llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.83k stars 11.91k forks source link

[OpenMP] Host runtime does not lower teams/thread count properly and oversubscribes the system #64991

Open jdoerfert opened 1 year ago

jdoerfert commented 1 year ago

The benchmark from https://github.com/llvm/llvm-project/issues/64959 cannot be run on the host right now without effectively killing it. I think we lower the thread count but probably not the team count.

#include <stdio.h>
#define N   10

int main (void)
{
  long int aa=0;
  int res = 0;

  int ng =12;
  int cmom = 14;
  int nxyz = 5000;

  #pragma omp target teams distribute num_teams(nxyz) thread_limit(ng*(cmom-1)) map(tofrom:aa)
  for (int gid = 0; gid < nxyz; gid++) {
    #pragma omp parallel for  collapse(2)
    for (unsigned int g = 0; g < ng; g++) {
      for (unsigned int l = 0; l < cmom-1; l++) {
        int a = 0;
        #pragma omp parallel for reduction(+:a)
        for (int i = 0; i < N; i++) {
          a += i;
        }
        #pragma omp atomic
        aa += a;
      }
    }
  }
  long exp = (long)ng*(cmom-1)*nxyz*(N*(N-1)/2);
  printf ("The result is = %ld exp:%ld!\n", aa,exp);
  if (aa != exp) {
    printf("Failed %ld\n",aa);
    return 1;
  }
  return 0;
}
llvmbot commented 1 year ago

@llvm/issue-subscribers-openmp

shiltian commented 1 year ago

That is not because of the teams. We have already properly cap the team size and number of teams. In this case, we will create N teams, each of which has 1 thread, where N is the number of threads the system has. The issue is in the nested parallel region. Every thread will create its own parallel region. Note that since all those threads are in their own team, even we set OMP_MAX_ACTIVE_LEVELS=1, there will still be N*N threads.