llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.37k stars 11.71k forks source link

OpenMP offload Program with atomic does not finished when build with -O3 of AMD GPUs #94047

Closed fel-cab closed 2 months ago

fel-cab commented 4 months ago

On a MI250x

The following program doesn't finish when compiled as: clang -fopenmp -O3 --offload-arch=gfx90a

But it does finish when compiled as: clang -fopenmp --offload-arch=gfx90a

#include <stdio.h>
#include <stdlib.h>

#pragma omp requires atomic_default_mem_order(acq_rel)

int main() {

  int x = 0, y = 0;
  int errors = 0;

#pragma omp target parallel num_threads(2) map(tofrom: x, y, errors)
   {
       int thrd = omp_get_thread_num();
       if (thrd == 0) {
          x = 10;
          #pragma omp atomic write 
          y = 1;
       }
       if (thrd == 1){
          int tmp = 0;
          while (tmp == 0) {
            #pragma omp atomic read 
            tmp = y;
          }
          if(x != 10) errors++;
       }
   }
   if(errors)
     printf("Test Failed\n");
   else
     printf("Test Passed\n");

   return errors;
}
llvmbot commented 4 months ago

@llvm/issue-subscribers-openmp

Author: Felipe Cabarcas (fel-cab)

On a MI250x The following program doesn't finish when compiled as: `clang -fopenmp -O3 --offload-arch=gfx90a ` But it does finish when compiled as: `clang -fopenmp --offload-arch=gfx90a ` ```#include <omp.h> #include <stdio.h> #include <stdlib.h> #pragma omp requires atomic_default_mem_order(acq_rel) int main() { int x = 0, y = 0; int errors = 0; #pragma omp target parallel num_threads(2) map(tofrom: x, y, errors) { int thrd = omp_get_thread_num(); if (thrd == 0) { x = 10; #pragma omp atomic write y = 1; } if (thrd == 1){ int tmp = 0; while (tmp == 0) { #pragma omp atomic read tmp = y; } if(x != 10) errors++; } } if(errors) printf("Test Failed\n"); else printf("Test Passed\n"); return errors; } ```
llvmbot commented 4 months ago

@llvm/issue-subscribers-offload

Author: Felipe Cabarcas (fel-cab)

On a MI250x The following program doesn't finish when compiled as: `clang -fopenmp -O3 --offload-arch=gfx90a ` But it does finish when compiled as: `clang -fopenmp --offload-arch=gfx90a ` ```#include <omp.h> #include <stdio.h> #include <stdlib.h> #pragma omp requires atomic_default_mem_order(acq_rel) int main() { int x = 0, y = 0; int errors = 0; #pragma omp target parallel num_threads(2) map(tofrom: x, y, errors) { int thrd = omp_get_thread_num(); if (thrd == 0) { x = 10; #pragma omp atomic write y = 1; } if (thrd == 1){ int tmp = 0; while (tmp == 0) { #pragma omp atomic read tmp = y; } if(x != 10) errors++; } } if(errors) printf("Test Failed\n"); else printf("Test Passed\n"); return errors; } ```
llvmbot commented 4 months ago

@llvm/issue-subscribers-bug

Author: Felipe Cabarcas (fel-cab)

On a MI250x The following program doesn't finish when compiled as: `clang -fopenmp -O3 --offload-arch=gfx90a ` But it does finish when compiled as: `clang -fopenmp --offload-arch=gfx90a ` ```#include <omp.h> #include <stdio.h> #include <stdlib.h> #pragma omp requires atomic_default_mem_order(acq_rel) int main() { int x = 0, y = 0; int errors = 0; #pragma omp target parallel num_threads(2) map(tofrom: x, y, errors) { int thrd = omp_get_thread_num(); if (thrd == 0) { x = 10; #pragma omp atomic write y = 1; } if (thrd == 1){ int tmp = 0; while (tmp == 0) { #pragma omp atomic read tmp = y; } if(x != 10) errors++; } } if(errors) printf("Test Failed\n"); else printf("Test Passed\n"); return errors; } ```
shiltian commented 2 months ago

This is a backend issue.

llvmbot commented 2 months ago

@llvm/issue-subscribers-backend-amdgpu

Author: Felipe Cabarcas (fel-cab)

On a MI250x The following program doesn't finish when compiled as: `clang -fopenmp -O3 --offload-arch=gfx90a ` But it does finish when compiled as: `clang -fopenmp --offload-arch=gfx90a ` ```#include <omp.h> #include <stdio.h> #include <stdlib.h> #pragma omp requires atomic_default_mem_order(acq_rel) int main() { int x = 0, y = 0; int errors = 0; #pragma omp target parallel num_threads(2) map(tofrom: x, y, errors) { int thrd = omp_get_thread_num(); if (thrd == 0) { x = 10; #pragma omp atomic write y = 1; } if (thrd == 1){ int tmp = 0; while (tmp == 0) { #pragma omp atomic read tmp = y; } if(x != 10) errors++; } } if(errors) printf("Test Failed\n"); else printf("Test Passed\n"); return errors; } ```
jdoerfert commented 2 months ago

This code cannot reliably work. You are trying to synchronize within a warp on an AMDGPU, that does not work. OpenMP 6.0 makes this explicit. If you want to test atomics, split the reader and writer into different warps at least.