llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.17k stars 12.03k forks source link

[OpenMP] miscompilation offload to amdgpu #59759

Closed ye-luo closed 1 year ago

ye-luo commented 1 year ago

On gfx90a and with rocm 5.4.0 device lib. Clang a63b7247299ce6edfbbf47c4a2773e5ca7eb7f11 Using https://github.com/ye-luo/miniqmc testing commit 6f526b6062682ec892fb02d2919484c8b4db0875

mkdir build_llvm_offload_cuda2hip_real; cd build_llvm_offload_cuda2hip_real
cmake -DCMAKE_CXX_COMPILER=clang++ -DENABLE_OFFLOAD=ON -DOFFLOAD_TARGET=amdgcn-amdhsa -DOFFLOAD_ARCH=gfx90a ..
make -j32 test_omptarget_blas
./src/Platforms/tests/OMPTarget/test_omptarget_blas

failure is sporadic. the result should be integer stored in floats but it is not.

-------------------------------------------------------------------------------
OmpBLAS gemv
-------------------------------------------------------------------------------
/ccs/home/yeluo/test/miniqmc/src/Platforms/tests/OMPTarget/test_omp_BLAS.cpp:179
...............................................................................

/ccs/home/yeluo/test/miniqmc/src/Platforms/tests/OMPTarget/test_omp_BLAS.cpp:175: FAILED:
  CHECK( Cs[batch][index] == Ds[batch][index] )
with expansion:
  586417.0317596535 == 586417.0

/ccs/home/yeluo/test/miniqmc/src/Platforms/tests/OMPTarget/test_omp_BLAS.cpp:175: FAILED:
  CHECK( Cs[batch][index] == Ds[batch][index] )
with expansion:
  728143.0635398587 == 728143.0

===============================================================================
test cases:    1 |    0 passed | 1 failed
assertions: 6576 | 6574 passed | 2 failed

Interestingly, if I edit

diff --git a/src/Platforms/OMPTarget/ompBLAS.cpp b/src/Platforms/OMPTarget/ompBLAS.cpp
index ce895f0..ca9d395 100644
--- a/src/Platforms/OMPTarget/ompBLAS.cpp
+++ b/src/Platforms/OMPTarget/ompBLAS.cpp
@@ -93,7 +93,6 @@ ompBLAS_status gemv(ompBLAS_handle&     handle,
   return gemv_impl(handle, trans, m, n, alpha, A, lda, x, incx, beta, y, incy);
 }

-#if !defined(OPENMP_NO_COMPLEX)
 ompBLAS_status gemv(ompBLAS_handle&                  handle,
                     const char                       trans,
                     const int                        m,
@@ -125,7 +124,6 @@ ompBLAS_status gemv(ompBLAS_handle&                   handle,
 {
   return gemv_impl(handle, trans, m, n, alpha, A, lda, x, incx, beta, y, incy);
 }
-#endif

which basically compiles a few more unused offload regions. test_omptarget_blas passes reliably.

Even with the above workaround, if I add -DCMAKE_CXX_FLAGS=-foffload-lto in CMake, the test returns to failure mode.

llvmbot commented 1 year ago

@llvm/issue-subscribers-openmp