On gfx90a and with rocm 5.4.0 device lib. Clang a63b7247299ce6edfbbf47c4a2773e5ca7eb7f11
Using https://github.com/ye-luo/miniqmc testing commit 6f526b6062682ec892fb02d2919484c8b4db0875
mkdir build_llvm_offload_cuda2hip_real; cd build_llvm_offload_cuda2hip_real
cmake -DCMAKE_CXX_COMPILER=clang++ -DENABLE_OFFLOAD=ON -DOFFLOAD_TARGET=amdgcn-amdhsa -DOFFLOAD_ARCH=gfx90a ..
make -j32 test_omptarget_blas
./src/Platforms/tests/OMPTarget/test_omptarget_blas
failure is sporadic. the result should be integer stored in floats but it is not.
On gfx90a and with rocm 5.4.0 device lib. Clang a63b7247299ce6edfbbf47c4a2773e5ca7eb7f11 Using https://github.com/ye-luo/miniqmc testing commit 6f526b6062682ec892fb02d2919484c8b4db0875
failure is sporadic. the result should be integer stored in floats but it is not.
Interestingly, if I edit
which basically compiles a few more unused offload regions. test_omptarget_blas passes reliably.
Even with the above workaround, if I add
-DCMAKE_CXX_FLAGS=-foffload-lto
in CMake, the test returns to failure mode.