lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
289 stars 97 forks source link

Hotfix/ccomplex #1485

Closed maddyscientist closed 3 weeks ago

maddyscientist commented 1 month ago

Fixes a bug observed with some compilers (nvc++, rocm clang) where the compiler fails to compile if:

I've also fixed a warning with nvc++ and reduced the argument size for the dilution kernel.

stevengottlieb commented 1 month ago

I am compiling this branch on Vista at TACC. Things seemed to be going well, getting past the point where there was an error before. However, there was a long pause here:

[ 78%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5pre.cu.o [ 78%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5pre_m5inv.cu.o [ 79%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5inv_m5pre.cu.o [ 79%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5inv_m5inv.cu.o [ 79%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5mob.cu.o [ 79%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5pre_m5mob.cu.o [ 80%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_pack2.cu.o [ 80%] Building CUDA object lib/CMakeFiles/quda.dir/laplace.cu.o [ 80%] Building CUDA object lib/CMakeFiles/quda.dir/covariant_derivative.cu.o [ 80%] Building CUDA object lib/CMakeFiles/quda.dir/staggered_quark_smearing.cu.o

Then compilation failed on a host reference function:

[ 84%] Building CXX object tests/CMakeFiles/quda_test.dir/host_reference/gauge_force_reference.cpp.o [ 84%] Building CXX object tests/CMakeFiles/quda_test.dir/utils/misc.cpp.o "/home1/00282/tg455536/from_frontera/compile_vista/build/_deps/eigen-src/Eigen/src/Core/arch/NEON/Complex.h", line 397: error: statement expressions are only allowed in block scope static uint64x2_t p2ul_CONJ_XOR = vld1q_u64( p2ul_conj_XOR_DATA ); ^

1 error detected in the compilation of "/home1/00282/tg455536/from_frontera/compile_vista/quda/tests/host_reference/clover_force_reference.cpp". make[2]: *** [tests/CMakeFiles/quda_test.dir/build.make:296: tests/CMakeFiles/quda_test.dir/host_reference/clover_force_reference.cpp.o] Error 2

Is this something specific to ARM? I am not sure what NEON refers to.

Thanks, Steve

On Aug 13, 2024, at 4:39 PM, maddyscientist @.***> wrote:

Fixes a bug observed with some compilers (nvc++, rocm clang) where the compiler fails to compile if:

I've also fixed a warning with nvc++ and reduced the argument size for the dilution kernel.


You can view, comment on, or merge this pull request online at:

https://github.com/lattice/quda/pull/1485

Commit Summary

File Changes

(4 fileshttps://github.com/lattice/quda/pull/1485/files)

Patch Links:

— Reply to this email directly, view it on GitHubhttps://github.com/lattice/quda/pull/1485, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGG3BNBPXIQ6OKJO7ST6MDZRJVIZAVCNFSM6AAAAABMPAO2YWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ3DIMJXHEZDAMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

maddyscientist commented 1 month ago

Thanks @stevengottlieb for testing this. This is indeed an Arm issue, with NEON being one of the vector instruction sets equivalent to SEE on Intel. Moreover, I see the issue is with compiling the Eigen headers. I'll investigate and report back.

stevengottlieb commented 1 month ago

Thanks, Kate! I appreciate your help.

On Aug 14, 2024, at 1:30 PM, maddyscientist @.***> wrote:

Thanks @stevengottliebhttps://github.com/stevengottlieb for testing this. This is indeed an Arm issue, with NEON being one of the vector instruction sets equivalent to SEE on Intel. Moreover, I see the issue is with compiling the Eigen headers. I'll investigate and report back.

— Reply to this email directly, view it on GitHubhttps://github.com/lattice/quda/pull/1485#issuecomment-2289408260, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGG3BOO35KPI5UV3LNAPJ3ZROH35AVCNFSM6AAAAABMPAO2YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZGQYDQMRWGA. You are receiving this because you were mentioned.Message ID: @.***>

maddyscientist commented 1 month ago

@stevengottlieb this is a bug in Eigen it seems. The patch to apply is to replace

#if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML

with

#if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML || __NVCOMPILER_LLVM__

in $PATH_TO_QUDA_BUILD_DIR/_deps/eigen-src/Eigen/src/Core/arch/NEON/Complex.h. Can you verify this fixes the issue for you?

I'm trying to work out the best way to fix this issue in the short term (until Eigen is fixed at source).

stevengottlieb commented 1 month ago

@maddyscientist Thanks, Kate.

I found two such lines the the Complex.h file and applied the fix to both. I then returned to the build directory and typed make -j 32 Everything seems fine now.

Thanks again! Steve

On Aug 14, 2024, at 2:59 PM, maddyscientist @.***> wrote:

@stevengottliebhttps://github.com/stevengottlieb this is a bug in Eigen it seems. The patch to apply is to replace

if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML

with

if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML || __NVCOMPILER_LLVM__

in $PATH_TO_QUDA_BUILD_DIR/_deps/eigen-src/Eigen/src/Core/arch/NEON/Complex.h. Can you verify this fixes the issue for you?

I'm trying to work out the best way to fix this issue in the short term (until Eigen is fixed at source).

— Reply to this email directly, view it on GitHubhttps://github.com/lattice/quda/pull/1485#issuecomment-2289610669, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGG3BKSQOYNWEDUEOACC3TZROSIBAVCNFSM6AAAAABMPAO2YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZGYYTANRWHE. You are receiving this because you were mentioned.Message ID: @.***>

maddyscientist commented 3 weeks ago

Thanks for confirming the fix works @stevengottlieb. I've updated the build system to now apply the patch automatically if using the NVHPC compiler, so this should now work out of the box for you.

maddyscientist commented 3 weeks ago

cscs-ci run

maddyscientist commented 3 weeks ago

cscs-ci run

stevengottlieb commented 3 weeks ago

Thanks @maddyscientist. I started a fresh compile yesterday on Vista and noticed that the build completed without my having to edit the Complex.h. I was wondering how that came about.

On Aug 18, 2024, at 1:24 PM, maddyscientist @.***> wrote:

Thanks for confirming the fix works @stevengottliebhttps://github.com/stevengottlieb. I've updated the build system to now apply the patch automatically if using the NVHPC compiler, so this should now work out of the box for you.

— Reply to this email directly, view it on GitHubhttps://github.com/lattice/quda/pull/1485#issuecomment-2295332674, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGG3BNT2KSX5NJ4B6CCSTDZSDKGNAVCNFSM6AAAAABMPAO2YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGMZTENRXGQ. You are receiving this because you were mentioned.Message ID: @.***>

maddyscientist commented 3 weeks ago

cscs-ci run

maddyscientist commented 3 weeks ago

cscs-ci run