Closed maddyscientist closed 3 months ago
I am compiling this branch on Vista at TACC. Things seemed to be going well, getting past the point where there was an error before. However, there was a long pause here:
[ 78%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5pre.cu.o [ 78%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5pre_m5inv.cu.o [ 79%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5inv_m5pre.cu.o [ 79%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5inv_m5inv.cu.o [ 79%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5mob.cu.o [ 79%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_domain_wall_4d_m5pre_m5mob.cu.o [ 80%] Building CUDA object lib/CMakeFiles/quda.dir/dslash_pack2.cu.o [ 80%] Building CUDA object lib/CMakeFiles/quda.dir/laplace.cu.o [ 80%] Building CUDA object lib/CMakeFiles/quda.dir/covariant_derivative.cu.o [ 80%] Building CUDA object lib/CMakeFiles/quda.dir/staggered_quark_smearing.cu.o
Then compilation failed on a host reference function:
[ 84%] Building CXX object tests/CMakeFiles/quda_test.dir/host_reference/gauge_force_reference.cpp.o [ 84%] Building CXX object tests/CMakeFiles/quda_test.dir/utils/misc.cpp.o "/home1/00282/tg455536/from_frontera/compile_vista/build/_deps/eigen-src/Eigen/src/Core/arch/NEON/Complex.h", line 397: error: statement expressions are only allowed in block scope static uint64x2_t p2ul_CONJ_XOR = vld1q_u64( p2ul_conj_XOR_DATA ); ^
1 error detected in the compilation of "/home1/00282/tg455536/from_frontera/compile_vista/quda/tests/host_reference/clover_force_reference.cpp". make[2]: *** [tests/CMakeFiles/quda_test.dir/build.make:296: tests/CMakeFiles/quda_test.dir/host_reference/clover_force_reference.cpp.o] Error 2
Is this something specific to ARM? I am not sure what NEON refers to.
Thanks, Steve
On Aug 13, 2024, at 4:39 PM, maddyscientist @.***> wrote:
Fixes a bug observed with some compilers (nvc++, rocm clang) where the compiler fails to compile if:
I've also fixed a warning with nvc++ and reduced the argument size for the dilution kernel.
You can view, comment on, or merge this pull request online at:
https://github.com/lattice/quda/pull/1485
Commit Summary
File Changes
(4 fileshttps://github.com/lattice/quda/pull/1485/files)
Patch Links:
— Reply to this email directly, view it on GitHubhttps://github.com/lattice/quda/pull/1485, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGG3BNBPXIQ6OKJO7ST6MDZRJVIZAVCNFSM6AAAAABMPAO2YWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ3DIMJXHEZDAMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks @stevengottlieb for testing this. This is indeed an Arm issue, with NEON being one of the vector instruction sets equivalent to SEE on Intel. Moreover, I see the issue is with compiling the Eigen headers. I'll investigate and report back.
Thanks, Kate! I appreciate your help.
On Aug 14, 2024, at 1:30 PM, maddyscientist @.***> wrote:
Thanks @stevengottliebhttps://github.com/stevengottlieb for testing this. This is indeed an Arm issue, with NEON being one of the vector instruction sets equivalent to SEE on Intel. Moreover, I see the issue is with compiling the Eigen headers. I'll investigate and report back.
— Reply to this email directly, view it on GitHubhttps://github.com/lattice/quda/pull/1485#issuecomment-2289408260, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGG3BOO35KPI5UV3LNAPJ3ZROH35AVCNFSM6AAAAABMPAO2YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZGQYDQMRWGA. You are receiving this because you were mentioned.Message ID: @.***>
@stevengottlieb this is a bug in Eigen it seems. The patch to apply is to replace
#if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML
with
#if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML || __NVCOMPILER_LLVM__
in $PATH_TO_QUDA_BUILD_DIR/_deps/eigen-src/Eigen/src/Core/arch/NEON/Complex.h
. Can you verify this fixes the issue for you?
I'm trying to work out the best way to fix this issue in the short term (until Eigen is fixed at source).
@maddyscientist Thanks, Kate.
I found two such lines the the Complex.h file and applied the fix to both. I then returned to the build directory and typed make -j 32 Everything seems fine now.
Thanks again! Steve
On Aug 14, 2024, at 2:59 PM, maddyscientist @.***> wrote:
@stevengottliebhttps://github.com/stevengottlieb this is a bug in Eigen it seems. The patch to apply is to replace
with
in $PATH_TO_QUDA_BUILD_DIR/_deps/eigen-src/Eigen/src/Core/arch/NEON/Complex.h. Can you verify this fixes the issue for you?
I'm trying to work out the best way to fix this issue in the short term (until Eigen is fixed at source).
— Reply to this email directly, view it on GitHubhttps://github.com/lattice/quda/pull/1485#issuecomment-2289610669, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGG3BKSQOYNWEDUEOACC3TZROSIBAVCNFSM6AAAAABMPAO2YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZGYYTANRWHE. You are receiving this because you were mentioned.Message ID: @.***>
Thanks for confirming the fix works @stevengottlieb. I've updated the build system to now apply the patch automatically if using the NVHPC compiler, so this should now work out of the box for you.
cscs-ci run
cscs-ci run
Thanks @maddyscientist. I started a fresh compile yesterday on Vista and noticed that the build completed without my having to edit the Complex.h. I was wondering how that came about.
On Aug 18, 2024, at 1:24 PM, maddyscientist @.***> wrote:
Thanks for confirming the fix works @stevengottliebhttps://github.com/stevengottlieb. I've updated the build system to now apply the patch automatically if using the NVHPC compiler, so this should now work out of the box for you.
— Reply to this email directly, view it on GitHubhttps://github.com/lattice/quda/pull/1485#issuecomment-2295332674, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGG3BNT2KSX5NJ4B6CCSTDZSDKGNAVCNFSM6AAAAABMPAO2YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGMZTENRXGQ. You are receiving this because you were mentioned.Message ID: @.***>
cscs-ci run
cscs-ci run
Fixes a bug observed with some compilers (nvc++, rocm clang) where the compiler fails to compile if:
complex.h
/complex
headerI
I've also fixed a warning with nvc++ and reduced the argument size for the dilution kernel.