Open valassi opened 2 years ago
unfortunately this does not work echo | nvc++ -dM -E -
while this work echo | gcc -dM -E -
silly, just do
touch /tmp/dummy.cc
nvc++ -E -dM /tmp/dummy.cc
But ok, here's the real problem:
[p6asv6bp@rl-cpu-r82-u02 -bash] ~ > g++ -march=nehalem -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE_MATH__ 1
#define __SSE3__ 1
[p6asv6bp@rl-cpu-r82-u02 -bash] ~ > g++ -march=haswell -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE2_MATH__ 1
#define __AVX__ 1
#define __AVX2__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE_MATH__ 1
#define __SSE3__ 1
[p6asv6bp@rl-cpu-r82-u02 -bash] ~ > g++ -march=skylake-avx512 -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __AVX512F__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __AVX512BW__ 1
#define __SSE2_MATH__ 1
#define __AVX__ 1
#define __AVX512DQ__ 1
#define __AVX512VL__ 1
#define __AVX512CD__ 1
#define __AVX2__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE_MATH__ 1
#define __SSE3__ 1
[p6asv6bp@rl-cpu-r82-u02 -bash] ~ > nvc++ -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSSE3__ 1
#define __SSE4A__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __AVX__ 1
#define __AVX2__ 1
That is to say, nvc++ by default switches on SSE4 and AVX2 (for this machine maybe?). The strategy that I have been assuming relies on the fact that by default no SIMD is used, and you must add -march explicitly to switch on a given level of SIMD. This works fine for g++, clang++ and icpx.
I will need to think of a more radical way to fix this. (And maybe SIMD must simply be switched off if CVEs are not supported?...)
I built a patch around this
[p6asv6bp@rl-cpu-r82-u02 -bash] ~/madgraph4gpu/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx > nvc++ -mno-sse3 -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1
[p6asv6bp@rl-cpu-r82-u02 -bash] ~/madgraph4gpu/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx > nvc++ -mno-avx -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSSE3__ 1
#define __SSE4A__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
[p6asv6bp@rl-cpu-r82-u02 -bash] ~/madgraph4gpu/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx > nvc++ -march=haswell -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __AVX__ 1
#define __AVX2__ 1
[p6asv6bp@rl-cpu-r82-u02 -bash] ~/madgraph4gpu/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx > nvc++ -march=skylake -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __AVX__ 1
#define __AVX2__ 1
#define __AVX512F__ 1
#define __AVX512CD__ 1
#define __AVX512VL__ 1
#define __AVX512BW__ 1
#define __AVX512DQ__ 1
Next I had a link error
I fixed that, now I have curand.h missing, I only see curand.mod in /sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/nvhpc-22.5-dxjyivdpfqgrgis6nn57piosdjwqakcr/Linux_x86_64/22.5/compilers//include/curand.mod I think that there is no way out of this one?...
solved curand by loading nvc++ and afterwards cuda116.
Next one
ccache /sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cuda-11.6.1-w4akflar2kvbl2ix6mw3qbols7kbrxy6/bin/nvcc -o runTest.exe ./CPPProcess.o ./MatrixElementKernels.o ./BridgeKernels.o ./CrossSectionKernels.o ./RandomNumberKernels.o ./RamboSamplingKernels.o ./testxxx.o ./testmisc.o ./runTest.o ./gCPPProcess.o ./gMatrixElementKernels.o ./gBridgeKernels.o ./gCrossSectionKernels.o ./gRandomNumberKernels.o ./gRamboSamplingKernels.o ./testxxx_cu.o ./testmisc_cu.o ./runTest_cu.o -ldl -L../../lib -lmg5amc_common -L/sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/nvhpc-22.5-dxjyivdpfqgrgis6nn57piosdjwqakcr/Linux_x86_64/22.5/compilers/lib -lnvhpcatm -lnvcpumath -lnvc -L../../../../../test/googletest/build/lib/ -lgtest -lgtest_main -Xlinker -rpath,'$ORIGIN/../../lib' -lcuda -lgomp -L/sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cuda-11.6.1-w4akflar2kvbl2ix6mw3qbols7kbrxy6/lib64/ -lcurand
/usr/bin/ld: ../../../../../test/googletest/build/lib//libgtest.a(gtest-all.cc.o): relocation R_X86_64_32S against `.data' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: ../../../../../test/googletest/build/lib//libgtest_main.a(gtest_main.cc.o): relocation R_X86_64_32 against `.rodata' can not be used when making a PIE object; recompile with -fPIE
ouf this adds shared libraru
cd googletest/build && cmake -DBUILD_GMOCK=OFF -DBUILD_SHARED_LIBS=ON ../
however then I still get errors, and ESPECIALLY make gtest is no longer a noop???? it toucheds files every time, forcing rebuilds of all test?!
the next error is
/sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/nvhpc-22.5-dxjyivdpfqgrgis6nn57piosdjwqakcr/Linux_x86_64/22.5/compilers/bin/nvfortran -I. -c fcheck_sa.f -o fcheck_sa.o
NVFORTRAN-S-0038-Symbol, isnan, has not been explicitly declared (fcheck_sa.f)
This should be tested by installing nvc++ at CERN on itscrd70. It was clear that raplab had other issues, not just nvc++. And we were told that nvc++ is here to stay, so we'd better support it. Keep this open.
Note: this is also related to the old MR #319, which is obsolete, and which I will therefore close. That itself is related to #318.
I am keeping this one open, low priority.
For the hackathon on raplab: add support for nvc++
(The main reason why we moved here is to have a fortran compiler, and the same on login nodes and GPU nodes)
A few things this implies