madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 33 forks source link

add support for nvc++ #531

Open valassi opened 2 years ago

valassi commented 2 years ago

For the hackathon on raplab: add support for nvc++

(The main reason why we moved here is to have a fortran compiler, and the same on login nodes and GPU nodes)

A few things this implies

valassi commented 2 years ago

unfortunately this does not work echo | nvc++ -dM -E - while this work echo | gcc -dM -E -

valassi commented 2 years ago

silly, just do

touch /tmp/dummy.cc
nvc++ -E -dM /tmp/dummy.cc

But ok, here's the real problem:

[p6asv6bp@rl-cpu-r82-u02 -bash] ~ > g++ -march=nehalem -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE_MATH__ 1
#define __SSE3__ 1
[p6asv6bp@rl-cpu-r82-u02 -bash] ~ > g++ -march=haswell -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE2_MATH__ 1
#define __AVX__ 1
#define __AVX2__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE_MATH__ 1
#define __SSE3__ 1
[p6asv6bp@rl-cpu-r82-u02 -bash] ~ > g++ -march=skylake-avx512 -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __AVX512F__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __AVX512BW__ 1
#define __SSE2_MATH__ 1
#define __AVX__ 1
#define __AVX512DQ__ 1
#define __AVX512VL__ 1
#define __AVX512CD__ 1
#define __AVX2__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE_MATH__ 1
#define __SSE3__ 1
[p6asv6bp@rl-cpu-r82-u02 -bash] ~ > nvc++ -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSSE3__ 1
#define __SSE4A__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __AVX__ 1
#define __AVX2__ 1

That is to say, nvc++ by default switches on SSE4 and AVX2 (for this machine maybe?). The strategy that I have been assuming relies on the fact that by default no SIMD is used, and you must add -march explicitly to switch on a given level of SIMD. This works fine for g++, clang++ and icpx.

I will need to think of a more radical way to fix this. (And maybe SIMD must simply be switched off if CVEs are not supported?...)

valassi commented 2 years ago

I built a patch around this

[p6asv6bp@rl-cpu-r82-u02 -bash] ~/madgraph4gpu/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx > nvc++ -mno-sse3 -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1

[p6asv6bp@rl-cpu-r82-u02 -bash] ~/madgraph4gpu/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx > nvc++ -mno-avx -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSSE3__ 1
#define __SSE4A__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1

[p6asv6bp@rl-cpu-r82-u02 -bash] ~/madgraph4gpu/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx > nvc++ -march=haswell -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __AVX__ 1
#define __AVX2__ 1

[p6asv6bp@rl-cpu-r82-u02 -bash] ~/madgraph4gpu/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx > nvc++ -march=skylake -E -dM /tmp/dummy.cc | egrep '(SSE|AVX)'
#define __PGI_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __NVCOMPILER_CLANG_SSE_INTRINSICS_VERSION__ 110000
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __AVX__ 1
#define __AVX2__ 1
#define __AVX512F__ 1
#define __AVX512CD__ 1
#define __AVX512VL__ 1
#define __AVX512BW__ 1
#define __AVX512DQ__ 1
valassi commented 2 years ago

Next I had a link error

I fixed that, now I have curand.h missing, I only see curand.mod in /sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/nvhpc-22.5-dxjyivdpfqgrgis6nn57piosdjwqakcr/Linux_x86_64/22.5/compilers//include/curand.mod I think that there is no way out of this one?...

valassi commented 2 years ago

solved curand by loading nvc++ and afterwards cuda116.

Next one

ccache /sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cuda-11.6.1-w4akflar2kvbl2ix6mw3qbols7kbrxy6/bin/nvcc -o runTest.exe ./CPPProcess.o ./MatrixElementKernels.o ./BridgeKernels.o ./CrossSectionKernels.o ./RandomNumberKernels.o ./RamboSamplingKernels.o ./testxxx.o  ./testmisc.o  ./runTest.o ./gCPPProcess.o ./gMatrixElementKernels.o ./gBridgeKernels.o ./gCrossSectionKernels.o ./gRandomNumberKernels.o ./gRamboSamplingKernels.o ./testxxx_cu.o  ./testmisc_cu.o  ./runTest_cu.o -ldl -L../../lib -lmg5amc_common -L/sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/nvhpc-22.5-dxjyivdpfqgrgis6nn57piosdjwqakcr/Linux_x86_64/22.5/compilers/lib -lnvhpcatm -lnvcpumath -lnvc -L../../../../../test/googletest/build/lib/ -lgtest -lgtest_main -Xlinker -rpath,'$ORIGIN/../../lib'  -lcuda -lgomp -L/sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cuda-11.6.1-w4akflar2kvbl2ix6mw3qbols7kbrxy6/lib64/ -lcurand 
/usr/bin/ld: ../../../../../test/googletest/build/lib//libgtest.a(gtest-all.cc.o): relocation R_X86_64_32S against `.data' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: ../../../../../test/googletest/build/lib//libgtest_main.a(gtest_main.cc.o): relocation R_X86_64_32 against `.rodata' can not be used when making a PIE object; recompile with -fPIE
valassi commented 2 years ago

ouf this adds shared libraru

        cd googletest/build && cmake -DBUILD_GMOCK=OFF -DBUILD_SHARED_LIBS=ON ../

however then I still get errors, and ESPECIALLY make gtest is no longer a noop???? it toucheds files every time, forcing rebuilds of all test?!

valassi commented 2 years ago

the next error is

/sw/packages/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/nvhpc-22.5-dxjyivdpfqgrgis6nn57piosdjwqakcr/Linux_x86_64/22.5/compilers/bin/nvfortran -I. -c fcheck_sa.f -o fcheck_sa.o
NVFORTRAN-S-0038-Symbol, isnan, has not been explicitly declared (fcheck_sa.f)
valassi commented 2 years ago

This should be tested by installing nvc++ at CERN on itscrd70. It was clear that raplab had other issues, not just nvc++. And we were told that nvc++ is here to stay, so we'd better support it. Keep this open.

valassi commented 1 year ago

Note: this is also related to the old MR #319, which is obsolete, and which I will therefore close. That itself is related to #318.

I am keeping this one open, low priority.