Enable CUDA in SLEPc solver

blmelp commented 10 months ago

We use the variable have_cuda to check if Yambo is configured with CUDA. Additionally we use the PETSc macro PETSC_HAVE_CUDA to check if PETSc is configured with CUDA.

We changed the location of SlepcInitialize and SlepcFinalize to cover all relevant code.

@joseeroman

sangallidavide commented 10 months ago

The changes are really minimal. So for sure no problem to merge this.

I guess you are using nvfotran to run on GPUs

Few questions.

Where is PETSC_HAVE_CUDA defined ? Will the compiler automatically know about it?
Which version of the SLEPC / PETSC library should we use ?

So far, on my machine, I've never been able to compile slepc (I'm using 3.17.2, should I change version?) with nvfortran compiler. If I try, at configure time,

python3 ./configure --prefix=/data/shared/yambo-libs/default/nvfortran/mpif90nv/single
Checking environment... done
Checking PETSc installation...
ERROR: Unable to link with PETSc
ERROR: See "installed-arch-linux2-c-debug-complex/lib/slepc/conf/configure.log" file for details

and, in the log


VecCreate(PETSC_COMM_WORLD,&v);
MatCreate(PETSC_COMM_WORLD,&m);
KSPCreate(PETSC_COMM_WORLD,&k);
return 0;
}

Running command: cd /tmp/slepc-rpuo57pc;/usr/bin/gmake checklink LINKFLAGS=""

Output: mpiccnv -o checklink.o -c -g -lineinfo -I/data/shared/yambo-libs/default/nvfortran/mpif90nv/single/include pwd/checklink.c mpiccnv -g -lineinfo -o checklink checklink.o -L/data/shared/yambo-libs/default/nvfortran/mpif90nv/single/lib -L/data/shared/yambo-libs/default/nvfortran/mpif90nv/lib -L/opt/nvidia/openmpi/lib -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.9/compilers/lib -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lnvf -lnvomp -lnvhpcatm -latomic -lpthread -lnvcpumath -lnsnvc -lnvc -lrt -lgcc_s -lm -lquadmath -ldl /usr/bin/ld: /data/shared/yambo-libs/default/nvfortran/mpif90nv/single/lib/libpetsc.a(petscsysmod.o): relocation R_X86_64_32S against symbol `_petscmpi8' can not be used when making a PIE object; recompile with -fPIE /usr/bin/ld: failed to set dynamic section sizes: bad value collect2: error: ld returned 1 exit status gmake: *** [makefile:2: checklink] Error 1

ERROR: Unable to link with PETSc

blmelp commented 10 months ago

1. Where is `PETSC_HAVE_CUDA` defined ? Will the compiler automatically know about it?
PETSC_HAVE_CUDA is a macro located in file petscconf.h which is included automatically when you include any PETSc header file. This file is written during configuration of PETSc, and contains macros related to the configuration.

Which version of the SLEPC / PETSC library should we use ? We used the latest version on our tests (3.20), but it should work with any version newer than 3.7, though we didn´t test this. The support for GPU in PETSc is there for more than 10 years probably.

I used this configuration:

PETSc

./configure --with-cc=mpicc --with-fc=mpifort --with-cxx=mpicxx --prefix=$HOME/install/petsc-slepc-3.20-sp --with-scalar-type=complex --with-debugging=0 --with-precision=single --with-cuda --with-blaslapack-dir=$NVHPC_ROOT/compilers/lib

SLEPc

./configure --prefix=$HOME/install/petsc-slepc-3.20-sp

YAMBO

./configure CC=nvc CPP="cpp -E" FPP="nvfortran -Mpreprocess -E" MPICC=mpicc FC=nvfortran F77=nvfortran MPIFC=mpifort MPIF77=mpifort --enable-cuda=cuda11.8 --with-blas-libs="-lblas" --with-lapack-libs="-llapack" --with-scalapack-libs=$NVHPC_ROOT/comm_libs/mpi/lib/libscalapack.a --enable-slepc-linalg --enable-par-linalg --with-petsc-path=$HOME/install/petsc-slepc-3.20-sp --with-slepc-path=$HOME/install/petsc-slepc-3.20-sp

joseeroman commented 10 months ago

So far, on my machine, I've never been able to compile slepc (I'm using 3.17.2, should I change version?) with nvfortran compiler. If I try, at configure time,

It is not really necessary to use nvfortran. You can install CUDA only (without the HPC toolkit) and then use it from gfortran for instance. Add --with-cuda at PETSc's configure and it should find it if installed in standard paths.

I don't know why nvfortran does not work for you. Does make check work in PETSc? If you want me to check the details, send me PETSc's configure.log and make.log by email, together with SLEPc's configure.log.

sangallidavide commented 10 months ago

Ok for PETSC_HAVE_CUDA
Since often yambo users use the internal petsc+slepc we'll need to change a bit the logic of the internal library. At the moment they are installed inside the yambo_libs_path under "single" or "double" depending on the compilation precision. We'll have to also move them inside the "cuda" or "nogpu" subfolders depending if they are compiled with "cuda" or not I'll need to merge with some other cuda developments by @andrea-ferretti We can do this in a second step. For now we can proceed with the merge and it will work with external slepc/petsc only
For gfortran with CUDA my experience is that it does not work yet. Ww have an ogning development for GPU porting based on OpenAcc, which is in principle supported by gfortran. But in practice there are some open issues. Probably because we use fortran and not C. Maybe the situation is now changed. @andrea-ferretti is the expert here.

andrea-ferretti commented 10 months ago

Hi All,

great news, thanks !!!! Let me reply here to some of the comments:

at present the tech/devel-gpu contains an ongoing development of OpenACC (now finally working, unto GW QP_PPA_cohsex.F in some std cases)
this is anyway working only with the nvidia compiler (and I'm personally not attempting to support
fortran + openacc, at present)

sangallidavide commented 10 months ago

this is anyway working only with the nvidia compiler (and I'm personally not attempting to support fortran + openacc, at present)

You mean gfortran + openacc ?

Anyway yeah, for the time being let's stick on nvfortran and rely on pre-compiled slepc and petsc with CUDA support. Are these available on Cineca for example?

Finally, for the compilation of the internal petsc/slepc is more related to the yambo configure. There is a private issue here https://github.com/yambo-code/yambo-devel/issues/736 which needs to be fixed. I think this is the source of the error

andrea-ferretti commented 10 months ago

yes, gfortran + openacc, apologies

sangallidavide commented 10 months ago

Branch is clean on my machine: https://media.yambo-code.eu/robots/slepc_gpu/unimi-XPS-8930.php

sangallidavide commented 8 months ago

Dear all, we are testing yambo+slepc with GPU support on the CINECA cluster Leonardo. However when comparing CPU only (5.2.0) vs GPU (5.2.1) run we do not see any significant difference. I guess something might not be correct in the compilation. The Petsc are compiled with CUDA and the PETSC_HAVE_CUDA flag is active (we see lines protected by this flag inside the pre-processed yambo source).

I attach a comparison of two runs: 1) yambo 5.2.0 (before this pull request was merged) without GPU support for the SLEPC solver 1) yambo 5.2.1 (after this pull request was merged) with GPU support for the SLEPC solver

test-slepc.tar.gz Do you have suggestions?

blmelp commented 8 months ago

Just to make sure, I have noticed that there are some differences in the configuration of the tests and the results, for example the line: DipComputed= "R V P" # [DIP] [default R P V; extra P2 Spin Orb]
It is possible that if the test is small, the overhead of sending data between CPU and GPU is larger than the benefit (specially when MPI is also involved, in our tests we used only one or two processes on MPI with GPU). I have noticed that many of the individual times are also larger, not only the SLEPc solver.
Could you send us some additional logs for PETSc? They can be obtained on the standard output by running again the test with some PETSc options, for example I use the line: PETSC_OPTIONS="-log_view -log_view_gpu_time -eps_view_mat0 ::ascii_info_detail" mpirun -np 1 yambo -F $input -J BSE_JOB -C $results and redirect the standard output to a file.
Another thing we can test is adding to the PETSC_OPTIONS -device_enable_cuda eager also to see if anything changes.

palful commented 8 months ago

Hi Blanca, thank you for the feedback.

I've addressed points 3 and 4 by rerunning the tests with the options you specified (logs and stdouts attached). It looks like the GPU stuff is detected. I also checked that running the test with 4 MPI tasks (4 gpu devices) takes approximately the same time as doing it with just 1 MPI task (2m in my present test).

01-test-slepc.tar.gz

sangallidavide commented 8 months ago

I put below the last tests from @palful here for reference. It looks like it is all good. The overhead was the source of the longer time in the GPU only case.

So all good.

MPI tasks: 16 | kernel size: 16200 => 1.9GB   | BSSNEig: 2000 | slepc: CPU | Timing: 99.0s
MPI tasks: 16 | kernel size: 16200 => 1.9GB   | BSSNEig: 2000 | slepc: GPU | Timing: 132.1s

MPI tasks:   1 | kernel size: 16200 => 1.9GB   | BSSNEig: 2000 | slepc: CPU | Timing: 565.4s
MPI tasks:   1 | kernel size: 16200 => 1.9GB   | BSSNEig: 2000 | slepc: GPU | Timing: 76.4s

MPI tasks: 16 | kernel size: 43200 => 13.9GB | BSSNEig: 2000 | slepc: CPU | Timing: 511.9s
MPI tasks: 16 | kernel size: 43200 => 13.9GB | BSSNEig: 2000 | slepc: GPU | Timing: 192.5s

joseeroman commented 8 months ago

The CUDA support in SLEPc is work in progress. Some comments:

The use of the GPU is more efficient with 1 MPI process, compared to several MPI processes. The reason is that we have not yet implemented support for CUDA-aware MPI, so MPI primitives always have to work with buffers in CPU memory, not GPU memory.
In SLEPc 3.20 we introduced some changes that should have an impact on the performance on GPU. You are using 3.18, we suggest upgrading if possible.
Version 3.21 (to be released in April) will also have additional improvements in terms of GPU efficiency.
From your logs, it is true that there are too many data copies to/from GPU. We should analyze this and find possible optimizations.

joseeroman commented 7 months ago

You can try this change in SLEPc https://gitlab.com/slepc/slepc/-/merge_requests/631 which is already available in SLEPc main (note that trying SLEPc main requires PETSc main)

yambo-code / yambo

Enable CUDA in SLEPc solver #63