yambo-code / yambo

This is the official GPL repository of the yambo code
http://www.yambo-code.eu/
GNU General Public License v2.0
98 stars 38 forks source link

Enable CUDA in SLEPc solver #63

Closed blmelp closed 10 months ago

blmelp commented 10 months ago

We use the variable have_cuda to check if Yambo is configured with CUDA. Additionally we use the PETSc macro PETSC_HAVE_CUDA to check if PETSc is configured with CUDA.

We changed the location of SlepcInitialize and SlepcFinalize to cover all relevant code.

@joseeroman

sangallidavide commented 10 months ago

The changes are really minimal. So for sure no problem to merge this.

I guess you are using nvfotran to run on GPUs

Few questions.

  1. Where is PETSC_HAVE_CUDA defined ? Will the compiler automatically know about it?

  2. Which version of the SLEPC / PETSC library should we use ?

  3. So far, on my machine, I've never been able to compile slepc (I'm using 3.17.2, should I change version?) with nvfortran compiler. If I try, at configure time,

    python3 ./configure --prefix=/data/shared/yambo-libs/default/nvfortran/mpif90nv/single
    Checking environment... done
    Checking PETSc installation...
    ERROR: Unable to link with PETSc
    ERROR: See "installed-arch-linux2-c-debug-complex/lib/slepc/conf/configure.log" file for details

    and, in the log

    
    VecCreate(PETSC_COMM_WORLD,&v);
    MatCreate(PETSC_COMM_WORLD,&m);
    KSPCreate(PETSC_COMM_WORLD,&k);
    return 0;
    }

Running command: cd /tmp/slepc-rpuo57pc;/usr/bin/gmake checklink LINKFLAGS=""


Output: mpiccnv -o checklink.o -c -g -lineinfo -I/data/shared/yambo-libs/default/nvfortran/mpif90nv/single/include pwd/checklink.c mpiccnv -g -lineinfo -o checklink checklink.o -L/data/shared/yambo-libs/default/nvfortran/mpif90nv/single/lib -L/data/shared/yambo-libs/default/nvfortran/mpif90nv/lib -L/opt/nvidia/openmpi/lib -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.9/compilers/lib -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lnvf -lnvomp -lnvhpcatm -latomic -lpthread -lnvcpumath -lnsnvc -lnvc -lrt -lgcc_s -lm -lquadmath -ldl /usr/bin/ld: /data/shared/yambo-libs/default/nvfortran/mpif90nv/single/lib/libpetsc.a(petscsysmod.o): relocation R_X86_64_32S against symbol `_petscmpi8' can not be used when making a PIE object; recompile with -fPIE /usr/bin/ld: failed to set dynamic section sizes: bad value collect2: error: ld returned 1 exit status gmake: *** [makefile:2: checklink] Error 1


ERROR: Unable to link with PETSc

blmelp commented 10 months ago
1. Where is `PETSC_HAVE_CUDA` defined ? Will the compiler automatically know about it?

PETSC_HAVE_CUDA is a macro located in file petscconf.h which is included automatically when you include any PETSc header file. This file is written during configuration of PETSc, and contains macros related to the configuration.

  1. Which version of the SLEPC / PETSC library should we use ? We used the latest version on our tests (3.20), but it should work with any version newer than 3.7, though we didn´t test this. The support for GPU in PETSc is there for more than 10 years probably.

I used this configuration:

PETSc

./configure --with-cc=mpicc --with-fc=mpifort --with-cxx=mpicxx --prefix=$HOME/install/petsc-slepc-3.20-sp --with-scalar-type=complex --with-debugging=0 --with-precision=single --with-cuda --with-blaslapack-dir=$NVHPC_ROOT/compilers/lib

SLEPc

./configure --prefix=$HOME/install/petsc-slepc-3.20-sp

YAMBO

./configure CC=nvc CPP="cpp -E" FPP="nvfortran -Mpreprocess -E" MPICC=mpicc FC=nvfortran F77=nvfortran MPIFC=mpifort MPIF77=mpifort --enable-cuda=cuda11.8 --with-blas-libs="-lblas" --with-lapack-libs="-llapack" --with-scalapack-libs=$NVHPC_ROOT/comm_libs/mpi/lib/libscalapack.a --enable-slepc-linalg --enable-par-linalg --with-petsc-path=$HOME/install/petsc-slepc-3.20-sp --with-slepc-path=$HOME/install/petsc-slepc-3.20-sp
joseeroman commented 10 months ago
  1. So far, on my machine, I've never been able to compile slepc (I'm using 3.17.2, should I change version?) with nvfortran compiler. If I try, at configure time,

It is not really necessary to use nvfortran. You can install CUDA only (without the HPC toolkit) and then use it from gfortran for instance. Add --with-cuda at PETSc's configure and it should find it if installed in standard paths.

I don't know why nvfortran does not work for you. Does make check work in PETSc? If you want me to check the details, send me PETSc's configure.log and make.log by email, together with SLEPc's configure.log.

sangallidavide commented 10 months ago
andrea-ferretti commented 10 months ago

Hi All,

great news, thanks !!!! Let me reply here to some of the comments:

sangallidavide commented 10 months ago

this is anyway working only with the nvidia compiler (and I'm personally not attempting to support fortran + openacc, at present)

You mean gfortran + openacc ?

Anyway yeah, for the time being let's stick on nvfortran and rely on pre-compiled slepc and petsc with CUDA support. Are these available on Cineca for example?

Finally, for the compilation of the internal petsc/slepc is more related to the yambo configure. There is a private issue here https://github.com/yambo-code/yambo-devel/issues/736 which needs to be fixed. I think this is the source of the error

andrea-ferretti commented 10 months ago

yes, gfortran + openacc, apologies

sangallidavide commented 10 months ago

Branch is clean on my machine: https://media.yambo-code.eu/robots/slepc_gpu/unimi-XPS-8930.php

sangallidavide commented 8 months ago

Dear all, we are testing yambo+slepc with GPU support on the CINECA cluster Leonardo. However when comparing CPU only (5.2.0) vs GPU (5.2.1) run we do not see any significant difference. I guess something might not be correct in the compilation. The Petsc are compiled with CUDA and the PETSC_HAVE_CUDA flag is active (we see lines protected by this flag inside the pre-processed yambo source).

I attach a comparison of two runs: 1) yambo 5.2.0 (before this pull request was merged) without GPU support for the SLEPC solver 1) yambo 5.2.1 (after this pull request was merged) with GPU support for the SLEPC solver

test-slepc.tar.gz Do you have suggestions?

blmelp commented 8 months ago
  1. Just to make sure, I have noticed that there are some differences in the configuration of the tests and the results, for example the line: DipComputed= "R V P" # [DIP] [default R P V; extra P2 Spin Orb]
  2. It is possible that if the test is small, the overhead of sending data between CPU and GPU is larger than the benefit (specially when MPI is also involved, in our tests we used only one or two processes on MPI with GPU). I have noticed that many of the individual times are also larger, not only the SLEPc solver.
  3. Could you send us some additional logs for PETSc? They can be obtained on the standard output by running again the test with some PETSc options, for example I use the line: PETSC_OPTIONS="-log_view -log_view_gpu_time -eps_view_mat0 ::ascii_info_detail" mpirun -np 1 yambo -F $input -J BSE_JOB -C $results and redirect the standard output to a file.
  4. Another thing we can test is adding to the PETSC_OPTIONS -device_enable_cuda eager also to see if anything changes.
palful commented 8 months ago

Hi Blanca, thank you for the feedback.

I've addressed points 3 and 4 by rerunning the tests with the options you specified (logs and stdouts attached). It looks like the GPU stuff is detected. I also checked that running the test with 4 MPI tasks (4 gpu devices) takes approximately the same time as doing it with just 1 MPI task (2m in my present test).

01-test-slepc.tar.gz

sangallidavide commented 8 months ago

I put below the last tests from @palful here for reference. It looks like it is all good. The overhead was the source of the longer time in the GPU only case.

So all good.

MPI tasks: 16 | kernel size: 16200 => 1.9GB   | BSSNEig: 2000 | slepc: CPU | Timing: 99.0s
MPI tasks: 16 | kernel size: 16200 => 1.9GB   | BSSNEig: 2000 | slepc: GPU | Timing: 132.1s

MPI tasks:   1 | kernel size: 16200 => 1.9GB   | BSSNEig: 2000 | slepc: CPU | Timing: 565.4s
MPI tasks:   1 | kernel size: 16200 => 1.9GB   | BSSNEig: 2000 | slepc: GPU | Timing: 76.4s

MPI tasks: 16 | kernel size: 43200 => 13.9GB | BSSNEig: 2000 | slepc: CPU | Timing: 511.9s
MPI tasks: 16 | kernel size: 43200 => 13.9GB | BSSNEig: 2000 | slepc: GPU | Timing: 192.5s
joseeroman commented 8 months ago

The CUDA support in SLEPc is work in progress. Some comments:

joseeroman commented 7 months ago

You can try this change in SLEPc https://gitlab.com/slepc/slepc/-/merge_requests/631 which is already available in SLEPc main (note that trying SLEPc main requires PETSc main)