hypre-space / hypre

Parallel solvers for sparse linear systems featuring multigrid methods.
https://www.llnl.gov/casc/hypre/
Other
697 stars 192 forks source link

Performance drop compiling with OpenMP with pgi/nvidia compilers on CPU #262

Open tpowis opened 3 years ago

tpowis commented 3 years ago

I'm trying to run Hypre (Struct PFMG) on the CPUs of a heterogeneous IBM Power9+V100 system (Traverse at Princeton University). However I see a 3-4x drop in performance when compiling with OpenMP enabled, just with OMP_NUM_THREADS=1.

I'm using the latest NVIDIA nvhpc/20.7 compiler with OpenMPI 4.04. Note that the Hypre configure file does not appear to recognise the new nvc, nvc++, nvfortran compiler commands so I trick it using:

export CC=pgcc
export CXX=pgc++
export FC=pgf77

Without setting these --with-openmp does not add the necessary -mp (or even -fast) flags to Makefile.config.

My configuration looks like the following:

./configure --with-openmp --with-MPI-include=<relevant directories> --with-MPI-libs="mpi" --with-MPI-lib-dirs=<relevant directories>

This build demonstrates a speedup with OMP_NUM_THREADS > 1. However if I configure without the --with-openmp flag I see a 3-4x speedup (with the same output) for OMP_NUM_THREADS=1, but of course without the --with-openmp flag there is no speedup for OMP_NUM_THREADS > 1.

Note that I have also tried different pgi/nvidia compilers and observe the following behaviour (all comparisons are made against an nvhpc/20.7 configuration without the --with-openmp flag and for OMP_NUM_THREADS=1). pgi/19.9 - 2.3x slower pgi/20.4 - 2.5x slower nvhpc/20.7 - 3.5x slower nvhpc/20.9 - 3.5x slower nvhpc/20.11 - 3.5x slower

Any advice or help would be much appreciated! Please let me know if I can provide any further details.

Cheers

cessenat commented 3 years ago

Hello, I can confirm what you found out about compiling with OpenMP + PGI (20.4) with Hypre Master (as of 2021/01/29) where I also want to benefit from MPI: << $ ./configure --enable-shared --enable-bigint --with-openmp --with-MPI-include=$MPI_INCDIR --with-MPI-libs="mpi" --with-MPI-lib-dirs=$MPI_LIBDIR --prefix=/opt/hypre_pgi_omp_mpi checking build system type... x86_64-pc-linux-gnu checking host system type... x86_64-pc-linux-gnu checking for mpxlc... no checking for mpixlc_r... n ... mpigcc... no checking for mpicc... mpicc checking for mpxlC... no checking for mpixlcxx_r... ... mpig++... no checking for mpic++... mpic++ checking for mpxlf... no checking for mpixlf77_r... no checking for mpiifort... no checking for mpif77... mpif77 checking whether make sets $(MAKE)... yes checking for ranlib... ranlib checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether mpicc accepts -g... yes checking for mpicc option to accept ISO C89... none needed checking whether we are using the GNU C++ compiler... yes checking whether mpic++ accepts -g... yes checking whether we are using the GNU Fortran compiler... no checking whether mpif77 accepts -g... yes checking how to get verbose linking output from mpif77... -v ... checking for dummy main to link with Fortran libraries... none checking for Fortran name-mangling scheme... lower case, underscore, no extra underscore checking for MPI_Init... yes checking for mpi.h... yes checking for MPI_Comm_f2c... yes checking how to run the C preprocessor... mpicc -E checking for grep that handles long lines and -e... /usr/bin/grep checking for egrep... /usr/bin/grep -E checking whether MPI_Comm_f2c is a macro... no checking for cabs in -lm... no checking the hostname... r6login checking the architecture... x86_64 configure: creating ./config.status config.status: creating config/Makefile.config config.status: creating HYPRE_config.h $ make Making blas ... make[1]: Entering directory `/tmp/hypre-master210129/src/blas' mpicc -O2 -fopenmp -fPIC -DHAVE_CONFIG_H -I.. -I../utilities -c dasum.c pgcc-Error-Unknown switch: -fopenmp

whereas when compiling for GPU it does look for nvcc properly << ./configure --enable-shared --with-cuda --enable-unified-memory --enable-cublas --with-MPI-include=$MPI_INCDIR --with-MPI-lib-dirs=$MPI_LIBDIR --prefix=/opt/hypre_cuda ... bla bla bla ... checking for stdint.h... yes checking for unistd.h... yes checking /opt/cuda-11.1.0/system/default/include/cuda.h usability... yes checking /opt/cuda-11.1.0/system/default/include/cuda.h presence... yes checking for /opt/cuda-11.1.0/system/default/include/cuda.h... yes checking for nvcc... nvcc

and compiles smoothly. If, for OpenMP, I set export CC=pgcc, it fails to link against "-lmpi": << pgcc -shared -o libHYPRE-2.20.0.so /blablabla/.o -L/opt/openmpi-4.0.3/pgi--20.4/default/lib -lm -Wl,-soname,libHYPRE-2.20.0.so -Wl,-z,defs -mp

I can add the "-lmpi" by hand but when typing make it wants to compile it again. So I found no way to compile MPI+OpenMP+PGI. Then, I would appreciate a basic example using a GPU, such as ex3.c or ex12.c with GPU... Olivier Cessenat

tpowis commented 3 years ago

Hi @cessenat ,

For GPU examples I built Hypre on the gpu-examples branch. I modified ex3.c for profiling the GPU implementation.

However I still haven't solved this MPI+OpenMP issue with pgi/nvc on the CPU.

Cheers

cessenat commented 3 years ago

Thank you very much for the pointer !

nvcc -O2 -lineinfo -ccbin=mpic++ -gencode arch=compute_60,"code=sm_60" -expt-extended-lambda -dc -std=c++11 --x cu -Xcompiler "-O2 " -Xcompiler "-fPIC" -DHAVE_CONFIG_H -I.. -I. -I./.. -I./../utilities -I/ccc/products/cuda-11.1.0/system/default/include -I/ccc/products/openmpi-4.0.3/pgi--20.4/default/include -c csr_matvec_device.c -o csr_matvec_device.obj csr_matvec_device.c(69): error: identifier "cusparseDcsr2csc" is undefined

csr_matvec_device.c(73): error: identifier "cusparseDcsrmv" is undefined

csr_matvec_device.c(85): error: identifier "cusparseDcsrmv" is undefined

3 errors detected in the compilation of "csr_matvec_device.c".

mpicc  -DHYPRE_USING_CUDA -O2  -Xcompiler "-fPIC" -DHAVE_CONFIG_H  -g -Wall -I${HYPRE_DIR}/include -I${CUDA_INCDIR} -c ex3.c nvcc -o ex3 ex3.o -L${HYPRE_DIR}/lib -lHYPRE -lm -L${CUDA_LIBDIR} -lcudart -lcusparse -lcublas -lcurand   -ccbin=mpic++ -gencode arch=compute_60,"code=sm_60" -Xcompiler "" -lstdc++

I get at execution :

mpirun -x -n 1 ./ex3

CUDA ERROR (code = 3, initialization error) at hypre_general.c:137 ex3: hypre_general.c:137: int HYPRE_Init(): Assertion `0' failed.

Do you see any obvious mistake I would have done ?

Cheers,

Olivier

Le 29/01/2021 à 17:07, Tasman Powis a écrit :

Hi @cessenat https://github.com/cessenat ,

For GPU examples I built Hypre on the gpu-examples branch. I modified ex3.c for profiling the GPU implementation.

However I still haven't solved this MPI+OpenMP issue with pgi/nvc on the CPU.

Cheers

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hypre-space/hypre/issues/262#issuecomment-769895176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXB6GGRVWURHB5WXUX6DE3S4LMKNANCNFSM4WOQOZJA.

tpowis commented 3 years ago

Hi @cessenat,

Apologies but I'm not sure. The branch may not have been updated for Cuda-11 as master was a few months back? I also use NVIDIA compute capability 7.0, so that could make a difference.

Sorry I can't be more of help. This might be best to raise in a difference issue (if it hasn't been already)?

Cheers