Open tpowis opened 3 years ago
Hello, I can confirm what you found out about compiling with OpenMP + PGI (20.4) with Hypre Master (as of 2021/01/29) where I also want to benefit from MPI: << $ ./configure --enable-shared --enable-bigint --with-openmp --with-MPI-include=$MPI_INCDIR --with-MPI-libs="mpi" --with-MPI-lib-dirs=$MPI_LIBDIR --prefix=/opt/hypre_pgi_omp_mpi checking build system type... x86_64-pc-linux-gnu checking host system type... x86_64-pc-linux-gnu checking for mpxlc... no checking for mpixlc_r... n ... mpigcc... no checking for mpicc... mpicc checking for mpxlC... no checking for mpixlcxx_r... ... mpig++... no checking for mpic++... mpic++ checking for mpxlf... no checking for mpixlf77_r... no checking for mpiifort... no checking for mpif77... mpif77 checking whether make sets $(MAKE)... yes checking for ranlib... ranlib checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether mpicc accepts -g... yes checking for mpicc option to accept ISO C89... none needed checking whether we are using the GNU C++ compiler... yes checking whether mpic++ accepts -g... yes checking whether we are using the GNU Fortran compiler... no checking whether mpif77 accepts -g... yes checking how to get verbose linking output from mpif77... -v ... checking for dummy main to link with Fortran libraries... none checking for Fortran name-mangling scheme... lower case, underscore, no extra underscore checking for MPI_Init... yes checking for mpi.h... yes checking for MPI_Comm_f2c... yes checking how to run the C preprocessor... mpicc -E checking for grep that handles long lines and -e... /usr/bin/grep checking for egrep... /usr/bin/grep -E checking whether MPI_Comm_f2c is a macro... no checking for cabs in -lm... no checking the hostname... r6login checking the architecture... x86_64 configure: creating ./config.status config.status: creating config/Makefile.config config.status: creating HYPRE_config.h $ make Making blas ... make[1]: Entering directory `/tmp/hypre-master210129/src/blas' mpicc -O2 -fopenmp -fPIC -DHAVE_CONFIG_H -I.. -I../utilities -c dasum.c pgcc-Error-Unknown switch: -fopenmp
whereas when compiling for GPU it does look for nvcc properly << ./configure --enable-shared --with-cuda --enable-unified-memory --enable-cublas --with-MPI-include=$MPI_INCDIR --with-MPI-lib-dirs=$MPI_LIBDIR --prefix=/opt/hypre_cuda ... bla bla bla ... checking for stdint.h... yes checking for unistd.h... yes checking /opt/cuda-11.1.0/system/default/include/cuda.h usability... yes checking /opt/cuda-11.1.0/system/default/include/cuda.h presence... yes checking for /opt/cuda-11.1.0/system/default/include/cuda.h... yes checking for nvcc... nvcc
and compiles smoothly. If, for OpenMP, I set export CC=pgcc, it fails to link against "-lmpi": << pgcc -shared -o libHYPRE-2.20.0.so /blablabla/.o -L/opt/openmpi-4.0.3/pgi--20.4/default/lib -lm -Wl,-soname,libHYPRE-2.20.0.so -Wl,-z,defs -mp
I can add the "-lmpi" by hand but when typing make it wants to compile it again. So I found no way to compile MPI+OpenMP+PGI. Then, I would appreciate a basic example using a GPU, such as ex3.c or ex12.c with GPU... Olivier Cessenat
Hi @cessenat ,
For GPU examples I built Hypre on the gpu-examples branch. I modified ex3.c for profiling the GPU implementation.
However I still haven't solved this MPI+OpenMP issue with pgi/nvc on the CPU.
Cheers
Thank you very much for the pointer !
nvcc -O2 -lineinfo -ccbin=mpic++ -gencode arch=compute_60,"code=sm_60" -expt-extended-lambda -dc -std=c++11 --x cu -Xcompiler "-O2 " -Xcompiler "-fPIC" -DHAVE_CONFIG_H -I.. -I. -I./.. -I./../utilities -I/ccc/products/cuda-11.1.0/system/default/include -I/ccc/products/openmpi-4.0.3/pgi--20.4/default/include -c csr_matvec_device.c -o csr_matvec_device.obj csr_matvec_device.c(69): error: identifier "cusparseDcsr2csc" is undefined
csr_matvec_device.c(73): error: identifier "cusparseDcsrmv" is undefined
csr_matvec_device.c(85): error: identifier "cusparseDcsrmv" is undefined
3 errors detected in the compilation of "csr_matvec_device.c".
mpicc -DHYPRE_USING_CUDA -O2 -Xcompiler "-fPIC" -DHAVE_CONFIG_H -g -Wall -I${HYPRE_DIR}/include -I${CUDA_INCDIR} -c ex3.c nvcc -o ex3 ex3.o -L${HYPRE_DIR}/lib -lHYPRE -lm -L${CUDA_LIBDIR} -lcudart -lcusparse -lcublas -lcurand -ccbin=mpic++ -gencode arch=compute_60,"code=sm_60" -Xcompiler "" -lstdc++
I get at execution :
mpirun -x -n 1 ./ex3
CUDA ERROR (code = 3, initialization error) at hypre_general.c:137 ex3: hypre_general.c:137: int HYPRE_Init(): Assertion `0' failed.
Do you see any obvious mistake I would have done ?
Cheers,
Olivier
Le 29/01/2021 à 17:07, Tasman Powis a écrit :
Hi @cessenat https://github.com/cessenat ,
For GPU examples I built Hypre on the gpu-examples branch. I modified ex3.c for profiling the GPU implementation.
However I still haven't solved this MPI+OpenMP issue with pgi/nvc on the CPU.
Cheers
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hypre-space/hypre/issues/262#issuecomment-769895176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXB6GGRVWURHB5WXUX6DE3S4LMKNANCNFSM4WOQOZJA.
Hi @cessenat,
Apologies but I'm not sure. The branch may not have been updated for Cuda-11 as master was a few months back? I also use NVIDIA compute capability 7.0, so that could make a difference.
Sorry I can't be more of help. This might be best to raise in a difference issue (if it hasn't been already)?
Cheers
I'm trying to run Hypre (Struct PFMG) on the CPUs of a heterogeneous IBM Power9+V100 system (Traverse at Princeton University). However I see a 3-4x drop in performance when compiling with OpenMP enabled, just with OMP_NUM_THREADS=1.
I'm using the latest NVIDIA nvhpc/20.7 compiler with OpenMPI 4.04. Note that the Hypre
configure
file does not appear to recognise the newnvc
,nvc++
,nvfortran
compiler commands so I trick it using:Without setting these
--with-openmp
does not add the necessary-mp
(or even-fast
) flags toMakefile.config
.My configuration looks like the following:
This build demonstrates a speedup with
OMP_NUM_THREADS > 1
. However if I configure without the--with-openmp
flag I see a 3-4x speedup (with the same output) forOMP_NUM_THREADS=1
, but of course without the--with-openmp
flag there is no speedup forOMP_NUM_THREADS > 1
.Note that I have also tried different pgi/nvidia compilers and observe the following behaviour (all comparisons are made against an
nvhpc/20.7
configuration without the--with-openmp
flag and forOMP_NUM_THREADS=1
).pgi/19.9
- 2.3x slowerpgi/20.4
- 2.5x slowernvhpc/20.7
- 3.5x slowernvhpc/20.9
- 3.5x slowernvhpc/20.11
- 3.5x slowerAny advice or help would be much appreciated! Please let me know if I can provide any further details.
Cheers