Building with cuda gives an error in HYPRE_handle.c.o

marcosvanella commented 2 months ago

Hi, I'm trying to build Hypre with cuda in my linux laptop and get this error both with make and cmake. I'm trying to use the intel mpi libraries. Thank you for your help. Marcos

$ cmake -DHYPRE_WITH_OPENMP=ON -DHYPRE_WITH_CUDA=ON .. -- The C compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- The CXX compiler identification is GNU 11.4.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Enabled support for CXX. -- Using CXX standard: c++11 -- Looking for a CUDA compiler -- Looking for a CUDA compiler - /usr/bin/nvcc -- The CUDA compiler identification is NVIDIA 11.5.119 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Enabled support for CUDA. -- Using CUDA architecture: 70 -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE
-- Found CUDA: /usr (found version "11.5") -- Found CUDAToolkit: /usr/include (found version "11.5.119") -- Found MPI_C: /home/marcosvanella/intel/oneapi/mpi/2021.12/lib/libmpi.so (found version "3.1") -- Found MPI_CXX: /home/marcosvanella/intel/oneapi/mpi/2021.12/lib/libmpicxx.so (found version "3.1") -- Found MPI: TRUE (found version "3.1")
-- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5")
-- Configuring done -- Generating done -- Build files have been written to: /home/marcosvanella/Documents/Software/hypre/src/cmbuild (base) marcosvanella@pop-os cmbuild (master) $ make [ 0%] Building C object CMakeFiles/HYPRE.dir/blas/dasum.c.o [ 0%] Building C object CMakeFiles/HYPRE.dir/blas/daxpy.c.o [ 0%] Building C object CMakeFiles/HYPRE.dir/blas/dcopy.c.o [ 0%] Building C object CMakeFiles/HYPRE.dir/blas/ddot.c.o [ 0%] Building C object CMakeFiles/HYPRE.dir/blas/dgemm.c.o [ 0%] Building C object CMakeFiles/HYPRE.dir/blas/dgemv.c.o [ 1%] Building C object CMakeFiles/HYPRE.dir/blas/dger.c.o [ 1%] Building C object CMakeFiles/HYPRE.dir/blas/dnrm2.c.o [ 1%] Building C object CMakeFiles/HYPRE.dir/blas/drot.c.o [ 1%] Building C object CMakeFiles/HYPRE.dir/blas/dscal.c.o [ 1%] Building C object CMakeFiles/HYPRE.dir/blas/dswap.c.o [ 1%] Building C object CMakeFiles/HYPRE.dir/blas/dsymm.c.o [ 2%] Building C object CMakeFiles/HYPRE.dir/blas/dsymv.c.o [ 2%] Building C object CMakeFiles/HYPRE.dir/blas/dsyr2.c.o [ 2%] Building C object CMakeFiles/HYPRE.dir/blas/dsyr2k.c.o [ 2%] Building C object CMakeFiles/HYPRE.dir/blas/dsyrk.c.o [ 2%] Building C object CMakeFiles/HYPRE.dir/blas/dtrmm.c.o [ 2%] Building C object CMakeFiles/HYPRE.dir/blas/dtrmv.c.o [ 2%] Building C object CMakeFiles/HYPRE.dir/blas/dtrsm.c.o [ 3%] Building C object CMakeFiles/HYPRE.dir/blas/dtrsv.c.o [ 3%] Building C object CMakeFiles/HYPRE.dir/blas/f2c.c.o [ 3%] Building C object CMakeFiles/HYPRE.dir/blas/idamax.c.o [ 3%] Building C object CMakeFiles/HYPRE.dir/blas/lsame.c.o [ 3%] Building C object CMakeFiles/HYPRE.dir/blas/xerbla.c.o [ 3%] Building C object CMakeFiles/HYPRE.dir/lapack/dbdsqr.c.o [ 4%] Building C object CMakeFiles/HYPRE.dir/lapack/dgebd2.c.o [ 4%] Building C object CMakeFiles/HYPRE.dir/lapack/dgebrd.c.o [ 4%] Building C object CMakeFiles/HYPRE.dir/lapack/dgelq2.c.o [ 4%] Building C object CMakeFiles/HYPRE.dir/lapack/dgelqf.c.o [ 4%] Building C object CMakeFiles/HYPRE.dir/lapack/dgels.c.o [ 4%] Building C object CMakeFiles/HYPRE.dir/lapack/dgeqr2.c.o [ 5%] Building C object CMakeFiles/HYPRE.dir/lapack/dgeqrf.c.o [ 5%] Building C object CMakeFiles/HYPRE.dir/lapack/dgesvd.c.o [ 5%] Building C object CMakeFiles/HYPRE.dir/lapack/dgetrf.c.o [ 5%] Building C object CMakeFiles/HYPRE.dir/lapack/dgetri.c.o [ 5%] Building C object CMakeFiles/HYPRE.dir/lapack/dgetrs.c.o [ 5%] Building C object CMakeFiles/HYPRE.dir/lapack/dgetf2.c.o [ 5%] Building C object CMakeFiles/HYPRE.dir/lapack/dlabad.c.o [ 6%] Building C object CMakeFiles/HYPRE.dir/lapack/dlabrd.c.o [ 6%] Building C object CMakeFiles/HYPRE.dir/lapack/dlacpy.c.o [ 6%] Building C object CMakeFiles/HYPRE.dir/lapack/dlae2.c.o [ 6%] Building C object CMakeFiles/HYPRE.dir/lapack/dlaev2.c.o [ 6%] Building C object CMakeFiles/HYPRE.dir/lapack/dlamch.c.o [ 6%] Building C object CMakeFiles/HYPRE.dir/lapack/dlange.c.o [ 7%] Building C object CMakeFiles/HYPRE.dir/lapack/dlanst.c.o [ 7%] Building C object CMakeFiles/HYPRE.dir/lapack/dlansy.c.o [ 7%] Building C object CMakeFiles/HYPRE.dir/lapack/dlapy2.c.o [ 7%] Building C object CMakeFiles/HYPRE.dir/lapack/dlarfb.c.o [ 7%] Building C object CMakeFiles/HYPRE.dir/lapack/dlarf.c.o [ 7%] Building C object CMakeFiles/HYPRE.dir/lapack/dlarfg.c.o [ 8%] Building C object CMakeFiles/HYPRE.dir/lapack/dlarft.c.o [ 8%] Building C object CMakeFiles/HYPRE.dir/lapack/dlartg.c.o [ 8%] Building C object CMakeFiles/HYPRE.dir/lapack/dlas2.c.o [ 8%] Building C object CMakeFiles/HYPRE.dir/lapack/dlascl.c.o [ 8%] Building C object CMakeFiles/HYPRE.dir/lapack/dlaset.c.o [ 8%] Building C object CMakeFiles/HYPRE.dir/lapack/dlasq1.c.o [ 8%] Building C object CMakeFiles/HYPRE.dir/lapack/dlasq2.c.o [ 9%] Building C object CMakeFiles/HYPRE.dir/lapack/dlasq3.c.o [ 9%] Building C object CMakeFiles/HYPRE.dir/lapack/dlasq4.c.o [ 9%] Building C object CMakeFiles/HYPRE.dir/lapack/dlasq5.c.o [ 9%] Building C object CMakeFiles/HYPRE.dir/lapack/dlasq6.c.o [ 9%] Building C object CMakeFiles/HYPRE.dir/lapack/dlasr.c.o [ 9%] Building C object CMakeFiles/HYPRE.dir/lapack/dlasrt.c.o [ 10%] Building C object CMakeFiles/HYPRE.dir/lapack/dlassq.c.o [ 10%] Building C object CMakeFiles/HYPRE.dir/lapack/dlaswp.c.o [ 10%] Building C object CMakeFiles/HYPRE.dir/lapack/dlasv2.c.o [ 10%] Building C object CMakeFiles/HYPRE.dir/lapack/dlatrd.c.o [ 10%] Building C object CMakeFiles/HYPRE.dir/lapack/dorg2l.c.o [ 10%] Building C object CMakeFiles/HYPRE.dir/lapack/dorg2r.c.o [ 11%] Building C object CMakeFiles/HYPRE.dir/lapack/dorgbr.c.o [ 11%] Building C object CMakeFiles/HYPRE.dir/lapack/dorgl2.c.o [ 11%] Building C object CMakeFiles/HYPRE.dir/lapack/dorglq.c.o [ 11%] Building C object CMakeFiles/HYPRE.dir/lapack/dorgql.c.o [ 11%] Building C object CMakeFiles/HYPRE.dir/lapack/dorgqr.c.o [ 11%] Building C object CMakeFiles/HYPRE.dir/lapack/dorgtr.c.o [ 11%] Building C object CMakeFiles/HYPRE.dir/lapack/dorm2r.c.o [ 12%] Building C object CMakeFiles/HYPRE.dir/lapack/dormbr.c.o [ 12%] Building C object CMakeFiles/HYPRE.dir/lapack/dorml2.c.o [ 12%] Building C object CMakeFiles/HYPRE.dir/lapack/dormlq.c.o [ 12%] Building C object CMakeFiles/HYPRE.dir/lapack/dormqr.c.o [ 12%] Building C object CMakeFiles/HYPRE.dir/lapack/dpotf2.c.o [ 12%] Building C object CMakeFiles/HYPRE.dir/lapack/dpotrf.c.o [ 13%] Building C object CMakeFiles/HYPRE.dir/lapack/dpotrs.c.o [ 13%] Building C object CMakeFiles/HYPRE.dir/lapack/dsteqr.c.o [ 13%] Building C object CMakeFiles/HYPRE.dir/lapack/dsterf.c.o [ 13%] Building C object CMakeFiles/HYPRE.dir/lapack/dsyev.c.o [ 13%] Building C object CMakeFiles/HYPRE.dir/lapack/dsygs2.c.o [ 13%] Building C object CMakeFiles/HYPRE.dir/lapack/dsygst.c.o [ 14%] Building C object CMakeFiles/HYPRE.dir/lapack/dsygv.c.o [ 14%] Building C object CMakeFiles/HYPRE.dir/lapack/dsytd2.c.o [ 14%] Building C object CMakeFiles/HYPRE.dir/lapack/dsytrd.c.o [ 14%] Building C object CMakeFiles/HYPRE.dir/lapack/dtrti2.c.o [ 14%] Building C object CMakeFiles/HYPRE.dir/lapack/dtrtri.c.o [ 14%] Building C object CMakeFiles/HYPRE.dir/lapack/ieeeck.c.o [ 14%] Building C object CMakeFiles/HYPRE.dir/lapack/ilaenv.c.o [ 15%] Building C object CMakeFiles/HYPRE.dir/lapack/lsame.c.o [ 15%] Building C object CMakeFiles/HYPRE.dir/lapack/xerbla.c.o [ 15%] Building CUDA object CMakeFiles/HYPRE.dir/utilities/HYPRE_handle.c.o /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’: 435 | function(_Functor&& f) | ^ /usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’ /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’: 530 | operator=(_Functor&& f) | ^ /usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’ make[2]: [CMakeFiles/HYPRE.dir/build.make:1434: CMakeFiles/HYPRE.dir/utilities/HYPRE_handle.c.o] Error 1 make[1]: [CMakeFiles/Makefile2:457: CMakeFiles/HYPRE.dir/all] Error 2 make: *** [Makefile:136: all] Error 2

victorapm commented 2 months ago

@marcosvanella This seems a compiler issue. To help debugging this problem, would you be able to share:

Generated CMakeCache.txt file (under cmake's build directory)
Output of make VERBOSE=1

marcosvanella commented 1 month ago

Hi Victor, apologies for the delay. I tried using make also, this is my confmake.sh script:

!/bin/bash
./configure --prefix=/home/marcosvanella/Documents/Software/hypre_ifx \
            CC=mpiicx FC=mpiifx \
            CFLAGS="-O3" FFLAGS="-O3" \
            --with-cuda CUCC=nvcc
make VERBOSE=1

Again it all goes well up to the first cuda compilation:

...
mpiicx -O3  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c state.c
mpiicx -O3  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c threading.c
mpiicx -O3  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c timer.c
mpiicx -O3  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c timing.c
nvcc -gencode arch=compute_70,code=sm_70  -O2 -lineinfo -expt-extended-lambda -std=c++11 --x cu -Xcompiler "-O2 "  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c device_utils.c -o device_utils.obj
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
make[1]: *** [../config/Makefile.config:66: device_utils.obj] Error 1
make[1]: Leaving directory '/home/marcosvanella/Documents/Software/hypre/src/utilities'
make: *** [Makefile:91: all] Error 1

victorapm commented 1 month ago

Hi @marcosvanella, could you share the output of mpiicx --version ? Could you also try again without specifying CUCC? Lastly, what is the target GPU you are compile for?

marcosvanella commented 1 month ago

Hi Victor, this is what i get for mpiicx:

$ mpiicx --version
Intel(R) oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/marcosvanella/intel/oneapi/compiler/2024.1/bin/compiler
Configuration file: /home/marcosvanella/intel/oneapi/compiler/2024.1/bin/compiler/../icx.cfg

compiling without CUCC=nvcc gives me this interesting error:

...
mpiicx -O3  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c state.c
mpiicx -O3  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c threading.c
mpiicx -O3  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c timer.c
mpiicx -O3  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c timing.c
/usr/bin/nvcc -ccbin=mpiicpx -gencode arch=compute_70,code=sm_70  -O2 -lineinfo -expt-extended-lambda -std=c++11 --x cu -Xcompiler "-O2 "  -DHAVE_CONFIG_H -I.. -I./.. -I./../struct_mv -I.  -I/usr/include          -c device_utils.c -o device_utils.obj
In file included from <built-in>:1:
In file included from /usr/include/cuda_runtime.h:83:
/usr/include/crt/host_config.h:147:2: error: -- unsupported clang version! clang version must be less than 13 and greater than 3.2 . The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
  147 | #error -- unsupported clang version! clang version must be less than 13 and greater than 3.2 . The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
      |  ^
1 error generated.
make[1]: *** [../config/Makefile.config:66: device_utils.obj] Error 1
make[1]: Leaving directory '/home/marcosvanella/Documents/Software/hypre/src/utilities'
make: *** [Makefile:91: all] Error 1

Not clear what clang version it is referring to. Is this the icpx compiler behind mpiicpx? The system is a System76 pop-os Linux laptop with a 8 GB GeForce RTX 4070 with 4608 CUDA Cores.

Thank you for taking time with this.

victorapm commented 1 month ago

Thank you! icpx is built on top of llvm's clang, hence the reference to clang in the error message.

It seems you have an incompatible software stack for cuda and C++ in your machine. I can think of a few options:

Upgrade your cuda install
Install another compiler compatible with your current cuda install, e.g., clang-13 or gcc (11 will probably work)
Install hypre with cuda support via spack: spack install hypre+cuda cuda_arch=89

I recommend the last option since it will handle all the package dependencies automatically for you.

Hope this solves your issue!

PS: your GPU card supports compute capability 89

marcosvanella commented 1 month ago

Thank you Victor, I will try spack. I have this cuda version installed. I'll also see if I can do an upgrade for it:

$nvidia-smi
Thu Sep 26 15:02:39 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |

I'll let you know. Thanks.

marcosvanella commented 1 month ago

Hi Victor, I was able to build the application with openmpi compiled with the gnu 10 compilers, given my cuda version. Any version higher than 10 was gicing the first error I posted. Then I went to the examples dir and typed make to make the examples. This is what I got:


mpicc -g -Wall   -I../hypre/include   -c ex1.c
mpicc -o ex1 ex1.o   -L../hypre/lib -lHYPRE -lm  
/usr/bin/ld: ../hypre/lib/libHYPRE.a(HYPRE_struct_pcg.obj): in function `__nv_hdl_wrapper_t<false, false, __nv_dl_tag<int (*)(hypre_StructSolver_struct*, hypre_StructMatrix_struct*, hypre_StructVector_struct*, hypre_StructVector_struct*), &HYPRE_StructDiagScale, 1u>, void (int), hypre_Boxloop_struct, hypre_Boxloop_struct, hypre_Boxloop_struct, double*, double*, double*>::manager<HYPRE_StructDiagScale::{lambda(int)#1}>::do_copy(void*)':
tmpxft_00115247_00000000-6_HYPRE_struct_pcg.cudafe1.cpp:(.text+0x15e): undefined reference to `operator new(unsigned long)'
/usr/bin/ld: ../hypre/lib/libHYPRE.a(HYPRE_struct_pcg.obj): in function `__device_stub__Z13forall_kernelIZ21HYPRE_StructDiagScaleEUnvhdl0_0_6_PFiP25hypre_StructSolver_structP25hypre_StructMatrix_structP25hypre_StructVector_structS5_E21HYPRE_StructDiagScale1_viE20hypre_Boxloop_structS8_S8_PdS9_S9_EvRPvT_i(void**, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<int (*)(hypre_StructSolver_struct*, hypre_StructMatrix_struct*, hypre_StructVector_struct*, hypre_StructVector_struct*), &HYPRE_StructDiagScale, 1u>, void (int), hypre_Boxloop_struct, hypre_Boxloop_struct, hypre_Boxloop_struct, double*, double*, double*>&, int)':
tmpxft_00115247_00000000-6_HYPRE_struct_pcg.cudafe1.cpp:(.text+0x26f): undefined reference to `__cudaPopCallConfiguration'
/usr/bin/ld: tmpxft_00115247_00000000-6_HYPRE_struct_pcg.cudafe1.cpp:(.text+0x29f): undefined reference to `cudaLaunchKernel'
/usr/bin/ld: ../hypre/lib/libHYPRE.a(HYPRE_struct_pcg.obj): in function `HYPRE_StructDiagScale':
tmpxft_00115247_00000000-6_HYPRE_struct_pcg.cudafe1.cpp:(.text+0x89c): undefined reference to `operator new(unsigned long)'
/usr/bin/ld: ../hypre/lib/libHYPRE.a(HYPRE_struct_pcg.obj): in function `__nv_hdl_wrapper_t<false, false, __nv_dl_tag<int (*)(hypre_StructSolver_struct*, hypre_StructMatrix_struct*, hypre_StructVector_struct*, hypre_StructVector_struct*), &HYPRE_StructDiagScale, 1u>, void (int), hypre_Boxloop_struct, hypre_Boxloop_struct, hypre_Boxloop_struct, double*, double*, double*>::manager<HYPRE_StructDiagScale::{lambda(int)#1}>::do_delete(void*)':
....

Am I missing linking to the cuda libraries? How would I go about making the examples with gpu offloading? Thanks!

victorapm commented 1 month ago

@marcosvanella can you try the branch https://github.com/hypre-space/hypre/tree/gpu-examples (PR linked to this issue) and let me know how it goes?

marcosvanella commented 1 month ago

Hi Victor, I recompiled the gpu-examples branch in the same way as done before. I tried ij cuda test and got this error.

(base) marcosvanella@pop-os test (gpu-examples) $ ./ij
=============================================
Hypre init times:
=============================================
Hypre init:
  wall clock time = 0.963593 seconds
  wall MFLOPS     = 0.000000
  cpu clock time  = 0.659546 seconds
  cpu MFLOPS      = 0.000000

Using HYPRE_DEVELOP_STRING: v2.31.0-38-g40ab15a6c (branch gpu-examples; not the develop branch)

Running with these driver parameters:
  solver ID    = 0

  Laplacian:   num_fun = 1
    (nx, ny, nz) = (10, 10, 10)
    (Px, Py, Pz) = (1, 1, 1)
    (cx, cy, cz) = (1.000000, 1.000000, 1.000000)

=============================================
Generate Matrix:
=============================================
Spatial Operator:
  wall clock time = 0.000277 seconds
  wall MFLOPS     = 0.000000
  cpu clock time  = 0.000094 seconds
  cpu MFLOPS      = 0.000000

  Number of vector components: 1
  RHS vector has unit coefficients
  Initial guess is 0
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device
[pop-os:117996] *** Process received signal ***
[pop-os:117996] Signal: Aborted (6)
[pop-os:117996] Signal code:  (-6)
[pop-os:117996] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x12a274042520]
[pop-os:117996] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x12a2740969fc]
[pop-os:117996] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x12a274042476]
[pop-os:117996] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x12a2740287f3]
[pop-os:117996] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x12a2744a2b9e]
[pop-os:117996] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x12a2744ae20c]
[pop-os:117996] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x12a2744ae277]
[pop-os:117996] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x12a2744ae4d8]
[pop-os:117996] [ 8] ./ij(+0xc4d1b)[0x5c20d81bfd1b]
[pop-os:117996] [ 9] ./ij(+0xbd808)[0x5c20d81b8808]
[pop-os:117996] [10] ./ij(+0xa456b)[0x5c20d819f56b]
[pop-os:117996] [11] ./ij(+0x9116a)[0x5c20d818c16a]
[pop-os:117996] [12] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x12a274029d90]
[pop-os:117996] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x12a274029e40]
[pop-os:117996] [14] ./ij(+0x844f5)[0x5c20d817f4f5]
[pop-os:117996] *** End of error message ***
Aborted (core dumped)

I'm suspecting I still have some library dependency issue or a setting in my computer is not right. I'll dig more on this. When compiling the examples I got:

mpicc -O1 -g  -DHAVE_CONFIG_H  -DHYPRE_EXAMPLE_USING_CUDA -I/usr/include -DHYPRE_TIMING -I/home/marcosvanella/Documents/Software/hypre/src/hypre/include  -I/usr/include           -c ex01.c -o ex01.o
In file included from ex01.c:35:
ex.h:19:2: error: #error *** Running the examples on GPUs requires Unified Memory. Please reconfigure and rebuild with --enable-unified-memory ***
   19 | #error *** Running the examples on GPUs requires Unified Memory. Please reconfigure and rebuild with --enable-unified-memory ***
      |  ^~~~~
make: *** [Makefile:123: ex01.o] Error 1

which makes sense as I did not set the unified memory in the build. I'll try that next.

An aside question is, we are interfacing with HYPRE PCG+AMG to solve a Poisson equation (setup at the beginning and solution done twice per time step in a low mach LES solver). I noted that by default the residual norm is the C norm (I assume is the discretization matrix norm of r -> ||<C*r,r>|| ), and there is the optin to make this the L2 norm. We would like to have the infinity norm to test against the convergence tolerance, to have exact control cell to cell in our staggered grid solver. How would I go about this?

Thanks!

victorapm commented 1 month ago

Hi @marcosvanella, that error should go away after you compile hypre with unified memory support. Let me know if you still have problems. Note the GPU usage in the examples is currently not optimal and we plan to add specific GPU examples with best practices in the future. For example, when assembling matrices on device memory, you should call HYPRE_IJMatrixSetValues or HYPRE_IJMatrixAddToValues with input data corresponding to several rows instead of one row at a time as currently being done in the examples, which is fine for CPUs though.

Regarding C-norm, that refers to the preconditioned norm $\sqrt{r^T M^{-1} r}$.

We don't have the option of using infinity norm in our Krylov solvers, but that's something I can bring up to the team and discuss about adding.

Can you share more about your code? Is it open-source? What institution is it linked to?

Best!

marcosvanella commented 1 month ago

Hi Victor, thank you for the information. I added the unified memory flag and the examples compiled. I still have the issue regarding "no kernel found". Must be related to the libraries for cuda/gcc in my laptop. I'm not well versed in the cuda stack. I'll try the spack install next. We are seeing that setting the TWONORM to 1 we get ||r||_2/||b||_2 of the order of our tolerance and L_inf errors on that order too in the cases we tested. We might be able to just use this. We develop a Fire model here at NIST called FDS (https://github.com/firemodels/fds). It is an LES model used by the fire protection engineering community to simulate for buoyancy driven flows due to combustion. The model has several physics units, in this case we are interested in a solver for the Poisson equation we have to solve for the pressure in unstructured grids. Thank you for your help!

victorapm commented 1 month ago

Thanks @marcosvanella, this is an interesting application. Please keep me updated on your efforts to use hypre in FDS.

Have you configured hypre with --with-gpu-arch=89 or -DHYPRE_CUDA_SM=89 if using cmake? 89 is the cuda architecture code supported by the RTX 4070 card

marcosvanella commented 1 month ago

Hi Victor I'll try your suggestion. We have added HYPRE PCG_AMG support for one of our Poisson Solvers (A local solver that works by mesh block and single MPI process, tied within a block Jacobi iteration), we are getting our CI workflow prepared to be able to choose between PARDISO and HYPRE as base solvers. Once we have our verification cases up and running with HYPRE and our local solver, we will add HYPRE to our global matrix solver across all MPI processes in a calculation (currently uses CLUSTER_SPARSE_SOLVER from MKL). The plan is to have these options for users in an upcoming release, which will be also of use for ARM Macs users (they can't compile with MKL so we plan to have HYPRE as their option). I'll let you know how the CUDA compilation works out in my laptop. On the other hand, we have access to several systems, from local PPC+V100s, to Polaris, Frontier and now Vista in TACC (Grace-Hopper). So something we would like to have is to build matrix and RHS in the CPU and have the GPU do the solves as time advances. We'll probably need your help if possible for this. Thanks! cc @rmcdermo

marcosvanella commented 1 month ago

Hi Victor, we are encountering issues when running our debug target of FDS which has been linked to Hypre compiled with -O2 or -O3 flags. This is happening both with openmpi/gnu and impi/icx/ifort. The errors are erroneous arithmetic operations in all cases. For example in an Intel Mac where hypre has been compiled with "-m64 -O3 -ffast-math -ggdb" flags, and fds with -O0 -ggdb I see this on the first PCG solver call:

Process 75728 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_SSEEXTERR, subcode=0x1926)
    frame #0: 0x0000000106f1a955 fds_ompi_gnu_osx_db`hypre_BoomerAMGSolve(amg_vdata=0x00007fa3c7014200, A=0x00006000003b81a0, f=<unavailable>, u=0x000060000225d440) at par_amg_solve.c:0:35
   19    *--------------------------------------------------------------------*/
   20
   21   HYPRE_Int
-> 22   hypre_BoomerAMGSolve( void               *amg_vdata,
   23                         hypre_ParCSRMatrix *A,
   24                         hypre_ParVector    *f,
   25                         hypre_ParVector    *u         )
Note: this address is compiler-generated code in function hypre_BoomerAMGSolve that has no source code associated with it.
Target 0: (fds_ompi_gnu_osx_db) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_SSEEXTERR, subcode=0x1926)
  * frame #0: 0x0000000106f1a955 fds_ompi_gnu_osx_db`hypre_BoomerAMGSolve(amg_vdata=0x00007fa3c7014200, A=0x00006000003b81a0, f=<unavailable>, u=0x000060000225d440) at par_amg_solve.c:0:35
    frame #1: 0x0000000106f026ef fds_ompi_gnu_osx_db`hypre_PCGSolve(pcg_vdata=0x00006000003b8270, A=0x00006000003b81a0, b=0x000060000225d380, x=0x000060000225d400) at pcg.c:496:4
    frame #2: 0x0000000106effb2a fds_ompi_gnu_osx_db`HYPRE_PCGSolve(solver=<unavailable>, A=<unavailable>, b=<unavailable>, x=<unavailable>) at HYPRE_pcg.c:47:13 [artificial]
    frame #3: 0x0000000106f0819a fds_ompi_gnu_osx_db`HYPRE_ParCSRPCGSolve(solver=<unavailable>, A=<unavailable>, b=<unavailable>, x=<unavailable>) at HYPRE_parcsr_pcg.c:77:13 [artificial]
    frame #4: 0x0000000106f04e8a fds_ompi_gnu_osx_db`hypre_parcsrpcgsolve_(solver=<unavailable>, A=<unavailable>, b=<unavailable>, x=<unavailable>, ierr=<unavailable>) at F90_HYPRE_parcsr_pcg.c:84:14
    frame #5: 0x00000001013b06f4 fds_ompi_gnu_osx_db`__locmat_solver_MOD_ulmat_solve_zone at pres.f90:1727:99
    frame #6: 0x00000001013be851 fds_ompi_gnu_osx_db`__locmat_solver_MOD_ulmat_solver at pres.f90:1444:35
    frame #7: 0x0000000101a09554 fds_ompi_gnu_osx_db`pressure_iteration_scheme.26 at main.f90:1493:38
    frame #8: 0x00000001019747dc fds_ompi_gnu_osx_db`MAIN__ at main.f90:703:59
    frame #9: 0x0000000101a20418 fds_ompi_gnu_osx_db`main at main.f90:6:4
    frame #10: 0x00007ff802734345 dyld`start + 1909

Something similar is seen in a Linux system when using impi (mpiicx) and -O2 or -O3 flags in the Hypre compilation, but -O0 -g in our code compilation (FDS is fortran BTW):

forrtl: error (73): floating divide by zero
Image              PC                Routine            Line        Source
libc.so.6          00001531D5A54DB0  Unknown               Unknown  Unknown
fds_impi_intel_li  0000000009E33CB1  hypre_PCGSolve            777  pcg.c
fds_impi_intel_li  0000000009CFAFE5  hypre_parcsrpcgso          84  F90_HYPRE_parcsr_pcg.c
fds_impi_intel_li  000000000357672C  locmat_solver_mp_        1722  pres.f90
fds_impi_intel_li  000000000354E0BC  locmat_solver_mp_        1444  pres.f90
fds_impi_intel_li  000000000422D976  fds_IP_pressure_i        1493  main.f90
fds_impi_intel_li  0000000004208D9B  MAIN__                    703  main.f90
fds_impi_intel_li  0000000000408C3D  Unknown               Unknown  Unknown
libc.so.6          00001531D5A3FEB0  Unknown               Unknown  Unknown
libc.so.6          00001531D5A3FF60  __libc_start_main     Unknown  Unknown
fds_impi_intel_li  0000000000408B55  Unknown               Unknown  Unknown
Aborted (core dumped)

Note that we don't see any of these errors in our development (-O1) or production (-O2) targets. Do you have an idea about what could be happening? Are we missing some compilation flag in particular for HYPRE? Thanks! cc @rmcdermo

marcosvanella commented 1 month ago

Hi Victor, I tried this make config flags in my computer to compile hypre with openmp offloading:

./configure --prefix=/home/marcosvanella/Documents/Software/hypre_gnu \
            CC=mpicc FC=mpifort \
            CFLAGS='-m64 -O0 -g -fopenmp -no-pie -foffload=-O0' FFLAGS='-m64 -O0 -g -fopenmp -no-pie -foffload=-O0' \
            --with-device-openmp --enable-unified-memory --with-gpu-arch=89

make VERBOSE=1
make install

The code compiles and installs. Now when I try to make the tests with: $make test

I see there is something missing on the link phase for each test:

mpicc -m64 -O0 -g -fopenmp -no-pie -foffload=-O0  -DHAVE_CONFIG_H -I. -I/home/marcosvanella/Documents/Software/hypre/src/hypre/include             -DHYPRE_TIMING -DHYPRE_FORTRAN -c ij.c
Building ij ... 
**o ij ij.o -L/home/marcosvanella/Documents/Software/hypre/src/hypre/lib -lHYPRE -Wl,-rpath,/home/marcosvanella/Documents/Software/hypre/src/hypre/lib           -lm**            
make[1]: o: No such file or directory
make[1]: [Makefile:137: ij] Error 127 (ignored)

What could be going on? Thanks!

liruipeng commented 1 month ago

It seems that hypre has problems in finding a proper LINK_CC. You can check that in src/config.log.

marcosvanella commented 1 month ago

I think you are right. LINK_CC points to CUCC which is undefined. Should I set it?

liruipeng commented 1 month ago

I think you are right. LINK_CC points to CUCC which is undefined. Should I set it?

Yes probably. With device-openmp, we only check these compilers mpixlc-gpu mpiclang-gpu mpiicx. Apparently, you don't have any of them. I guess you set CUCC manually.

marcosvanella commented 1 month ago

Thank you Rui! I'll try it.

From: Rui Peng Li @.> Sent: Wednesday, October 23, 2024 01:27 AM To: hypre-space/hypre @.> Cc: marcosvanella @.>; Mention @.> Subject: Re: [hypre-space/hypre] Building with cuda gives an error in HYPRE_handle.c.o (Issue #1129)

I think you are right. LINK_CC points to CUCC which is undefined. Should I set it?

Yes probably. With device-openmp, we only check these compilers mpixlc-gpu mpiclang-gpu mpiicx. Apparently, you don't have any of them. I guess you set CUCC manually.

— Reply to this email directly, view it on GitHubhttps://github.com/hypre-space/hypre/issues/1129#issuecomment-2430939011, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABY23VNUCH6R6FYQWAWO6W3Z44XU7AVCNFSM6AAAAABOHIMGFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZQHEZTSMBRGE. You are receiving this because you were mentioned.Message ID: @.***>

victorapm commented 1 month ago

Hi Marcos, is there any specific reason for using the device openmp build of hypre? In general, we recommend --with-cuda since you are working with NVIDIA cards

victorapm commented 1 month ago

For the record, the SIGFPE issues we talked above have been solved offline. The solution is to compile hypre with -fno-unsafe-math-optimizations or -fp-model=precise depending on compiler support

hypre-space / hypre

Building with cuda gives an error in HYPRE_handle.c.o #1129