xsdk-project / xsdk-examples

Example codes demonstrating the use of various XSDK packages in combination.
17 stars 10 forks source link

AMReX + SUNDIALS example runtime error with HIP #50

Closed balos1 closed 1 year ago

balos1 commented 1 year ago

Originally posted by @v-dobrev in https://github.com/xsdk-project/xsdk-examples/issues/45#issuecomment-1546523242

When trying to run this example with HIP enabled I get the following error from ctest -V:

...
test 1
      Start  1: AMREX-amrex_sundials_advection_diffusion

1: Test command: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion
1: Working Directory: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials
1: Test timeout computed to be: 1500
1: Initializing HIP...
1: HIP initialized.
1: amrex::Abort::0::GPU last error detected in file /dev/shm/dobrev1/spack/var/spack/stage/spack-stage-amrex-22.09-huuzz4saz73j72hmwd6wupsmfpx62owg/spack-src/Src/Base/AMReX_GpuLaunchFunctsG.H line 809: shared object initialization failed !!!
1: SIGABRT
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: See Backtrace.0 file for details
1: MPICH ERROR [Rank 0] [job id 364801714345214976] [Fri May 12 21:03:56 2023] [tioga23] - Abort(6) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 6) - process 0
1: 
 1/26 Test  #1: AMREX-amrex_sundials_advection_diffusion ...***Failed    0.92 sec
...

Does anyone have any suggestions?

The Spack spec for AMReX is:

[+]  amrex@22.09%gcc@12.2.0~amrdata~cuda~eb~fortran~hdf5~hypre~ipo+linear_solvers+mpi~openmp~particles~petsc~pic~plotfile_tools+rocm~shared+sundials~sycl~tiny_profile amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo dimensions=3 generator=make precision=double arch=linux-rhel8-zen3
[+]      ^cmake@3.24.2%gcc@12.2.0~doc+ncurses+ownlibs~qt build_system=generic build_type=Release arch=linux-rhel8-zen3
[+]      ^cray-mpich@8.1.25%gcc@12.2.0+wrappers build_system=generic arch=linux-rhel8-zen3
[+]      ^gmake@4.2.1%gcc@12.2.0~guile build_system=autotools patches=ca60bd9,fe5b60d arch=linux-rhel8-zen3
[+]      ^hip@5.4.3%gcc@12.2.0~cuda~ipo+rocm build_system=cmake build_type=Release generator=make patches=ca523f1 arch=linux-rhel8-zen3
[+]      ^hsa-rocr-dev@5.4.3%gcc@12.2.0+image~ipo+shared build_system=cmake build_type=Release generator=make patches=71e6851 arch=linux-rhel8-zen3
[+]      ^llvm-amdgpu@5.4.3%gcc@12.2.0~ipo~link_llvm_dylib~llvm_dylib~openmp+rocm-device-libs build_system=cmake build_type=Release generator=ninja patches=a08bbe1 arch=linux-rhel8-zen3
[+]      ^rocprim@5.4.3%gcc@12.2.0~ipo amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+]      ^rocrand@5.4.3%gcc@12.2.0+hiprand~ipo amdgpu_target=auto build_system=cmake build_type=Release generator=make patches=a35e689 arch=linux-rhel8-zen3
[+]      ^sundials@6.4.1%gcc@12.2.0+ARKODE+CVODE+CVODES+IDA+IDAS+KINSOL~cuda+examples+examples-install~f2003~fcmix+generic-math+ginkgo+hypre~int64~ipo~klu~kokkos~kokkos-kernels~lapack+magma~monitoring+mpi~openmp+petsc~profiling~pthread~raja+rocm+shared+static+superlu-dist~superlu-mt~sycl+trilinos amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo cstd=99 cxxstd=14 generator=make logging-level=0 logging-mpi=OFF precision=double arch=linux-rhel8-zen3
[+]          ^ginkgo@1.5.0%gcc@12.2.0~cuda~develtools~full_optimizations~hwloc~ipo+mpi~oneapi~openmp+rocm+shared amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo generator=make patches=ba0956e arch=linux-rhel8-zen3
[+]              ^hipblas@5.4.3%gcc@12.2.0~cuda~ipo+rocm amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+]              ^hipsparse@5.4.3%gcc@12.2.0~cuda~ipo+rocm amdgpu_target=auto build_system=cmake build_type=Release generator=make patches=c447537 arch=linux-rhel8-zen3
[+]              ^rocthrust@5.4.3%gcc@12.2.0~ipo amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+]          ^hypre@2.26.0%gcc@12.2.0~complex~cuda~debug+fortran~gptune~int64~internal-superlu~mixedint+mpi~openmp+rocm+shared+superlu-dist~sycl~umpire~unified-memory amdgpu_target=gfx90a build_system=autotools arch=linux-rhel8-zen3
[+]              ^cray-libsci@23.02.1.1%gcc@12.2.0+mpi~openmp+shared build_system=generic arch=linux-rhel8-zen3
[+]              ^rocsparse@5.4.3%gcc@12.2.0~ipo~test amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+]          ^magma@2.7.0%gcc@12.2.0~cuda+fortran~ipo+rocm+shared amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo generator=make arch=linux-rhel8-zen3
[+]          ^petsc@3.18.1%gcc@12.2.0~X+batch~cgns~complex~cuda~debug+double~exodusii~fftw+fortran~giflib+hdf5~hpddm~hwloc+hypre~int64~jpeg~knl~kokkos~libpng~libyaml~memkind+metis~mkl-pardiso~mmg~moab~mpfr+mpi~mumps~openmp~p4est~parmmg~ptscotch~random123+rocm~saws~scalapack+shared~strumpack~suite-sparse+superlu-dist~tetgen~trilinos~valgrind amdgpu_target=gfx90a build_system=generic clanguage=C arch=linux-rhel8-zen3
[+]              ^diffutils@3.6%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+]              ^hdf5@1.14.0%gcc@12.2.0~cxx+fortran+hl~ipo~java~map+mpi+shared~szip~threadsafe+tools api=default build_system=cmake build_type=RelWithDebInfo generator=make patches=0b5dd6f arch=linux-rhel8-zen3
[+]                  ^pkgconf@1.8.0%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+]              ^hipsolver@5.4.3%gcc@12.2.0~cuda~ipo+rocm amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+]              ^metis@5.1.0%gcc@12.2.0~gdb~int64~ipo~real64+shared build_system=cmake build_type=RelWithDebInfo generator=make patches=4991da9,93a7903,b1225da arch=linux-rhel8-zen3
[+]              ^parmetis@4.0.3%gcc@12.2.0~gdb~int64~ipo+shared build_system=cmake build_type=RelWithDebInfo generator=make patches=4f89253,50ed208,704b84f arch=linux-rhel8-zen3
[+]              ^python@3.10.10%gcc@12.2.0+bz2+crypt+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tkinter+uuid+zlib build_system=generic patches=0d98e93,7d40923,f2fd060 arch=linux-rhel8-zen3
[+]                  ^bzip2@1.0.6%gcc@12.2.0~debug~pic+shared build_system=generic arch=linux-rhel8-zen3
[+]                  ^expat@2.5.0%gcc@12.2.0+libbsd build_system=autotools arch=linux-rhel8-zen3
[+]                      ^libbsd@0.11.7%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+]                          ^libmd@1.0.4%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+]                  ^gdbm@1.23%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+]                  ^gettext@0.19.8.1%gcc@12.2.0+bzip2+curses+git~libunistring+libxml2+tar+xz build_system=autotools patches=9acdb4e arch=linux-rhel8-zen3
[+]                  ^libffi@3.4.4%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+]                  ^libxcrypt@4.4.33%gcc@12.2.0~obsolete_api build_system=autotools arch=linux-rhel8-zen3
[+]                      ^perl@5.26.3%gcc@12.2.0+cpanm+open+shared+threads build_system=generic patches=8cf4302 arch=linux-rhel8-zen3
[+]                  ^ncurses@6.1%gcc@12.2.0~symlinks+termlib abi=none build_system=autotools arch=linux-rhel8-zen3
[+]                  ^openssl@1.1.1k%gcc@12.2.0~docs~shared build_system=generic certs=mozilla arch=linux-rhel8-zen3
[+]                  ^readline@8.2%gcc@12.2.0 build_system=autotools patches=bbf97f1 arch=linux-rhel8-zen3
[+]                  ^sqlite@3.40.1%gcc@12.2.0+column_metadata+dynamic_extensions+fts~functions+rtree build_system=autotools arch=linux-rhel8-zen3
[+]                  ^util-linux-uuid@2.38.1%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+]                  ^xz@5.2.4%gcc@12.2.0~pic build_system=autotools libs=shared,static arch=linux-rhel8-zen3
[+]              ^rocblas@5.4.3%gcc@12.2.0~ipo+tensile amdgpu_target=auto build_system=cmake build_type=Release generator=make patches=81591d9 arch=linux-rhel8-zen3
[+]              ^rocsolver@5.4.3%gcc@12.2.0~ipo+optimal amdgpu_target=auto build_system=cmake build_type=Release generator=make patches=8067bfb arch=linux-rhel8-zen3
[+]              ^zlib@1.2.13%gcc@12.2.0+optimize+pic+shared build_system=makefile arch=linux-rhel8-zen3
[+]          ^superlu-dist@8.1.2%gcc@12.2.0~cuda~int64~ipo~openmp~rocm+shared build_system=cmake build_type=RelWithDebInfo generator=make arch=linux-rhel8-zen3
[+]          ^trilinos@13.4.1%gcc@12.2.0~adelus~adios2+amesos+amesos2+anasazi+aztec~basker+belos+boost~chaco~complex~cuda~cuda_rdc~debug~dtk+epetra+epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest+hdf5+hypre+ifpack+ifpack2~intrepid+intrepid2~ipo~isorropia+kokkos~mesquite~minitensor+ml+mpi+muelu~mumps+nox~openmp~panzer~phalanx~piro~python~rocm~rocm_rdc~rol~rythmos+sacado~scorec+shards+shared~shylu~stk~stokhos+stratimikos~strumpack~suite-sparse~superlu+superlu-dist~teko~tempus+thyra+tpetra~trilinoscouplings~wrapper~x11+zoltan+zoltan2 build_system=cmake build_type=RelWithDebInfo cxxstd=14 generator=make gotype=int arch=linux-rhel8-zen3
[+]              ^boost@1.79.0%gcc@12.2.0+atomic+chrono~clanglibcpp~container~context~contract~coroutine+date_time~debug+exception~fiber+filesystem+graph~graph_parallel~icu+iostreams~json+locale+log+math~mpi+multithreaded~nowide~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+stacktrace+system~taggedlayout+test+thread+timer~type_erasure~versionedlayout+wave build_system=generic cxxstd=14 patches=a440f96 visibility=hidden arch=linux-rhel8-zen3
[+]                  ^zstd@1.5.5%gcc@12.2.0~programs build_system=makefile libs=shared,static arch=linux-rhel8-zen3
[+]              ^hwloc@2.8.0%gcc@12.2.0~cairo~cuda~gl~libudev+libxml2~netloc~nvml~oneapi-level-zero~opencl+pci~rocm build_system=autotools libs=shared,static arch=linux-rhel8-zen3

Note that hypre is built with +rocm -- could that be a problem?

v-dobrev commented 1 year ago

Note that there are two other posts in #45 related to this. I copy them below.


Here's the contents of the file Backtrace.0 mentioned in the error message above:

=== If no file names and line numbers are shown below, one can run
            addr2line -Cpfie my_exefile my_line_address
    to convert `my_line_address` (e.g., 0x4a6b) into file name and line number.
    Or one can use amrex/Tools/Backtrace/parse_bt.py.

=== Please note that the line number reported by addr2line may not be accurate.
    One can use
            readelf -wl my_exefile | grep my_line_address'
    to find out the offset for that line.

 0: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x314b19]
    amrex::BLBackTrace::print_backtrace_info(_IO_FILE*) at ??:?

 1: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x314707]
    amrex::BLBackTrace::handler(int) at ??:?

 2: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x2902cf]
    amrex::Gpu::ErrorCheck(char const*, int) at ??:?

 3: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x2cf532]
    amrex::InitRandom(unsigned long, int, unsigned long) at ??:?

 4: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x29b7ed]
    amrex::Initialize(int&, char**&, bool, int, std::function<void ()> const&, std::ostream&, std::ostream&, void (*)(char const*)) at ??:?

 5: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x288efe]
    main at ??:?

 6: /lib64/libc.so.6(__libc_start_main+0xe5) [0x15553f6d3d85]

 7: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x286ade]
    _start at ??:?

I also tried a build of xsdk+rocm with (the default) ^hypre~rocm and I see the same issue.

v-dobrev commented 1 year ago

Possible solution: https://github.com/xsdk-project/xsdk-examples/pull/52#issuecomment-1568880672

Here's the suggestion:

This should fix the issue.

diff --git a/cmake/FindAMReX.cmake b/cmake/FindAMReX.cmake
index de4b328..91e7979 100644
--- a/cmake/FindAMReX.cmake
+++ b/cmake/FindAMReX.cmake
@@ -7,7 +7,7 @@ if(NOT TARGET XSDK::AMReX)
   target_link_libraries(XSDK_AMREX INTERFACE AMReX::amrex)
    if(ENABLE_HIP)
      target_link_libraries(XSDK_AMREX INTERFACE hip::amdhip64)
-     target_link_options(XSDK_AMREX INTERFACE "-fgpu-rdc")
+     target_link_options(XSDK_AMREX INTERFACE "SHELL:-Xoffload-linker --whole-archive" "-fgpu-rdc")
    endif()
   add_library(XSDK::AMReX ALIAS XSDK_AMREX)
 endif()