Closed v-dobrev closed 1 year ago
This is a separate issue but it is related to the same example, so I'll post it here.
When trying to run this example with HIP enabled I get the following error from ctest -V
:
...
test 1
Start 1: AMREX-amrex_sundials_advection_diffusion
1: Test command: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion
1: Working Directory: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials
1: Test timeout computed to be: 1500
1: Initializing HIP...
1: HIP initialized.
1: amrex::Abort::0::GPU last error detected in file /dev/shm/dobrev1/spack/var/spack/stage/spack-stage-amrex-22.09-huuzz4saz73j72hmwd6wupsmfpx62owg/spack-src/Src/Base/AMReX_GpuLaunchFunctsG.H line 809: shared object initialization failed !!!
1: SIGABRT
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: /usr/bin/addr2line: Dwarf Error: Invalid or unhandled FORM value: 0x25.
1: See Backtrace.0 file for details
1: MPICH ERROR [Rank 0] [job id 364801714345214976] [Fri May 12 21:03:56 2023] [tioga23] - Abort(6) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 6) - process 0
1:
1/26 Test #1: AMREX-amrex_sundials_advection_diffusion ...***Failed 0.92 sec
...
Does anyone have any suggestions?
The Spack spec for AMReX is:
[+] amrex@22.09%gcc@12.2.0~amrdata~cuda~eb~fortran~hdf5~hypre~ipo+linear_solvers+mpi~openmp~particles~petsc~pic~plotfile_tools+rocm~shared+sundials~sycl~tiny_profile amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo dimensions=3 generator=make precision=double arch=linux-rhel8-zen3
[+] ^cmake@3.24.2%gcc@12.2.0~doc+ncurses+ownlibs~qt build_system=generic build_type=Release arch=linux-rhel8-zen3
[+] ^cray-mpich@8.1.25%gcc@12.2.0+wrappers build_system=generic arch=linux-rhel8-zen3
[+] ^gmake@4.2.1%gcc@12.2.0~guile build_system=autotools patches=ca60bd9,fe5b60d arch=linux-rhel8-zen3
[+] ^hip@5.4.3%gcc@12.2.0~cuda~ipo+rocm build_system=cmake build_type=Release generator=make patches=ca523f1 arch=linux-rhel8-zen3
[+] ^hsa-rocr-dev@5.4.3%gcc@12.2.0+image~ipo+shared build_system=cmake build_type=Release generator=make patches=71e6851 arch=linux-rhel8-zen3
[+] ^llvm-amdgpu@5.4.3%gcc@12.2.0~ipo~link_llvm_dylib~llvm_dylib~openmp+rocm-device-libs build_system=cmake build_type=Release generator=ninja patches=a08bbe1 arch=linux-rhel8-zen3
[+] ^rocprim@5.4.3%gcc@12.2.0~ipo amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+] ^rocrand@5.4.3%gcc@12.2.0+hiprand~ipo amdgpu_target=auto build_system=cmake build_type=Release generator=make patches=a35e689 arch=linux-rhel8-zen3
[+] ^sundials@6.4.1%gcc@12.2.0+ARKODE+CVODE+CVODES+IDA+IDAS+KINSOL~cuda+examples+examples-install~f2003~fcmix+generic-math+ginkgo+hypre~int64~ipo~klu~kokkos~kokkos-kernels~lapack+magma~monitoring+mpi~openmp+petsc~profiling~pthread~raja+rocm+shared+static+superlu-dist~superlu-mt~sycl+trilinos amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo cstd=99 cxxstd=14 generator=make logging-level=0 logging-mpi=OFF precision=double arch=linux-rhel8-zen3
[+] ^ginkgo@1.5.0%gcc@12.2.0~cuda~develtools~full_optimizations~hwloc~ipo+mpi~oneapi~openmp+rocm+shared amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo generator=make patches=ba0956e arch=linux-rhel8-zen3
[+] ^hipblas@5.4.3%gcc@12.2.0~cuda~ipo+rocm amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+] ^hipsparse@5.4.3%gcc@12.2.0~cuda~ipo+rocm amdgpu_target=auto build_system=cmake build_type=Release generator=make patches=c447537 arch=linux-rhel8-zen3
[+] ^rocthrust@5.4.3%gcc@12.2.0~ipo amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+] ^hypre@2.26.0%gcc@12.2.0~complex~cuda~debug+fortran~gptune~int64~internal-superlu~mixedint+mpi~openmp+rocm+shared+superlu-dist~sycl~umpire~unified-memory amdgpu_target=gfx90a build_system=autotools arch=linux-rhel8-zen3
[+] ^cray-libsci@23.02.1.1%gcc@12.2.0+mpi~openmp+shared build_system=generic arch=linux-rhel8-zen3
[+] ^rocsparse@5.4.3%gcc@12.2.0~ipo~test amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+] ^magma@2.7.0%gcc@12.2.0~cuda+fortran~ipo+rocm+shared amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo generator=make arch=linux-rhel8-zen3
[+] ^petsc@3.18.1%gcc@12.2.0~X+batch~cgns~complex~cuda~debug+double~exodusii~fftw+fortran~giflib+hdf5~hpddm~hwloc+hypre~int64~jpeg~knl~kokkos~libpng~libyaml~memkind+metis~mkl-pardiso~mmg~moab~mpfr+mpi~mumps~openmp~p4est~parmmg~ptscotch~random123+rocm~saws~scalapack+shared~strumpack~suite-sparse+superlu-dist~tetgen~trilinos~valgrind amdgpu_target=gfx90a build_system=generic clanguage=C arch=linux-rhel8-zen3
[+] ^diffutils@3.6%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+] ^hdf5@1.14.0%gcc@12.2.0~cxx+fortran+hl~ipo~java~map+mpi+shared~szip~threadsafe+tools api=default build_system=cmake build_type=RelWithDebInfo generator=make patches=0b5dd6f arch=linux-rhel8-zen3
[+] ^pkgconf@1.8.0%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+] ^hipsolver@5.4.3%gcc@12.2.0~cuda~ipo+rocm amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-rhel8-zen3
[+] ^metis@5.1.0%gcc@12.2.0~gdb~int64~ipo~real64+shared build_system=cmake build_type=RelWithDebInfo generator=make patches=4991da9,93a7903,b1225da arch=linux-rhel8-zen3
[+] ^parmetis@4.0.3%gcc@12.2.0~gdb~int64~ipo+shared build_system=cmake build_type=RelWithDebInfo generator=make patches=4f89253,50ed208,704b84f arch=linux-rhel8-zen3
[+] ^python@3.10.10%gcc@12.2.0+bz2+crypt+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tkinter+uuid+zlib build_system=generic patches=0d98e93,7d40923,f2fd060 arch=linux-rhel8-zen3
[+] ^bzip2@1.0.6%gcc@12.2.0~debug~pic+shared build_system=generic arch=linux-rhel8-zen3
[+] ^expat@2.5.0%gcc@12.2.0+libbsd build_system=autotools arch=linux-rhel8-zen3
[+] ^libbsd@0.11.7%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+] ^libmd@1.0.4%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+] ^gdbm@1.23%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+] ^gettext@0.19.8.1%gcc@12.2.0+bzip2+curses+git~libunistring+libxml2+tar+xz build_system=autotools patches=9acdb4e arch=linux-rhel8-zen3
[+] ^libffi@3.4.4%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+] ^libxcrypt@4.4.33%gcc@12.2.0~obsolete_api build_system=autotools arch=linux-rhel8-zen3
[+] ^perl@5.26.3%gcc@12.2.0+cpanm+open+shared+threads build_system=generic patches=8cf4302 arch=linux-rhel8-zen3
[+] ^ncurses@6.1%gcc@12.2.0~symlinks+termlib abi=none build_system=autotools arch=linux-rhel8-zen3
[+] ^openssl@1.1.1k%gcc@12.2.0~docs~shared build_system=generic certs=mozilla arch=linux-rhel8-zen3
[+] ^readline@8.2%gcc@12.2.0 build_system=autotools patches=bbf97f1 arch=linux-rhel8-zen3
[+] ^sqlite@3.40.1%gcc@12.2.0+column_metadata+dynamic_extensions+fts~functions+rtree build_system=autotools arch=linux-rhel8-zen3
[+] ^util-linux-uuid@2.38.1%gcc@12.2.0 build_system=autotools arch=linux-rhel8-zen3
[+] ^xz@5.2.4%gcc@12.2.0~pic build_system=autotools libs=shared,static arch=linux-rhel8-zen3
[+] ^rocblas@5.4.3%gcc@12.2.0~ipo+tensile amdgpu_target=auto build_system=cmake build_type=Release generator=make patches=81591d9 arch=linux-rhel8-zen3
[+] ^rocsolver@5.4.3%gcc@12.2.0~ipo+optimal amdgpu_target=auto build_system=cmake build_type=Release generator=make patches=8067bfb arch=linux-rhel8-zen3
[+] ^zlib@1.2.13%gcc@12.2.0+optimize+pic+shared build_system=makefile arch=linux-rhel8-zen3
[+] ^superlu-dist@8.1.2%gcc@12.2.0~cuda~int64~ipo~openmp~rocm+shared build_system=cmake build_type=RelWithDebInfo generator=make arch=linux-rhel8-zen3
[+] ^trilinos@13.4.1%gcc@12.2.0~adelus~adios2+amesos+amesos2+anasazi+aztec~basker+belos+boost~chaco~complex~cuda~cuda_rdc~debug~dtk+epetra+epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest+hdf5+hypre+ifpack+ifpack2~intrepid+intrepid2~ipo~isorropia+kokkos~mesquite~minitensor+ml+mpi+muelu~mumps+nox~openmp~panzer~phalanx~piro~python~rocm~rocm_rdc~rol~rythmos+sacado~scorec+shards+shared~shylu~stk~stokhos+stratimikos~strumpack~suite-sparse~superlu+superlu-dist~teko~tempus+thyra+tpetra~trilinoscouplings~wrapper~x11+zoltan+zoltan2 build_system=cmake build_type=RelWithDebInfo cxxstd=14 generator=make gotype=int arch=linux-rhel8-zen3
[+] ^boost@1.79.0%gcc@12.2.0+atomic+chrono~clanglibcpp~container~context~contract~coroutine+date_time~debug+exception~fiber+filesystem+graph~graph_parallel~icu+iostreams~json+locale+log+math~mpi+multithreaded~nowide~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+stacktrace+system~taggedlayout+test+thread+timer~type_erasure~versionedlayout+wave build_system=generic cxxstd=14 patches=a440f96 visibility=hidden arch=linux-rhel8-zen3
[+] ^zstd@1.5.5%gcc@12.2.0~programs build_system=makefile libs=shared,static arch=linux-rhel8-zen3
[+] ^hwloc@2.8.0%gcc@12.2.0~cairo~cuda~gl~libudev+libxml2~netloc~nvml~oneapi-level-zero~opencl+pci~rocm build_system=autotools libs=shared,static arch=linux-rhel8-zen3
Note that hypre
is built with +rocm
-- could that be a problem?
Here's the contents of the file Backtrace.0
mentioned in the error message above:
=== If no file names and line numbers are shown below, one can run
addr2line -Cpfie my_exefile my_line_address
to convert `my_line_address` (e.g., 0x4a6b) into file name and line number.
Or one can use amrex/Tools/Backtrace/parse_bt.py.
=== Please note that the line number reported by addr2line may not be accurate.
One can use
readelf -wl my_exefile | grep my_line_address'
to find out the offset for that line.
0: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x314b19]
amrex::BLBackTrace::print_backtrace_info(_IO_FILE*) at ??:?
1: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x314707]
amrex::BLBackTrace::handler(int) at ??:?
2: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x2902cf]
amrex::Gpu::ErrorCheck(char const*, int) at ??:?
3: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x2cf532]
amrex::InitRandom(unsigned long, int, unsigned long) at ??:?
4: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x29b7ed]
amrex::Initialize(int&, char**&, bool, int, std::function<void ()> const&, std::ostream&, std::ostream&, void (*)(char const*)) at ??:?
5: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x288efe]
main at ??:?
6: /lib64/libc.so.6(__libc_start_main+0xe5) [0x15553f6d3d85]
7: /dev/shm/dobrev1/xsdk-examples/build/amrex/sundials/amrex_sundials_advection_diffusion() [0x286ade]
_start at ??:?
I also tried a build of xsdk+rocm
with (the default) ^hypre~rocm
and I see the same issue.
ping: @gardner48, @balos1
Can you please help with the two issues above: (1) the long running example with a CPU build, and (2) the issue with the HIP build.
Thanks!
I opened a new issue regarding the HIP part since its a distinct issue. The original topic of this issue is resolved by #49.
Compared to other tests the AMReX + SUNDIALS test takes a very long time:
It will be best if we can get the runtime under 1 min or even shorter.