kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
303 stars 96 forks source link

openmp.sparse_spgemm_{jacobi_}double_int_int_TestExecSpace hangs W/ GCC10+ARMPL21 & c++17 w/ blas TPL enabled #1542

Open e10harvey opened 2 years ago

e10harvey commented 2 years ago

Reproducer that causes hang:

module purge
module load cmake/3.17.0 gcc/10.2.0 armpl/21.1.0
export OMP_NUM_THREADS=47

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-devices=OpenMP --arch=A64FX --compiler=/opt/spatse/gcc/2020-09-17/spack/opt/spack/linux-rhel8-a64fx/gcc-8.2.1/gcc-10.2.0-f73mwr3ryd77o37a5jyofxet6nk7xowg/bin/g++ --cxxflags="-O3 -Wall -Wunused-parameter -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized " --cxxstandard="17" --ldflags=""   --kokkos-path=$KOKKOS_PATH --kokkoskernels-path=$KOKKOSKERNELS_PATH --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft --with-tpls=armpl    --with-options= --with-cuda-options=   --no-examples 

./sparse/unit_test/KokkosKernels_sparse_openmp
<snip>
[ RUN      ] openmp.sparse_spadd_unsorted_input_kokkos_complex_double_int_size_t_TestExecSpace
[       OK ] openmp.sparse_spadd_unsorted_input_kokkos_complex_double_int_size_t_TestExecSpace (7 ms)
[ RUN      ] openmp.sparse_spgemm_double_int_int_TestExecSpace

Note that the spgemm tests run fine when isolated from other tests:

$ ./sparse/unit_test/KokkosKernels_sparse_openmp --gtest_filter=*spgemm**
Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
  In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
  For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
  For unit testing set OMP_PROC_BIND=false

Note: Google Test filter = *spgemm**
[==========] Running 12 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 12 tests from openmp
[ RUN      ] openmp.sparse_spgemm_jacobi_double_int_int_TestExecSpace
[       OK ] openmp.sparse_spgemm_jacobi_double_int_int_TestExecSpace (44 ms)
[ RUN      ] openmp.sparse_spgemm_jacobi_double_int_size_t_TestExecSpace
[       OK ] openmp.sparse_spgemm_jacobi_double_int_size_t_TestExecSpace (46 ms)
[ RUN      ] openmp.sparse_spgemm_jacobi_kokkos_complex_double_int_int_TestExecSpace
[       OK ] openmp.sparse_spgemm_jacobi_kokkos_complex_double_int_int_TestExecSpace (34 ms)
[ RUN      ] openmp.sparse_spgemm_jacobi_kokkos_complex_double_int_size_t_TestExecSpace
[       OK ] openmp.sparse_spgemm_jacobi_kokkos_complex_double_int_size_t_TestExecSpace (33 ms)
[ RUN      ] openmp.sparse_spgemm_double_int_int_TestExecSpace
[       OK ] openmp.sparse_spgemm_double_int_int_TestExecSpace (979 ms)
[ RUN      ] openmp.sparse_spgemm_double_int_size_t_TestExecSpace
[       OK ] openmp.sparse_spgemm_double_int_size_t_TestExecSpace (973 ms)
[ RUN      ] openmp.sparse_spgemm_kokkos_complex_double_int_int_TestExecSpace
[       OK ] openmp.sparse_spgemm_kokkos_complex_double_int_int_TestExecSpace (1191 ms)
[ RUN      ] openmp.sparse_spgemm_kokkos_complex_double_int_size_t_TestExecSpace
[       OK ] openmp.sparse_spgemm_kokkos_complex_double_int_size_t_TestExecSpace (1204 ms)
[ RUN      ] openmp.sparse_block_spgemm_double_int_int_TestExecSpace
[       OK ] openmp.sparse_block_spgemm_double_int_int_TestExecSpace (2819 ms)
[ RUN      ] openmp.sparse_block_spgemm_double_int_size_t_TestExecSpace
[       OK ] openmp.sparse_block_spgemm_double_int_size_t_TestExecSpace (2797 ms)
[ RUN      ] openmp.sparse_block_spgemm_kokkos_complex_double_int_int_TestExecSpace
[       OK ] openmp.sparse_block_spgemm_kokkos_complex_double_int_int_TestExecSpace (7104 ms)
[ RUN      ] openmp.sparse_block_spgemm_kokkos_complex_double_int_size_t_TestExecSpace
[       OK ] openmp.sparse_block_spgemm_kokkos_complex_double_int_size_t_TestExecSpace (8884 ms)
[----------] 12 tests from openmp (26108 ms total)

[----------] Global test environment tear-down
[==========] 12 tests from 1 test case ran. (26108 ms total)
[  PASSED  ] 12 tests.
brian-kelley commented 1 year ago

@e10harvey I'm thinking this is related to #1777, because:

But, I wasn't able to replicate this issue with develop branch yesterday (after re-enabling the SpGEMM tests that were disabled for ArmPL builds).

I tried going back to a version from Sep. 21, but I get undefined references to HostBlas functions. Do you remember on what kernels version you originally saw this?

e10harvey commented 1 year ago

@brian-kelley: IIRC, this was found with the source branch of https://github.com/kokkos/kokkos-kernels/pull/1511. I believe you can reproduce this with Kokkos@4cd0fc44a693128d77e80cd8167485bf063a4f34 and KokkosKernels@f1b968116db450c22b0be1eed0988b357441901c.