Closed Sachitt closed 2 months ago
Can you please provide the compiler used?
Hi, the compiler used is g++
@Sachitt This matrix you're squaring has almost a billion entries, and by default KokkosKernels uses int (32-bit) to represent the row offsets in sparse matrices. So it's likely that the NNZ count of the C matrix, which here is printed as 382465606, has overflowed and so C's entries/values are allocated to the wrong size.
I'm not sure if the C result will fit in memory, but you can at least try
-DKokkosKernels_INST_OFFSET_SIZE_T=ON
-DKokkosKernels_INST_OFFSET_INT=OFF
to use size_t (64 bit) to represent row offsets and nonzero counts. If you do run out of memory you should get an informative message and not just a segfault.
@brian-kelley is correct, for uk-2005
the correct number of non-zeros is 8972400198
which is an overflow.
This thing determines nnz, but it uses the row map value type, which is 32 bit by default. https://github.com/kokkos/kokkos-kernels/blob/b2210058826672c8de838541a36f7b946ecbb79a/sparse/impl/KokkosSparse_spgemm_impl_symbolic.hpp#L1954-L1955
This is causing some downstream effects where the resulting allocation for the product matrix isn't large enough, leading to an out of bounds access. I think we can't even catch this case without adding an allocation or expensive runtime checks in one way or another.
This seems like it works with the following CMake options
-DKokkosKernels_INST_OFFSET_SIZE_T=ON
-DKokkosKernels_INST_OFFSET_INT=OFF
For posterity, here's a reproducer:
#! /bin/bash
set -eou pipefail
ROOT=$HOME/proj/kk-issue-2291
KOKKOS_SRC=$ROOT/kokkos
KOKKOS_BUILD=$ROOT/build-kokkos
KOKKOS_INSTALL=$ROOT/install-kokkos
KERNELS_SRC=$ROOT/kernels
KERNELS_BUILD=$ROOT/build-kernels
mkdir -p "$ROOT"
git clone https://github.com/kokkos/kokkos.git $KOKKOS_SRC || true
git clone git@github.com:cwpearson/kokkos-kernels.git $KERNELS_SRC || true
if [ ! -d $KOKKOS_INSTALL ]; then
cmake -S $KOKKOS_SRC -B $KOKKOS_BUILD \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DKokkos_ENABLE_OPENMP=ON \
-DCMAKE_INSTALL_PREFIX=$KOKKOS_INSTALL
nice -n20 cmake --build $KOKKOS_BUILD --target install --parallel 52
fi
cmake -S$KERNELS_SRC -B $KERNELS_BUILD \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DKokkos_ROOT=$KOKKOS_INSTALL \
-DKokkosKernels_ENABLE_TESTS=ON \
-DKokkosKernels_ENABLE_BENCHMARK=ON \
-DKokkosKernels_ENABLE_PERFTESTS=ON
nice -n20 cmake --build $KERNELS_BUILD/perf_test/sparse --target sparse_spgemm --parallel 52
if [ ! -d uk-2005 ]; then
wget --continue https://suitesparse-collection-website.herokuapp.com/MM/LAW/uk-2005.tar.gz
tar -xvf uk-2005.tar.gz
fi
$KERNELS_BUILD/perf_test/sparse/sparse_spgemm --amtx uk-2005/uk-2005.mtx --algorithm KKMEM --verbose --openmp 48
Hi,
I followed the build instructions for kokkos-kernels with OpenMP support with perf_tests enabled and ran ./sparse_spgemm in kokkos-kernels/build/perf_test/sparse. For smaller datasets, such as cit-Patents, ca-GrQc, and roadnet-CA, this ran fine. However, I tried running the algorithm for the uk-2005 dataset and am receiving a segfault during the numeric phase (symbolic phase computes fine)
For each dataset, I am doing A*A. I am also running on aarch64 with openmp enabled.
Please let me know if you have any ideas about why this could be the case
Here is the output with --verbose flag:
Running on OpenMP backend. B is not provided or is the same as A. Multiplying AxA. m:39459925 n:39459925 k:39459925 SYMBOLIC PHASE Original Max Row Flops:348499 Original overall_flops Flops:71150593524 tOriginal Max Row Flop Calc Time:1.7172 COMPRESS MATRIX-B PHASE n:39459925 nnz:936364282 vector_size:1 team_size:1 chunk_size::16 shmem:16128 Compression Allocations:1.28e-07 COMPRESS -- thread_memory:16128 unit_memory:16 initial key size:1006 COMPRESS -- adjusted hashsize:512 shmem_key_size:1170 POOL chunksize:26810 num_chunks:18 min_hash_size:8192 max_row_nnz:5213 Pool Alloc MB:1.8409 Compression Count Kernel:0.865291 Compressed Max Row Flops:50700 Compressed Overall Row Flops:6963612531 Compressed Flops ratio:0.0978715 min_reduction:0.85 Compressed Max Row Flop Calc Time:1.70562 COMPRESS MATRIX-B overall time:3.74296
C SIZE:382465606 Numeric PHASE HASH MODE initial PortableNumericCHASH -- thread_memory:16120 unit_memory:20 initial key size:805 initial PortableNumericCHASH -- team_memory:16128 unit_memory:20 initial team key size:805 Running SPGEMM_KK_MEMORY col_size:39459925 max_column_cut_off:250000 PortableNumericCHASH -- adjusted hashsize:512 thread_shmem_key_size:878 PortableNumericCHASH -- adjusted team hashsize:512 team_shmem_key_size:878 max_nnz: 128941 chunk_size:653229 min_hash_size:262144 concurrency:18 MyExecSpace().concurrency():18 numchunks:18 num_chunks:32 chunk_size:653229 overall_size:20903328 modular_num_chunks:31 Printing chunk_locks view
Printing data view -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... ... ... -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Pool Alloc Time:0.0244646 Pool Size(MB):44.8537 PortableNumericCHASH -- sizeof(scalar_t): 8 sizeof(nnz_lno_t): 4 suggested_team_size: 1 PortableNumericCHASH -- thread_memory:16128 unit_memory:20 initial key size:805 PortableNumericCHASH -- team shared_memory:16128 unit_memory:20 initial team key size:805 PortableNumericCHASH -- thread_memory:16128 unit_memory:20 resized key size:878 PortableNumericCHASH -- team shared_memory:16128 unit_memory:20 resized team key size:878 PortableNumericCHASH -- thread_memory:16128 unit_memory:20 initial key size:878 PortableNumericCHASH -- team_memory:16128 unit_memory:20 initial team key size:878 PortableNumericCHASH -- adjusted hashsize:512 thread_shmem_key_size:878 PortableNumericCHASH -- adjusted team hashsize:512 team_shmem_key_size:878 team_cuckoo_key_size:1024 team_cuckoo_hash_func:1023 max_first_level_hash_size:512 pow2_hash_size:262144 pow2_hash_func:262143 vector_size:1 chunk_size:16 suggested_team_size:1 Segmentation fault (core dumped)