Open ndellingwood opened 6 years ago
Looks like this test began failing by v2.7.00. Including some debugging info as reference to start chasing this issue down:
From cuda-gdb this is failing in the cusparseXcsrgemmNnz
call at this line:
kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_cuSPARSE_impl.hpp:114
With this runtime message
CUDA Exception: Warp Out-of-range Address
Program received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 1, grid 75, block (29,0,0), thread (0,0,0), device 0, sm 0, warp 7, lane 0]
0x0000000008d81a18 in void csrgemmNnz_kernel2<128, 32, 2, 4>(csrgemmNnz_params)<<<(2500,1,1),(128,1,1)>>> ()
From debug step-through it appears that the cuSparseHandle stuff is properly set.
Is it possible to cuda-memcheck on the failing test ? Apparently that EXCEPTION is a catch all for out of bounds in memory.
I'm pretty sure the issue is that cusparse can't handle matrix that is being generated for the spgemm test.
Note: The spgemm test is testing A*A using two copies of a single generated sparse matrix.
Here's a list of things I've tried before reporting this:
Checked for any changes in the cusparse API related to the spgemm usage here - no changes. Fixed a couple macro bugs though they didn't have impact on the code.
Copied relevant parts of the unit test to create a standalone test outside of kokkos-kernels, same error.
From the standalone test, I removed all kokkos-kernels calls except the sparse matrix generator, I removed all the kokkos-kernels handle stuff, calls to spgemm_symbolic etc. and replaced with Cuda and CUSPARSE code. Same error.
I output the sparse matrix to matrix market format, and rewrote a raw Cuda code that read the matrix and tested it. Same error.
I tested the output matrix in Octave to see if it could handle the spgemm, no problems reading it or multiplying to get the result A*A.
I tested the code posted on this link by Robert Crovella as a sanity check, unaltered - it passed.
I took the code from (6) and replaced the internally generated matrices with the output matrix from (4) in case I made an error in my own code. Same error.
@ndellingwood : Thanks for checking this thoroughly. Does the matrix generated change in anyway from 2.6 and 2.7 in our code. A git log on that file file might be able to help. Also, cany you attach the matrices and the sample codes you did. I am going to forward this to NVIDIA.
One thing you can do is check with CUDA 10 on kokkos-dev. Do module use /home/projects/modulefiles
to load the module. That should have a new CuSparse version. Check your raw CUDA code.
I'll test these things when I get a chance to login to White and kokkos-dev, thanks for the feedback!
@ndellingwood : Just a ping that we might want to test this on 10.1
I just tested cusparse spgemm on 10.1. It is still failing with the same error. @srajama1 @ndellingwood @brian-kelley @lucbv
I also tried running it with a smaller matrix of size 100 x 100. Scalar=Double failed with the same error without producing an output matrix, whereas Scalar=float failed after producing an output matrix. I dumped this output matrix (of cuSPARSE spgemm) to a file and compared it against the output matrices of SPGEMM_KK_MEMORY, SPGEMM_DEBUG, and Octave: 1) The number of nonzeros in the cuSPARSE version is wrong (It is 1262 but should be 1260). 2) Some of the column indices are off the limits. 3) Some entries match in their column indices but the value is wrong, some entries match in both column indices and values.
@seheracer Could this a one-based vs zero-based indexing error?
Hmm I don't think that's it, because cusparseCreateMatDescr will use CUSPARSE_INDEX_BASE_ZERO by default.
When I tested this previously the issue seemed to come from CuSparse having issues with some sparse matrices that were generated for the test. I dumped the generated matrix to file and ran with a standalone CuSparse code (independent from Kokkos and KokkosKernels) and it died in that case as well. I tested the same matrix with either Octave or Matlab and had no issues. If I can find the standalone code I used I'll share.
@ndellingwood I was debating trying to do a direct call to the Cuda kernel but if you already tried that then it's probably not worth it?
@brian-kelley There are entries with column indices \in {101, 127, 128} in the first two rows of the output matrix. I checked the other rows as well, and couldn't see such off-the-limits entries in the rest of the matrix. Yet, mismatches in the column indices and values still occur in the rest of the matrix.
When I tested this previously the issue seemed to come from CuSparse having issues with some sparse matrices that were generated for the test. I dumped the generated matrix to file and ran with a standalone CuSparse code (independent from Kokkos and KokkosKernels) and it died in that case as well. I tested the same matrix with either Octave or Matlab and had no issues. If I can find the standalone code I used I'll share.
I think we shouldn't spend more time on this until we have the Cuda 11 version.
@seheracer But we'll still have to support Cuda 8-10.
The test
KokkosKernels_UnitTest_Cuda --gtest_filter=cuda.sparse_spgemm_double_int_int_TestExecSpace
fails at runtime with the following output:The first issue to address is updating the macro name used in the unit test to properly match that enabled when the cusparse tpl is enabled, see e.g. PR . Following that, the test still fails with this output:
First reported in trilinos/Trilinos#3438