Closed lowener closed 3 years ago
@lowener, are you only seeing this on 64-bit data? Just from the looks of the cudamemcheck message, and because the density is only 1%, I'm thinking the 12byte alignment might be an issue (4 for int, 8 for double) here or the calculation of the smem to allocate might be off (e.g. might be making a bad assumption that double precision means int64 & float64.).
I can't recreate this on 32-bit data, so it's probably only on 64-bit data
Describe the bug I ran into an error while testing pairwise distance caused by an invalid write (according to cuda-memcheck) in
raft::sparse::distance::classic_csr_semiring_spmv_smem_kernel
Steps/Code to reproduce bug Easy python code that you can use on my branch of cuml implementing pairwise dist API:
https://github.com/lowener/cuml/tree/019-expose-spmv
Environment details (please complete the following information):
docker run --gpus all rapidsai/rapidsai-dev-nightly:0.19-cuda11.0-devel-ubuntu20.04-py3.8
Additional context Core dump message:
And here's the output of cuda memcheck: pairwisedist_cudamemcheck.txt