rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
https://docs.rapids.ai/api/raft/stable/
Apache License 2.0
788 stars 195 forks source link

[BUG] Invalid write in raft::sparse::distance::classic_csr_semiring_spmv_smem_kernel #182

Closed lowener closed 3 years ago

lowener commented 3 years ago

Describe the bug I ran into an error while testing pairwise distance caused by an invalid write (according to cuda-memcheck) in raft::sparse::distance::classic_csr_semiring_spmv_smem_kernel

Steps/Code to reproduce bug Easy python code that you can use on my branch of cuml implementing pairwise dist API: https://github.com/lowener/cuml/tree/019-expose-spmv

import cupyx
import cupy as cp
from cuml.metrics import pairwise_distances as pd

X = cupyx.scipy.sparse.random(20, 10000, dtype=cp.float64, random_state=123, density=0.01)
pd(X, metric='l1')

Environment details (please complete the following information):

Additional context Core dump message:

terminate called after throwing an instance of 'raft::cuda_error'
  what():  CUDA error encountered at: file=/rapids/cuml/cpp/build/raft/src/raft/cpp/include/raft/handle.hpp line=270: call='cudaEventDestroy(event_)', Reason=cudaErrorMisalignedAddress:misaligned address
Obtained 28 stack frames
#0 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(_ZN4raft9exception18collect_call_stackEv+0x46) [0x7f52e024cf96]
#1 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(_ZN4raft10cuda_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x69) [0x7f52e024d6b9]
#2 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(_ZN4raft8handle_t17destroy_resourcesEv+0x5dd) [0x7f52e024e30d]
#3 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(_ZN4raft8handle_tD1Ev+0x30) [0x7f52e024e4d0]
#4 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(+0x2bf69) [0x7f52e0247f69]
#5 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/metrics/pairwise_distances.cpython-38-x86_64-linux-gnu.so(+0x31b98) [0x7f52ac052b98]
#6 in python(PyObject_Call+0x255) [0x55fde3dd22b5]
#7 in python(_PyEval_EvalFrameDefault+0x21c1) [0x55fde3e7ede1]
#8 in python(_PyEval_EvalCodeWithName+0x2c3) [0x55fde3e5d503]
#9 in python(_PyFunction_FastCallDict+0x1b2) [0x55fde3d792e8]
#10 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/metrics/pairwise_distances.cpython-38-x86_64-linux-gnu.so(+0x2500c) [0x7f52ac04600c]
#11 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/metrics/pairwise_distances.cpython-38-x86_64-linux-gnu.so(+0x280f0) [0x7f52ac0490f0]
#12 in python(PyObject_Call+0x255) [0x55fde3dd22b5]
#13 in python(_PyEval_EvalFrameDefault+0x21c1) [0x55fde3e7ede1]
#14 in python(_PyEval_EvalCodeWithName+0x2c3) [0x55fde3e5d503]
#15 in python(_PyFunction_Vectorcall+0x378) [0x55fde3e5e8d8]
#16 in python(_PyEval_EvalFrameDefault+0x1782) [0x55fde3e7e3a2]
#17 in python(_PyEval_EvalCodeWithName+0x2c3) [0x55fde3e5d503]
#18 in python(PyEval_EvalCodeEx+0x39) [0x55fde3e5e559]
#19 in python(PyEval_EvalCode+0x1b) [0x55fde3f019ab]
#20 in python(+0x254a43) [0x55fde3f01a43]
#21 in python(+0x26e6b3) [0x55fde3f1b6b3]
#22 in python(+0x2735b2) [0x55fde3f205b2]
#23 in python(PyRun_SimpleFileExFlags+0x1b2) [0x55fde3f20792]
#24 in python(Py_RunMain+0x36d) [0x55fde3f20d0d]
#25 in python(Py_BytesMain+0x39) [0x55fde3f20ec9]
#26 in /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f53b7b610b3]
#27 in python(+0x1e9369) [0x55fde3e96369]

Aborted (core dumped)

And here's the output of cuda memcheck: pairwisedist_cudamemcheck.txt

cjnolet commented 3 years ago

@lowener, are you only seeing this on 64-bit data? Just from the looks of the cudamemcheck message, and because the density is only 1%, I'm thinking the 12byte alignment might be an issue (4 for int, 8 for double) here or the calculation of the smem to allocate might be off (e.g. might be making a bad assumption that double precision means int64 & float64.).

lowener commented 3 years ago

I can't recreate this on 32-bit data, so it's probably only on 64-bit data