Open albert-cwkuo opened 1 year ago
@albert-cwkuo, Running in the same error, did you find a solution in the end?
@RijndertAriese do you have details of how you ran into this issue?
@dantegd I've tried with data with 5M rows, and it reproduces the same error. Any updates?
Tested on CUDA: 12.0 Linux: Ubuntu 20.04 amd64 GPU: RTX3090, Driver Version: 525.60.11 cuML installed with pip
Below are traceback
File "/data/cuml-test/find_optimum_outlier_param.py", line 307, in <module>
main_run(args.dataset_name, args.collection_text_filepath, args.collection_embedding_filepath, args.remove_outlier, args.normalize, args.target_ratios)
File "/data/cuml-test/find_optimum_outlier_param.py", line 214, in main_run
outlier_label = dbscan_outlier(normalized_target_docids_embs, eps=eps, min_samples=5)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/cuml-test/find_optimum_outlier_param.py", line 122, in dbscan_outlier
outlier_label = dbscan.fit_predict(data, np.int64)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "dbscan.pyx", line 466, in cuml.cluster.dbscan.DBSCAN.fit_predict
File "/data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "dbscan.pyx", line 442, in cuml.cluster.dbscan.DBSCAN.fit
File "/data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "dbscan.pyx", line 353, in cuml.cluster.dbscan.DBSCAN._fit
RuntimeError: CUDA error encountered at: file=/__w/cuml/cuml/python/cuml/build/cp312-cp312-linux_x86_64/_deps/raft-src/cpp/include/raft/spatial/knn/detail/epsilon_neighborhood.cuh line=197: call='cudaGetLastError()', Reason=cudaErrorInvalidConfiguration:invalid configuration argument
Obtained 30 stack frames
#1 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/../libcuml++.so: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x5a [0x7f5d582ef58a]
#2 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/../libcuml++.so: void raft::spatial::knn::detail::epsUnexpL2SqNeighImpl<float, long, 4>(bool*, long*, float const*, float const*, long, long, long, float, CUstream_st*) +0x3f1 [0x7f5d584e0ac1]
#3 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/../libcuml++.so: void ML::Dbscan::VertexDeg::Algo::launcher<float, long>(raft::handle_t const&, ML::Dbscan::VertexDeg::Pack<float, long>, long, long, CUstream_st*, raft::distance::DistanceType) +0x591 [0x7f5d585f3511]
#4 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/../libcuml++.so: void ML::Dbscan::VertexDeg::run<float, long>(raft::handle_t const&, raft::neighbors::ball_cover::BallCoverIndex<long, float, long, long>*, long*, rmm::device_uvector<long>*, long, bool*, long*, float*, float const*, float const*, float, long, long, int, long, long, CUstream_st*, raft::distance::DistanceType) +0x2bd [0x7f5d585f492d]
#5 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/../libcuml++.so(+0xa82b92) [0x7f5d58757b92]
#6 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/../libcuml++.so: void ML::Dbscan::dbscanFitImpl<float, long, false>(raft::handle_t const&, float*, long, long, float, long, raft::distance::DistanceType, long*, long*, float*, unsigned long, ML::Dbscan::EpsNnMethod, CUstream_st*, int) +0x1804 [0x7f5d5875bf24]
#7 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/cluster/dbscan.cpython-312-x86_64-linux-gnu.so(+0x2f5ae) [0x7f5cafcd15ae]
#8 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: _PyEval_EvalFrameDefault +0x919 [0x7f5e54c13999]
#9 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/cluster/dbscan.cpython-312-x86_64-linux-gnu.so(+0x3806e) [0x7f5cafcda06e]
#10 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/base.cpython-312-x86_64-linux-gnu.so(+0x1043e) [0x7f5cc002043e]
#11 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/base.cpython-312-x86_64-linux-gnu.so(+0x217a2) [0x7f5cc00317a2]
#12 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: _PyEval_EvalFrameDefault +0x919 [0x7f5e54c13999]
#13 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0(+0x176df3) [0x7f5e54c7ddf3]
#14 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: _PyEval_EvalFrameDefault +0x919 [0x7f5e54c13999]
#15 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/cluster/dbscan.cpython-312-x86_64-linux-gnu.so(+0x37a32) [0x7f5cafcd9a32]
#16 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/base.cpython-312-x86_64-linux-gnu.so(+0x1043e) [0x7f5cc002043e]
#17 in /data/cuml-test/venv/lib/python3.12/site-packages/cuml/internals/base.cpython-312-x86_64-linux-gnu.so(+0x217a2) [0x7f5cc00317a2]
#18 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: _PyEval_EvalFrameDefault +0x919 [0x7f5e54c13999]
#19 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0(+0x176df3) [0x7f5e54c7ddf3]
#20 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: _PyEval_EvalFrameDefault +0x919 [0x7f5e54c13999]
#21 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: PyEval_EvalCode +0x217 [0x7f5e54d909f7]
#22 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0(+0x2e4bf6) [0x7f5e54debbf6]
#23 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0(+0x2e4d05) [0x7f5e54debd05]
#24 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: _PyRun_SimpleFileObject +0x17b [0x7f5e54deebab]
#25 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: _PyRun_AnyFileObject +0x3f [0x7f5e54def12f]
#26 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0(+0x30ef39) [0x7f5e54e15f39]
#27 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: Py_RunMain +0x2a [0x7f5e54e162ea]
#28 in /root/.pyenv/versions/3.12.4/lib/libpython3.12.so.1.0: Py_BytesMain +0x5e [0x7f5e54e164be]
#29 in /lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0xf3 [0x7f5e5492a083]
#30 in python: _start +0x2e [0x56349198409e]
Also, tested with Conda Installed on other device. And get the same error
Traceback (most recent call last):
File "/data/cuml-test/find_optimum_outlier_param.py", line 307, in <module>
main_run(args.dataset_name, args.collection_text_filepath, args.collection_embedding_filepath, args.remove_outlier, args.normalize, args.target_ratios)
File "/data/cuml-test/find_optimum_outlier_param.py", line 214, in main_run
outlier_label = dbscan_outlier(normalized_target_docids_embs, eps=eps, min_samples=5)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/cuml-test/find_optimum_outlier_param.py", line 122, in dbscan_outlier
outlier_label = dbscan.fit_predict(data, np.int64)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/rapids-24.10/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/rapids-24.10/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/rapids-24.10/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "dbscan.pyx", line 466, in cuml.cluster.dbscan.DBSCAN.fit_predict
File "/root/miniconda3/envs/rapids-24.10/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/rapids-24.10/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/rapids-24.10/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "dbscan.pyx", line 442, in cuml.cluster.dbscan.DBSCAN.fit
File "/root/miniconda3/envs/rapids-24.10/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "dbscan.pyx", line 353, in cuml.cluster.dbscan.DBSCAN._fit
RuntimeError: CUDA error encountered at: file=/root/miniconda3/envs/rapids-24.10/include/raft/spatial/knn/detail/epsilon_neighborhood.cuh line=197:
This machine is CUDA: 12.2 Linux: Ubuntu 20.04 amd64 GPU: RTX TITAN, Driver Version: 535.183.01 cuML installed with conda
Thanks for updating this thread.
As a temporary workaround, would you be able to try HDBSCAN instead of DBSCAN? This PyData conference talk on HDBSCAN makes a compelling case for using it vs. DBSCAN -- and it hopefully shouldn't run into this issue.
Describe the bug When feeding DBSCAN.fit_predict with data x having many a large #rows, it crashed instantly with the following error:
Steps/Code to reproduce bug Here's the code snippet to reproduce the bug with X having 5M rows
Expected behavior The last line
labels = dbscan.fit_predict(x)
crashes immediately.Environment details (please complete the following information):
Additional context This issue seems relevant to this solved issue: https://github.com/rapidsai/cuml/issues/1753