rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.19k stars 527 forks source link

[BUG] Dense PCA fails with CUDA12 #5555

Closed Intron7 closed 2 months ago

Intron7 commented 1 year ago

Describe the bug Dense PCA fails for larger Datasets with the following error. RuntimeError: cuSOLVER error encountered at: file=/home/sdicks/micromamba/envs/rapids-23.08_12/include/raft/linalg/detail/eig.cuh line=118: With CUDA 11.8 and Rapids-23.08 it works. Steps/Code to reproduce bug

X = cp.random.rand(90000,5000,dtype= cp.float32)
pca_func = PCA(
    n_components=100, random_state=42, output_type="numpy"
)
X_pca = pca_func.fit_transform(X)

Expected behavior That it work like cuml 23.08 with cuda11-8

Environment details (please complete the following information):

Additional context Add any other context about the problem here.

divyegala commented 1 year ago

@Intron7 thanks for raising this issue. Is there a stack trace that you could share?

Intron7 commented 1 year ago
Traceback (most recent call last):
  File "/tmp/ipykernel_5301/2924074572.py", line 2, in <module>
    X_pca = pca_func.fit_transform(X)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
  File "base.pyx", line 665, in cuml.internals.base.UniversalBase.dispatch_func
  File "pca.pyx", line 509, in cuml.decomposition.pca.PCA.fit_transform
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
  File "base.pyx", line 665, in cuml.internals.base.UniversalBase.dispatch_func
  File "pca.pyx", line 470, in cuml.decomposition.pca.PCA.fit
RuntimeError: cuSOLVER error encountered at: file=/home/severin/conda/envs/rapids-23.08_12/include/raft/linalg/detail/eig.cuh line=118: 

pca-23.08_cu1.zip

@dantegd I hope that enough. I can also appended a nsys-report for cuda12 and cuda118 runs.

Intron7 commented 11 months ago

It seems like the bug still persists in rapids-23.10

Intron7 commented 10 months ago

Just to check up is there still work beeing done fixing this issue? As far as I can tell the bug still persists in 23.12

lharri73 commented 10 months ago

Also wondering about this. It must be related to rapids because torch doesn't have an issue running the same calculation on one GPU.

Intron7 commented 6 months ago

This is still broken. Also in the development version for 24.04

Intron7 commented 5 months ago

@dantegd So I did some more testing on this because this seems to mainly affect Ampere GPUs. Can this please be fixed.

lharri73 commented 5 months ago

@Intron7 no, this also affects H100's (Hopper).

lowener commented 4 months ago

I submitted PR rapidsai/raft#2332 to fix this issue. It should be resolved by version 24.08

liyaodev commented 3 months ago

@lowener Hi, I also had a similar problem in CUDA 12.1, I would like to ask if I need to choose 12.0?

Traceback (most recent call last):
  File "/app/r_sc_test.py", line 115, in <module>
    rsc.tl.pca(adata, n_comps=100)
  File "/usr/local/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_pca.py", line 163, in pca
    X_pca = pca_func.fit_transform(X)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
  File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
  File "pca.pyx", line 507, in cuml.decomposition.pca.PCA.fit_transform
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
  File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
  File "pca.pyx", line 468, in cuml.decomposition.pca.PCA.fit
RuntimeError: cuSOLVER error encountered at: file=/__w/cuml/cuml/python/build/cp310-cp310-linux_x86_64/_deps/raft-src/cpp/include/raft/linalg/detail/eig.cuh line=121: call='cusolverDnxsyevd(cusolverH, dn_params, CUSOLVER_EIG_MODE_VECTOR, CUBLAS_FILL_MODE_UPPER, static_cast<int64_t>(n_rows), eig_vectors, static_cast<int64_t>(n_cols), eig_vals, d_work.data(), workspaceDevice, h_work.data(), workspaceHost, d_dev_info.data(), stream)', Reason=7:CUSOLVER_STATUS_INTERNAL_ERROR
Obtained 42 stack frames
#1 in /usr/local/lib/python3.10/site-packages/cuml/internals/../libcuml++.so: raft::cusolver_error::cusolver_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0xbd [0x7f301aaa289d]
#2 in /usr/local/lib/python3.10/site-packages/cuml/internals/../libcuml++.so: void raft::linalg::detail::eigDC<float>(raft::resources const&, float const*, unsigned long, unsigned long, float*, float*, CUstream_st*) +0xe6b [0x7f301b0468fb]
#3 in /usr/local/lib/python3.10/site-packages/cuml/internals/../libcuml++.so: void ML::truncCompExpVars<float, ML::solver>(raft::handle_t const&, float*, float*, float*, float*, ML::paramsTSVDTemplate<ML::solver> const&, CUstream_st*) +0x5de [0x7f301b5f579e]
#4 in /usr/local/lib/python3.10/site-packages/cuml/internals/../libcuml++.so(+0x2cb60c5) [0x7f301b5e90c5]
#5 in /usr/local/lib/python3.10/site-packages/cuml/decomposition/pca.cpython-310-x86_64-linux-gnu.so(+0x3fedb) [0x7f2fe3c8eedb]
#6 in /usr/local/lib/python3.10/site-packages/cuml/internals/base.cpython-310-x86_64-linux-gnu.so(+0x1006e) [0x7f2fe455006e]
#7 in /usr/local/lib/python3.10/site-packages/cuml/internals/base.cpython-310-x86_64-linux-gnu.so(+0x2e7a6) [0x7f2fe456e7a6]
#8 in python: PyVectorcall_Call +0x6c [0x55ee4702169c]
#9 in python: _PyEval_EvalFrameDefault +0x44c6 [0x55ee47008c06]
#10 in python(+0x131d08) [0x55ee470dad08]
#11 in python(+0x2248c1) [0x55ee471cd8c1]
#12 in python: PyVectorcall_Call +0x6c [0x55ee4702169c]
#13 in python: _PyEval_EvalFrameDefault +0x44c6 [0x55ee47008c06]
#14 in python(+0x131d08) [0x55ee470dad08]
#15 in python: PyVectorcall_Call +0x6c [0x55ee4702169c]
#16 in python: _PyEval_EvalFrameDefault +0x44c6 [0x55ee47008c06]
#17 in python(+0x131d08) [0x55ee470dad08]
#18 in /usr/local/lib/python3.10/site-packages/cuml/decomposition/pca.cpython-310-x86_64-linux-gnu.so(+0x305d5) [0x7f2fe3c7f5d5]
#19 in /usr/local/lib/python3.10/site-packages/cuml/internals/base.cpython-310-x86_64-linux-gnu.so(+0x1006e) [0x7f2fe455006e]
#20 in /usr/local/lib/python3.10/site-packages/cuml/internals/base.cpython-310-x86_64-linux-gnu.so(+0x2e7a6) [0x7f2fe456e7a6]
#21 in python: PyVectorcall_Call +0x6c [0x55ee4702169c]
#22 in python: _PyEval_EvalFrameDefault +0x44c6 [0x55ee47008c06]
lowener commented 2 months ago

Closing this issue now that rapidsai/raft#2332 has been merged. It is resolved in version 24.10.

Intron7 commented 2 months ago

This already works for rapids-24.08