rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.14k stars 526 forks source link

[BUG] PCA doesn't work with sparse matrices #5475

Closed Intron7 closed 1 year ago

Intron7 commented 1 year ago

Describe the bug The PCA function doesn't work with sparse matrices. Steps/Code to reproduce bug

import cupy as cp 
from cuml.decomposition import PCA
from cupyx.scipy import sparse
from cuml.testing.utils import array_equal

X = sparse.random(100000,5000,density=0.05,format="csr",dtype=cp.float32, random_state=0)

pca_func = PCA(
    n_components=50, random_state=0, output_type="cupy"
)
X_pca = pca_func.fit_transform(X)

X_dense = X.toarray()
pca_func_dense = PCA(
    n_components=50, random_state=0, output_type="cupy"
)
X_pca_dense = pca_func_dense.fit_transform(X_dense)

assert array_equal(X_pca_dense, X_pca, 1e-3, with_sign=False)

Output

Traceback (most recent call last):
  File "cupy_backends/cuda/libs/cusparse.pyx", line 1544, in cupy_backends.cuda.libs.cusparse.check_status
  File "cupy_backends/cuda/libs/cusparse.pyx", line 1534, in cupy_backends.cuda.libs.cusparse.CuSparseError.__init__
KeyError: 11
Exception ignored in: 'cupy_backends.cuda.libs.cusparse.spGEMM_workEstimation'
Traceback (most recent call last):
  File "cupy_backends/cuda/libs/cusparse.pyx", line 1544, in cupy_backends.cuda.libs.cusparse.check_status
  File "cupy_backends/cuda/libs/cusparse.pyx", line 1534, in cupy_backends.cuda.libs.cusparse.CuSparseError.__init__
KeyError: 11
Traceback (most recent call last):
  File "/sc-scratch/sc-scratch-rosen-neurogenetics/git/rapids_singlecell_tester/notebooks/pca_sparse.py", line 19, in <module>
    assert array_equal(X_pca_dense, X_pca, 1e-3, with_sign=False)
AssertionError

Expected behavior To run PCA with sparse matrix without the error and get the same result.

Environment details (please complete the following information):

Click here to see environment details

 - Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
 - Linux Distro/Architecture: [Ubuntu 20.04 amd64, Rocky Linux 8.5 amd 64]
 - GPU Model/Driver: [A100 driver 525.78.01 & A100 470.161.03]
 - CUDA: [12, 11.4]
 - Method of cuDF & cuML install: [conda]
   - name: rapids-23.06
channels:
  - nvidia
  - rapidsai
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - aiohttp=3.8.4=py310h2372a71_1
  - aiosignal=1.3.1=pyhd8ed1ab_0
  - anyio=3.7.0=pyhd8ed1ab_1
  - aom=3.5.0=h27087fc_0
  - appdirs=1.4.4=pyh9f0ad1d_0
  - argon2-cffi=21.3.0=pyhd8ed1ab_0
  - argon2-cffi-bindings=21.2.0=py310h5764c6d_3
  - asttokens=2.2.1=pyhd8ed1ab_0
  - async-timeout=4.0.2=pyhd8ed1ab_0
  - attrs=23.1.0=pyh71513ae_1
  - aws-c-auth=0.6.28=hccec9ca_5
  - aws-c-cal=0.5.27=hf85dbcb_0
  - aws-c-common=0.8.20=hd590300_0
  - aws-c-compression=0.2.17=h4b87b72_0
  - aws-c-event-stream=0.3.0=hc5de78f_6
  - aws-c-http=0.7.8=h412fb1b_4
  - aws-c-io=0.13.26=h0d05201_0
  - aws-c-mqtt=0.8.13=ha5d9b87_2
  - aws-c-s3=0.3.4=h95e21fb_5
  - aws-c-sdkutils=0.1.10=h4b87b72_0
  - aws-checksums=0.1.16=h4b87b72_0
  - aws-crt-cpp=0.20.2=h5289e1f_9
  - aws-sdk-cpp=1.10.57=h8101662_14
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=pyhd8ed1ab_3
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - beautifulsoup4=4.12.2=pyha770c72_0
  - bleach=6.0.0=pyhd8ed1ab_0
  - blosc=1.21.4=h0f2a231_0
  - bokeh=2.4.3=pyhd8ed1ab_3
  - boost-cpp=1.78.0=h5adbc97_2
  - branca=0.6.0=pyhd8ed1ab_0
  - brotli=1.0.9=h166bdaf_8
  - brotli-bin=1.0.9=h166bdaf_8
  - brotlipy=0.7.0=py310h5764c6d_1005
  - brunsli=0.1=h9c3ff4c_0
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.19.1=hd590300_0
  - c-blosc2=2.9.2=hb4ffafa_0
  - ca-certificates=2023.5.7=hbcca054_0
  - cachetools=5.3.0=pyhd8ed1ab_0
  - cairo=1.16.0=ha61ee94_1014
  - certifi=2023.5.7=pyhd8ed1ab_0
  - cffi=1.15.1=py310h255011f_3
  - cfitsio=4.2.0=hd9d235c_0
  - charls=2.4.2=h59595ed_0
  - charset-normalizer=3.1.0=pyhd8ed1ab_0
  - click=8.1.3=unix_pyhd8ed1ab_2
  - click-plugins=1.1.1=py_0
  - cligj=0.7.2=pyhd8ed1ab_1
  - cloudpickle=2.2.1=pyhd8ed1ab_0
  - colorama=0.4.6=pyhd8ed1ab_0
  - colorcet=3.0.1=pyhd8ed1ab_0
  - comm=0.1.3=pyhd8ed1ab_0
  - contourpy=1.1.0=py310hd41b1e2_0
  - cryptography=41.0.1=py310h75e40e8_0
  - cubinlinker=0.3.0=py310hfdf336d_0
  - cucim=23.06.00=cuda11_py310_230607_gfdc657b_0
  - cuda-profiler-api=11.8.86=0
  - cuda-python=11.8.2=py310h01a121a_0
  - cuda-version=11.8=h70ddcb2_2
  - cudatoolkit=11.8.0=h37601d7_11
  - cudf=23.06.00=cuda11_py310_230607_gf881d40c63_0
  - cudf_kafka=23.06.00=py310_230607_gf881d40c63_0
  - cugraph=23.06.02=cuda11_py310_230613_gdb9d3c12_0
  - cuml=23.06.00=cuda11_py310_230607_ga381e03f2_0
  - cupy=12.0.0=py310h9216885_4
  - curl=8.1.2=h409715c_0
  - cusignal=23.06.00=py39_230607_g22c7120_0
  - cuspatial=23.06.00=py310_230607_g7b3284af_0
  - custreamz=23.06.00=py310_230607_gf881d40c63_0
  - cuxfilter=23.06.00=py310_230607_g862c7d1_0
  - cycler=0.11.0=pyhd8ed1ab_0
  - cyrus-sasl=2.1.27=h9033bb2_6
  - cytoolz=0.12.0=py310h5764c6d_1
  - dask=2023.3.2=pyhd8ed1ab_0
  - dask-core=2023.3.2=pyhd8ed1ab_0
  - dask-cuda=23.06.00=py310_230607_gfd3ab2d_0
  - dask-cudf=23.06.00=cuda11_py310_230607_gf881d40c63_0
  - datashader=0.15.0=pyhd8ed1ab_0
  - datashape=0.5.4=py_1
  - dav1d=1.2.1=hd590300_0
  - debugpy=1.6.7=py310heca2aa9_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - distributed=2023.3.2.1=pyhd8ed1ab_0
  - dlpack=0.5=h9c3ff4c_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - exceptiongroup=1.1.1=pyhd8ed1ab_0
  - executing=1.2.0=pyhd8ed1ab_0
  - expat=2.5.0=hcb278e6_1
  - fastrlock=0.8=py310hd8f1fbe_3
  - fiona=1.9.1=py310ha325b7b_0
  - flit-core=3.9.0=pyhd8ed1ab_0
  - fmt=9.1.0=h924138e_0
  - folium=0.14.0=pyhd8ed1ab_0
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=hab24e00_0
  - fontconfig=2.14.2=h14ed4e7_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - fonttools=4.40.0=py310h2372a71_0
  - freetype=2.12.1=hca18f0e_1
  - freexl=1.0.6=h166bdaf_1
  - frozenlist=1.3.3=py310h5764c6d_0
  - fsspec=2023.6.0=pyh1a96a4e_0
  - gdal=3.6.2=py310hc1b7723_3
  - gdk-pixbuf=2.42.10=h05c8ddd_0
  - geopandas=0.13.2=pyhd8ed1ab_1
  - geopandas-base=0.13.2=pyha770c72_1
  - geos=3.11.1=h27087fc_0
  - geotiff=1.7.1=h7157cca_5
  - gettext=0.21.1=h27087fc_0
  - gflags=2.2.2=he1b5a44_1004
  - giflib=5.2.1=h0b41bf4_3
  - glib=2.76.3=hfc55251_0
  - glib-tools=2.76.3=hfc55251_0
  - glog=0.6.0=h6f12383_0
  - gmock=1.13.0=ha770c72_1
  - gtest=1.13.0=h00ab1b0_1
  - hdf4=4.2.15=h9772cbc_5
  - hdf5=1.12.2=nompi_h4df4325_101
  - holoviews=1.15.4=pyhd8ed1ab_0
  - icu=70.1=h27087fc_0
  - idna=3.4=pyhd8ed1ab_0
  - imagecodecs=2023.1.23=py310ha3ed6a1_0
  - imageio=2.31.1=pyh24c5eb1_0
  - importlib-metadata=6.7.0=pyha770c72_0
  - importlib_metadata=6.7.0=hd8ed1ab_0
  - importlib_resources=5.12.0=pyhd8ed1ab_0
  - ipykernel=6.23.2=pyh210e3f2_0
  - ipython=8.14.0=pyh41d4057_0
  - ipywidgets=8.0.6=pyhd8ed1ab_0
  - jbig=2.1=h7f98852_2003
  - jedi=0.18.2=pyhd8ed1ab_0
  - jinja2=3.1.2=pyhd8ed1ab_1
  - joblib=1.2.0=pyhd8ed1ab_0
  - jpeg=9e=h0b41bf4_3
  - json-c=0.16=hc379101_0
  - jsonschema=4.17.3=pyhd8ed1ab_0
  - jupyter-server-proxy=4.0.0=pyhd8ed1ab_0
  - jupyter_client=8.2.0=pyhd8ed1ab_0
  - jupyter_core=5.3.1=py310hff52083_0
  - jupyter_events=0.6.3=pyhd8ed1ab_0
  - jupyter_server=2.6.0=pyhd8ed1ab_0
  - jupyter_server_terminals=0.4.4=pyhd8ed1ab_1
  - jupyterlab_pygments=0.2.2=pyhd8ed1ab_0
  - jupyterlab_widgets=3.0.7=pyhd8ed1ab_1
  - jxrlib=1.1=h7f98852_2
  - kealib=1.5.0=ha7026e8_0
  - keyutils=1.6.1=h166bdaf_0
  - kiwisolver=1.4.4=py310hbf28c38_1
  - krb5=1.20.1=h81ceb04_0
  - lazy_loader=0.2=pyhd8ed1ab_0
  - lcms2=2.15=hfd0df8a_0
  - ld_impl_linux-64=2.40=h41732ed_0
  - lerc=4.0.0=h27087fc_0
  - libabseil=20230125.2=cxx17_h59595ed_2
  - libaec=1.0.6=hcb278e6_1
  - libarrow=11.0.0=hc00ebf5_25_cpu
  - libavif=0.11.1=h8182462_2
  - libblas=3.9.0=17_linux64_openblas
  - libbrotlicommon=1.0.9=h166bdaf_8
  - libbrotlidec=1.0.9=h166bdaf_8
  - libbrotlienc=1.0.9=h166bdaf_8
  - libcblas=3.9.0=17_linux64_openblas
  - libcrc32c=1.1.2=h9c3ff4c_0
  - libcublas=11.11.3.6=0
  - libcublas-dev=11.11.3.6=0
  - libcucim=23.06.00=cuda11_230607_gfdc657b_0
  - libcudf=23.06.00=cuda11_230607_gf881d40c63_0
  - libcudf_kafka=23.06.00=230607_gf881d40c63_0
  - libcufft=10.9.0.58=0
  - libcufile=1.4.0.31=0
  - libcufile-dev=1.4.0.31=0
  - libcugraph=23.06.02=cuda11_230613_gdb9d3c12_0
  - libcugraph_etl=23.06.02=cuda11_230613_gdb9d3c12_0
  - libcugraphops=23.06.00=cuda11_230607_g77d012ac_0
  - libcuml=23.06.00=cuda11_230607_ga381e03f2_0
  - libcumlprims=23.06.00=cuda11_230607_g7081940_0
  - libcurand=10.3.0.86=0
  - libcurand-dev=10.3.0.86=0
  - libcurl=8.1.2=h409715c_0
  - libcusolver=11.4.1.48=0
  - libcusolver-dev=11.4.1.48=0
  - libcusparse=11.7.5.86=0
  - libcusparse-dev=11.7.5.86=0
  - libcuspatial=23.06.00=cuda11_230607_g7b3284af_0
  - libdeflate=1.17=h0b41bf4_0
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libevent=2.1.12=hf998b51_1
  - libexpat=2.5.0=hcb278e6_1
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=13.1.0=he5830b7_0
  - libgdal=3.6.2=h10cbb15_3
  - libgfortran-ng=13.1.0=h69a702a_0
  - libgfortran5=13.1.0=h15d22d2_0
  - libglib=2.76.3=hebfc3b9_0
  - libgomp=13.1.0=he5830b7_0
  - libgoogle-cloud=2.11.0=hac9eb74_1
  - libgrpc=1.54.2=hb20ce57_2
  - libiconv=1.17=h166bdaf_0
  - libkml=1.3.0=h37653c0_1015
  - libkvikio=23.06.00=cuda11_230607_gd3b823c_0
  - liblapack=3.9.0=17_linux64_openblas
  - libllvm14=14.0.6=hcd5def8_3
  - libnetcdf=4.8.1=nompi_h261ec11_106
  - libnghttp2=1.52.0=h61bc06f_0
  - libnsl=2.0.0=h7f98852_0
  - libntlm=1.4=h7f98852_1002
  - libnuma=2.0.16=h0b41bf4_1
  - libopenblas=0.3.23=pthreads_h80387f5_0
  - libpng=1.6.39=h753d276_0
  - libpq=15.2=hb675445_0
  - libprotobuf=3.21.12=h3eb15da_0
  - libraft=23.06.01=cuda11_230612_g9147c907_0
  - libraft-headers=23.06.01=cuda11_230612_g9147c907_0
  - libraft-headers-only=23.06.01=cuda11_230612_g9147c907_0
  - librdkafka=1.9.2=ha5a0de0_2
  - librmm=23.06.00=cuda11_230607_gacaf3f5e_0
  - librttopo=1.1.0=ha49c73b_12
  - libsodium=1.0.18=h36c2ea0_1
  - libspatialindex=1.9.3=h9c3ff4c_4
  - libspatialite=5.0.1=h7c8129e_22
  - libsqlite=3.42.0=h2797004_0
  - libssh2=1.11.0=h0841786_0
  - libstdcxx-ng=13.1.0=hfd8a6a1_0
  - libthrift=0.18.1=h8fd135c_2
  - libtiff=4.5.0=h6adf6a1_2
  - libutf8proc=2.8.0=h166bdaf_0
  - libuuid=2.38.1=h0b41bf4_0
  - libuv=1.44.2=h166bdaf_0
  - libwebp=1.2.4=h1daa5a0_1
  - libwebp-base=1.2.4=h166bdaf_0
  - libxcb=1.13=h7f98852_1004
  - libxgboost=1.7.5dev.rapidsai23.06=cuda11_0
  - libxml2=2.10.3=hca2bb57_4
  - libzip=1.9.2=hc929e4a_1
  - libzlib=1.2.13=hd590300_5
  - libzopfli=1.0.3=h9c3ff4c_0
  - llvmlite=0.40.0=py310h1b8f574_0
  - locket=1.0.0=pyhd8ed1ab_0
  - lz4=4.3.2=py310h0cfdcf0_0
  - lz4-c=1.9.4=hcb278e6_0
  - mapclassify=2.5.0=pyhd8ed1ab_1
  - markdown=3.4.3=pyhd8ed1ab_0
  - markupsafe=2.1.3=py310h2372a71_0
  - matplotlib-base=3.7.1=py310he60537e_0
  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
  - mistune=3.0.0=pyhd8ed1ab_0
  - msgpack-python=1.0.5=py310hdf3cbec_0
  - multidict=6.0.4=py310h1fa729e_0
  - multipledispatch=0.6.0=py_0
  - munch=3.0.0=pyhd8ed1ab_0
  - munkres=1.1.4=pyh9f0ad1d_0
  - nbclient=0.8.0=pyhd8ed1ab_0
  - nbconvert-core=7.6.0=pyhd8ed1ab_0
  - nbformat=5.9.0=pyhd8ed1ab_0
  - nccl=2.18.3.1=h12f7317_0
  - ncurses=6.4=hcb278e6_0
  - nest-asyncio=1.5.6=pyhd8ed1ab_0
  - networkx=3.1=pyhd8ed1ab_0
  - nodejs=18.15.0=h8d033a5_0
  - nspr=4.35=h27087fc_0
  - nss=3.89=he45b914_0
  - numba=0.57.0=py310h0f6aa51_2
  - numpy=1.24.3=py310ha4c1d20_0
  - nvtx=0.2.5=py310h1fa729e_0
  - openjpeg=2.5.0=hfec8fc6_2
  - openslide=3.4.1=h7773abc_6
  - openssl=3.1.1=hd590300_1
  - orc=1.8.4=h2f23424_0
  - overrides=7.3.1=pyhd8ed1ab_0
  - packaging=23.1=pyhd8ed1ab_0
  - pandas=1.5.3=py310h9b08913_1
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - panel=0.14.1=pyhd8ed1ab_0
  - param=1.13.0=pyh1a96a4e_0
  - parso=0.8.3=pyhd8ed1ab_0
  - partd=1.4.0=pyhd8ed1ab_0
  - pcre2=10.40=hc3806b6_0
  - pexpect=4.8.0=pyh1a96a4e_2
  - pickleshare=0.7.5=py_1003
  - pillow=9.4.0=py310h023d228_1
  - pip=23.1.2=pyhd8ed1ab_0
  - pixman=0.40.0=h36c2ea0_0
  - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_0
  - platformdirs=3.6.0=pyhd8ed1ab_0
  - pooch=1.7.0=pyha770c72_3
  - poppler=22.12.0=h091648b_1
  - poppler-data=0.4.12=hd8ed1ab_0
  - postgresql=15.2=h3248436_0
  - proj=9.1.0=h8ffa02c_1
  - prometheus_client=0.17.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.38=pyha770c72_0
  - prompt_toolkit=3.0.38=hd8ed1ab_0
  - protobuf=4.21.12=py310heca2aa9_0
  - psutil=5.9.5=py310h1fa729e_0
  - pthread-stubs=0.4=h36c2ea0_1001
  - ptxcompiler=0.8.1=py310h01a121a_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - py-xgboost=1.7.5dev.rapidsai23.06=cuda11_py310_0
  - pyarrow=11.0.0=py310he6bfd7f_25_cpu
  - pycparser=2.21=pyhd8ed1ab_0
  - pyct=0.4.6=py_0
  - pyct-core=0.4.6=py_0
  - pydeck=0.5.0=pyh9f0ad1d_0
  - pyee=8.1.0=pyhd8ed1ab_0
  - pygments=2.15.1=pyhd8ed1ab_0
  - pylibcugraph=23.06.02=cuda11_py310_230613_gdb9d3c12_0
  - pylibraft=23.06.01=cuda11_py310_230612_g9147c907_0
  - pynvml=11.4.1=pyhd8ed1ab_0
  - pyopenssl=23.2.0=pyhd8ed1ab_1
  - pyparsing=3.1.0=pyhd8ed1ab_0
  - pyppeteer=1.0.2=pyhd8ed1ab_0
  - pyproj=3.4.0=py310hb1338dc_2
  - pyrsistent=0.19.3=py310h1fa729e_0
  - pysocks=1.7.1=pyha2e5f31_6
  - python=3.10.11=he550d4f_0_cpython
  - python-confluent-kafka=1.9.2=py310h5764c6d_2
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python-fastjsonschema=2.17.1=pyhd8ed1ab_0
  - python-json-logger=2.0.7=pyhd8ed1ab_0
  - python_abi=3.10=3_cp310
  - pytz=2023.3=pyhd8ed1ab_0
  - pyviz_comms=2.3.2=pyhd8ed1ab_0
  - pywavelets=1.4.1=py310h0a54255_0
  - pyyaml=6.0=py310h5764c6d_5
  - pyzmq=25.1.0=py310h5bbb5d0_0
  - raft-dask=23.06.01=cuda11_py310_230612_g9147c907_0
  - rapids=23.06.02=cuda11_py310_230613_g9b052fc_0
  - rapids-xgboost=23.06.02=cuda11_py310_230613_g9b052fc_0
  - rdma-core=28.9=h59595ed_1
  - re2=2023.03.02=h8c504da_0
  - readline=8.2=h8228510_1
  - requests=2.31.0=pyhd8ed1ab_0
  - rfc3339-validator=0.1.4=pyhd8ed1ab_0
  - rfc3986-validator=0.1.1=pyh9f0ad1d_0
  - rmm=23.06.00=cuda11_py310_230607_gacaf3f5e_0
  - rtree=1.0.1=py310hbdcdc62_1
  - s2n=1.3.45=h06160fa_0
  - scikit-image=0.20.0=py310h9b08913_1
  - scikit-learn=1.2.2=py310hf7d194e_2
  - scipy=1.10.1=py310ha4c1d20_3
  - send2trash=1.8.2=pyh41d4057_0
  - setuptools=67.7.2=pyhd8ed1ab_0
  - shapely=2.0.1=py310h8b84c32_0
  - simpervisor=1.0.0=pyhd8ed1ab_0
  - six=1.16.0=pyh6c4a22f_0
  - snappy=1.1.10=h9fff704_0
  - sniffio=1.3.0=pyhd8ed1ab_0
  - sortedcontainers=2.4.0=pyhd8ed1ab_0
  - soupsieve=2.3.2.post1=pyhd8ed1ab_0
  - spdlog=1.11.0=h9b3ece8_1
  - sqlite=3.42.0=h2c6b66d_0
  - stack_data=0.6.2=pyhd8ed1ab_0
  - streamz=0.6.4=pyh6c4a22f_0
  - tblib=1.7.0=pyhd8ed1ab_0
  - terminado=0.17.1=pyh41d4057_0
  - threadpoolctl=3.1.0=pyh8a188c0_0
  - tifffile=2023.4.12=pyhd8ed1ab_0
  - tiledb=2.13.2=hd532e3d_0
  - tinycss2=1.2.1=pyhd8ed1ab_0
  - tk=8.6.12=h27826a3_0
  - toolz=0.12.0=pyhd8ed1ab_0
  - tornado=6.3.2=py310h2372a71_0
  - tqdm=4.65.0=pyhd8ed1ab_1
  - traitlets=5.9.0=pyhd8ed1ab_0
  - treelite=3.2.0=py310h1be96d9_0
  - typing-extensions=4.6.3=hd8ed1ab_0
  - typing_extensions=4.6.3=pyha770c72_0
  - typing_utils=0.1.0=pyhd8ed1ab_0
  - tzcode=2023c=h0b41bf4_0
  - tzdata=2023c=h71feb2d_0
  - ucx=1.14.1=h4a2ce2d_2
  - ucx-proc=1.0.0=gpu
  - ucx-py=0.32.00=py310_230607_gded9ea2_0
  - unicodedata2=15.0.0=py310h5764c6d_0
  - urllib3=1.26.15=pyhd8ed1ab_0
  - wcwidth=0.2.6=pyhd8ed1ab_0
  - webencodings=0.5.1=py_1
  - websocket-client=1.6.0=pyhd8ed1ab_0
  - websockets=10.4=py310h5764c6d_1
  - wheel=0.40.0=pyhd8ed1ab_0
  - widgetsnbextension=4.0.7=pyhd8ed1ab_0
  - xarray=2023.5.0=pyhd8ed1ab_0
  - xerces-c=3.2.4=h55805fa_1
  - xgboost=1.7.5dev.rapidsai23.06=cuda11_py310_0
  - xorg-kbproto=1.0.7=h7f98852_1002
  - xorg-libice=1.1.1=hd590300_0
  - xorg-libsm=1.2.4=h7391055_0
  - xorg-libx11=1.8.4=h0b41bf4_0
  - xorg-libxau=1.0.11=hd590300_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xorg-libxext=1.3.4=h0b41bf4_2
  - xorg-libxrender=0.9.10=h7f98852_1003
  - xorg-renderproto=0.11.1=h7f98852_1002
  - xorg-xextproto=7.3.0=h0b41bf4_1003
  - xorg-xproto=7.0.31=h7f98852_1007
  - xyzservices=2023.5.0=pyhd8ed1ab_1
  - xz=5.2.6=h166bdaf_0
  - yaml=0.2.5=h7f98852_2
  - yarl=1.9.2=py310h2372a71_0
  - zeromq=4.3.4=h9c3ff4c_1
  - zfp=1.0.0=h27087fc_3
  - zict=3.0.0=pyhd8ed1ab_0
  - zipp=3.15.0=pyhd8ed1ab_0
  - zlib=1.2.13=hd590300_5
  - zlib-ng=2.0.7=h0b41bf4_0
  - zstd=1.5.2=h3eb15da_6
  - pip:
      - treelite-runtime==3.2.0
prefix: /home/dleonpe/miniconda3/envs/rapids-23.06

Additional context Has also been prevalent in older rapids versions.

lowener commented 1 year ago

You are likely running into the error of CUSPARSE_STATUS_INSUFFICIENT_RESOURCES (see here). On my 48GB GPU I was not able to run the sparse PCA with a density of 0.05, but by lowering the density to less than 0.02 I was able to run it. By quickly running a few sparse PCA I saw that for a matrix of that dimension, a density of 0.02 leads to a 26GB memory allocation. This is related to the covariance matrix computation. https://github.com/rapidsai/cuml/blob/93f7ddc6c473d93093900061afd539c08b53287f/python/cuml/decomposition/pca.pyx#L365 So you can either lower the dimension of the matrix or lower it's density so that it fits your GPU memory.

Intron7 commented 1 year ago

Ok thank you. I'm just confused because the dense version works with way bigger matrices. Usually sparse matrices are used to save VRAM.

flying-sheep commented 1 year ago

@lowener why was this closed? Doesn’t it seem like a bug that dense PCA needs less memory than sparse PCA?

Intron7 commented 1 year ago

I think it might be a bug with cuSPARSE. I created a replacement implementation for the cov function with a custom cupy kernel for csr matrices based on a spGEMM paper. It performs really well. I can create a PR.

lowener commented 1 year ago

@Intron7 This would be a great new feature that would be very helpful for many users! Your PR would be very welcomed! CuSparse has recently released more algorithm (as of this February) for SpGEMM that are using less memory. (see here) So I created this issue https://github.com/cupy/cupy/issues/7699 to add them in cupy. @flying-sheep I closed this issue too early. I'll reopen it.