Open alexwilson1 opened 3 years ago
Thank you for opening the issue. A minimal reproducer would be welcomed for a more informed answer. However, from available information, I see two possible causes. First, you may simply run out of memory (device or host memory). Or otherwise, the failure might happen because of an incompatibility between the CUDA version installed in the environment (11.6) and the one RAPIDS (and FAISS) was compiled with (11.2). You may be able to fix this by installing a 11.4 CUDA environment and RAPIDS software suite.
@viclafargue the CUDA 11.6 version reported by SMI is just the maximum CUDA version supported by that driver, doesn't mean that the environment/locally CUDA 11.6 is being used
@alexwilson1 could you run https://github.com/rapidsai/cuml/blob/branch-21.12/print_env.sh and put the output in a reply here? That'll also help triage
@viclafargue the CUDA 11.6 version reported by SMI is just the maximum CUDA version supported by that driver, doesn't mean that the environment/locally CUDA 11.6 is being used
@alexwilson1 could you run https://github.com/rapidsai/cuml/blob/branch-21.12/print_env.sh and put the output in a reply here? That'll also help triage
**git*** *REDACTED* **git submodules*** ***OS Information*** DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS" NAME="Ubuntu" VERSION="18.04.6 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.6 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic Linux 816b6aa61b81 5.10.60.1-microsoft-standard-WSL2 #1 SMP Wed Aug 25 23:20:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux ***GPU Information*** Mon Oct 18 18:33:09 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.00 Driver Version: 510.10 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA TITAN RTX On | 00000000:01:00.0 Off | N/A | | 41% 29C P8 10W / 280W | 316MiB / 24576MiB | N/A Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ***CPU*** Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz Stepping: 13 CPU MHz: 3600.011 BogoMIPS: 7200.02 Virtualization: VT-x Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves md_clear flush_l1d arch_capabilities ***CMake*** ***g++*** /usr/bin/g++ g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ***nvcc*** ***Python*** /opt/conda/envs/rapids/bin/python Python 3.8.12 ***Environment Variables*** PATH : /opt/conda/envs/rapids/bin:/opt/conda/condabin:/opt/conda/envs/rapids/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH : /usr/local/nvidia/lib:/usr/local/nvidia/lib64 NUMBAPRO_NVVM : NUMBAPRO_LIBDEVICE : CONDA_PREFIX : /opt/conda/envs/rapids PYTHON_PATH : ***conda packages*** /opt/conda/condabin/conda # packages in environment at /opt/conda/envs/rapids: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_llvm conda-forge abseil-cpp 20210324.2 h9c3ff4c_0 conda-forge aiohttp 3.7.4.post0 py38h497a2fe_0 conda-forge anyio 3.3.2 py38h578d9bd_0 conda-forge appdirs 1.4.4 pyh9f0ad1d_0 conda-forge apscheduler 3.8.0 py38h578d9bd_0 conda-forge argon2-cffi 20.1.0 py38h497a2fe_2 conda-forge arrow-cpp 5.0.0 py38h327e1ba_4_cuda conda-forge arrow-cpp-proc 3.0.0 cuda conda-forge async-timeout 3.0.1 py_1000 conda-forge async_generator 1.10 py_0 conda-forge attrs 21.2.0 pyhd8ed1ab_0 conda-forge aws-c-cal 0.5.11 h95a6274_0 conda-forge aws-c-common 0.6.2 h7f98852_0 conda-forge aws-c-event-stream 0.2.7 h3541f99_13 conda-forge aws-c-io 0.10.5 hfb6a706_0 conda-forge aws-checksums 0.1.11 ha31a3da_7 conda-forge aws-sdk-cpp 1.8.186 hb4091e7_3 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge blas 2.108 mkl conda-forge blas-devel 3.9.0 8_mkl conda-forge blazingsql 21.10.0 pypi_0 pypi bleach 4.1.0 pyhd8ed1ab_0 conda-forge blosc 1.21.0 h9c3ff4c_0 conda-forge bokeh 2.4.0 py38h578d9bd_0 conda-forge boost 1.72.0 py38h1e42940_1 conda-forge boost-cpp 1.72.0 h312852a_5 conda-forge brotli 1.0.9 h7f98852_5 conda-forge brotli-bin 1.0.9 h7f98852_5 conda-forge brotli-python 1.0.9 py38h709712a_5 conda-forge brotlipy 0.7.0 py38h497a2fe_1001 conda-forge brunsli 0.1 h9c3ff4c_0 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.17.2 h7f98852_0 conda-forge ca-certificates 2021.10.8 ha878542_0 conda-forge cachetools 4.2.4 pyhd8ed1ab_0 conda-forge cairo 1.16.0 h6cf1ce9_1008 conda-forge certifi 2021.10.8 py38h578d9bd_0 conda-forge cffi 1.14.6 py38h3931269_1 conda-forge cfitsio 3.470 hb418390_7 conda-forge chardet 4.0.0 py38h578d9bd_1 conda-forge charls 2.2.0 h9c3ff4c_0 conda-forge charset-normalizer 2.0.0 pyhd8ed1ab_0 conda-forge chart-studio 1.1.0 pyh9f0ad1d_0 conda-forge click 7.1.2 pyh9f0ad1d_0 conda-forge click-plugins 1.1.1 py_0 conda-forge cligj 0.7.2 pyhd8ed1ab_0 conda-forge cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge colorcet 2.0.6 pyhd8ed1ab_0 conda-forge cryptography 3.4.7 py38ha5dfef3_0 conda-forge cucim 21.10.00 cuda_11.0_py38_gd7ac21f_0 rapidsai cudatoolkit 11.0.221 h6bb024c_0 nvidia cudf 21.10.00 cuda_11.0_py38_g072fd862cc_0 rapidsai cudf_kafka 21.10.00 py38_g072fd862cc_0 rapidsai cugraph 21.10.00 cuda11.0_py38_g84617024_0 rapidsai cuml 21.10.00 cuda11.0_py38_g0fd3503ba_0 rapidsai cupy 9.0.0 py38hc350bd8_0 conda-forge curl 7.79.1 h2574ce0_1 conda-forge cusignal 21.10.00 py37_gff14a10_0 rapidsai cuspatial 21.10.00 py38_gba20298_0 rapidsai custreamz 21.10.00 py38_g072fd862cc_0 rapidsai cuxfilter 21.10.00 py38_g003d3d6_0 rapidsai cycler 0.10.0 py_2 conda-forge cyrus-sasl 2.1.27 h230043b_3 conda-forge cytoolz 0.11.0 py38h497a2fe_3 conda-forge dash 2.0.0 pypi_0 pypi dash-auth 1.4.1 pyhd3deb0d_0 conda-forge dash-bootstrap-components 0.13.1 pyhd8ed1ab_0 conda-forge dash-core-components 2.0.0 pypi_0 pypi dash-cytoscape 0.3.0 pypi_0 pypi dash-daq 0.5.0 pyh9f0ad1d_1 conda-forge dash-extensions 0.0.60 pypi_0 pypi dash-html-components 2.0.0 pypi_0 pypi dash-renderer 1.9.1 pyhd8ed1ab_0 conda-forge dash-table 5.0.0 pypi_0 pypi dash-uploader 0.6.0 pypi_0 pypi dask 2021.9.1 pyhd8ed1ab_0 conda-forge dask-core 2021.9.1 pyhd8ed1ab_0 conda-forge dask-cuda 21.10.00 py38_0 rapidsai dask-cudf 21.10.00 py38_g072fd862cc_0 rapidsai dataclasses 0.8 pyhc8e2a94_3 conda-forge datasets 1.13.2 pyhd8ed1ab_0 conda-forge datashader 0.11.1 pyh9f0ad1d_0 conda-forge datashape 0.5.4 py_1 conda-forge debugpy 1.4.1 py38h709712a_0 conda-forge decorator 4.4.2 py_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge dill 0.3.4 pyhd8ed1ab_0 conda-forge distributed 2021.9.1 py38h578d9bd_0 conda-forge dlpack 0.5 h9c3ff4c_0 conda-forge easynmt 2.0.1 pypi_0 pypi entrypoints 0.3 pyhd8ed1ab_1003 conda-forge expat 2.4.1 h9c3ff4c_0 conda-forge faiss-proc 1.0.0 cuda rapidsai fastavro 1.4.5 py38h497a2fe_0 conda-forge fastrlock 0.6 py38h709712a_1 conda-forge fasttext 0.9.2 pypi_0 pypi filelock 3.3.0 pyhd8ed1ab_0 conda-forge fiona 1.8.20 py38hbb147eb_1 conda-forge flask 2.0.2 pyhd8ed1ab_0 conda-forge flask-caching 1.10.1 pypi_0 pypi flask-compress 1.10.1 pyhd8ed1ab_0 conda-forge flask-seasurf 0.3.1 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.13.1 hba837de_1005 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge freetype 2.10.4 h0708190_1 conda-forge freexl 1.0.6 h7f98852_0 conda-forge fsspec 2021.10.0 pyhd8ed1ab_0 conda-forge future 0.18.2 py38h578d9bd_3 conda-forge gdal 3.3.1 py38h81a01a0_3 conda-forge geopandas 0.9.0 pyhd8ed1ab_1 conda-forge geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge geos 3.9.1 h9c3ff4c_2 conda-forge geotiff 1.6.0 h4f31c25_6 conda-forge gettext 0.19.8.1 h73d1719_1008 conda-forge gevent 21.8.0 py38h497a2fe_0 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge giflib 5.2.1 h36c2ea0_2 conda-forge git 2.33.0 pl5321hc30692c_1 conda-forge glog 0.5.0 h48cff8f_0 conda-forge google-cloud-cpp 1.29.0 hb967e95_1 conda-forge gpuci-tools 0.3.1 10 gpuci greenlet 1.1.2 py38h709712a_0 conda-forge grpc-cpp 1.39.1 h850795e_1 conda-forge hdbscan 0.8.27 py38h5c078b8_0 conda-forge hdf4 4.2.15 h10796ff_3 conda-forge hdf5 1.12.1 nompi_h2750804_101 conda-forge heapdict 1.0.1 py_0 conda-forge huggingface_hub 0.0.19 pyhd8ed1ab_0 conda-forge humanize 3.12.0 pypi_0 pypi icu 68.1 h58526e2_0 conda-forge idna 3.1 pyhd3deb0d_0 conda-forge imagecodecs 2021.7.30 py38hb5ce8f7_1 conda-forge imageio 2.9.0 py_0 conda-forge importlib-metadata 4.8.1 py38h578d9bd_0 conda-forge importlib_metadata 4.8.1 hd8ed1ab_0 conda-forge iniconfig 1.1.1 pypi_0 pypi ipykernel 6.4.1 py38he5a9106_0 conda-forge ipython 7.28.0 py38he5a9106_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.6.5 pyhd8ed1ab_0 conda-forge itsdangerous 2.0.1 pyhd8ed1ab_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge jedi 0.18.0 py38h578d9bd_2 conda-forge jinja2 3.0.1 pyhd8ed1ab_0 conda-forge joblib 1.0.1 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge jpype1 1.3.0 py38h1fd1430_0 conda-forge json-c 0.15 h98cffda_0 conda-forge jsonschema 4.0.1 pyhd8ed1ab_0 conda-forge jupyter-server-proxy 3.1.0 pyhd8ed1ab_0 conda-forge jupyter_client 7.0.6 pyhd8ed1ab_0 conda-forge jupyter_core 4.8.1 py38h578d9bd_0 conda-forge jupyter_server 1.11.1 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_widgets 1.0.2 pyhd8ed1ab_0 conda-forge jxrlib 1.1 h7f98852_2 conda-forge kealib 1.4.14 h87e4c3c_3 conda-forge kiwisolver 1.3.2 py38h1fd1430_0 conda-forge krb5 1.19.2 hcc1bbae_2 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge lerc 2.2.1 h9c3ff4c_0 conda-forge libaec 1.0.6 h9c3ff4c_0 conda-forge libblas 3.9.0 8_mkl conda-forge libbrotlicommon 1.0.9 h7f98852_5 conda-forge libbrotlidec 1.0.9 h7f98852_5 conda-forge libbrotlienc 1.0.9 h7f98852_5 conda-forge libcblas 3.9.0 8_mkl conda-forge libcrc32c 1.1.1 h9c3ff4c_2 conda-forge libcucim 21.10.00 cuda11.0_gd7ac21f_0 rapidsai libcudf 21.10.00 cuda11.0_g072fd862cc_0 rapidsai libcudf_kafka 21.10.00 g072fd862cc_0 rapidsai libcugraph 21.10.00 cuda11.0_g84617024_0 rapidsai libcuml 21.10.00 cuda11.0_g0fd3503ba_0 rapidsai libcumlprims 21.10.00 cuda11.0_g167dc59_0 nvidia libcurl 7.79.1 h2574ce0_1 conda-forge libcuspatial 21.10.00 cuda11.0_gba20298_0 rapidsai libdap4 3.20.6 hd7c4107_2 conda-forge libdeflate 1.7 h7f98852_5 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.10 h9b69904_4 conda-forge libfaiss 1.7.0 cuda110h8045045_8_cuda conda-forge libffi 3.4.2 h9c3ff4c_4 conda-forge libgcc-ng 9.4.0 hfa6338b_9 conda-forge libgcrypt 1.9.4 h7f98852_0 conda-forge libgdal 3.3.1 h6214c1d_3 conda-forge libgfortran-ng 11.2.0 h69a702a_9 conda-forge libgfortran5 11.2.0 h5c6108e_9 conda-forge libglib 2.68.4 h174f98d_1 conda-forge libgpg-error 1.42 h9c3ff4c_0 conda-forge libgsasl 1.10.0 h5b4c23d_0 conda-forge libhwloc 2.3.0 h5e5b7d1_1 conda-forge libiconv 1.16 h516909a_0 conda-forge libkml 1.3.0 hd79254b_1012 conda-forge liblapack 3.9.0 8_mkl conda-forge liblapacke 3.9.0 8_mkl conda-forge libllvm10 10.0.1 he513fc3_3 conda-forge libnetcdf 4.8.1 nompi_hb3fd0d9_101 conda-forge libnghttp2 1.43.0 h812cca2_1 conda-forge libntlm 1.4 h7f98852_1002 conda-forge libopenblas 0.3.17 pthreads_h8fe5266_1 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libpq 13.3 hd57d9b9_0 conda-forge libprotobuf 3.16.0 h780b84a_0 conda-forge librdkafka 1.6.1 hc49e61c_1 conda-forge librmm 21.10.00 cuda11.0_g72ee41f_0 rapidsai librttopo 1.1.0 h1185371_6 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libspatialindex 1.9.3 h9c3ff4c_4 conda-forge libspatialite 5.0.1 h8694cbe_6 conda-forge libssh2 1.10.0 ha56f1ee_2 conda-forge libstdcxx-ng 9.4.0 h79bfe98_9 conda-forge libthrift 0.14.2 he6d91bd_1 conda-forge libtiff 4.3.0 hf544144_1 conda-forge libutf8proc 2.6.1 h7f98852_0 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libuv 1.42.0 h7f98852_0 conda-forge libwebp 1.2.1 h3452ae3_0 conda-forge libwebp-base 1.2.1 h7f98852_0 conda-forge libxcb 1.13 h7f98852_1003 conda-forge libxgboost 1.4.2dev.rapidsai21.10 cuda11.0_0 rapidsai libxml2 2.9.12 h72842e0_0 conda-forge libzip 1.8.0 h4de3113_1 conda-forge libzlib 1.2.11 h36c2ea0_1013 conda-forge libzopfli 1.0.3 h9c3ff4c_0 conda-forge llvm-openmp 12.0.1 h4bd325d_1 conda-forge llvmlite 0.36.0 py38h4630a5e_0 conda-forge locket 0.2.0 py_2 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge mapclassify 2.4.3 pyhd8ed1ab_0 conda-forge markdown 3.3.4 pyhd8ed1ab_0 conda-forge markupsafe 2.0.1 py38h497a2fe_0 conda-forge matplotlib-base 3.4.3 py38hf4fb855_1 conda-forge matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge mistune 0.8.4 py38h497a2fe_1004 conda-forge mkl 2020.4 h726a3e6_304 conda-forge mkl-devel 2020.4 ha770c72_305 conda-forge mkl-include 2020.4 h726a3e6_304 conda-forge more-itertools 8.10.0 pypi_0 pypi msgpack-python 1.0.2 py38h1fd1430_1 conda-forge multidict 5.2.0 py38h497a2fe_0 conda-forge multipledispatch 0.6.0 py_0 conda-forge multiprocess 0.70.12.2 py38h497a2fe_0 conda-forge munch 2.5.0 py_0 conda-forge nbclient 0.5.4 pyhd8ed1ab_0 conda-forge nbconvert 6.2.0 py38h578d9bd_0 conda-forge nbformat 5.1.3 pyhd8ed1ab_0 conda-forge nccl 2.11.4.1 h96e36e3_0 conda-forge ncurses 6.2 h58526e2_4 conda-forge nest-asyncio 1.5.1 pyhd8ed1ab_0 conda-forge netifaces 0.10.9 py38h497a2fe_1003 conda-forge networkx 2.6.3 pyhd8ed1ab_0 conda-forge ninja 1.10.2 h4bd325d_1 conda-forge nlohmann_json 3.9.1 h9c3ff4c_1 conda-forge nltk 3.6.5 pyhd8ed1ab_0 conda-forge nodejs 14.17.4 h92b4a50_0 conda-forge nostril 1.2.0 pypi_0 pypi notebook 6.4.4 pyha770c72_0 conda-forge nspr 4.30 h9c3ff4c_0 conda-forge nss 3.69 hb5efdd6_1 conda-forge numba 0.53.1 py38h8b71fd7_1 conda-forge numpy 1.21.2 py38he2449b9_0 conda-forge nvtx 0.2.3 py38h497a2fe_0 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openjdk 8.0.302 h7f98852_0 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 1.1.1l h7f98852_0 conda-forge orc 1.6.10 h58a87f1_0 conda-forge packaging 21.0 pyhd8ed1ab_0 conda-forge pandas 1.3.3 py38h43a58ef_0 conda-forge pandoc 2.14.2 h7f98852_0 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge panel 0.12.4 pyhd8ed1ab_0 conda-forge param 1.11.1 pyh6c4a22f_0 conda-forge parquet-cpp 1.5.1 2 conda-forge parso 0.8.2 pyhd8ed1ab_0 conda-forge partd 1.2.0 pyhd8ed1ab_0 conda-forge patsy 0.5.2 pyhd8ed1ab_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pcre2 10.37 h032f7d1_0 conda-forge perl 5.32.1 1_h7f98852_perl5 conda-forge pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 8.3.2 py38h8e6f84c_0 conda-forge pip 21.3 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge plac 1.3.3 pypi_0 pypi plotly 5.3.1 pyhd8ed1ab_0 conda-forge pluggy 1.0.0 pypi_0 pypi pooch 1.5.1 pyhd8ed1ab_0 conda-forge poppler 21.03.0 h93df280_0 conda-forge poppler-data 0.4.11 hd8ed1ab_0 conda-forge postgresql 13.3 h2510834_0 conda-forge proj 8.0.1 h277dcde_0 conda-forge prometheus_client 0.11.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.20 pyha770c72_0 conda-forge protobuf 3.16.0 py38h709712a_0 conda-forge psutil 5.8.0 py38h497a2fe_1 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure-sasl 0.6.2 pyhd8ed1ab_0 conda-forge py 1.10.0 pypi_0 pypi py-xgboost 1.4.2dev.rapidsai21.10 cuda11.0py38_0 rapidsai pyarrow 5.0.0 py38hed47224_4_cuda conda-forge pybind11 2.8.0 pypi_0 pypi pycparser 2.20 pyh9f0ad1d_2 conda-forge pyct 0.4.6 py_0 conda-forge pyct-core 0.4.6 py_0 conda-forge pydeck 0.5.0 pyh9f0ad1d_0 conda-forge pyee 8.1.0 pyh9f0ad1d_0 conda-forge pygments 2.10.0 pyhd8ed1ab_0 conda-forge pyhive 0.6.4 pyhd8ed1ab_0 conda-forge pynndescent 0.5.4 pyh6c4a22f_0 conda-forge pynvml 11.0.0 pyhd8ed1ab_0 conda-forge pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pyppeteer 0.2.6 pyhd8ed1ab_0 conda-forge pyproj 3.1.0 py38h4df08a6_4 conda-forge pyrsistent 0.17.3 py38h497a2fe_2 conda-forge pysocks 1.7.1 py38h578d9bd_3 conda-forge pytest 6.2.5 pypi_0 pypi python 3.8.12 hb7a2778_1_cpython conda-forge python-confluent-kafka 1.6.0 py38h497a2fe_1 conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-xxhash 2.0.2 py38h497a2fe_0 conda-forge python_abi 3.8 2_cp38 conda-forge pytz 2021.3 pyhd8ed1ab_0 conda-forge pyviz_comms 2.1.0 pyhd8ed1ab_0 conda-forge pywavelets 1.1.1 py38h6c62de6_3 conda-forge pyyaml 5.4.1 py38h497a2fe_1 conda-forge pyzmq 22.3.0 py38h2035c66_0 conda-forge rapids 21.10.00 cuda11.0_py38_ge66f011_114 rapidsai rapids-blazing 21.10.00 cuda11.0_py38_ge66f011_114 rapidsai rapids-xgboost 21.10.00 cuda11.0_py38_ge66f011_114 rapidsai re2 2021.09.01 h9c3ff4c_0 conda-forge readline 8.1 h46c0cb4_0 conda-forge redis-py 3.5.3 pyh9f0ad1d_0 conda-forge regex 2021.10.8 py38h497a2fe_0 conda-forge requests 2.26.0 pyhd8ed1ab_0 conda-forge requests-unixsocket 0.2.0 py_0 conda-forge retrying 1.3.3 py_2 conda-forge rmm 21.10.00 cuda_11.0_py38_g72ee41f_0 rapidsai rtree 0.9.7 py38h02d302b_2 conda-forge s2n 1.0.10 h9b69904_0 conda-forge sacremoses 0.0.43 pyh9f0ad1d_0 conda-forge scikit-image 0.18.1 py38h51da96c_0 conda-forge scikit-learn 0.24.2 py38hacb3eff_1 conda-forge scipy 1.7.1 py38h56a6a73_0 conda-forge send2trash 1.8.0 pyhd8ed1ab_0 conda-forge sentence-transformers 2.1.0 pypi_0 pypi sentencepiece 0.1.96 pypi_0 pypi setuptools 49.6.0 py38h578d9bd_3 conda-forge shapely 1.7.1 py38hb7fe4a8_5 conda-forge simpervisor 0.4 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.8 he1b5a44_3 conda-forge sniffio 1.2.0 py38h578d9bd_1 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge spdlog 1.8.5 h4bd325d_0 conda-forge sqlalchemy 1.4.25 py38h497a2fe_0 conda-forge sqlite 3.36.0 h9cd32fc_2 conda-forge statsmodels 0.13.0 py38h6c62de6_0 conda-forge streamz 0.6.3 pyh6c4a22f_0 conda-forge tabulate 0.8.9 pypi_0 pypi tbb 2020.2 h4bd325d_4 conda-forge tblib 1.7.0 pyhd8ed1ab_0 conda-forge tenacity 8.0.1 pyhd8ed1ab_0 conda-forge terminado 0.12.1 py38h578d9bd_0 conda-forge testpath 0.5.0 pyhd8ed1ab_0 conda-forge thai-segmenter 0.4.1 pypi_0 pypi threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge thrift 0.14.0 py38h709712a_0 conda-forge thrift_sasl 0.4.3 pyhd8ed1ab_1 conda-forge tifffile 2021.8.30 pyhd8ed1ab_0 conda-forge tiledb 2.3.4 he87e0bf_0 conda-forge tk 8.6.11 h27826a3_1 conda-forge tokenizers 0.10.3 py38hb63a372_1 conda-forge toml 0.10.2 pypi_0 pypi toolz 0.11.1 py_0 conda-forge torch 1.9.1 pypi_0 pypi torchvision 0.10.1 pypi_0 pypi tornado 6.1 py38h497a2fe_1 conda-forge tqdm 4.62.3 pyhd8ed1ab_0 conda-forge traitlets 5.1.0 pyhd8ed1ab_0 conda-forge transformers 4.11.3 pyhd8ed1ab_0 conda-forge treelite 2.1.0 py38hdd725b4_0 conda-forge treelite-runtime 2.1.0 pypi_0 pypi typing-extensions 3.10.0.2 hd8ed1ab_0 conda-forge typing_extensions 3.10.0.2 pyha770c72_0 conda-forge tzcode 2021c h7f98852_0 conda-forge tzdata 2021c he74cb21_0 conda-forge tzlocal 2.0.0 py_0 conda-forge ua-parser 0.10.0 pyh9f0ad1d_0 conda-forge ucx 1.11.1+gc58db6b cuda11.0_0 rapidsai ucx-proc 1.0.0 gpu rapidsai ucx-py 0.22.0 py38_gc58db6b_0 rapidsai umap-learn 0.5.1 py38h578d9bd_1 conda-forge urllib3 1.26.7 pyhd8ed1ab_0 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge websocket-client 0.57.0 py38h578d9bd_4 conda-forge websockets 9.1 py38h497a2fe_0 conda-forge werkzeug 2.0.1 pyhd8ed1ab_0 conda-forge wheel 0.37.0 pyhd8ed1ab_1 conda-forge widgetsnbextension 3.5.1 py38h578d9bd_4 conda-forge xarray 0.19.0 pyhd8ed1ab_1 conda-forge xerces-c 3.2.3 h9d8b166_2 conda-forge xgboost 1.4.2dev.rapidsai21.10 cuda11.0py38_0 rapidsai xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.7.2 h7f98852_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h7f98852_1 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xxhash 0.8.0 h7f98852_3 conda-forge xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h516909a_0 conda-forge yarl 1.6.3 py38h497a2fe_2 conda-forge zeromq 4.3.4 h9c3ff4c_1 conda-forge zfp 0.5.5 h9c3ff4c_7 conda-forge zict 2.0.0 py_0 conda-forge zipp 3.6.0 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h36c2ea0_1013 conda-forge zope.event 4.5.0 pyh9f0ad1d_0 conda-forge zope.interface 5.4.0 py38h497a2fe_0 conda-forge zstd 1.5.0 ha95c52a_0 conda-forge
Thank you for opening the issue. A minimal reproducer would be welcomed for a more informed answer. However, from available information, I see two possible causes. First, you may simply run out of memory (device or host memory). Or otherwise, the failure might happen because of an incompatibility between the CUDA version installed in the environment (11.6) and the one RAPIDS (and FAISS) was compiled with (11.2). You may be able to fix this by installing a 11.4 CUDA environment and RAPIDS software suite.
For the environment given above, the following reproducers should work:
from cuml.manifold.umap import UMAP
import numpy as np
umap_model = UMAP(n_components=4, n_neighbors=20, min_dist=0.1)
umap_model.fit_transform(np.random.rand(285658, 768))
Gives error:
RuntimeError: Error in virtual void faiss::gpu::StandardGpuResourcesImpl::initializeForDevice(int) at /home/conda/feedstock_root/build_artifacts/faiss-split_1618468126454/work/faiss/gpu/StandardGpuResources.cpp:285: Error: 'err == cudaSuccess' failed: failed to cudaHostAlloc 268435456 bytes for CPU <-> GPU async copy buffer (error 2 out of memory)
Whereas:
from cuml.manifold.umap import UMAP
import numpy as np
umap_model = UMAP(n_components=4, n_neighbors=20, min_dist=0.1)
umap_model.fit_transform(np.random.rand(700000, 768))
Executes successfully, as does:
from cuml.manifold.umap import UMAP
import numpy as np
umap_model = UMAP(n_components=4, n_neighbors=20, min_dist=0.1)
umap_model.fit_transform(np.random.rand(100000, 768))
I can execute these in any order, and they pass/fail reliably (matrix of dim 285658, 768 always fails but 700000, 100000 always passes). I'm wondering if the FAISS code is taking a different path within this range that is causing OOM?
I'm wondering if the FAISS code is taking a different path within this range that is causing OOM?
It's probably the case. However, I could not reproduce the bug.
As @dantegd said, the driver should support all the CUDA features up to version 11.6. It shouldn't be the issue here. However, I think that the cudatoolkit
package should match what the other RAPIDS packages were compiled with (11.2). Have you tried reproducing the bug on a clean Docker setup before installing PyTorch or any other package that might alter the cudatoolkit
version?
I'm wondering if the FAISS code is taking a different path within this range that is causing OOM?
It's probably the case. However, I could not reproduce the bug.
As @dantegd said, the driver should support all the CUDA features up to version 11.6. It shouldn't be the issue here. However, I think that the
cudatoolkit
package should match what the other RAPIDS packages were compiled with (11.2). Have you tried reproducing the bug on a clean Docker setup before installing PyTorch or any other package that might alter thecudatoolkit
version?
I can repro with the following steps:
Stop all containers
Start the Rapids container in interactive mode
docker run -i --gpus all rapidsai/rapidsai:21.10-cuda11.2-base-ubuntu18.04-py3.8
Open the CLI of the container using Docker Desktop and execute the following:
# ls
NVIDIA_Deep_Learning_Container_License.pdf utils
# bash
(rapids) root@70b49e55437e:/rapids# python
Python 3.8.12 | packaged by conda-forge | (default, Sep 29 2021, 19:52:28)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cuml.manifold.umap import UMAP
>>> import numpy as np
>>> umap_model = UMAP(n_components=4, n_neighbors=20, min_dist=0.1)
>>> umap_model.fit_transform(np.random.rand(285658, 768))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/internals/api_decorators.py", line 549, in inner_set_get
ret_val = func(*args, **kwargs)
File "cuml/manifold/umap.pyx", line 659, in cuml.manifold.umap.UMAP.fit_transform
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/internals/api_decorators.py", line 409, in inner_with_setters
return func(*args, **kwargs)
File "cuml/manifold/umap.pyx", line 600, in cuml.manifold.umap.UMAP.fit
RuntimeError: Error in virtual void faiss::gpu::StandardGpuResourcesImpl::initializeForDevice(int) at /home/conda/feedstock_root/build_artifacts/faiss-split_1618468141526/work/faiss/gpu/StandardGpuResources.cpp:285: Error: 'err == cudaSuccess' failed: failed to cudaHostAlloc 268435456 bytes for CPU <-> GPU async copy buffer (error 2 out of memory)
>>>
Honestly I'm quite surprised, I expected it to be a dependency related issue.
Note, as before, all other containers are stopped and nothing is using the GPU (apart from Windows desktop manager), task manager shows 1.6GB/24GB memory used on the GPU so there should be more than enough(?). In addition, the system is only using 30GB of it's 128GB RAM.
I ran the same commands, but could not reproduce the error. It is a possibility that the problem may arise because of the Windows host (I have a Linux host). The cudaHostAlloc
call allocates page-locked memory thanks to the mlock
syscall in Linux. However, this call might not be appropriately transmitted to the host OS.
If you want to investigate this lead, you can try the followings:
--cap-add=IPC_LOCK
to your Docker commandmax locked memory
) with ulimit -a
ulimit -l <large value>
Thanks for the reply!
ulimit -a
command and the result is:
(rapids) root@14605cd77a24:/rapids# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257249
max locked memory (kbytes, -l) 82000
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
(rapids) root@14605cd77a24:/rapids# ulimit -l 999999999999999 bash: ulimit: max locked memory: cannot modify limit: Operation not permitted
I'll spend time trying to deploy on a Linux host first before we try and figure out the locked memory issue.
Still waiting to try this on a Linux host, but could this issue be related to this
As you have said @viclafargue, it seems like allocating large blocks may be a possible limitation of WSL2...
Still waiting to try this on a Linux host, but could this issue be related to this
As you have said @viclafargue, it seems like allocating large blocks may be a possible limitation of WSL2...
Thank you for spotting this. It may very well be the issue indeed.
Still waiting to try this on a Linux host, but could this issue be related to this As you have said @viclafargue, it seems like allocating large blocks may be a possible limitation of WSL2...
Thank you for spotting this. It may very well be the issue indeed.
No problem! Got it set up on a Linux host and can't repro the issue, so it seems like this is the WSL2 limitation.
My WSL2 setup is:
# uname -r
5.10.60.1-microsoft-standard-WSL2
@viclafargue what would be the recommended course of action here please?
I could communicate internally on the matter. As the thread you pointed out suggested, there are indeed some limits on how much host memory can be pinned on WSL at once. CUDA on WSL is still in development.
Thank you @viclafargue! I'm glad we got to the bottom of this, please keep us updated.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Describe the bug
I have a dataset of 800k items (768 dim vectors). UMAP will work with the full 800k dataset, and with smaller (randomly sampled) datasets of around 150k, but medium-sized datasets of size ~300k, 350k etc crash with this error.
I'm using a Titan RTX GPU with 24GB memory and nvidia-smi is showing more than enough free memory for this operation before applying fit_transform:
This is using parameters
(n_components=3, n_neighbors=15, min_dist=0.0)
to create the UMAP model and fit_transform operation to apply it.Using
rapidsai/rapidsai:21.10-cuda11.2-base-ubuntu18.04-py3.8
withtorch==1.9.1+cu111
applied on top of the environment.Any idea why this works for the large dataset and not intermediate sized datasets please?