rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.23k stars 532 forks source link

[BUG] Kernel Restarting (kernel died) #4951

Open artyomboyko opened 2 years ago

artyomboyko commented 2 years ago

Kernel Restarting (kernel died) I use a laptop for clustering with HDBSCAN.

Video card: Nvidia 3050 4GB Laptop

Windows 11 Home WSL2 Ubuntu 20.04.5 LTS

Installed Rapids in WLS using miniconda:

conda create -n rapids-22.10 -c rapidsai -c nvidia -c conda-forge \
    cudf=22.10 cuml=22.10 python=3.9 cudatoolkit=11.5 \
    jupyterlab

While trying the HDBSCAN parameters, I get the following error in JupiterLab: "Kernel Restarting The kernel for Task_1/BUG.ipynb appears to have died. It will restart automatically."

Steps/Code to reproduce bug All steps to reproduce the error in the attached notebook (zip archive). A screenshot of the error is also available in the archive at the link: https://drive.google.com/file/d/1wIR4G1s-tBK3-B51zDAMvtjC9WMKcX5I/view?usp=sharing

Expected behavior The idea is to loop through the parameter and display the results under the corresponding cell

Environment details (please complete the following information):

# packages in environment at /home/artyom/miniconda3/envs/rapids-22.10:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
anyio                     3.6.2              pyhd8ed1ab_0    conda-forge
argon2-cffi               21.3.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0           py39hb9d737c_2    conda-forge
arrow-cpp                 9.0.0           py39hd3ccb9b_2_cpu    conda-forge
asttokens                 2.0.8              pyhd8ed1ab_0    conda-forge
attrs                     22.1.0             pyh71513ae_1    conda-forge
aws-c-cal                 0.5.11               h95a6274_0    conda-forge
aws-c-common              0.6.2                h7f98852_0    conda-forge
aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
aws-c-io                  0.10.5               hfb6a706_0    conda-forge
aws-checksums             0.1.11               ha31a3da_7    conda-forge
aws-sdk-cpp               1.8.186              hb4091e7_3    conda-forge
babel                     2.10.3             pyhd8ed1ab_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
beautifulsoup4            4.11.1             pyha770c72_0    conda-forge
bleach                    5.0.1              pyhd8ed1ab_0    conda-forge
bokeh                     2.4.3              pyhd8ed1ab_3    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.07.19           h06a4308_0    anaconda
cachetools                5.2.0              pyhd8ed1ab_0    conda-forge
certifi                   2022.6.15        py39h06a4308_0    anaconda
cffi                      1.15.1           py39he91dace_2    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3            py39hf3d152e_1    conda-forge
cloudpickle               2.2.0              pyhd8ed1ab_0    conda-forge
contourpy                 1.0.5                    pypi_0    pypi
cryptography              38.0.2           py39hd97740a_1    conda-forge
cubinlinker               0.2.0            py39h11215e4_1    rapidsai
cuda-python               11.7.0           py39h3fd9d12_0    nvidia
cudatoolkit               11.5.1               hcf5317a_9    nvidia
cudf                      22.10.00        cuda_11_py39_g8ffe375d85_0    rapidsai
cuml                      22.10.00        cuda11_py39_g963d46299_0    rapidsai
cupy                      11.2.0           py39hc3c280e_0    conda-forge
cycler                    0.11.0                   pypi_0    pypi
cytoolz                   0.12.0           py39hb9d737c_0    conda-forge
dask                      2022.9.2           pyhd8ed1ab_0    conda-forge
dask-core                 2022.9.2           pyhd8ed1ab_0    conda-forge
dask-cuda                 22.10.00        py39_g382e519_0    rapidsai
dask-cudf                 22.10.00        cuda_11_py39_g8ffe375d85_0    rapidsai
debugpy                   1.6.3            py39h5a03fae_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distributed               2022.9.2           pyhd8ed1ab_0    conda-forge
dlpack                    0.5                  h9c3ff4c_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
et-xmlfile                1.1.0                    pypi_0    pypi
executing                 1.1.1              pyhd8ed1ab_0    conda-forge
faiss-proc                1.0.0                      cuda    rapidsai
fastavro                  1.6.1            py39hb9d737c_0    conda-forge
fastrlock                 0.8              py39h5a03fae_2    conda-forge
flit-core                 3.7.1              pyhd8ed1ab_0    conda-forge
fonttools                 4.38.0                   pypi_0    pypi
freetype                  2.12.1               hca18f0e_0    conda-forge
fsspec                    2022.10.0          pyhd8ed1ab_0    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glog                      0.6.0                h6f12383_0    conda-forge
grpc-cpp                  1.47.1               hbad87ad_6    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        4.11.4           py39hf3d152e_0    conda-forge
importlib_resources       5.10.0             pyhd8ed1ab_0    conda-forge
ipykernel                 6.16.2             pyh210e3f2_0    conda-forge
ipython                   8.5.0              pyh41d4057_1    conda-forge
ipython-autotime          0.3.1                    pypi_0    pypi
ipython_genutils          0.2.0                      py_1    conda-forge
jedi                      0.18.1             pyhd8ed1ab_2    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
json5                     0.9.5              pyh9f0ad1d_0    conda-forge
jsonschema                4.16.0             pyhd8ed1ab_0    conda-forge
jupyter_client            7.3.4              pyhd8ed1ab_0    conda-forge
jupyter_core              4.11.1           py39hf3d152e_0    conda-forge
jupyter_server            1.21.0             pyhd8ed1ab_0    conda-forge
jupyterlab                3.5.0              pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.2.2              pyhd8ed1ab_0    conda-forge
jupyterlab_server         2.16.1             pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4                    pypi_0    pypi
krb5                      1.19.3               h3790be6_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.39                 hc81fddc_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h48a1fff_4    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
libbrotlidec              1.0.9                h166bdaf_7    conda-forge
libbrotlienc              1.0.9                h166bdaf_7    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcudf                   22.10.00        cuda11_g8ffe375d85_0    rapidsai
libcuml                   22.10.00        cuda11_g963d46299_0    rapidsai
libcumlprims              22.10.00        cuda11_gfdb85e0_0    nvidia
libcurl                   7.85.0               h7bff187_0    conda-forge
libcusolver               11.4.1.48                     0    nvidia
libcusparse               11.7.5.86                     0    nvidia
libdeflate                1.14                 h166bdaf_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h9b69904_4    conda-forge
libfaiss                  1.7.0           cuda112h5bea7ad_8_cuda    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libgoogle-cloud           2.1.0                h9ebe8e8_2    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libllvm11                 11.1.0               he0ac6c6_4    conda-forge
libnghttp2                1.47.0               hdcd2b5c_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpng                    1.6.38               h753d276_0    conda-forge
libprotobuf               3.20.1               h6239696_4    conda-forge
libraft-distance          22.10.00        cuda11_g31ae597_0    rapidsai
libraft-headers           22.10.00        cuda11_g31ae597_0    rapidsai
libraft-nn                22.10.00        cuda11_g31ae597_0    rapidsai
librmm                    22.10.00        cuda11_g9d5a8c37_0    rapidsai
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libsqlite                 3.39.4               h753d276_0    conda-forge
libssh2                   1.10.0               haa6b8db_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libthrift                 0.16.0               h491838f_2    conda-forge
libtiff                   4.4.0                h55922b4_4    conda-forge
libutf8proc               2.7.0                h7f98852_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.4                h166bdaf_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvmlite                  0.39.1           py39h7d9a04d_0    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lz4                       4.0.0            py39h029007f_2    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
markupsafe                2.1.1            py39hb9d737c_2    conda-forge
matplotlib                3.6.0                    pypi_0    pypi
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
mistune                   2.0.4              pyhd8ed1ab_0    conda-forge
msgpack-python            1.0.4            py39hf939315_1    conda-forge
nbclassic                 0.4.7              pyhd8ed1ab_0    conda-forge
nbclient                  0.7.0              pyhd8ed1ab_0    conda-forge
nbconvert                 7.2.2              pyhd8ed1ab_0    conda-forge
nbconvert-core            7.2.2              pyhd8ed1ab_0    conda-forge
nbconvert-pandoc          7.2.2              pyhd8ed1ab_0    conda-forge
nbformat                  5.7.0              pyhd8ed1ab_0    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
nest-asyncio              1.5.6              pyhd8ed1ab_0    conda-forge
notebook                  6.4.12             pyha770c72_0    conda-forge
notebook-shim             0.2.0              pyhd8ed1ab_0    conda-forge
numba                     0.56.3           py39h61ddf18_0    conda-forge
numpy                     1.23.4           py39h3d75532_0    conda-forge
nvtx                      0.2.3            py39h3811e60_1    conda-forge
openjpeg                  2.5.0                h7d73246_1    conda-forge
openpyxl                  3.0.10                   pypi_0    pypi
openssl                   1.1.1q               h7f8727e_0    anaconda
orc                       1.7.6                h6c59b99_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.5.1            py39h4661b88_0    conda-forge
pandoc                    2.19.2               h32600fe_1    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
parquet-cpp               1.5.1                         2    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
partd                     1.3.0              pyhd8ed1ab_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    9.2.0            py39hd5dbb17_2    conda-forge
pip                       22.3               pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_0    conda-forge
prometheus_client         0.15.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.31             pyha770c72_0    conda-forge
protobuf                  3.20.1           py39h5a03fae_0    conda-forge
psutil                    5.9.3            py39hb9d737c_1    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptxcompiler               0.6.1            py39h1eff087_0    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyarrow                   9.0.0           py39hc0775d8_2_cpu    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pygments                  2.13.0             pyhd8ed1ab_0    conda-forge
pylibraft                 22.10.00        cuda11_py39_g31ae597_0    rapidsai
pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
pyopenssl                 22.1.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyrsistent                0.18.1           py39hb9d737c_1    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.13          h9a8a25e_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-fastjsonschema     2.16.2             pyhd8ed1ab_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2022.5             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0              py39hb9d737c_5    conda-forge
pyzmq                     24.0.1           py39headdf64_0    conda-forge
raft-dask                 22.10.00        cuda11_py39_g31ae597_0    rapidsai
re2                       2022.06.01           h27087fc_0    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
rmm                       22.10.00        cuda11_py39_g9d5a8c37_0    rapidsai
s2n                       1.0.10               h9b69904_0    conda-forge
scikit-learn              1.1.1            py39h6a678d5_0    anaconda
scipy                     1.9.3            py39hddc5342_0    conda-forge
seaborn                   0.12.1                   pypi_0    pypi
send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
setuptools                65.5.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                hbd366e4_1    conda-forge
sniffio                   1.3.0              pyhd8ed1ab_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.3.2.post1        pyhd8ed1ab_0    conda-forge
spdlog                    1.8.5                h4bd325d_1    conda-forge
sqlite                    3.39.4               h4ff8645_0    conda-forge
stack_data                0.5.1              pyhd8ed1ab_0    conda-forge
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
terminado                 0.17.0             pyh41d4057_0    conda-forge
threadpoolctl             2.2.0              pyh0d69192_0    anaconda
tinycss2                  1.2.1              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
tornado                   6.1              py39hb9d737c_3    conda-forge
traitlets                 5.5.0              pyhd8ed1ab_0    conda-forge
treelite                  3.0.0            py39hc7ff369_0    conda-forge
treelite-runtime          3.0.0                    pypi_0    pypi
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzdata                    2022e                h191b570_0    conda-forge
ucx                       1.13.1               h538f049_0    conda-forge
ucx-proc                  1.0.0                       gpu    rapidsai
ucx-py                    0.28.00         py39_g8292636_0    rapidsai
urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
websocket-client          1.4.1              pyhd8ed1ab_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zeromq                    4.3.4                h9c3ff4c_1    conda-forge
zict                      2.2.0              pyhd8ed1ab_0    conda-forge
zipp                      3.10.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h6239696_4    conda-forge

Additional context Add any other context about the problem here.

dantegd commented 2 years ago

Hi @blademoon, thanks for the issue! I have a feeling that it might be an out of memory error, will try your reproducer and update you on the results as soon as I can. Thanks!

artyomboyko commented 2 years ago

Hello @dantegd. I monitored the memory, there were no peaks. Doesn't look like out of memory. There is very little data there. Either way, this is the first time I've encountered this.