Open willb opened 1 year ago
Thanks for the issue @willb, I can confirm that I can reproduce. In a V100, with the provided size it ran fine:
(cuml0419) ➜ RAPIDS vim umap_repro.py
(cuml0419) ➜ RAPIDS python umap_repro.py
Total memory usage for `subset` is 0.008382 GB
Total memory usage for `df` is 0.1118 GB
projecting 1.0% of df...
projecting 5.0% of df...
projecting 10.0% of df...
projecting 25.0% of df...
projecting 50.0% of df...
projecting 75.0% of df...
projecting 100.0% of df...
But once I increased the size of the dataframe I can run into the crash, we'll look into it and provide a fix as soon as possible.
Thank you!
@dantegd is this still relevant?
Describe the bug
As of RAPIDS 23.02, UMAP transformations consistently fail with CUDA errors when projecting more than 1.5 million rows or so. These transformations consistently worked in RAPIDS 21.12 and 21.10, so this is a regression.
Steps/Code to reproduce bug
umap-crash
.cd
toumap-crash
.We'll then run this code with RAPIDS 21.12:
docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v ${PWD}:/rapids/notebooks/host --workdir /rapids/notebooks/host rapidsai/rapidsai:21.12-cuda11.5-runtime-ubuntu20.04 jupyter nbconvert --execute umap.ipynb --to html --allow-errors --log-level 0
This will succeed. However, if we run this notebook under RAPIDS 23.02, it will fail with an illegal access or CURAND error:
docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v ${PWD}:/rapids/notebooks/host --workdir /rapids/notebooks/host nvcr.io/nvidia/rapidsai/rapidsai-core:23.04-cuda11.8-runtime-ubuntu22.04-py3.10 jupyter nbconvert --execute umap.ipynb --to html --allow-errors --log-level 0
Expected behavior
It should be possible to transform a small (~⅛ gb) data set with UMAP and RAPIDS on any supported GPU.
Environment details (please complete the following information):
Additional context
This may be related to #4984.