Open zhimin-z opened 1 year ago
Hi @zhimin-z thanks for the issue! Currently the RAPIDS pip packages only support CUDA 11.x, so that is very likely the issue you're facing.
Hi @zhimin-z thanks for the issue! Currently the RAPIDS pip packages only support CUDA 11.x, so that is very likely the issue you're facing.
What can I do now?I found I do not have permission to downgrade the CUDA driver since I was not the owner of the server.
I also have a similar issue but running nvidia-smi shows my Environment has Cuda 11.7.
Issue is, after installing:
!pip install cugraph-cu11 cudf-cu11 cuml-cu11 --extra-index-url=https://pypi.nvidia.com
!pip uninstall cupy-cuda115 -y
!pip uninstall cupy-cuda11x -y
!pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64
I try to import:
from cuml.cluster import HDBSCAN
But get:
OSError: libcudart.so: cannot open shared object file: No such file or directory
Just adding another data point, and posting a thanks to developers for their work on this. Currently, the installation guide (https://docs.rapids.ai/install#pip) claims support for CUDA 12 with pip. I am running CUDA 12.0. My cuml installation was successful with pip (pip install cudf-cu12 cuml-cu12 --extra-index-url=https://pypi.nvidia.com
). But I get the same libcudart.so
error when I try to train a model.
mike@henry:~$ nvidia-smi
Thu Jul 27 17:42:47 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A300... Off | 00000000:01:00.0 On | N/A |
| N/A 56C P8 17W / 115W | 865MiB / 6144MiB | 25% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2044 G /usr/lib/xorg/Xorg 362MiB |
| 0 N/A N/A 2527 G /usr/bin/gnome-shell 142MiB |
| 0 N/A N/A 3481 G ...veSuggestionsOnlyOnDemand 82MiB |
| 0 N/A N/A 8067 G ...8/usr/lib/firefox/firefox 183MiB |
| 0 N/A N/A 37940 G ...RendererForSitePerProcess 35MiB |
+-----------------------------------------------------------------------------+
Can confirm. Pip installation is successful with CUDA Version: 12.0 , but when running import cudf
I get the following error as well.
OSError: libcudart.so: cannot open shared object file: No such file or directory
@mfschmidt @brendanartley Can you share more about your OS and version (e.g. Ubuntu 20.04, whether you're using containers or WSL), how you installed the CUDA Toolkit, and the outputs of ls -al /usr/local/cuda*
?
@bdice Thanks for your response and interest; sorry I'm slow getting back to this. I'm running Ubuntu 22.04.3 on a Dell Precision Workstation with an nVidia RTX A3000 GPU and nVidia drivers version 525.125.06. I'm using a python virtual environment, but no docker or WSL.
I had no /usr/local/cuda* paths and I had not installed CUDA Toolkit. After installing the CUDA Toolkit this morning, I imported cuml from within python and the error does not occur.
I think it may have been unclear to me (rapidly and mindlessly copy/pasting commands rather than actually reading instructions) that the CUDA Toolkit was required in addition to nvidia drivers. I assumed the nvidia drivers were sufficient.
Thank you for your help!! I believe my issue is now resolved by installing CUDA Toolkit, and I'll post back to this thread if I discover additional related problems.
If possible, it would be ideal if the pip installer could install CUDA Toolkit as a dependency. If that's not possible, an informative warning or error that it's missing and must be installed separately would be very useful.
Thank you again for your help, and for making the world better with open source software!! :)
Hi @mfschmidt
If possible, it would be ideal if the pip installer could install CUDA Toolkit as a dependency. If that's not possible, an informative warning or error that it's missing and must be installed separately would be very useful.
We do statically link libcudart
in RAPIDS wheels, however some dependencies like numba
/cupy
link to libcudart
dynamically, and the error stack trace shows that they are the ones unable to find libcudart
. We'll need to consider whether we should add this as a warning or our upstream libraries should - thanks for your suggestion.
I also face the same error with CUDA 11.4 (RTX 3090)
I try to import:
from cuml.manifold import UMAP
And get this error:
OSError: libcudart.so: cannot open shared object file: No such file or directory
[Edited]
Solved this issue by installing via conda
conda create -n rapids -c rapidsai -c conda-forge -c nvidia \ rapids=23.08 python=3.9 cuda-version=11.8
@mdsatria it looks to me like you don't have CUDA toolkit installed on your system, which is a requirement for cuML wheels
I had a very similar issue, where the problem was unmatching versions of CUDA and CUDA toolkit.
You can check your version of CUDA with:
nvidia-smi
You can check your version of CUDA toolkit with:
nvcc --version
If you don't have CUDA toolkit installed, I find that the easiest way to install it is with Anaconda:
conda install -c nvidia cuda-nvcc
I hope it helps! :)
The same is happening also on Google Colab with V100
Wed Nov 8 07:01:04 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 34C P0 24W / 300W | 2MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
installed as suggested in the docs
pip install \
--extra-index-url=https://pypi.nvidia.com \
cudf-cu12 dask-cudf-cu12 cuml-cu12 cugraph-cu12 cuspatial-cu12 cuproj-cu12 cuxfilter-cu12 cucim
failing with:
/content# python
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
/usr/local/lib/python3.10/dist-packages/cupy/_environment.py:447: UserWarning:
--------------------------------------------------------------------------------
CuPy may not function correctly because multiple CuPy packages are installed
in your environment:
cupy-cuda11x, cupy-cuda12x
Follow these steps to resolve this issue:
1. For all packages listed above, run the following command to remove all
existing CuPy installations:
$ pip uninstall <package_name>
If you previously installed CuPy via conda, also run the following:
$ conda uninstall cupy
2. Install the appropriate CuPy package.
Refer to the Installation Guide for detailed instructions.
https://docs.cupy.dev/en/stable/install.html
--------------------------------------------------------------------------------
warnings.warn(f'''
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 17, in <module>
from cupy import _core # NOQA
File "/usr/local/lib/python3.10/dist-packages/cupy/_core/__init__.py", line 3, in <module>
from cupy._core import core # NOQA
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/cudf/__init__.py", line 12, in <module>
import cupy
File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 19, in <module>
raise ImportError(f'''
ImportError:
================================================================
Failed to import CuPy.
If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.
On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.
Check the Installation Guide for details:
https://docs.cupy.dev/en/latest/install.html
Original error:
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory
================================================================
Is this a tracked issue? @dantegd
@Borda for installations via the pip package manager, you need cudatoolkit installed at the system level. This is because pip managed cupy dynamically links to system level libcudart.
Also, it seems like your environment has multiple cupy installations.
for installations via the pip package manager, you need cudatoolkit installed at the system level. This is because pip managed cupy dynamically links to system level libcudart.
interesting so you say I need to install: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#upgrading-from-cudatoolkit-package
Also, it seems like your environment has multiple cupy installations.
yes but it came with your installation cmd, it was not there before
@Borda could you share the output of !nvcc --version
?
The nvidia-smi
output indicates that your CUDA Driver version supports CUDA 12.0, but your CUDA runtime may be 11.x. At least some of Colab's GPU runtimes are using CUDA Toolkit 11.8, in which case when you start from a fresh runtime you should install the cu11
packages.
The rapids.ai quick start has a Colab launcher that includes script that should hopefully get you up and running!
could you share the output of
!nvcc --version
?
/content# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
@Borda Google Colab uses CUDA 11, but your installation command above uses CUDA 12. That is what is causing the failure to find the linked libcudart.so
. If using pip packages, you must match the CUDA major versions by replacing cu12
with cu11
in the package names like this:
pip install \
--extra-index-url=https://pypi.nvidia.com/ \
cudf-cu11 dask-cudf-cu11 cuml-cu11 cugraph-cu11 cuspatial-cu11 cuproj-cu11 cuxfilter-cu11 cucim
edit: Sorry, I scrolled too fast and missed that @beckernick already gave this answer above. Apologies for the noise.
I had a similar problem on Ubuntu, but it had to do with the naming of the .so
file. I just make a copy of the .so
and changed its name to match that of which the library is looking for and voila!, everything works.
find the .so
file's location
find / -name libcudart.so.12
cd
into the folder containing the libcudart.so.12
file and make a copy, leaving out the .12
.
cd .../anaconda3/envs/envname/lib/python3.11/site-packages/nvidia/cuda_runtime/lib/
cp libcudart.so.12 libcudart.so
you might have to add the folders to the path too. I had to do it for every single library :face_with_spiral_eyes:
export PATH=.../anaconda3/envs/envname/lib/python3.11/site-packages/nvidia/cublas/lib/${PATH:+:${PATH}}
export LD_LIBRARY_PATH=.../anaconda3/envs/envnam/lib/python3.11/site-packages/nvidia/cublas/lib/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
...
Describe the bug I installed cuml and found it throws error in running:
Steps/Code to reproduce bug
Expected behavior It runs successfully.
Environment details (please complete the following information):
Bare-metal
Linux docjk-gpu-01 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
A100 and 525.85.12
12.0
according to https://docs.rapids.ai/install#pip
Additional context Error trace:
https://stackoverflow.com/questions/69934320/oserror-libcudart-so-10-2-cannot-open-shared-object-file-no-such-file-or-dire does not work for me since I could run Pytorch successfully.