rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.16k stars 525 forks source link

[BUG] libcudart.so: cannot open shared object file: No such file or directory #5300

Open zhimin-z opened 1 year ago

zhimin-z commented 1 year ago

Describe the bug I installed cuml and found it throws error in running:

Steps/Code to reproduce bug

import os
import pandas as pd

path_dataset = 'Dataset'
df_all = pd.read_json(os.path.join(path_dataset, 'filtered.json'))

from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

docs = df_all['Challenge_summary'].tolist()
embeddings = embedding_model.encode(docs)

import matplotlib.pyplot as plt
import cuml
model = cuml.TSNE(n_neighbors=32)
embed2D = model.fit_transform(embeddings)
train['x'] = embed2D[:,0]
train['y'] = embed2D[:,1]
fig = plt.figure(figsize=(1000,1000))
plt.scatter(train.x,train.y,color='blue',s=10,label='Clusters')
fig.savefig('test.png')

Expected behavior It runs successfully.

Environment details (please complete the following information):

Additional context Error trace:

(.venv) 21zz42@docjk-gpu-01:~/Asset-Management-Topic-Modeling$ python "Code/best_challenge copy.py"
Traceback (most recent call last):
  File "/home/21zz42/Asset-Management-Topic-Modeling/Code/best_challenge copy.py", line 52, in <module>
    import cuml
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/__init__.py", line 17, in <module>
    from cuml.internals.base import Base, UniversalBase
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/__init__.py", line 17, in <module>
    from cuml.internals.base_helpers import (
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/base_helpers.py", line 20, in <module>
    from cuml.internals.api_decorators import (
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 24, in <module>
    from cuml.internals import input_utils as iu
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/input_utils.py", line 19, in <module>
    from cuml.internals.array import CumlArray
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/array.py", line 22, in <module>
    from cuml.internals.global_settings import GlobalSettings
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/global_settings.py", line 20, in <module>
    from cuml.internals.device_type import DeviceType
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/device_type.py", line 19, in <module>
    from cuml.internals.mem_type import MemoryType
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/mem_type.py", line 25, in <module>
    cudf = gpu_only_import('cudf')
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/safe_imports.py", line 366, in gpu_only_import
    return importlib.import_module(module)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cudf/__init__.py", line 5, in <module>
    validate_setup()
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cudf/utils/gpu_utils.py", line 95, in validate_setup
    cuda_runtime_version = runtimeGetVersion()
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/rmm/_cuda/gpu.py", line 87, in runtimeGetVersion
    major, minor = numba.cuda.runtime.get_version()
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
    self.cudaRuntimeGetVersion(ctypes.byref(rtver))
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
    self._initialize()
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
    self.lib = open_cudalib('cudart')
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/numba/cuda/cudadrv/libs.py", line 60, in open_cudalib
    return ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcudart.so: cannot open shared object file: No such file or directory

https://stackoverflow.com/questions/69934320/oserror-libcudart-so-10-2-cannot-open-shared-object-file-no-such-file-or-dire does not work for me since I could run Pytorch successfully.

dantegd commented 1 year ago

Hi @zhimin-z thanks for the issue! Currently the RAPIDS pip packages only support CUDA 11.x, so that is very likely the issue you're facing.

zhimin-z commented 1 year ago

Hi @zhimin-z thanks for the issue! Currently the RAPIDS pip packages only support CUDA 11.x, so that is very likely the issue you're facing.

What can I do now?I found I do not have permission to downgrade the CUDA driver since I was not the owner of the server.

noahberhe commented 1 year ago

I also have a similar issue but running nvidia-smi shows my Environment has Cuda 11.7. image

Issue is, after installing:

!pip install cugraph-cu11 cudf-cu11 cuml-cu11 --extra-index-url=https://pypi.nvidia.com
!pip uninstall cupy-cuda115 -y
!pip uninstall cupy-cuda11x -y
!pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64

I try to import: from cuml.cluster import HDBSCAN

But get: OSError: libcudart.so: cannot open shared object file: No such file or directory

mfschmidt commented 1 year ago

Just adding another data point, and posting a thanks to developers for their work on this. Currently, the installation guide (https://docs.rapids.ai/install#pip) claims support for CUDA 12 with pip. I am running CUDA 12.0. My cuml installation was successful with pip (pip install cudf-cu12 cuml-cu12 --extra-index-url=https://pypi.nvidia.com). But I get the same libcudart.so error when I try to train a model.

mike@henry:~$ nvidia-smi
Thu Jul 27 17:42:47 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A300...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   56C    P8    17W / 115W |    865MiB /  6144MiB |     25%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2044      G   /usr/lib/xorg/Xorg                362MiB |
|    0   N/A  N/A      2527      G   /usr/bin/gnome-shell              142MiB |
|    0   N/A  N/A      3481      G   ...veSuggestionsOnlyOnDemand       82MiB |
|    0   N/A  N/A      8067      G   ...8/usr/lib/firefox/firefox      183MiB |
|    0   N/A  N/A     37940      G   ...RendererForSitePerProcess       35MiB |
+-----------------------------------------------------------------------------+
brendanartley commented 1 year ago

Can confirm. Pip installation is successful with CUDA Version: 12.0 , but when running import cudf I get the following error as well.

OSError: libcudart.so: cannot open shared object file: No such file or directory
bdice commented 1 year ago

@mfschmidt @brendanartley Can you share more about your OS and version (e.g. Ubuntu 20.04, whether you're using containers or WSL), how you installed the CUDA Toolkit, and the outputs of ls -al /usr/local/cuda*?

mfschmidt commented 1 year ago

@bdice Thanks for your response and interest; sorry I'm slow getting back to this. I'm running Ubuntu 22.04.3 on a Dell Precision Workstation with an nVidia RTX A3000 GPU and nVidia drivers version 525.125.06. I'm using a python virtual environment, but no docker or WSL.

I had no /usr/local/cuda* paths and I had not installed CUDA Toolkit. After installing the CUDA Toolkit this morning, I imported cuml from within python and the error does not occur.

I think it may have been unclear to me (rapidly and mindlessly copy/pasting commands rather than actually reading instructions) that the CUDA Toolkit was required in addition to nvidia drivers. I assumed the nvidia drivers were sufficient.

Thank you for your help!! I believe my issue is now resolved by installing CUDA Toolkit, and I'll post back to this thread if I discover additional related problems.

mfschmidt commented 1 year ago

If possible, it would be ideal if the pip installer could install CUDA Toolkit as a dependency. If that's not possible, an informative warning or error that it's missing and must be installed separately would be very useful.

Thank you again for your help, and for making the world better with open source software!! :)

divyegala commented 1 year ago

Hi @mfschmidt

If possible, it would be ideal if the pip installer could install CUDA Toolkit as a dependency. If that's not possible, an informative warning or error that it's missing and must be installed separately would be very useful.

We do statically link libcudart in RAPIDS wheels, however some dependencies like numba/cupy link to libcudart dynamically, and the error stack trace shows that they are the ones unable to find libcudart. We'll need to consider whether we should add this as a warning or our upstream libraries should - thanks for your suggestion.

mdsatria commented 1 year ago

I also face the same error with CUDA 11.4 (RTX 3090)

I try to import: from cuml.manifold import UMAP

And get this error: OSError: libcudart.so: cannot open shared object file: No such file or directory

[Edited] Solved this issue by installing via conda conda create -n rapids -c rapidsai -c conda-forge -c nvidia \ rapids=23.08 python=3.9 cuda-version=11.8

divyegala commented 1 year ago

@mdsatria it looks to me like you don't have CUDA toolkit installed on your system, which is a requirement for cuML wheels

MariyaSha commented 10 months ago

I had a very similar issue, where the problem was unmatching versions of CUDA and CUDA toolkit.

You can check your version of CUDA with: nvidia-smi

You can check your version of CUDA toolkit with: nvcc --version

If you don't have CUDA toolkit installed, I find that the easiest way to install it is with Anaconda: conda install -c nvidia cuda-nvcc

I hope it helps! :)

Borda commented 10 months ago

The same is happening also on Google Colab with V100

Wed Nov  8 07:01:04 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    24W / 300W |      2MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

installed as suggested in the docs

pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu12 dask-cudf-cu12 cuml-cu12 cugraph-cu12 cuspatial-cu12 cuproj-cu12 cuxfilter-cu12 cucim

failing with:

/content# python
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
/usr/local/lib/python3.10/dist-packages/cupy/_environment.py:447: UserWarning: 
--------------------------------------------------------------------------------

  CuPy may not function correctly because multiple CuPy packages are installed
  in your environment:

    cupy-cuda11x, cupy-cuda12x

  Follow these steps to resolve this issue:

    1. For all packages listed above, run the following command to remove all
       existing CuPy installations:

         $ pip uninstall <package_name>

      If you previously installed CuPy via conda, also run the following:

         $ conda uninstall cupy

    2. Install the appropriate CuPy package.
       Refer to the Installation Guide for detailed instructions.

         https://docs.cupy.dev/en/stable/install.html

--------------------------------------------------------------------------------

  warnings.warn(f'''
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 17, in <module>
    from cupy import _core  # NOQA
  File "/usr/local/lib/python3.10/dist-packages/cupy/_core/__init__.py", line 3, in <module>
    from cupy._core import core  # NOQA
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/cudf/__init__.py", line 12, in <module>
    import cupy
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 19, in <module>
    raise ImportError(f'''
ImportError: 
================================================================
Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details:
  https://docs.cupy.dev/en/latest/install.html

Original error:
  ImportError: libcudart.so.12: cannot open shared object file: No such file or directory
================================================================
Borda commented 10 months ago

Is this a tracked issue? @dantegd

divyegala commented 10 months ago

@Borda for installations via the pip package manager, you need cudatoolkit installed at the system level. This is because pip managed cupy dynamically links to system level libcudart.

Also, it seems like your environment has multiple cupy installations.

Borda commented 10 months ago

for installations via the pip package manager, you need cudatoolkit installed at the system level. This is because pip managed cupy dynamically links to system level libcudart.

interesting so you say I need to install: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#upgrading-from-cudatoolkit-package

Also, it seems like your environment has multiple cupy installations.

yes but it came with your installation cmd, it was not there before

beckernick commented 10 months ago

@Borda could you share the output of !nvcc --version?

The nvidia-smi output indicates that your CUDA Driver version supports CUDA 12.0, but your CUDA runtime may be 11.x. At least some of Colab's GPU runtimes are using CUDA Toolkit 11.8, in which case when you start from a fresh runtime you should install the cu11 packages.

The rapids.ai quick start has a Colab launcher that includes script that should hopefully get you up and running!

Borda commented 10 months ago

could you share the output of !nvcc --version?


/content# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
bdice commented 10 months ago

@Borda Google Colab uses CUDA 11, but your installation command above uses CUDA 12. That is what is causing the failure to find the linked libcudart.so. If using pip packages, you must match the CUDA major versions by replacing cu12 with cu11 in the package names like this:

pip install \
    --extra-index-url=https://pypi.nvidia.com/ \
    cudf-cu11 dask-cudf-cu11 cuml-cu11 cugraph-cu11 cuspatial-cu11 cuproj-cu11 cuxfilter-cu11 cucim

edit: Sorry, I scrolled too fast and missed that @beckernick already gave this answer above. Apologies for the noise.

jwnz commented 4 months ago

I had a similar problem on Ubuntu, but it had to do with the naming of the .so file. I just make a copy of the .so and changed its name to match that of which the library is looking for and voila!, everything works.

  1. find the .so file's location

    find / -name libcudart.so.12
  2. cd into the folder containing the libcudart.so.12 file and make a copy, leaving out the .12.

    cd .../anaconda3/envs/envname/lib/python3.11/site-packages/nvidia/cuda_runtime/lib/
    cp libcudart.so.12 libcudart.so
  3. you might have to add the folders to the path too. I had to do it for every single library :face_with_spiral_eyes:

    
    export PATH=.../anaconda3/envs/envname/lib/python3.11/site-packages/nvidia/cublas/lib/${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=.../anaconda3/envs/envnam/lib/python3.11/site-packages/nvidia/cublas/lib/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

...