rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.48k stars 907 forks source link

OSError: libcudart.so: cannot open shared object file: No such file or directory (Nvidia Studio Driver 561.09) #16961

Closed artyomboyko closed 1 month ago

artyomboyko commented 1 month ago

Describe the bug Try to use Rapids on WSL2 and get error:

/home/artyom/.local/lib/python3.10/site-packages/cudf/utils/_ptxcompiler.py:64: UserWarning: Error getting driver and runtime versions:

stdout:

stderr:

Traceback (most recent call last):
  File "<string>", line 7, in <module>
  File "/home/artyom/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
    self.cudaRuntimeGetVersion(ctypes.byref(rtver))
  File "/home/artyom/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
    self._initialize()
  File "/home/artyom/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
    self.lib = open_cudalib('cudart')
  File "/home/artyom/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/libs.py", line 65, in open_cudalib
    return ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcudart.so: cannot open shared object file: No such file or directory

Not patching Numba
  warnings.warn(msg, UserWarning)
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[1], line 1
----> 1 import cudf
      2 print(cudf.Series([1, 2, 3]))

File ~/.local/lib/python3.10/site-packages/cudf/__init__.py:10
      7 from cudf.utils.gpu_utils import validate_setup
      9 _setup_numba()
---> 10 validate_setup()
     12 import cupy
     13 from numba import config as numba_config, cuda

File ~/.local/lib/python3.10/site-packages/cudf/utils/gpu_utils.py:96, in validate_setup()
     86     minor_version = getDeviceAttribute(
     87         cudaDeviceAttr.cudaDevAttrComputeCapabilityMinor, 0
     88     )
     89     raise UnsupportedCUDAError(
     90         "A GPU with NVIDIA Volta™ (Compute Capability 7.0) "
     91         "or newer architecture is required[.\n](http://127.0.0.1:8888/lab/tree/n)"
     92         f"Detected GPU 0: {device_name}\n"
     93         f"Detected Compute Capability: {major_version}.{minor_version}"
     94     )
---> 96 cuda_runtime_version = runtimeGetVersion()
     98 if cuda_runtime_version < 11000:
     99     # Require CUDA Runtime version 11.0 or greater.
    100     major_version = cuda_runtime_version // 1000

File ~/.local/lib/python3.10/site-packages/rmm/_cuda/gpu.py:88, in runtimeGetVersion()
     84 # TODO: Replace this with `cuda.cudart.cudaRuntimeGetVersion()` when the
     85 # limitation is fixed.
     86 import numba.cuda
---> 88 major, minor = numba.cuda.runtime.get_version()
     89 return major * 1000 + minor * 10

File ~/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py:111, in Runtime.get_version(self)
    107 """
    108 Returns the CUDA Runtime version as a tuple (major, minor).
    109 """
    110 rtver = ctypes.c_int()
--> 111 self.cudaRuntimeGetVersion(ctypes.byref(rtver))
    112 # The version is encoded as (1000 * major) + (10 * minor)
    113 major = rtver.value // 1000

File ~/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py:65, in Runtime.__getattr__(self, fname)
     62 argtypes = proto[1:]
     64 if not self.is_initialized:
---> 65     self._initialize()
     67 # Find function in runtime library
     68 libfn = self._find_api(fname)

File ~/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py:51, in Runtime._initialize(self)
     47     msg = ("CUDA is disabled due to setting NUMBA_DISABLE_CUDA=1 "
     48            "in the environment, or because CUDA is unsupported on "
     49            "32-bit systems.")
     50     raise CudaSupportError(msg)
---> 51 self.lib = open_cudalib('cudart')
     53 self.is_initialized = True

File ~/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/libs.py:65, in open_cudalib(lib)
     63 def open_cudalib(lib):
     64     path = get_cudalib(lib)
---> 65     return ctypes.CDLL(path)

File /usr/lib/python3.10/ctypes/__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    371 self._FuncPtr = _FuncPtr
    373 if handle is None:
--> 374     self._handle = _dlopen(self._name, mode)
    375 else:
    376     self._handle = handle

OSError: libcudart.so: cannot open shared object file: No such file or directory

Steps/Code to reproduce bug

  1. Install clear WSL 2 Ubuntu 22.04

  2. $ sudo apt-get update && sudo apt-get install -y python3-pip

  3. $ pip install jupyterlab jupyterlab-git ipywidgets

  4. Install Rapids in WSL2:

    pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu12==24.8.* dask-cudf-cu12==24.8.* cuml-cu12==24.8.* \
    cugraph-cu12==24.8.* cuspatial-cu12==24.8.* cuproj-cu12==24.8.* \
    cuxfilter-cu12==24.8.* cucim-cu12==24.8.* pylibraft-cu12==24.8.* \
    raft-dask-cu12==24.8.* cuvs-cu12==24.8.* nx-cugraph-cu12==24.8.*
  5. Lanch JupyterLab and create notepad:

jupyter-lab --NotebookApp.iopub_data_rate_limit=1e10
  1. Execute cell with code :
    import cudf
    print(cudf.Series([1, 2, 3]))

    Image

Environment overview (please complete the following information)

Image

Environment details

Additional context

artyomboyko commented 1 month ago

After downgrading driver on Windows 11 problem still reproduce:

Image

bdice commented 1 month ago

Did you install the CUDA Toolkit? See our WSL2 instructions here, specifically step 4: https://docs.rapids.ai/install/#wsl2-pip

bdice commented 1 month ago

If you use conda to install rather than pip, it will come with a CUDA Toolkit and you don't need to install it yourself.

artyomboyko commented 1 month ago

@bdice Ok. I test.

Can you please tell, why Docker container (VSCode dev container) return exception to?

Image

Devcontainer project TEST.zip

artyomboyko commented 1 month ago

@bdice Problem with WSL2 solved. I install latest driver again (Nvidia Studio 561.09) and install latest CUDA toolkit (CUDA Toolkit 12.6 Update 1) in WSL2 on Ubuntu 22.04.

Only one question remains. How to fix dev container in VS code. Or close this bug and open a new one? Only where is it more appropriate to open it?

bdice commented 1 month ago

I'm not sure how to diagnose the error shown at the bottom of your screenshot. It looks like you are using Docker. Are you using Docker Desktop? Did you follow the RAPIDS Docker instructions to install the NVIDIA Container Toolkit?

Please try to compile and run a sample CUDA program or another GPU library like CuPy so we can see what might be going wrong.

A sample CuPy program:

import cupy as cp
print(cp.array([[1, 2, 3], [4, 5, 6]]))
artyomboyko commented 1 month ago

@bdice

Yes. I have Windows 11 Pro + WSL2 + Docker 4.34.2 (167172). I install VS Code and try to use Dev container with custom Docker file.

As i undestan Docker in Windows work over Docker. Docker use WSL2 (Ubuntu 22.04). I install Nvidia toolkit in WSL2. WSL2 work as expected.

Yesterday I tried install Nvidia toolkit in Windows. Then rebuilt the dev container. But that didn't help.

Everything is pretty much done as described in the video.

My test "project" - TEST.zip

When I try run notebook i get error.... Surprise! It all worked.

Bottom line, what I did: 1) After yesterday's experiment with installing Nvidia Toolkit in Windows, I uninstalled it from Windows. But when uninstalling it, I didn't touch the driver. 2) I upgraded the Docker version this morning to Docker Desktop 4.34.2 (167172). And Reboot ))) 3) Rebuilt the containers.

And result:

Image

Looks like we won))))

artyomboyko commented 1 month ago

@bdice I'll check it out and come back with feedback.

artyomboyko commented 1 month ago

@bdice Tested. I use container nvcr.io/nvidia/pytorch:24.09-py3 as dev container. Everything seems to be working properly now.

Image

Thanks!

bdice commented 1 month ago

Great. Glad that worked out for you @blademoon! 🥳

artyomboyko commented 1 month ago

@bdice Thank you so much for your help! 👍