OSError: libcudart.so: cannot open shared object file: No such file or directory (Nvidia Studio Driver 561.09)

artyomboyko commented 1 month ago

Describe the bug Try to use Rapids on WSL2 and get error:

/home/artyom/.local/lib/python3.10/site-packages/cudf/utils/_ptxcompiler.py:64: UserWarning: Error getting driver and runtime versions:

stdout:

stderr:

Traceback (most recent call last):
  File "<string>", line 7, in <module>
  File "/home/artyom/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
    self.cudaRuntimeGetVersion(ctypes.byref(rtver))
  File "/home/artyom/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
    self._initialize()
  File "/home/artyom/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
    self.lib = open_cudalib('cudart')
  File "/home/artyom/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/libs.py", line 65, in open_cudalib
    return ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcudart.so: cannot open shared object file: No such file or directory

Not patching Numba
  warnings.warn(msg, UserWarning)
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[1], line 1
----> 1 import cudf
      2 print(cudf.Series([1, 2, 3]))

File ~/.local/lib/python3.10/site-packages/cudf/__init__.py:10
      7 from cudf.utils.gpu_utils import validate_setup
      9 _setup_numba()
---> 10 validate_setup()
     12 import cupy
     13 from numba import config as numba_config, cuda

File ~/.local/lib/python3.10/site-packages/cudf/utils/gpu_utils.py:96, in validate_setup()
     86     minor_version = getDeviceAttribute(
     87         cudaDeviceAttr.cudaDevAttrComputeCapabilityMinor, 0
     88     )
     89     raise UnsupportedCUDAError(
     90         "A GPU with NVIDIA Volta™ (Compute Capability 7.0) "
     91         "or newer architecture is required[.\n](http://127.0.0.1:8888/lab/tree/n)"
     92         f"Detected GPU 0: {device_name}\n"
     93         f"Detected Compute Capability: {major_version}.{minor_version}"
     94     )
---> 96 cuda_runtime_version = runtimeGetVersion()
     98 if cuda_runtime_version < 11000:
     99     # Require CUDA Runtime version 11.0 or greater.
    100     major_version = cuda_runtime_version // 1000

File ~/.local/lib/python3.10/site-packages/rmm/_cuda/gpu.py:88, in runtimeGetVersion()
     84 # TODO: Replace this with `cuda.cudart.cudaRuntimeGetVersion()` when the
     85 # limitation is fixed.
     86 import numba.cuda
---> 88 major, minor = numba.cuda.runtime.get_version()
     89 return major * 1000 + minor * 10

File ~/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py:111, in Runtime.get_version(self)
    107 """
    108 Returns the CUDA Runtime version as a tuple (major, minor).
    109 """
    110 rtver = ctypes.c_int()
--> 111 self.cudaRuntimeGetVersion(ctypes.byref(rtver))
    112 # The version is encoded as (1000 * major) + (10 * minor)
    113 major = rtver.value // 1000

File ~/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py:65, in Runtime.__getattr__(self, fname)
     62 argtypes = proto[1:]
     64 if not self.is_initialized:
---> 65     self._initialize()
     67 # Find function in runtime library
     68 libfn = self._find_api(fname)

File ~/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py:51, in Runtime._initialize(self)
     47     msg = ("CUDA is disabled due to setting NUMBA_DISABLE_CUDA=1 "
     48            "in the environment, or because CUDA is unsupported on "
     49            "32-bit systems.")
     50     raise CudaSupportError(msg)
---> 51 self.lib = open_cudalib('cudart')
     53 self.is_initialized = True

File ~/.local/lib/python3.10/site-packages/numba/cuda/cudadrv/libs.py:65, in open_cudalib(lib)
     63 def open_cudalib(lib):
     64     path = get_cudalib(lib)
---> 65     return ctypes.CDLL(path)

File /usr/lib/python3.10/ctypes/__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    371 self._FuncPtr = _FuncPtr
    373 if handle is None:
--> 374     self._handle = _dlopen(self._name, mode)
    375 else:
    376     self._handle = handle

OSError: libcudart.so: cannot open shared object file: No such file or directory

Steps/Code to reproduce bug

Install clear WSL 2 Ubuntu 22.04
$ sudo apt-get update && sudo apt-get install -y python3-pip
$ pip install jupyterlab jupyterlab-git ipywidgets

Install Rapids in WSL2:

pip install \
--extra-index-url=https://pypi.nvidia.com \
cudf-cu12==24.8.* dask-cudf-cu12==24.8.* cuml-cu12==24.8.* \
cugraph-cu12==24.8.* cuspatial-cu12==24.8.* cuproj-cu12==24.8.* \
cuxfilter-cu12==24.8.* cucim-cu12==24.8.* pylibraft-cu12==24.8.* \
raft-dask-cu12==24.8.* cuvs-cu12==24.8.* nx-cugraph-cu12==24.8.*

Lanch JupyterLab and create notepad:

jupyter-lab --NotebookApp.iopub_data_rate_limit=1e10

Execute cell with code :

import cudf
print(cudf.Series([1, 2, 3]))

Environment overview (please complete the following information)

Environment location: Windows 11 Pro 23H2 22631.4169 Windows Feature Experience Pack 1000.22700.1034.0, WSL 2 Ubuntu 22.04.3 LTS, RTX 4090 with Nvidia Studio Driver 561.09

Method of cuDF install: From PIP

Environment details

Additional context

artyomboyko commented 1 month ago

After downgrading driver on Windows 11 problem still reproduce:

bdice commented 1 month ago

Did you install the CUDA Toolkit? See our WSL2 instructions here, specifically step 4: https://docs.rapids.ai/install/#wsl2-pip

bdice commented 1 month ago

If you use conda to install rather than pip, it will come with a CUDA Toolkit and you don't need to install it yourself.

artyomboyko commented 1 month ago

@bdice Ok. I test.

Can you please tell, why Docker container (VSCode dev container) return exception to?

Devcontainer project TEST.zip

artyomboyko commented 1 month ago

@bdice Problem with WSL2 solved. I install latest driver again (Nvidia Studio 561.09) and install latest CUDA toolkit (CUDA Toolkit 12.6 Update 1) in WSL2 on Ubuntu 22.04.

Only one question remains. How to fix dev container in VS code. Or close this bug and open a new one? Only where is it more appropriate to open it?

bdice commented 1 month ago

I'm not sure how to diagnose the error shown at the bottom of your screenshot. It looks like you are using Docker. Are you using Docker Desktop? Did you follow the RAPIDS Docker instructions to install the NVIDIA Container Toolkit?

Please try to compile and run a sample CUDA program or another GPU library like CuPy so we can see what might be going wrong.

A sample CuPy program:

import cupy as cp
print(cp.array([[1, 2, 3], [4, 5, 6]]))

artyomboyko commented 1 month ago

@bdice

Yes. I have Windows 11 Pro + WSL2 + Docker 4.34.2 (167172). I install VS Code and try to use Dev container with custom Docker file.

As i undestan Docker in Windows work over Docker. Docker use WSL2 (Ubuntu 22.04). I install Nvidia toolkit in WSL2. WSL2 work as expected.

Yesterday I tried install Nvidia toolkit in Windows. Then rebuilt the dev container. But that didn't help.

Everything is pretty much done as described in the video.

My test "project" - TEST.zip

When I try run notebook i get error.... Surprise! It all worked.

Bottom line, what I did: 1) After yesterday's experiment with installing Nvidia Toolkit in Windows, I uninstalled it from Windows. But when uninstalling it, I didn't touch the driver. 2) I upgraded the Docker version this morning to Docker Desktop 4.34.2 (167172). And Reboot ))) 3) Rebuilt the containers.

And result:

Looks like we won))))

artyomboyko commented 1 month ago

@bdice I'll check it out and come back with feedback.

artyomboyko commented 1 month ago

@bdice Tested. I use container nvcr.io/nvidia/pytorch:24.09-py3 as dev container. Everything seems to be working properly now.

Thanks!

bdice commented 1 month ago

Great. Glad that worked out for you @blademoon! 🥳

artyomboyko commented 1 month ago

@bdice Thank you so much for your help! 👍

rapidsai / cudf

OSError: libcudart.so: cannot open shared object file: No such file or directory (Nvidia Studio Driver 561.09) #16961

Environment details