Open betatim opened 1 year ago
Clicked my way through from NGC to a Vertex notebook. The custom kernel takes minutes to appear. At first it wasn't clear to me why there was no Rapids kernel/that after loading the notebook UI I had to wait for an additional kernel to appear. I think this is a bit weird/most people won't expect that?
Once I had the custom Rapids kernel opening a notebook that used it was quite quick (c.f. "takes 8min to spin up" comment above). However when I tried to execute import cudf
in the notebook I got an error message:
/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/utils/gpu_utils.py:62: UserWarning: Failed to dlopen libcuda.so
warnings.warn(str(e))
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
/opt/conda/envs/rapids/lib/python3.10/site-packages/cupy/__init__.py in <module>
17 try:
---> 18 from cupy import _core # NOQA
19 except ImportError as exc:
/opt/conda/envs/rapids/lib/python3.10/site-packages/cupy/_core/__init__.py in <module>
2
----> 3 from cupy._core import core # NOQA
4 from cupy._core import fusion # NOQA
cupy/_core/core.pyx in init cupy._core.core()
/opt/conda/envs/rapids/lib/python3.10/site-packages/cupy/cuda/__init__.py in <module>
7 from cupy._environment import get_hipcc_path # NOQA
----> 8 from cupy.cuda import compiler # NOQA
9 from cupy.cuda import device # NOQA
/opt/conda/envs/rapids/lib/python3.10/site-packages/cupy/cuda/compiler.py in <module>
13 from cupy.cuda import device
---> 14 from cupy.cuda import function
15 from cupy.cuda import get_rocm_path
cupy/cuda/function.pyx in init cupy.cuda.function()
cupy/_core/_carray.pyx in init cupy._core._carray()
cupy/_core/internal.pyx in init cupy._core.internal()
cupy/cuda/memory.pyx in init cupy.cuda.memory()
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
The above exception was the direct cause of the following exception:
ImportError Traceback (most recent call last)
/tmp/ipykernel_7/619004098.py in <module>
----> 1 import cudf
/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/__init__.py in <module>
5 validate_setup()
6
----> 7 import cupy
8 from numba import config as numba_config, cuda
9
/opt/conda/envs/rapids/lib/python3.10/site-packages/cupy/__init__.py in <module>
18 from cupy import _core # NOQA
19 except ImportError as exc:
---> 20 raise ImportError(f'''
21 ================================================================
22 {_environment._diagnose_import_error()}
ImportError:
================================================================
Failed to import CuPy.
If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.
On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.
Check the Installation Guide for details:
https://docs.cupy.dev/en/latest/install.html
Original error:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
================================================================
The shared library exists in /usr/local/cuda-11.2/compat/libcuda.so.1
but LD_LIBRARYPATH
is /usr/local/nvidia/lib:/usr/local/nvidia/lib64
.
The docker image that was used is nvcr.io/nvidia/rapidsai/rapidsai:cuda11.2-runtime-centos7-py3.10
Weirdly enough, when I look at the "managed notebooks" tab in the Google Cloud UI it tells me that the notebook doesn't have a GPU.
Thanks @betatim!
I ran through myself and found similar things. Here are the steps I followed:
nvidia-smi
and saw the T4import cudf
.Then I ran through a second time (partly to refresh my memory and write the list above) and the process felt a little different.
Reduce the number of clicks to launch on VertexAI from NGC.
import cudf
fails in the default kernel; users need to select the RAPIDS kernel once it is available)