rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.49k stars 908 forks source link

[QST] Cannot import `cudf` #15268

Closed blue-cat-whale closed 6 months ago

blue-cat-whale commented 8 months ago

I've downloaded cudf==24.4.0a on my RHEL 8.9, but when I tried import cudf like this tutorial, the python console just crushed. How can I fix it? fig

wence- commented 8 months ago

Hmm, can you show the output of nvidia-smi on this system please?

blue-cat-whale commented 8 months ago

nvidia-smi returns nothing. I use a cloud/shared GPU. fig

wence- commented 8 months ago

Do you know what version of the cuda driver is installed on this system? Or if there's anywhere we can look to see details of the virtualisation setup?

It might be possible that some information is available if we launch the process under gdb. Assuming gdb is already installed (if not you'll have to install it using your operating system's package management), what does the following show?

gdb -ex run --args python -c "import cudf"

?

Thanks

blue-cat-whale commented 8 months ago

I installed CUDA12.4 locally

[root@localhost code]# whereis nvcc
nvcc: /usr/local/cuda-12.4/bin/nvcc /usr/local/cuda-12.4/bin/nvcc.profile

But that cloud GPU has multiple CUDA installed, I'm not sure which one is active.

[root@localhost code]# ls /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda* -d
/opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-11.0  /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-11.3  /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-11.6  /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-12.0
/opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-11.1  /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-11.4  /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-11.7  /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-12.1
/opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-11.2  /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-11.5  /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-11.8  /opt/orion/orion_runtime/gpu/cuda/current/orion-cuda-12.2

fig1 fig2

I'm an end-user and I don't know how this cloud GPU is set up on the server side. This is how I set up it as an end user.

wget http://<private_url>/dev/rpm/orionx-cuda-4.2.0-1.all.rpm
wget http://<private_url>/dev/rpm/orionx-engine-4.2.0- 1.all.rpm
wget http://<private_url>/dev/rpm/orionx-runtime-4.2.0- 1.all.rpm

rpm -i ./orionx-engine-4.2.0-1.all.rpm
rpm -i ./orionx-cuda-4.2.0-1.all.rpm
rpm -i ./orionx-runtime-4.2.0-1.all.rpm

systemctl start oriond

export ORION_CLIENT_ID=client-id
export ORION_VGPU=1
export ORION_GMEM=10000
export ORION_RATIO=100
export ORION_DEVICE_ENABLE=1
export ORION_RESERVED=0
export LD_LIBRARY_PATH=/opt/orion/orion_runtime/gpu/cuda/current   
bdice commented 8 months ago

If you are seeing nvidia-smi produce CUDA Version: N/A, that's not a good sign. I'm not sure what that means, but it could be an issue with your drivers? I would first try compiling and running a basic CUDA program. Something like this hello world example. I also recommmend reaching out to your cloud provider for support.

vyasr commented 6 months ago

Closing as stale. Please reopen if needed.