msminhas93 / nviwatch

NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes
GNU General Public License v3.0
176 stars 4 forks source link

Error: LibloadingError(DlOpen { desc: "libnvidia-ml.so: cannot open shared object file: No such file or directory" }) #1

Open sangshuduo opened 2 months ago

sangshuduo commented 2 months ago

$ nviwatch Error: LibloadingError(DlOpen { desc: "libnvidia-ml.so: cannot open shared object file: No such file or directory" })

$ uname -a Linux sn4622120254 5.15.0-101-generic #111-Ubuntu SMP Tue Mar 5 20:16:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue Ubuntu 22.04.4 LTS \n \l

$ rustc --version rustc 1.81.0 (eeb90cda1 2024-09-04)

$ cargo --version cargo 1.81.0 (2dbb1af80 2024-08-20)

msminhas93 commented 2 months ago

Please check your nvidia drivers and verify that nvidia-smi and nvcc commands are working. Also export the LD_LIBRARY_PATH also. https://docs.nvidia.com/cuda/cuda-installation-guide-linux/

sangshuduo commented 2 months ago

$ nvidia-smi Thu Sep 12 20:37:52 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100 80GB PCIe Off | 00000000:01:00.0 Off | 0 |

After I export LD_LIBRARY_PATH with the location of libnvidia-ml.so and run nviwatch again. It reports following:

$ nviwatch

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING:

You should always run with libnvidia-ml.so that is installed with your NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64. libnvidia-ml.so in GDK package is a stub library that is attached only for build purposes (e.g. machine that you build your application doesn't have to have Display Driver installed). !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing Output information may be incomplete. lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1005/gvfs Output information may be incomplete. Linked to libnvidia-ml library at wrong path : /usr/local/cuda-12.4/targets/x86_64-linux/lib/stubs/libnvidia-ml.so

Error: DriverNotLoaded

msminhas93 commented 2 months ago

I think I was getting the same error few weeks ago on my wsl and I had added these two lines which seemed to fix the issue.

export PATH=$PATH:/usr/local/cuda-12.2/bin
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/lib/wsl/drivers/:$LD_LIBRARY_PATH

So for your case it would be 12.4. I'm speculating that the lib or lib64 whichever your system has isn't available in the ld path.

Rough equivalent for the drivers path on ubuntu is:

export PATH=$PATH:/usr/local/cuda-12.4/bin
export LD_LIBRARY_PATH=/usr/lib:/usr/lib/modules/$(uname -r):$LD_LIBRARY_PATH