gpuopenanalytics / pynvml

Provide Python access to the NVML library for GPU diagnostics
BSD 3-Clause "New" or "Revised" License
205 stars 31 forks source link

undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2 #43

Open kinredon opened 2 years ago

kinredon commented 2 years ago

When I run the following code to get GPU process information:

import psutil
import pynvml #导包

UNIT = 1024 * 1024

pynvml.nvmlInit() 
gpuDeriveInfo = pynvml.nvmlSystemGetDriverVersion()

gpuDeviceCount = pynvml.nvmlDeviceGetCount()

for i in range(gpuDeviceCount):
    handle = pynvml.nvmlDeviceGetHandleByIndex(i)#获取GPU i的handle,后续通过handle来处理

            print("进程pid:", pidInfo.pid, "用户名:", pidUser, 
            "显存占有:", pidInfo.usedGpuMemory/UNIT, "Mb") # 统计某pid使用的显存

pynvml.nvmlShutdown() #最后关闭管理工具

but I get the errors like this:

Traceback (most recent call last):
  File "/mnt/data0/home/dengjinhong/miniconda3/envs/python3/lib/python3.6/site-packages/pynvml.py", line 782, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/mnt/data0/home/dengjinhong/miniconda3/envs/python3/lib/python3.6/ctypes/__init__.py", line 361, in __getattr__
    func = self.__getitem__(name)
  File "/mnt/data0/home/dengjinhong/miniconda3/envs/python3/lib/python3.6/ctypes/__init__.py", line 366, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/nvidia-430/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "gpu_info.py", line 21, in <module>
    pidAllInfo = pynvml.nvmlDeviceGetComputeRunningProcesses(handle)#获取所有GPU上正在运行的进程信息
  File "/mnt/data0/home/dengjinhong/miniconda3/envs/python3/lib/python3.6/site-packages/pynvml.py", line 2223, in nvmlDeviceGetComputeRunningProcesses
    return nvmlDeviceGetComputeRunningProcesses_v2(handle);
  File "/mnt/data0/home/dengjinhong/miniconda3/envs/python3/lib/python3.6/site-packages/pynvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
  File "/mnt/data0/home/dengjinhong/miniconda3/envs/python3/lib/python3.6/site-packages/pynvml.py", line 785, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.NVMLError_FunctionNotFound: Function Not Found

Here is the nvidia-smi information:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 20%   26C    P8     8W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 20%   28C    P8     8W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:82:00.0 Off |                  N/A |
| 20%   24C    P8     9W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 20%   27C    P8     8W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

The version of nvidia-ml-py is 11.495.46. So why did this happen?

fbcotter commented 1 year ago

This is also happening on the latest version, which now tries to call nvmlDeviceGetComputeRunningProcesses_v3 for me with nvidia driver version 470. I think calling the older function nvmlDeviceGetComputeRunningProcesses is still available when I try. Maybe could we add a try-except here