gpuopenanalytics / pynvml

Provide Python access to the NVML library for GPU diagnostics
BSD 3-Clause "New" or "Revised" License
205 stars 31 forks source link

When failing on NVMLError exception, bug in handling #37

Open qwertAsc opened 3 years ago

qwertAsc commented 3 years ago

When failing on this line in smi.py with exception nvmlDeviceGetSupportedMemoryClocks(handle)

following line fails with this error - "TypeError: list indices must be integers or slices, not str" except NVMLError as err: supportedClocks['Error'] = nvidia_smi.__handleError(err)

because supportedClocks defines as list

rjzamora commented 3 years ago

Thanks for raising an issue @qwertAsc - Can you provide a full reproducer here? How are you getting the handle?

For example, here is how I would expect someone to use nvmlDeviceGetSupportedMemoryClocks:

In [1]: import pynvml

In [2]: pynvml.nvmlInit()

In [3]: handle = pynvml.nvmlDeviceGetHandleByIndex(0)

In [4]: pynvml.nvmlDeviceGetSupportedMemoryClocks(handle)
Out[4]: [7001, 6501, 5001, 810, 405]
qwertAsc commented 3 years ago

Thanks @rjzamora I am just calling the following from pynvml.smi import nvidia_smi nvidia_smi.getInstance().DeviceQuery() and receiving this error supportedClocks['Error'] = nvidia_smi.__handleError(err) TypeError: list indices must be integers or slices, not str

when runnning your example I get the following pynvml.nvmlDeviceGetSupportedMemoryClocks(handle) File "/home/.../env/lib/python3.6/site-packages/pynvml/nvml.py", line 1135, in nvmlDeviceGetSupportedMemoryClocks raise NVMLError(ret) pynvml.nvml.NVMLError_NotSupported: Not Supported

rjzamora commented 3 years ago

Thanks for the info! Are you passing a query string to DeviceQuery (e.g. nvidia_smi.getInstance().DeviceQuery('memory.free') ?)

Also, can you specify the version of CUDA you are using and wheter you happen to be using MIG support?

qwertAsc commented 3 years ago

No, i am not passing any query string I am using CUDA version 11.0 Also I am less bothered that the clock query fails, and more that except doesn't catch the error (Not sure regarding MIG)

danielbraun89 commented 3 years ago

have the same issue while running nvidia_smi.getInstance().DeviceQuery()

supportedClocks['Error'] = nvidia_smi.__handleError(err)
TypeError: list indices must be integers or slices, not str