lfwa / carbontracker

Track and predict the energy consumption and carbon footprint of training deep learning models.
MIT License
352 stars 26 forks source link

Error message when GPU model do not support power retrieval #36

Closed lfwa closed 1 year ago

lfwa commented 3 years ago

Some models of NVIDIA GPUs do not support the retrieval of power usages in NVML. These errors are currently suppressed and power usage retrieval is skipped.

It should instead throw a descriptive error when the GPU model does not support power retrieval (see power_usage() in nvidia.py).

See e.g. error by Princec711 when running this code snippet:

import pynvml

pynvml.nvmlInit()

device_indices = range(pynvml.nvmlDeviceGetCount())
handles = [pynvml.nvmlDeviceGetHandleByIndex(i) for i in device_indices]

for handle in handles:
    name = pynvml.nvmlDeviceGetName(handle)
    device = name.decode("utf-8")
    power_usage = pynvml.nvmlDeviceGetPowerUsage(handle) / 1000
    print(f"{device} uses {power_usage} W")

pynvml.nvmlShutdown()

---------------------------------------------------------------------------
NVMLError_NotSupported                    Traceback (most recent call last)
<ipython-input-2-7e19c443106e> in <module>
      9     name = pynvml.nvmlDeviceGetName(handle)
     10     device = name.decode("utf-8")
---> 11     power_usage = pynvml.nvmlDeviceGetPowerUsage(handle) / 1000
     12     print(f"{device} uses {power_usage} W")
     13 

C:\ProgramData\Anaconda3\lib\site-packages\pynvml\nvml.py in nvmlDeviceGetPowerUsage(handle)
   1243     fn = get_func_pointer("nvmlDeviceGetPowerUsage")
   1244     ret = fn(handle, byref(c_mWatts))
-> 1245     check_return(ret)
   1246     return c_mWatts.value
   1247 

C:\ProgramData\Anaconda3\lib\site-packages\pynvml\nvml.py in check_return(ret)
    364 def check_return(ret):
    365     if (ret != NVML_SUCCESS):
--> 366         raise NVMLError(ret)
    367     return ret
    368 

NVMLError_NotSupported: Not Supported

Originally posted by @Princec711 in https://github.com/lfwa/carbontracker/issues/33#issuecomment-679210377

PedramBakh commented 1 year ago

Release 1.1.7 addresses this issue by informing the user of missing support for power usage retrieval through the NVML API and refers the user to this issue thread.