Whenever we call pynvml.nvmlDeviceGetTotalEnergyConsumption it could throw an exception. Logs taken from the other issue:
[codecarbon WARNING @ 11:19:56] Invalid gpu_ids format. Expected a string or a list of ints.
[codecarbon INFO @ 11:19:56] [setup] RAM Tracking...
[codecarbon INFO @ 11:19:56] [setup] GPU Tracking...
[codecarbon INFO @ 11:19:56] Tracking Nvidia GPU via pynvml
Traceback (most recent call last):
File "/home/ainhoa.vivel/TFM/transfer_learning/baseline.py", line 191, in <module>
tracker = EmissionsTracker(project_name="baseline")
File "/home/ainhoa.vivel/anaconda3/envs/tl/lib/python3.9/site-packages/codecarbon/emissions_tracker.py", line 296, in __init__
gpu_devices = GPU.from_utils(self._gpu_ids)
File "/home/ainhoa.vivel/anaconda3/envs/tl/lib/python3.9/site-packages/codecarbon/external/hardware.py", line 121, in from_utils
return cls(gpu_ids=gpu_ids)
File "<string>", line 4, in __init__
File "/home/ainhoa.vivel/anaconda3/envs/tl/lib/python3.9/site-packages/codecarbon/external/hardware.py", line 63, in __post_init__
self.devices = AllGPUDevices()
File "/home/ainhoa.vivel/anaconda3/envs/tl/lib/python3.9/site-packages/codecarbon/core/gpu.py", line 186, in __init__
gpu_device = GPUDevice(handle=handle, gpu_index=i)
File "<string>", line 8, in __init__
File "/home/ainhoa.vivel/anaconda3/envs/tl/lib/python3.9/site-packages/codecarbon/core/gpu.py", line 24, in __post_init__
self.last_energy = self._get_energy_kwh()
File "/home/ainhoa.vivel/anaconda3/envs/tl/lib/python3.9/site-packages/codecarbon/core/gpu.py", line 28, in _get_energy_kwh
return Energy.from_millijoules(self._get_total_energy_consumption())
File "/home/ainhoa.vivel/anaconda3/envs/tl/lib/python3.9/site-packages/codecarbon/core/gpu.py", line 95, in _get_total_energy_consumption
return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
File "/home/ainhoa.vivel/anaconda3/envs/tl/lib/python3.9/site-packages/pynvml/nvml.py", line 2411, in nvmlDeviceGetTotalEnergyConsumption
_nvmlCheckReturn(ret)
File "/home/ainhoa.vivel/anaconda3/envs/tl/lib/python3.9/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
raise NVMLError(ret)
pynvml.nvml.NVMLError: System is not in ready state
Goal
It looks like this error is not properly catched. If this error appears, we should catch it and skip the GPU measurements.
Description
This is a follow-up issue discovered by #578.
Whenever we call
pynvml.nvmlDeviceGetTotalEnergyConsumption
it could throw an exception. Logs taken from the other issue:Goal
It looks like this error is not properly catched. If this error appears, we should catch it and skip the GPU measurements.