Closed hreesteeyahn closed 4 months ago
You may have a look to https://forums.developer.nvidia.com/t/bug-nvml-incorrectly-detects-certain-gpus-as-unsupported/30165/4 (and https://github.com/CFSworks/nvml_fix )
There seems to be a way to path Nvidia code to support GPU that they seems to don't want to support :exploding_head:
Description
I am trying to recreate the results of Luccioni, Sasha, et al. “Power Hungry Processing: Watts Driving the Cost of AI Deployment?” The 2024 ACM Conference on Fairness, Accountability, and Transparency, ACM, 2024. Crossref, https://doi.org/10.1145/3630106.3658542. as baseline results for my research thesis. When I try to run this script from the provided code repo: https://github.com/sashavor/co2_inference/blob/main/code/qa/qa_squadv2.py, I am getting this issue:
{ "name": "NVMLError_NotSupported", "message": "Not Supported", "stack": "--------------------------------------------------------------------------- NVMLError_NotSupported Traceback (most recent call last) Cell In[7], line 38 36 for model in qa_models: 37 print(model) ---> 38 tracker = EmissionsTracker(project_name=model, measure_power_secs=1, logging_logger=_logger, output_file='./qa_squadv2.csv') 39 tracker.start() 40 tracker.start_task(\"load model\")
File /lib/python3.10/site-packages/codecarbon/emissions_tracker.py:284, in BaseEmissionsTracker.init(self, project_name, measure_power_secs, api_call_interval, api_endpoint, api_key, output_dir, output_file, save_to_file, save_to_api, save_to_logger, logging_logger, save_to_prometheus, prometheus_url, gpu_ids, emissions_endpoint, experiment_id, experiment_name, co2_signal_api_token, tracking_mode, log_level, on_csv_write, logger_preamble, default_cpu_power, pue) 282 if gpu.is_gpu_details_available(): 283 logger.info(\"Tracking Nvidia GPU via pynvml\") --> 284 gpu_devices = GPU.from_utils(self._gpu_ids) 285 self._hardware.append(gpu_devices) 286 gpu_names = [n[\"name\"] for n in gpu_devices.devices.get_gpu_static_info()]
File /lib/python3.10/site-packages/codecarbon/external/hardware.py:121, in GPU.from_utils(cls, gpu_ids) 119 @classmethod 120 def from_utils(cls, gpu_ids: Optional[List] = None) -> \"GPU\": --> 121 return cls(gpu_ids=gpu_ids)
File:4, in init(self, gpu_ids)
File/lib/python3.10/site-packages/codecarbon/external/hardware.py:63, in GPU.post_init(self)
62 def post_init(self):
---> 63 self.devices = AllGPUDevices()
64 self.num_gpus = self.devices.device_count
65 self._total_power = Power(
66 0 # It will be 0 until we call for the first time measure_power_and_energy
67 )
File/lib/python3.10/site-packages/codecarbon/core/gpu.py:208, in AllGPUDevices.init(self)
206 for i in range(self.device_count):
207 handle = pynvml.nvmlDeviceGetHandleByIndex(i)
--> 208 gpu_device = GPUDevice(handle=handle, gpu_index=i)
209 self.devices.append(gpu_device)
File:8, in init(self, handle, gpu_index, energy_delta, power, last_energy)
File/lib/python3.10/site-packages/codecarbon/core/gpu.py:46, in GPUDevice.post_init(self)
45 def post_init(self):
---> 46 self.last_energy = self._get_energy_kwh()
47 self._init_static_details()
File/lib/python3.10/site-packages/codecarbon/core/gpu.py:50, in GPUDevice._get_energy_kwh(self)
49 def _get_energy_kwh(self):
---> 50 return Energy.from_millijoules(self._get_total_energy_consumption())
File/lib/python3.10/site-packages/codecarbon/core/gpu.py:117, in GPUDevice._get_total_energy_consumption(self)
113 def _get_total_energy_consumption(self):
114 \"\"\"Returns total energy consumption for this GPU in millijoules (mJ) since the driver was last reloaded
115 https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1g732ab899b5bd18ac4bfb93c02de4900a
116 \"\"\"
--> 117 return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
File/lib/python3.10/site-packages/pynvml/nvml.py:2411, in nvmlDeviceGetTotalEnergyConsumption(handle)
2409 fn = _nvmlGetFunctionPointer(\"nvmlDeviceGetTotalEnergyConsumption\")
2410 ret = fn(handle, byref(c_millijoules))
-> 2411 _nvmlCheckReturn(ret)
2412 return c_millijoules.value
File/watts/lib/python3.10/site-packages/pynvml/nvml.py:833, in _nvmlCheckReturn(ret)
831 def _nvmlCheckReturn(ret):
832 if (ret != NVML_SUCCESS):
--> 833 raise NVMLError(ret)
834 return ret
NVMLError_NotSupported: Not Supported" }
It seems that the issue is with the GPU I am using (NVIDIA Geforce GTX Titan X) not being supported for nvmlDeviceGetTotalEnergyConsumption.
Are there any workarounds that you are aware of?