mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1.12k stars 173 forks source link

Not able to record GPU Power/Energy consumed for GPUs #378

Closed pugantsov closed 1 year ago

pugantsov commented 1 year ago

Description

GPU Power not tracking, rest of metrics are fine.

I've been using codecarbon for the last few months and not had an issue with tracking metrics, but all of a sudden I'm unable to track GPU power consumed or GPU power in watts.

Example output:

[codecarbon INFO @ 21:35:01] Energy consumed for RAM : 0.000787 kWh. RAM Power : 188.75854682922363 W
[codecarbon INFO @ 21:35:01] Energy consumed for all GPUs : 0.000000 kWh. All GPUs Power : 0.0 W
[codecarbon INFO @ 21:35:01] Energy consumed for all CPUs : 0.000775 kWh. All CPUs Power : 186.00187191683423 W
[codecarbon INFO @ 21:35:01] 0.001562 kWh of electricity used since the begining.
[codecarbon INFO @ 21:35:16] Energy consumed for RAM : 0.001573 kWh. RAM Power : 188.75854682922363 W
[codecarbon INFO @ 21:35:16] Energy consumed for all GPUs : 0.000000 kWh. All GPUs Power : 0.0 W
[codecarbon INFO @ 21:35:16] Energy consumed for all CPUs : 0.001548 kWh. All CPUs Power : 185.50437082241802 W
[codecarbon INFO @ 21:35:16] 0.003121 kWh of electricity used since the begining.
[codecarbon INFO @ 21:35:31] Energy consumed for RAM : 0.002359 kWh. RAM Power : 188.75854682922363 W
[codecarbon INFO @ 21:35:31] Energy consumed for all GPUs : 0.000000 kWh. All GPUs Power : 0.0 W
[codecarbon INFO @ 21:35:31] Energy consumed for all CPUs : 0.002323 kWh. All CPUs Power : 185.86241305233906 W
[codecarbon INFO @ 21:35:31] 0.004681 kWh of electricity used since the begining.
[codecarbon INFO @ 21:35:46] Energy consumed for RAM : 0.003145 kWh. RAM Power : 188.75854682922363 W
[codecarbon INFO @ 21:35:46] Energy consumed for all GPUs : 0.000000 kWh. All GPUs Power : 0.0 W
[codecarbon INFO @ 21:35:46] Energy consumed for all CPUs : 0.003088 kWh. All CPUs Power : 183.75305036532507 W
[codecarbon INFO @ 21:35:46] 0.006233 kWh of electricity used since the begining.

I'm using codecarbon via an explicit EmissionsTracker object, i.e. via start() and stop() and I'm using it to track model-related metrics on an NVIDIA TITAN RTX GPU. I don't think it's on the pynvml side since that's what Weights & Biases uses and it seems to track GPU power draw in watts.

I wondered if it's because the consumption was so low (since my code accesses the kWh variable from tracker._total_gpu_energy.kWh) but my other models seem to report, for example, ~0.2 kW GPU Power and 0.0009264 kWh consumed with a dataset that is half the size. Since codecarbon's regular code output seems to report 0.0 every 30 seconds or so, I imagine it's not how I'm accessing the variables. Is this a known issue or is there a workaround to the way I'm doing things?

pugantsov commented 1 year ago

Found that the issue was pyNVML 11.5 was breaking the tracking of GPU-related metrics, reverting to 11.4.1 fixed it.