mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.

https://mlco2.github.io/codecarbon

MIT License

1.01k stars 158 forks source link

feat: get gpu energy consumption directly from pynvml #401

Closed inimaz closed 11 months ago

inimaz commented 1 year ago

Until now, GPU energy consumption has been calculated as gpu_power * time. And this gpu_power is measured every measure_power_secs. The problem is that this way, if there are spikes of consumption, they will go unnoticed.

Goal of this PR

It is preferable to get the energy consumption instead, directly from pynvml. This PR tries to accomplish that.

benoit-cty commented 11 months ago

I tested it with and without gpu_burn and it works well.

We have a problem for the first measure as the estimated power is above the expected. With my GTX 1080 Ti we estimate a power of 323 W for the first measure, then 245 W. The max power given for my card is 250 W so I suspect we initialize the energy counter some time after we start _last_measured_time. I tried to move it at the end of https://github.com/mlco2/codecarbon/blob/6e5df712d512966f6177a5672d3e60e1548439d9/codecarbon/emissions_tracker.py#L409 but it doesn't change anything.

By the way, it's probably not relative to this PR and we may have the same problem with CPU.

inimaz commented 11 months ago

it's probably not relative to this PR and we may have the same problem with CPU

@benoit-cty Do you think it is blocking the merge? Or should we merge it and create an issue for this?

PS: I start the last energy as 0 here https://github.com/mlco2/codecarbon/blob/6b151244887e29acf8f3cec48c5d8e6e32ba2959/codecarbon/core/gpu.py#L40 maybe this is not true, maybe there is an initial boost just to start it up or sth like that. I could do a measurement there to set the initial last_energy

benoit-cty commented 11 months ago

Maybe we could implement a start for GPU like we do for CPU : https://github.com/mlco2/codecarbon/blob/6b151244887e29acf8f3cec48c5d8e6e32ba2959/codecarbon/external/hardware.py#L204C26-L204C26

EDIT : I just push something to add a start because instead we store energy at __post_init__ and not when monitoring start.