mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1k stars 157 forks source link

Adding support for CUDA_VISIBLE_DEVICES #567

Closed lipengfeizju closed 2 weeks ago

lipengfeizju commented 3 weeks ago

Description

In our university's cluster, our goal is to measure the energy consumption of a deep learning model. The server uses SLURM system and we only get 1 A100 ( out of 8 GPUs). The GPU power measurement from codecarbon is about all 8 GPUs, instead of the GPU we have been allocated.

Techinically speaking, I guess in line 184 of codecarbon/core/gpu.py, it queries all GPUs from nvml instead of focusing on the GPU we are actually using. To get a more accurate measurement, it would be better to only look up the power consumption related to CUDA_VISIBLE_DEVICES.

Similar discussion about this topic can also be found here in pynvml.

So is it possible to add a new feature to support measurements focusing on CUDA_VISIBLE_DEVICES? I think this is important for deep learning applications, since the other non-visiable devices are usually unrelated to the power consumption of the DL applications.

Thank you again for providing the code base for carbon measurement.

inimaz commented 3 weeks ago

Hello @lipengfeizju! Thanks for using codecarbon. I didn't know about this var CUDA_VISIBLE_DEVICES. If they end up implementing the function in pynvml it would be useful to use it indeed.

In the meantime, in codecarbon there is a way to filter the GPUs that are tracked if you provide their gpu_id. https://github.com/mlco2/codecarbon/blob/master/codecarbon/emissions_tracker.py#L191 So maybe you can get the id via nvidia-smi of the ones you want to use and do

EmissionsTracker(
             ...
             gpu_ids ="0,3,4"
)

Is this what you need?

lipengfeizju commented 3 weeks ago

Thanks! That's exactly what I need.

lipengfeizju commented 3 weeks ago

Sorry to reopen the issue again, is it possible to measure the power of several specific CPU cores? (Maybe just like we do for the GPU ids)

benoit-cty commented 2 weeks ago

Maybe we could initialize gpu_ids with os.environ("CUDA_VISIBLE_DEVICES") ?

For CPU node could you open another issue and provide the codecarbon debug logs ? Because it's not possible yet but maybe we could imagine a way to do it.

lipengfeizju commented 2 weeks ago

Thanks!