mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1.18k stars 178 forks source link

CodeCarbon spams Slurm controler with `scontrol` #447

Closed RemiLacroix-IDRIS closed 1 year ago

RemiLacroix-IDRIS commented 1 year ago

Hello,

We are facing some issues with CodeCarbon in our facility because it spams our Slurm controler with scontrol.

If rate-limiting is enabled this also means that the jobs that use CodeCarbon are slowed down because they spend a lot of time waiting for scontrol to return.

As far as I could tell, the values that are read from scontrol won't change during the run so they could be cached by CodeCarbon instead of being fetched everytime.

We are willing to provide a PR if needed. If caching is used somewhere else, feel free to point us to the relevant code so that we use a similar implementation.

Rémi

SaboniAmine commented 1 year ago

Hello, Thanks for reporting this. scontrol is used at the tracker initialization to detect hardware values. It is then surprising that your execution context is spammed with it. Could you please provide some codecarbon-related logs surrounding the scontrol calls, and the tracker instantiation context ?

RemiLacroix-IDRIS commented 1 year ago

I think that's true for the count_cpus function https://github.com/mlco2/codecarbon/blob/d99fe87604791ad667a4c0ec54a10f8182dea0ab/codecarbon/core/util.py#L77 but the slurm_memory_GB function is indirectly used by measure_power_and_energy which is called in a loop: https://github.com/mlco2/codecarbon/blob/d99fe87604791ad667a4c0ec54a10f8182dea0ab/codecarbon/external/hardware.py#L312

benoit-cty commented 1 year ago

Hello, Good point, feel free to propose a PR to not compute slurm_memory_GB value each time. We, CodeCarbon contributors, do not have access to Slurm cluster.

Do you know the rate that is problematic ? Because measure_power_secs default value is 15 seconds.

benoit-cty commented 1 year ago

Hello, fixed by the today release.

RemiLacroix-IDRIS commented 1 year ago

Hello, fixed by the today release.

Thanks! :)