Closed RemiLacroix-IDRIS closed 1 year ago
Hello,
Thanks for reporting this. scontrol
is used at the tracker initialization to detect hardware values. It is then surprising that your execution context is spammed with it.
Could you please provide some codecarbon-related logs surrounding the scontrol
calls, and the tracker instantiation context ?
I think that's true for the count_cpus
function
https://github.com/mlco2/codecarbon/blob/d99fe87604791ad667a4c0ec54a10f8182dea0ab/codecarbon/core/util.py#L77
but the slurm_memory_GB
function is indirectly used by measure_power_and_energy
which is called in a loop:
https://github.com/mlco2/codecarbon/blob/d99fe87604791ad667a4c0ec54a10f8182dea0ab/codecarbon/external/hardware.py#L312
Hello,
Good point, feel free to propose a PR to not compute slurm_memory_GB
value each time. We, CodeCarbon contributors, do not have access to Slurm cluster.
Do you know the rate that is problematic ? Because measure_power_secs
default value is 15 seconds.
Hello, fixed by the today release.
Hello, fixed by the today release.
Thanks! :)
Hello,
We are facing some issues with CodeCarbon in our facility because it spams our Slurm controler with
scontrol
.If rate-limiting is enabled this also means that the jobs that use CodeCarbon are slowed down because they spend a lot of time waiting for
scontrol
to return.As far as I could tell, the values that are read from
scontrol
won't change during the run so they could be cached by CodeCarbon instead of being fetched everytime.We are willing to provide a PR if needed. If caching is used somewhere else, feel free to point us to the relevant code so that we use a similar implementation.
Rémi