mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1.01k stars 158 forks source link

Unexpected output after running `scontrol show job $SLURM_JOBID` to count SLURM-available RAM #462

Closed benoit-cty closed 7 months ago

benoit-cty commented 8 months ago

Description

Use CodeCarbon on a SLURM cluster.

What I Did

CodeCarbon output a Warning :

[codecarbon WARNING @ 11:21:51] Unexpected output after running `scontrol show job $SLURM_JOBID` to count SLURM-available RAM. Using the machine's total RAM.
[codecarbon INFO @ 11:21:51] Energy consumed for RAM : 0.000293 kWh. RAM Power : 70.18182134628296 W
[codecarbon WARNING @ 11:21:51] Unexpected output after running `scontrol show job $SLURM_JOBID` to count SLURM-available RAM. Using the machine's total RAM.

Here is the output of the command :

scontrol show job $SLURM_JOBID
JobId=XXXX JobName=gpu-jupyterhub
   UserId=XXXX GroupId=XXXX MCS_label=N/A
   Priority=255342 Nice=0 Account=puk@v100 QOS=qos_gpu-t3
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:33:42 TimeLimit=08:00:00 TimeMin=N/A
   SubmitTime=2023-10-23T10:45:25 EligibleTime=2023-10-23T10:45:25
   AccrueTime=2023-10-23T10:45:25
   StartTime=2023-10-23T10:45:35 EndTime=2023-10-23T18:45:35 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-10-23T10:45:35 Scheduler=Main
   Partition=gpu_p13 AllocNode:Sid=idrsrv12-ib0:500994
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=r13i5n0
   BatchHost=r13i5n0
   NumNodes=1 NumCPUs=64 NumTasks=1 CPUs/Task=32 ReqB:S:C:T=0:0:*:1
   ReqTRES=cpu=32,mem=128G,node=1,billing=40,gres/gpu=4
   AllocTRES=cpu=64,mem=128G,node=1,billing=40,gres/gpu=4
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=32 MinMemoryCPU=4G MinTmpDiskNode=0
   Features=v100-16g DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/linkhome/rech/gendxh01/uei48xr
   StdErr=/linkhome/rech/gendxh01/uei48xr/jupyterhub_slurm.err
   StdIn=/dev/null
   StdOut=/linkhome/rech/gendxh01/uei48xr/jupyterhub_slurm.out
   Power=
   TresPerNode=gres:gpu:4

See also https://github.com/mlco2/codecarbon/issues/447

benoit-cty commented 7 months ago

Fixed in the today release.