Open martialblog opened 2 years ago
Hi,
I reworked the PR https://github.com/vpenso/prometheus-slurm-exporter/pull/57 to be compatible with the recent version. I decided not to include the GPU type just to have a minimal working version, which we can then extend later.
Fixes #60
Tested on Slurm 20.11.9 with and without GRES.
sinfo -h -N -O "NodeList: ,AllocMem: ,Memory: ,CPUsState: ,StateLong: ,Gres: ,GresUsed:" gpu-01 113440 187000 34/22/0/56 mixed gpu:tesla:4 gpu:tesla:4(IDX:0-3) gpu-02 80000 187000 8/48/0/56 mixed gpu:tesla:4 gpu:tesla:4(IDX:0-3) gpu-03 64000 187000 8/48/0/56 mixed gpu:tesla:4 gpu:tesla:4(IDX:0-3) gpu-04 36000 187000 6/50/0/56 mixed gpu:tesla:4 gpu:tesla:3(IDX:0,2-3) gpu-05 0 187000 0/56/0/56 idle gpu:tesla:4 gpu:tesla:0(IDX:N/A) gpu-06 12000 187000 2/54/0/56 mixed gpu:tesla:4 gpu:tesla:1(IDX:3) gpu-07 24000 187000 4/52/0/56 mixed gpu:tesla:4 gpu:tesla:2(IDX:1-2) gpu-08 48000 187000 8/48/0/56 mixed gpu:tesla:4 gpu:tesla:4(IDX:0-3) cpu-01 0 502000 0/56/0/56 idle (null) gpu:0 cpu-02 0 502000 0/56/0/56 idle (null) gpu:0 cpu-03 0 502000 0/56/0/56 idle (null) gpu:0 cpu-04 0 502000 0/56/0/56 idle (null) gpu:0 curl localhost:8080/metrics | grep gpu # HELP slurm_node_gpu_alloc Allocated GPUs per node # TYPE slurm_node_gpu_alloc gauge slurm_node_gpu_alloc{node="gpu-01",status="mixed"} 4 slurm_node_gpu_alloc{node="gpu-02",status="mixed"} 4 slurm_node_gpu_alloc{node="gpu-03",status="mixed"} 4 slurm_node_gpu_alloc{node="gpu-04",status="mixed"} 3 slurm_node_gpu_alloc{node="gpu-05",status="idle"} 0 slurm_node_gpu_alloc{node="gpu-06",status="mixed"} 1 slurm_node_gpu_alloc{node="gpu-07",status="mixed"} 2 slurm_node_gpu_alloc{node="gpu-08",status="mixed"} 4 # HELP slurm_node_gpu_total Total GPUs per node # TYPE slurm_node_gpu_total gauge slurm_node_gpu_total{node="gpu-01",status="mixed"} 4 slurm_node_gpu_total{node="gpu-02",status="mixed"} 4 slurm_node_gpu_total{node="gpu-03",status="mixed"} 4 slurm_node_gpu_total{node="gpu-04",status="mixed"} 4 slurm_node_gpu_total{node="gpu-05",status="idle"} 4 slurm_node_gpu_total{node="gpu-06",status="mixed"} 4 slurm_node_gpu_total{node="gpu-07",status="mixed"} 4 slurm_node_gpu_total{node="gpu-08",status="mixed"} 4
sinfo -h -N -O "NodeList: ,AllocMem: ,Memory: ,CPUsState: ,StateLong: ,Gres: ,GresUsed:" localhost 0 1 0/1/0/1 unknown* (null) (null) curl localhost:8080/metrics | grep gpu # empty
Let me know if I should change anything.
Cheers, Markus
Hi,
I reworked the PR https://github.com/vpenso/prometheus-slurm-exporter/pull/57 to be compatible with the recent version. I decided not to include the GPU type just to have a minimal working version, which we can then extend later.
Fixes #60
Tested on Slurm 20.11.9 with and without GRES.
Let me know if I should change anything.
Cheers, Markus