rivosinc / prometheus-slurm-exporter

Export select slurm metrics to prometheus
Apache License 2.0
26 stars 5 forks source link

[wrapper] sum children resource #37

Closed abhinavDhulipala closed 7 months ago

abhinavDhulipala commented 7 months ago

Sum children resources and fix cpu_percent calcs

Test

Dry run with the following:

echo '#!/bin/bash' > tmp.sh && echo 'timeout 60 yes >/dev/null' >> tmp.sh
chmod +x tmp.sh
python wrappers/proctrac.py --jobid 26515966 --dry-run --endpoint http://endpoint:9092/trace --sample-rate 2 --cmd ./tmp.sh
{"pid": 41445, "cpus": 109.8, "threads": 2, "mem": 3932160, "read_bytes": 0, "write_bytes": 0, "job_id": 26515966, "username": "codespace", "hostname": "codespaces-0e6327"}