rivosinc / prometheus-slurm-exporter

Export select slurm metrics to prometheus
Apache License 2.0
26 stars 5 forks source link

[bug] handle N/A free memory node value #15

Closed abhinavDhulipala closed 9 months ago

abhinavDhulipala commented 9 months ago

Handle N/A's gracefully or just drop poorly formatted metrics

resolves #17

abhinavDhulipala commented 9 months ago

In response to the following error logs on prod:

"Failed to parse node metrics: sinfo failed to parse line 0: {\"s\": \"completing\", \"mem\": 770000, \"n\": \"cs156\", \"l\": N/A, \"p\": \"hw\", \"fmem\": N/A, \"cstate\": \"56/8/0/64\", \"w\": 1}"