rivosinc / prometheus-slurm-exporter

Export select slurm metrics to prometheus
Apache License 2.0
26 stars 5 forks source link

[cli-fallback] squeue 22.04 NAbleTime Unmarshall error #26

Closed jmcastelo closed 8 months ago

jmcastelo commented 8 months ago

When executing the exporter as: ./prometheus-slurm-exporter -slurm.cli-fallback

some error messages appear: squeue --states=all -h -o {\"a\": \"%a\", \"id\": %A, \"end_time\": \"%e\", \"u\": \"%u\", \"state\": \"%T\", \"p\": \"%P\", \"cpu\": %C, \"mem\": \"%m\"}

squeue fallback parse error: failed on line 0 {\"a\": \"(null)\", \"id\": 18804, \"end_time\": \"NONE\", \"u\": \"eskua\", \"state\": \"RUNNING\", \"p\": \"magma\", \"cpu\": 24, \"mem\": \"118G\"}

job failed to parse with \"parsing time \\\"NONE\\\" as \\\"2006-01-02T15:04:05\\\": cannot parse \\\"NONE\\\" as \\\"2006\\\"\"

Perhaps you need to adjust your parser to take into account this situation? Maybe we can debug together? Thanks!


I forgot to mention that I am using release v1.0.0 and built the exporter via go build command, SLURM version 22.05.10.1-1 under Arch Linux.

abhinavDhulipala commented 8 months ago

Howdy! Thanks so much for putting this issue down! Frustratingly Slurm seems to change up its nullable value all the time. For Slurm 23.x N/A are seen as nullable, it seems like SLURM 22.X uses the string NONE as its null value. We can fix this by adding that value to the none list in NAbleTime#UnmarshalJSON. Would you like to give this a crack?

jmcastelo commented 8 months ago

Hey, I am going to build that branch and check the results. I'll let you know if the issue is solved. Thanks!

jmcastelo commented 8 months ago

Alright, now running the exporter with debug log level shows no error messages. I guess the issue is solved with the changes on that branch. Thank you again!

abhinavDhulipala commented 8 months ago

Awesome! I'll merge it a cut a release