Closed gabrieleiannetti closed 3 years ago
Thanks, I only added the try/catch because I had 1 HCA with the error and I could not test the previous changes, and it wasn't reproducible once the counters was reset since I did not kept the output of ibqueryerrors.
Hi,
you ran into it before I could create an issue for this :-).
I see the problem with your implemented error catch in https://github.com/guilbaults/infiniband-exporter/pull/33 that just catching the error and printing it, will not help us much, because we need to recognize that there went something wrong.
We can handle it as you did in the above implementation and do not crash the exporter.
But then we need to show up that there happened something wrong in the exporter, for that I have set the
scrape_ok
metric to 0.So we can check if that metric was set to 0 on non critical errors during exporter scrapes.