jenningsloy318 / redfish_exporter

exporter to get metrics from redfish based hardware such as lenovo/dell/superc servers
Apache License 2.0
70 stars 61 forks source link

Redfish_exporter container crashes after some time it scrapes data #62

Open Hydrapozza opened 1 year ago

Hydrapozza commented 1 year ago

Hello there !

I have an issue using the exporter. Sometimes, for an unknown reason, the exporter stop scraping data. Here is a little screenshot showing _scrape_duration_seconds{job="redfish"} : image

All the things I know are, sometimes, the container is still running but using a very low CPU consumption and prometheus prompt a "server misbehaving" error for each redfish targets because the scrape duration as been exceeded. When I check the container logs, everything seems to be alright, the process just stop to scrape at sometime without returning any error.

Or, the container doesn't answer anymore and it can't be restarted, stopped or killed. So I must restart the docker systemctl to delete it and I can't check the container logs.

In all cases, the exporter doesn't scrape anymore...

Do somebody have an idea of what is wrong and how to fix it?

jenningsloy318 commented 1 year ago

if the redfish exporter process is sitll runing when the occured ? did you try curl to get the redfish-exporter endpoint?

Hydrapozza commented 1 year ago

I've tryed to curl or to exec but the container doesn't respond. Only the ping is working.

I think the container crashes because there is a lot of equipment to scrape. But I've noticed that by splitting the workload over several redfish exporters, they do not crash anymore.