treydock / gpfs_exporter

Apache License 2.0
38 stars 14 forks source link

Feature request: add interval for running the gpfs commands #50

Closed creativefre closed 2 years ago

creativefre commented 2 years ago

Hello,

While testing the exporter and letting it run for a few hours. I notice the following behavior. The mmlsfileset and the mmrepquota (and others) get executed about every minute.

We have a large GPFS system with a lot of data which causes those commands to take a bit longer to execute. The commands gets stuck because they are executed again each minute. But the last execution is not yet completed, which causes the command to wait on the other command. Eventually the exporter does not export new data. Setting the timeouts don't help in this situation. In this way the exporter is not usable for us.

Is it possible that you add an interval which I can configure. So the commands are executed for example every 10 minutes rather then every minute.

Thank you.

treydock commented 2 years ago

The only way to adjust this is to change the scrape interval in Prometheus. At my site we run GPFS scrapes every 3 minutes. Anything longer than 4-5 minutes will produce problems for Prometheus as the data will be marked as stale so any alerts may clear and you will get false negatives.

A few of the commands that are known to run long like mmdf have command line exporters that can write their metrics to node_exporter textfile directory so that you can run them with cron. Our GPFS system is rather large and we've never had issues with mmlsfileset or mmrepqouta taking a long time. Unfortunately for the exporter, it's not a good design to "cache" or otherwise not return fresh data when scraped by Prometheus.

So I'd suggest adjusting your scrape interval to 3 minutes and see if that solves the problem. Can also time the commands and verify those commands are really what is slow. Examples:

# time /usr/lpp/mmfs/bin/mmlsfileset ess -Y | wc -l
390

real    0m0.475s
user    0m0.382s
sys     0m0.058s

# time /usr/lpp/mmfs/bin/mmrepquota -j -Y -a | wc -l
394

real    0m0.902s
user    0m0.360s
sys     0m0.075s