Closed creativefre closed 2 years ago
The only way to adjust this is to change the scrape interval in Prometheus. At my site we run GPFS scrapes every 3 minutes. Anything longer than 4-5 minutes will produce problems for Prometheus as the data will be marked as stale so any alerts may clear and you will get false negatives.
A few of the commands that are known to run long like mmdf
have command line exporters that can write their metrics to node_exporter textfile directory so that you can run them with cron. Our GPFS system is rather large and we've never had issues with mmlsfileset or mmrepqouta taking a long time. Unfortunately for the exporter, it's not a good design to "cache" or otherwise not return fresh data when scraped by Prometheus.
So I'd suggest adjusting your scrape interval to 3 minutes and see if that solves the problem. Can also time the commands and verify those commands are really what is slow. Examples:
# time /usr/lpp/mmfs/bin/mmlsfileset ess -Y | wc -l
390
real 0m0.475s
user 0m0.382s
sys 0m0.058s
# time /usr/lpp/mmfs/bin/mmrepquota -j -Y -a | wc -l
394
real 0m0.902s
user 0m0.360s
sys 0m0.075s
Hello,
While testing the exporter and letting it run for a few hours. I notice the following behavior. The mmlsfileset and the mmrepquota (and others) get executed about every minute.
We have a large GPFS system with a lot of data which causes those commands to take a bit longer to execute. The commands gets stuck because they are executed again each minute. But the last execution is not yet completed, which causes the command to wait on the other command. Eventually the exporter does not export new data. Setting the timeouts don't help in this situation. In this way the exporter is not usable for us.
Is it possible that you add an interval which I can configure. So the commands are executed for example every 10 minutes rather then every minute.
Thank you.