First of all, thank you for your great work on this check_nwc_health plugin!
I was wondering whether you could find some time to let me know your thoughts on the following:
check_nwc_health release 7.12.1.2
Centos 7 host (3.10.0-957.10.1.el7.x86_64)
I might be missing something, but in mode "cpu-load", for Linux hosts, if I try and modify the thresholds for load averages, I do not get the expected behaviour:
1) if I set the load averages thresholds to 0, it doesn't disable alerting for those load averages:
CRITICAL - load-1 is 21.67 (1 min Load Average too high (= 21.67)), load-5 is 24.43 (5 min Load Average too high (= 24.43)), load-15 is 23.53 (15 min Load Average too high (= 23.53)), cpu (total): 71.20%, user: 61.25%, system: 9.85%, nice: 0.00%, wait: 0.10%, kernel: 0.00%, interrupt: 0.00% | 'cpu_usage'=71.20%;95;99;0;100 'user_usage'=61.25%;95;99;0;100 'system_usage'=9.85%;95;99;0;100 'nice_usage'=0%;95;99;0;100 'wait_usage'=0.10%;95;99;0;100 'kernel_usage'=0%;95;99;0;100 'interrupt_usage'=0%;95;99;0;100 'load-1'=21.67;0;0;; 'load-5'=24.43;0;0;; 'load-15'=23.53;0;0;;
2) if I set the load averages thresholds to a value greater than 12 (around 40 for example?), any load average value greater than 12 will still trigger an alert, even if lower than the configured thresholds of 40:
CRITICAL - load-1 is 27.41 (1 min Load Average too high (= 27.41)), load-5 is 24.75 (5 min Load Average too high (= 24.75)), load-15 is 23.89 (15 min Load Average too high (= 23.89)), cpu (total): 71.53%, user: 61.87%, system: 9.60%, nice: 0.00%, wait: 0.05%, kernel: 0.00%, interrupt: 0.00% | 'cpu_usage'=71.53%;95;99;0;100 'user_usage'=61.87%;95;99;0;100 'system_usage'=9.60%;95;99;0;100 'nice_usage'=0%;95;99;0;100 'wait_usage'=0.05%;95;99;0;100 'kernel_usage'=0%;95;99;0;100 'interrupt_usage'=0%;95;99;0;100 'load-1'=27.41;40;40;; 'load-5'=24.75;40;40;; 'load-15'=23.89;40;40;;
=> It looks like there is a kind of hardcoded higher limit for the load average thresholds, which is 12 in my case
Note: this Linux host VM is running with 6 CPU cores, so maybe there is a correlation, like a hardcoded load average limit of twice the number of cores?
3) however, if I set the load averages thresholds to a value greater than 0 but lower than 12 (the "hardcoded limit" I mentioned in point 2), the configured thresholds are enforced normally (i.e a load average value lower than 12 but greater than the configured threshold will trigger an alert correctly)
Hi Gerhard,
First of all, thank you for your great work on this check_nwc_health plugin!
I was wondering whether you could find some time to let me know your thoughts on the following:
I might be missing something, but in mode "cpu-load", for Linux hosts, if I try and modify the thresholds for load averages, I do not get the expected behaviour:
1) if I set the load averages thresholds to 0, it doesn't disable alerting for those load averages:
2) if I set the load averages thresholds to a value greater than 12 (around 40 for example?), any load average value greater than 12 will still trigger an alert, even if lower than the configured thresholds of 40:
=> It looks like there is a kind of hardcoded higher limit for the load average thresholds, which is 12 in my case Note: this Linux host VM is running with 6 CPU cores, so maybe there is a correlation, like a hardcoded load average limit of twice the number of cores?
3) however, if I set the load averages thresholds to a value greater than 0 but lower than 12 (the "hardcoded limit" I mentioned in point 2), the configured thresholds are enforced normally (i.e a load average value lower than 12 but greater than the configured threshold will trigger an alert correctly)
Thank you in advance for your feedback on this.
Regards, Olivier