I have a problem where the connection times out on multiple remote servers I monitor, which isn't a problem in itself. The problem is that there are often remote servers with failed hardware components that take a few days for the replacement hardware to arrive. In that time I "Acknowledge" the problem in Icinga2. But when a connection times out, Icinga marks it as "Warning" which resets the acknowledgement. Then when it successfully connects again it re-detects the same hardware failure. This happens multiple times a day, and I have to "Acknowledge" the error over and over.
The only way I can think of to deal with this is to have check_ipmi_sensor report "OK" when the connection times out. Is this possible?
Is there a better way to deal with this problem that I'm not thinking of?
I have a problem where the connection times out on multiple remote servers I monitor, which isn't a problem in itself. The problem is that there are often remote servers with failed hardware components that take a few days for the replacement hardware to arrive. In that time I "Acknowledge" the problem in Icinga2. But when a connection times out, Icinga marks it as "Warning" which resets the acknowledgement. Then when it successfully connects again it re-detects the same hardware failure. This happens multiple times a day, and I have to "Acknowledge" the error over and over.
The only way I can think of to deal with this is to have check_ipmi_sensor report "OK" when the connection times out. Is this possible?
Is there a better way to deal with this problem that I'm not thinking of?
Thanks for any help you can give