mpellegrin / nagios-eventhandler-cachet

A Nagios event handler to push Nagios notifications to Cachet API
MIT License
50 stars 17 forks source link

Incident not updated when status is going from partial outage -> operational #12

Closed 2Belette closed 1 month ago

2Belette commented 7 years ago

Hi, Tested and working when the status is critical and solved (operational) but if the status from nagios is partial outage my components stay indefinitely in partial outage status...

I think https://github.com/mkh1973/nagios-eventhandler-cachet/commit/d156505bdb8db7048a749959cd641150919cc12b is a great update for supporting new JSON API but there is still an issue.

Anyone is able to make cachet and nagios work together correctly? thanks

2Belette commented 7 years ago

thanks Michael I see you merged on master for typo good idea :) any idea regarding the not closed incident? have you been able to test or want me to do further tests? (still an issue for me where all my components stay in partial outage for ages)

rjr162 commented 7 years ago

Looking at the nagios logs, it appears the check_http (that's what I'm testing with) doesn't send a OK;HARD; when the status was "CRITICAL;SOFT;" which is what it changes the state to when only 1 or 2 checks in a row fail (so in the "soft" state). Cachet is updated to the Performance Issues, but when I re-enable HTTP on the test box and force a Nagios check, the status is "OK;SOFT;" which leaves it in the Performance Issue state. It doesn't seem like Nagios ever sends the "OK;HARD;" even after 3 successful checks.

My thought is so make OK;SOFT; act the same as OK;HARD; at least for HTTP, unless someone better with Nagios knows how to get the check_http to do an OK;HARD after 3 successful checks

2Belette commented 7 years ago

thanks for your suggestion and pull, I will try to test it tomorrow, I remember making some test on nagios trying to hack how nagios is sending status and I think I remember I went to the same conclusion or nagios never send the ok:hard or cachet never interpret it.. a curiousity do you have the same issue for simple availability test from nagios? (icmp)