mpellegrin / nagios-eventhandler-cachet

A Nagios event handler to push Nagios notifications to Cachet API
MIT License
50 stars 17 forks source link

Added updating of existing Incidents so new incidents are contanstly … #15

Open rjr162 opened 7 years ago

rjr162 commented 7 years ago

…created with each new warning/critical

Added a metric update option by using the event_handler in the following fashion:

    event_handler   cachet_notify!<host> -m=true

The cachet_notify script checks for the presence of '-m=true' in the string and then breaks out the first 'word' prior to the first space as the component name

rjr162 commented 7 years ago

It's been a while since the last merge and feedback, so I think I have all the original issues resolved and added in a metric component for anyone using metrics on the page load times. Downside is I don't think there's a way to tell Cachet to not auto-update at the set interval where the lowest value is 1 which can create a bit of wonkiness with the chart (although that's an issue on the cachet auto-update side of things and the fact Nagios only fires the event handler when there's a status update change). It may be better to use the python cachet utility for URL checking if you want something that just runs and updates the page load time metric on a consistent basis.

to use you just add -m=true after the component name in the event handler. Give it a test if you wish and give some feedback. You may need to pre-create the metric in Cachet for it to work right. If it's too poor, no issues yanking that part out

2Belette commented 7 years ago

It has been a while I haven't gave you feedback as I had no time to test and the server needed to be re-installed, it is done now :) I have tested but I have an issue and I keep having multiple event created instead of having one event updated using -m=true

I also tried to test using :

./cachet_notify 'host.fr' 'dispo' CRITICAL HARD 'test service down' -m=true

I got

KO HARD: creating incident
Array
(
    [name] => nagios dispo
    [message] => test service down
    [status] => 1
    [visible] => 1
    [component_id] => 5
    [component_status] => 4
    [notify] => 1
)

But if I do a

./cachet_notify 'host.fr' 'dispo' OK HARD 'test service down' -m=true

I got:

OK Hard: creating incident
Array
(
    [name] => nagios dispo
    [message] => test service down
    [status] => 4
    [visible] => 1
    [component_id] => 5
    [component_status] => 1
    [notify] => 1
)
OK HARD: updating incident
Can't find incident "nagios dispo"

And on Cachet I still got two incident created: one for the CRITICAL, one for the OK when it goes back to normal.

For Nagios alert I got the same issue, or sometimes it does't update Cachet at all...

Any idea? many thanks

EDIT: I am still trying to understand the issue, in the meantime I confirm to you that the past pull request you made to solve the issue of going back to Normal status after CRITICAL or WARNING seems to work well :)

Another thing is at the begging of cachet_notify you make a test against the number of parameters, I think this has to be extended to 7 as -m=true is adding one more, I needed to change it to make it work

rjr162 commented 7 years ago

Hey!

I'll have to dig in and take a look. It's been so long since I had a chance to touch the code, I can't remember what I did lol.

I did think about cutting out the metrics code and resubmitting to keep things cleaner, and then maybe have another with the metrics code, although the way cachet defaults to a 1 or 0 for metrics at a set interval sort of screws up the flow/view... So the metrics part may not be worth it in the end.

I'll let you know when I get a chance to play with it again as I also have to finish up our info VM (haven't touched that really since the last update either)

Thanks!

-- Ron

On Apr 6, 2017 4:27 AM, "2Belette" notifications@github.com wrote:

It has been a while I haven't gave you feedback as I had no time to test and the server needed to be re-installed, it is done now :) I have tested but I have an issue and I keep having multiple event created instead of having one event updated using -m=true

I also tried to test using :

./cachet_notify 'host.fr' 'dispo' CRITICAL HARD 'test service down' -m=true

I got

KO HARD: creating incident Array ( [name] => nagios dispo [message] => test service down [status] => 1 [visible] => 1 [component_id] => 5 [component_status] => 4 [notify] => 1 )

But if I do a

./cachet_notify 'host.fr' 'dispo' OK HARD 'test service down' -m=true

I got:

OK Hard: creating incident Array ( [name] => nagios dispo [message] => test service down [status] => 4 [visible] => 1 [component_id] => 5 [component_status] => 1 [notify] => 1 ) OK HARD: updating incident Can't find incident "nagios dispo"

And on Cachet I still got two incident created: one for the CRITICAL, one for the OK when it goes back to normal.

For Nagios alert I got the same issue, or sometimes it does't update Cachet at all...

Any idea? many thanks

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mpellegrin/nagios-eventhandler-cachet/pull/15#issuecomment-292104570, or mute the thread https://github.com/notifications/unsubscribe-auth/ADVE4EyZGt_LEK3EXDjCRrorgtLxdBmsks5rtKHhgaJpZM4LWCaf .

2Belette commented 7 years ago

Thanks for your reply ;)

Another thing I am thinking about is that it would be usefull to select which alerts we want to receive from Nagios. For example I have "hacked" your script to exit(0) for Warning Soft, as if I don't do that Cachet is receiving too much false positive from Nagios on my installation.

Would be great to add a parameter to say -warning or -critical where -warning includes both and -critical only critical alerts.

Just an idea

PS: I confirm the metrics are messed-up and Cachet keeps creating multiple event and doesn't update the same

Many thanks :)