Closed fessyfoo closed 7 years ago
TODO need to take this into consideration.
@decklin @portertech any thoughts on this?
@bobtfish @solarkennedy I'd be interested in your opinions as well, as we're using your pagerduty handler and are trying to integrate acks from pagerduty into stashing in sensu which currently would result in pagerduty incidents not getting automatically resolved because the resolve event is filtered.
Yea, we've talked about this at Yelp a bit, and somehow we have gotten away without acks all this time?
I wouldn't "mind" an ack primitive, but I would want it to be on the same playing field as native silences and not something that handlers have to "opt into".
I do agree that if there is to be an ack primitive it needs to be handled differently than a normal silence.
Hi @fessyfoo,
Thanks for opening this PR, I can tell significant effort went into these changes and really appreciate the effort you put in to write tests and thoroughly describe your use case.
[...] which currently would result in pagerduty incidents not getting automatically resolved because the resolve event is filtered
I think the filter_repeated
method is biting you here. The existing filter methods are being deprecated; I've blogged about this here https://sensuapp.org/blog/2016/07/07/sensu-plugin-filter-deprecation.html and have opened #136 to implement the deprecation.
Because silencing is moving into Sensu Core as a primitive, and because I believe it is in the best interests of the Sensu project to remove filtering from this library, I think adding a new feature of the existing filter_silenced
method is probably the wrong thing to do right now.
Thank you very much for your feedback, @cwjohnston I solidly understand the desire to move filtering out of sensu-handler. I'm all for that effort.
you are right filter_repeated
looks like it would also filter out the resolution event, in some conditions. IMO It shouldn't filter resolves. #142
That's not what i'm getting bitten by though. Silencing filters out the resolve event as well
In the case of silencing, I think there are two valid use cases:
The silencing implemented in sensu-handler.rb
and presumably to be accommodated in https://github.com/sensu/sensu/issues/1328 filters out the resolution event. and that's the key difference between what i'm calling an acknowledgment vs downtime. In a pagerduty example, if the resolution event is silenced by a stash then you can't automatically issue a resolve to pagerduty. when the on-call intended the silence as aknowledgement not getting the resolution event is frustrating.
I think both acknowledgment and downtime forms of silence should be first class citizens. in both the old and new forms of silencing.
I've commented in https://github.com/sensu/sensu/issues/1328#issuecomment-228484884 which brings silencing into the core and thankfully so have you. I very much hope that the idea of acknowledgment can be accommodated there.
I believe it is in the best interests of the Sensu project to remove filtering from this library, I think adding a new feature of the existing filter_silenced method is probably the wrong thing to do right now
I understand that view. I agree with it.
I do feel that filter_silenced
is serving as the reference implementation for things that the built in silencing will accommodate. So depending on the timeline for the release of built in silencing I would like to make one more plea for merging this. :)
updated to use expire_on_resolve: true
matching built in silencing's proposed implementation [ yes I know this is likely not to get merged. :D ]
noting for clarity that the support for this feature as expire_on_resolve
is included in the proposal https://github.com/sensu/sensu/issues/1328 and pending implementation https://github.com/sensu/sensu/pull/1400 of built in silencing.
I'm closing this issue as the silencing implementation with expire_on_resolve
shipped in 0.26!
short version: if a stash has
type: "acknowledgment"expire_on_resolve: true
then when the event is resolved delete the stash and handle the event.I propose people need two workflows with regards to silencing alerts.
right now sensu-handler only supports the notion of downtime with it's silence stashes. this has a few problems related to an acknowledgment workflow
This commit introduces acknowledgments by way of the type field in the normal /silence/* stashes if the type field is set to 'acknowledgment' and the event action is resolve then the stash will be deleted and the event processed.
a note on testing:
I wasn't sure how to test this. I did my best to imitate the current style of tests which is essentially stubbing and mocking via a subclass in an external script and signaling back expectations via stdout. mine still ended up more complicated than any existing ones.
I also tried to just use minitest and new() up a handler object making use of Sensu::Handler.disable_autorun but I got lost in the world of minitest mocks. if someone thinks that's better let me know I can try again.
An alternate abandoned approach:
I initially considered having acknowledgments as their own namespace in the stashes, such as /acknowledgment/:client/:name, however I decided that this would not have backwards compatibility with existing systems checking for silenced alerts and would add to the number of api calls made per handled event.