sensu-plugins / sensu-plugin

A framework for writing Sensu plugins & handlers with Ruby.
http://sensuapp.org
MIT License
126 stars 117 forks source link

Deprecate event filtering in Sensu::Handler #136

Closed cwjohnston closed 8 years ago

cwjohnston commented 8 years ago

Description

The filtering methods in this library are computationally expensive and confusing to users. See http://bit.ly/sensu-plugin for more detail.

This change adds logic which honors the check.enable_deprecated_filtering and check.enable_deprecated_occurrence_filtering event attributes:

When check results are to be processed by the filter_repeated method (enabled by default), a warning will be logged indicating that sensu-plugin event occurrence filtering is deprecated.

The filter_disabled, filter_dependencies and filter_silenced methods will continue to be applied to events.

Motivation and Context

As documented in #134 and my blog post at http://bit.ly/sensu-plugin, deprecating and eventually removing the filtering behavior of sensu-plugin will transfer more control to operators instead of trying to cover all cases via this library.

How Has This Been Tested?

Added tests to validate that:

Edit: I've updated the description of this PR to reflect that we've narrowed the focus of this change to the event occurrence filtering in the filter_repeated method.

Edit: I've updated the description of this PR again to reflect the more granular approach we're now proposing for deprecating these filter behaviors. See https://github.com/sensu-plugins/sensu-plugin/commit/7d089fc762ffa198a997b02f74bf2d5b851cfa51

cwjohnston commented 8 years ago

cc @sensu-plugins/commit-bit for review

CVTJNII commented 8 years ago

:+1: for making it optional and having the default match current behavior. This option will make the situation I"m hitting in https://github.com/sensu/sensu/issues/1341 much easier to deal with.

cwjohnston commented 8 years ago

Marking as WIP again as we think we'll be keeping the dependency filtering here.

mattyjones commented 8 years ago

@cwjohnston Any reason why it is now being left in the plugin vs brought into Sensu core?

cwjohnston commented 8 years ago

@mattyjones in our internal discussions @portertech has said he's 👎 on moving filter_dependencies logic into Sensu Core; the implementation in this library is not a pattern we want to encourage by adopting in Core.

Regarding filter_silenced, we're planning to implement first-class support for silencing. That effort will include an extension in Sensu Core to filter silenced events. Even with this in place, the filter_silenced method will probably need to stay around in order to give folks time to update any custom tooling for the new API.

mattyjones commented 8 years ago

Just so I am clear on the roadmap for my stuff, what concerning filters is actually going to be in core. Specifically I am looking at intervals and occurrences.

Thanks!

On Wed, Jul 13, 2016 at 7:18 PM, Cameron Johnston notifications@github.com wrote:

@mattyjones https://github.com/mattyjones in our internal discussions @portertech https://github.com/portertech has said he's 👎 on moving filter_dependencies logic into Sensu Core; the implementation in this library is not a pattern we want to encourage by adopting in Core.

Regarding filter_silenced, we're planning to implement first-class support for silencing https://github.com/sensu/sensu/issues/1328. That effort will include an extension in Sensu Core to filter silenced events. Even with this in place, the filter_silenced method will probably need to stay around in order to give folks time to update any custom tooling for the new API.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sensu-plugins/sensu-plugin/pull/136#issuecomment-232514516, or mute the thread https://github.com/notifications/unsubscribe/AApErLjCHfM6XEJAHWszYwIWzgac8RSCks5qVXJBgaJpZM4I0ied .

Matt Jones @DevopsMatt Senior Infrastructure Engineer - Yieldbot Inc. http://yieldbot.com/ Core Contributor - Sensu Plugins http://sensu-plugins.github.io/ Co-Organizer - Boston Infrastructure Coders http://www.meetup.com/Boston-Infrastructure-Coders/ Organizer - Metrowest Golang Meetup http://www.meetup.com/Metrowest-Golang-Meetup/ https://linkedin.com/in/mattyjones

cwjohnston commented 8 years ago

@mattyjones I've changed the title of this PR and it's description to clarify that the change is limited to deprecating the filter_repeated method, which implements event occurrence filtering. I believe this is the only filtering method from sensu-plugin that we're currently planning to re-implement in Sensu Core.

For the implementation of the occurrence filtering logic in Sensu Core you can watch https://github.com/sensu/sensu-extensions/issues/10

CVTJNII commented 8 years ago

I'm :-1: on this scope change. The problem I'm seeing is that filtering in the handler is very expensive, not because the filtering itself is expensive but because spinning up the Ruby process is. So keeping filtering at the handler level still leaves the problems I documented in sensu/sensu#1341.

I would like to see all filtering removed from the handlers. In my opinion filtering should be handled in a long lived process, not something launched on each event, so that the handlers aren't load bombs waiting to go off when a major outage occurs. Whether this is core or not doesn't matter too much.

My current solution is to make a long lived process which does the filtering, and then handles launching the handlers only when the event is unfiltered. This currently results in two different processes evaluating if the event should be filtered, which I consider risky. I was excited by the original scope of this as it would allow me to disable the duplicated filter code in the handlers, ensuring the handler will always alert on whatever it's passed and not second-guess my main filter process. I would strongly like to see the ability to disable all filtering maintained.

mattyjones commented 8 years ago

👍

mattyjones commented 8 years ago

My issue as well is that all my handlers are written in golang/lua/python and I had to port all that over to each language and it would be cleaner if it lived in core as from a purely arch standpoint it is core functionality necessary for handler operation. By keeping it in the plugin, among other reasons, you are forcing Ruby and I personally detest carrying around a Ruby runtime or being forced to use a language/tool just because. If this all goes to core people can truly bring what they want to the handler party like you can, to a much easier extent, with the checks.

CVTJNII commented 8 years ago

So, in reading the comments I'd like to officially propose my middleware solution. Basically:

The benefits of this are:

This is the solution I'm currently testing, and I believe it has some strong benefits that address all the concerns raised here thus far. This is really getting out of scope of this PR, so let me know how I should properly propose this. I also have currently implemented this as a simple 1:1 filter to handler intermediary which will demonstrate the concept, if there is interest I can bark up the appropriate trees in my organization to open source it.

mattyjones commented 8 years ago

👍 I think this is a fair trade-off and would be fine with this.

cwjohnston commented 8 years ago

Hi @CVTJNII,

I'm 👎 on this scope change. The problem I'm seeing is that filtering in the handler is very expensive, not because the filtering itself is expensive but because spinning up the Ruby process is. So keeping filtering at the handler level still leaves the problems I documented in sensu/sensu#1341.

I also think it would be good to eventually remove filtering from this library completely, but it will take time.

As I've tried to express in the blog post about this change, I am certainly concerned with the computational cost of filters in this library, but I think the way the filter_repeated method's occurrence filtering implementation affects people who are on-call is the most pressing concern for Sensu end-users: the folks who are on-call. On-call personnel may not receive resolve notifications due to this filtering method. I believe this is a bad user experience which may lead some people to mistrust Sensu as a monitoring and alerting system. I think this is detrimental to our community.

In order to enable folks to benefit from fixes and enhancements to the sensu-plugin library, e.g. the ability to disable this harmful filter_repeated logic, we still need to work through the process of updating the sensu-plugin dependency for a large number of Sensu Plugin gems which are have pinned sensu-plugin at = 1.2.0. This bit us a while ago when we tried to ship only sensu-plugin 1.3.0 in Sensu 0.25.

We're trying to adhere to semantic versioning guidelines. All these changes take time.

Since you asked about it in a previous post, here's a take on the Sensu filter equivalent of filter_repeated:

{
  "filters": {
    "filter_repeated": {
      "attributes": {
        "action": "eval: [:create, :resolve].include?(value)",
        "occurrences": "eval: value == :::check.occurrences|1::: || (value - :::check.occurrences|1:::) % :::check.refresh|1800:::.fdiv(:::check.interval|30:::).to_i == 0"
      }
    }
  }
}

I'm using this filter on my Sensu instance at home; I've found that it lowers pipe handler-related load quite a bit. You can configure a filter like this on your existing handlers regardless of any changes in this library. As I mentioned before we'll also be adding a filter extension with similar behavior to Sensu Core so that this filter definition doesn't become required cargo-culting.

I'd like to officially propose my middleware solution. Basically:

  • Sensu sends the event over TCP to a long lived process

The solution you describe should already possible using Sensu Enterprise's Event Stream integration.

CVTJNII commented 8 years ago

Hi, @cwjohnston:

Not disagreeing with anything you said, but I feel you've completely missed the point of my :-1:. I'm unhappy not because of my opinions on filter-repeated, but because I don't want any event filtering in the handers at all. Zero. None. In my testing any filtering in the handler is dangerous when other filtering is in use. Say I enable the Sensu filter you suggested, but it gets blocked by the handler filtering for some reason? Then no messages get through.

The original commit, 2e6f35ef4cee59dcc064972e09b52f20f6b678ac, turned off all filtering in the handler. I need this for reasons I've already discussed. I'd like to still see this feature toggle in the code.

So I'm :-1: on fb08c8c4d7d4108100f9053ed770ef9de40f9ff4 and the edited scope not because I have opinions on filter_repeated, but because I need what this PR originally did: disable all filtering at the handler level.

CVTJNII commented 8 years ago

Also, I totally get what you're saying about it taking time. This is also why I was :+1: for the original flag. It would let me turn off the filters in my environment, where I'm ready for it and the handler level filtering is to my detriment, but still leave the old behavior for everyone else. I'm a fan of slow moving, well considered changes as it makes it easier for me to maintain the install of the project.

So can we have both flags? Can we put back the flag that turns off all the filters and have the flag that turns off filter_repeated? This doesn't seem like an either/or scenario to me, we just need appropriate granularity in the filter disable bits.

CVTJNII commented 8 years ago

:+1: for 7d089fc, thanks!

cwjohnston commented 8 years ago

@CVTJNII thanks for your helpful suggestion 👍

@sensu-plugins/commit-bit I think this is ready to be merged for 1.4.0

mattyjones commented 8 years ago

@cwjohnston Do you want to drop a note with a link to this to the users group? There is a lot of talk and decisions here, the more eyes the better as this will affect everybody at some point.

cwjohnston commented 8 years ago

@mattyjones good idea. I've posted a reply to my initial sensu-users message here.

portertech commented 8 years ago

These changes look great and move us forward on the path to remove certain filtering logic from sensu-plugin :+1: I am going to merge, as the only change to the user experience are deprecation warnings, and Core 0.26 will provide the deprecated functionality, e.g. https://github.com/sensu/sensu-extensions/pull/11

rkjindal91 commented 7 years ago

I have configured at check definition. Two of my servers are subscribed to it. While testing; I decrease the critical thresholds. As a result, one of my servers, SERVER[ALPHA] is sending the first mail after 2 occurrences of criticality and since my refresh = 600 it resends me mail each 10mins.

The other server, SERVER[BETA] however, is not sending alert emails. Nor does it resend them after refresh period is over. Although I am getting the first alert through call (For critical events we have set two handlers----mailer--sends-emails----caller--notifies-by-call----one is working I suppose).

Server log is generating "not enough occurrences" for [BETA] for mailer.rb. How do I overcome this?