sensu-plugins / sensu-plugin

A framework for writing Sensu plugins & handlers with Ruby.
http://sensuapp.org
MIT License
126 stars 117 forks source link

DISCUSS: Event filtering in Sensu::Handler should be deprecated and then removed #134

Closed cwjohnston closed 7 years ago

cwjohnston commented 8 years ago

The problem

The Sensu::Handler class currently implements a number of methods for filtering events, e.g. filter_disabled, filter_repeated, filter_silenced, and filter_dependencies. Each of these filters is evaluated every time an event is handled by a Sensu::Handler plugin.

This Sensu::Handler class has been widely adopted by the plugins in the community Sensu Plugins collection. The original intention was for organizations to maintain their own plugins and override the filter method for their own purposes, but thats not what has manifest in our ecosystem.

Because of how these filter methods are implemented inside Sensu::Handler, and owing to the popularity of the community plugins in our ecosystem, it has become very difficult for operators to distinguish between bugs in Sensu::Handler's filtering methods and strange behavior in Sensu Core itself. For example, even rather commonly used check attributes like refresh aren't implemented directly in Sensu Core, but are used only in Sensu::Helper filter methods.

In particular, the filter_repeated method uses logic to compare occurrences, refresh and status in order to determine if the event should be disregarded as a repeat. It's apparently prone to filtering events which should have been sent , often to indicate the outage was resolved. There may be some room for improvements to the existing filtering made possible by making use of last_ok and last_state_change event attributes, but we lack tests to describe the current behavior and can't change much without breaking many plugins.

The proposed solution

As stewards of Sensu, we'd like to see filters to become the primary way for operators to separate signal from noise, instead of relying on this library to cover all cases. Although we think operators can use Sensu filters in a similar capacity now, events that pass all filter evaluation may still be erroneously dropped by handlers that use the filter_repeated event occurrence filtering behavior in this library, and we cannot change this without breaking a few eggs.

To that end we're proposing the following steps:

Edit: I've updated the description of this PR to indicate that the scope of my proposed change has narrowed to deprecating the event occurrence logic of the filter_repeated method.

Edit: I've changed this description again to describe the more granular implementation now being proposed in #136.

CVTJNII commented 8 years ago

I also agree in moving this onto the core. With the current behavior the handler is called on every event as a separate process, which is expensive. This could be a landmine for users, as with checks set to 'standard' OK events aren't handled, but if something major happens (like a core router dropping) suddenly the majority of an environment could go into alert, effectively fork-bombing the Sensu server with handlers. This may be a case many users have not tested for.

Sensu core supports filters now, but with the filter logic in the handlers I consider using both dangerous as it would be easy to create a filter setup that blocks all events, so no events are ever handled. It would be very nice, at the minimum, to have an option in the base handler class to switch the default filter method to a no-op. This would allow the default behavior to stay the same but let Sensu filter users shut off the handler filters as needed. Perhaps this flag should be a subtask of this larger story?

cwjohnston commented 8 years ago

@CVTJNII I opened #136 with a proposed implementation of deprecating the filtering behavior, including a per-check configuration to make filtering a no-op as you mentioned.

CVTJNII commented 8 years ago

Cool. One thing I would like to see is either have the current filter functionality moved into Sensu server or have a filter config block that reproduces this functionality added to the Sensu docs. In reading the current filter docs I have to admit I'm not sure how to replicate all the current handler filter functionality, so having an example of how to do so would be super useful and I believe would de-duplicate effort otherwise required by users. I'd like to see this be an explicit step to ensure it doesn't get forgotten and to make the transition easier for users.

cwjohnston commented 8 years ago

@CVTJNII I agree regarding documentation, and I am working on enhancements to provide example filters which reproduce much of the functionality in sensu-plugin that I have proposed to remove.

There are aspects of filtering behavior in sensu-plugin which cannot be readily reproduced in native Sensu filters. Specifically, we recognize that there's currently no way to do things like query the stashes API for silenced checks/clients. Toward that end we've started to discuss silencing as a first class concept in https://github.com/sensu/sensu/issues/1328. We believe implementing this proposal will address that particular concern. Please have a look and let us know what you think?

cwjohnston commented 8 years ago

I've published a post about this change on the Sensu blog: https://sensuapp.org/blog/2016/07/07/sensu-plugin-filter-deprecation.html

zbintliff commented 8 years ago

@cwjohnston do you see this ultimately ending with the removal of #api_request function? We have handlers that will work with the event filtering but mostly our deregistration handler uses the #api_request function to remove autoscaled instances.

Also, happy this change is happening. When we do maintenance and have to stash a check over a few hundred hosts we need to scale sensu server because it handles all of the events just to query stash and die.

cwjohnston commented 8 years ago

@zbintliff I don't think we'll be removing #api_request in the foreseeable future, but we can probably deprecate checking silenced stashes in this library before too long.

We're planning to move silencing into Sensu Core as a dedicated API endpoint and providing an filter extension to match (see https://github.com/sensu/sensu/issues/1328).

I think there's still a case to keep the #api_request method in this library for other operations which can't be neatly implemented as native filters, e.g. #filter_dependencies.

cwjohnston commented 8 years ago

I've updated the title and description of this PR to indicate that the scope of my proposed change has narrowed to deprecating the event occurrence logic of the filter_repeated method.

fessyfoo commented 8 years ago

In particular, the filter_repeated method uses logic to compare occurrences, refresh and status in order to determine if the event should be disregarded as a repeat. It's apparently prone to filtering events which should have been sent , often to indicate the outage was resolved.

I'm curious why the proposed built in occurrences filter or even regular sensu filters would be in any better position to solve those issues.

IMO an occurrences filter needs to never filter resolve events except in the one case where all other events were filtered. if I handled a create I should handle a resolve. If I haven't handled a create I shouldn't handle a resolve. Knowledge of wether all other events were filtered is not computable from information in the stored event AFAICT. I think we might need to store if a filter was successfully applied in the stored event.

portertech commented 8 years ago

I have updated the built-in silencing proposal description to more accurately represent the required/desired functionality https://github.com/sensu/sensu/issues/1328

cwjohnston commented 8 years ago

Version 1.4.0 with deprecations is now available on rubygems.org.

cwjohnston commented 8 years ago

Hi @fessyfoo,

I'm curious why the proposed built in occurrences filter or even regular sensu filters would be in any better position to solve those issues.

You're right, it is not going to solve those issues immediately. In this case we're choosing not to change the existing implementation with the knowledge that it is flawed, but with the option to iterate on these things in the future with a more decoupled model.

As of this writing we've merged changes in sensu-extensions to enable loading extensions from gems, and we've moved the occurences filter into its own gem and we've opened a ticket for tracking the need for improvement in handling resolve events.

Internally we're tracking tasks for publishing other filter extensions which will demonstrate alternate approaches to event filtering, e.g. using last_ok & last_state_change.

IMO an occurrences filter needs to never filter resolve events except in the one case where all other events were filtered.

We agree here, but I believe that making a determination as to whether all other events were filtered will require access to the API. AFAIK there's actually nothing in the Sensu Core codebase that provides a canonical method for generating the API URI. Adding some reasonable way to access the API from extensions is probably a prerequisite for being able to make meaningful changes to this particular filter and unlocking other possibilities as well. See #122 for some related discussion.

cwjohnston commented 7 years ago

I've opened #170 to disable deprecated filters by default as part of the 2.0 release.

cwjohnston commented 7 years ago

We've released Sensu 0.29 today, which includes sensu-plugin versions 1.4.5 and 2.0.0. The 2.0.0 release disables event filtering by default, but note that most sensu-plugins projects will need a new release to utilize this new major version. :tada:

julian7 commented 7 years ago

There is an issue with keepalive handling after migration to filters: every client has to opt out of the old event and occurrence filters, in their keepalive config.

This is not under control of the server in any way.

This kind of filtering exists on the handler side, why don't we have this tunable at the handler? Eg. instead of client.enable_deprecated_occurrence_filtering, we could have handler.enable_deprecated_occurrence_filtering.

cwjohnston commented 7 years ago

@julian7 sensu-plugin 1.4.5, shipped in Sensu 0.29, adds support for globally disabling the deprecated filtering. See https://github.com/sensu-plugins/sensu-plugin/tree/v1.4.5#important for example configuration. Does this address your concern?

cwjohnston commented 7 years ago

The remaining to-do described here is to completely remove filtering methods in a future major version release. I think this issue has served its purpose for now. Closing.

julian7 commented 7 years ago

I just upgraded from 0.26.5 to 1.0.2, which was tricky for a completely different reason. Filtering works very smoothly, even legacy keepalive alerts are nicely filtered. It took a while, but it is totally worth it.