sensu / sensu-chef

Sensu Chef cookbook.
https://supermarket.chef.io/cookbooks/sensu
Apache License 2.0
221 stars 279 forks source link

If the chef converge takes time fake alerts fired #333

Closed akerekes closed 8 years ago

akerekes commented 9 years ago

After sensu client service is installed and started via the sensu::client_service recipe, but the converge takes a long time, fake alerts can be fired because not all monitored services will be up by the time sensu starts to check them. Is it possible to use delayed start of the service?

There is a workaround for this, by using a dummy subscription that does not match any actual subscriptions, then start the service and configuring the actual subscriptions which will trigger the restart of the sensu service at the end of the converge, but it is a workaround, not an elegant solution.

sge-babrams commented 9 years ago

@akerekes if I understand you correctly (which I could very well be misunderstanding you), that on spin up of a new node you enable the sensu client but then receive alerts because of the services you want it to monitor are not up yet? This kind of seems like a chicken or egg scenario. Keep in mind that this is only meant to provide you the LWRPs to allow you to define your own logic on your workflow would be done in your sensu wrapper. The same way they don't tell you to define checks via a data bag or attribute. If its that much of a concern then maybe don't enable (start) the client immediately (put it later in the run list) or make use of an action :notifies,:delay,:nothing . Another idea is to only start the client if certain conditions are met maybe something like an only_if,not_if some file exists or a port is open etc. Its hard to say more without knowing how you are using this cookbook. I can look into this more when I have some time but this has not been an issue for me as I want to always know when a server/container I have designated for a purpose is not doing that purpose even on spin up and I try to keep our converge times to a minimum.

cwjohnston commented 8 years ago

Per the readme:

This cookbook provides the building blocks for creating a monitoring cookbook specific to your environment (wrapper). 

As mentioned in the prior comment, the issue you're reporting seems to be a matter of timing (e.g. run list ordering) which can be resolved either by changes in your wrapper cookbook or the run list order in which your wrapper cookbook is included.