sensu-plugins / sensu-plugins-sensu

This plugin provides monitoring and metrics for Sensu.
http://sensu-plugins.io
MIT License
12 stars 35 forks source link

Remediation not executing despite api log showing status 202 #50

Open drhey opened 6 years ago

drhey commented 6 years ago

This is in regards to this commit

I'm running into an issue with this current version of handler-sensu and have found that the brackets put in around "subscribers" (line 136) are causing me an issue (not sure if others are seeing this or not). Basically, the logs show that the API call is accepted/successful (202), but nothing happens on the afflicted host. I found in sensu-api.log that additional brackets were being added around the subscribers the call was using (e.g. "subscribers\":[[\"client:sensu-client-one\",\"sensu-client-one\"]],). When I took this same syntax and tried it with curl I received a 202 but still nothing happened on the afflicted host.

I then went and removed the brackets around "subscribers" in my local copy of the handler-sensu.rb script and set up an event to trigger and remediation worked.

This is all being done in a vagrant environment at the moment, running the latest sensu everything. My configs have remained relatively static, but I did attempt to change things once remediation stopped working (because I wasn't sure if my memory on the process was fuzzy or not).

majormoses commented 6 years ago

@drhey thanks for reporting this, definitely sounds like a regression. I am currently running 2.2.2 in my environment which is still working for me. Looks like those commits indeed likely broke it. We really need tests on this to prevent these kind of regressions in the future. For the time being I'd suggest rolling any production systems back to the version I am running while we work on getting this sorted out.

majormoses commented 6 years ago

Thanks for doing some of the initial triage, when I have some time I will try to dig a little deeper.

drhey commented 6 years ago

Thanks, Major Ben! And my pleasure! I'm not sure what the initial intent/desire was for the latest release of the plugin viz. code changes, but I'd be happy to try and spend some time pulling things apart and seeing if I can't maintain the spirit of the code while finding the pesky additional brackets.

FWIW, as part of my testing I went back and rebuild my vagrant setup with each version of the sensu-plugins-sensu plugin (starting at 1.0.0) and found that 2.2.2 is stable, too.

drhey commented 6 years ago

It seems like removing the brackets around subscribers (within the trigger_remediation function/method) is the way to go:

req.body = JSON.dump('check' => check, 'subscribers' => subscribers, 'creator' => 'sensu-plugins-sensu', 'reason' => 'Auto remediation triggered')

The code block that builds out subscribers further up in the script

subscribers = trigger_on ? @event['check']['trigger_on'] : ['client:' + client, client]

can't be modified (i.e. have the brackets removed from:['client:' + client, client]) , as it doesn't fix the actual issue with remediation. I tried doing that, and I still get a 202, but nothing on the afflicted host runs as I've defined it; and the additional brackets remained in the req.body API call (as confirmed by my additional logging defined below).

However, when I remove the brackets in the req.body stuff (within the trigger_remediation function/method) it functions correctly.

I added some additional logging to the trigger_remediation function/method just to see what's going on when remediation kicks off after an event, like so:

  def trigger_remediation(check, subscribers)
    output = File.open("/tmp/remediation.out", "w")
    api_request(:POST, '/request') do |req|
      req.body = JSON.dump('check' => check, 'subscribers' => subscribers, 'creator' => 'sensu-plugins-sensu', 'reason' => 'Auto remediation triggered')
      output.puts req.body
      output.puts subscribers
      output.puts subscribers.class
    end
  end

It's definitely not elegant, but it helped me gain a little insight and validate my changes were doing something. Here's the resulting output file before making any changes to the trigger_remediation function viz. brackets:

[root@sensu-server ~]# cat /tmp/remediation.out
{"check":"remediator_restart_sshd","subscribers":[["client:sensu-client-one","sensu-client-one"]],"creator":"sensu-plugins-sensu","reason":"Auto remediation triggered"}
client:sensu-client-one
sensu-client-one
Array

One can see the additional brackets in the API call (from req.body).

When I updated the trigger_remediation function to remove those brackets around subscribers it was rendered as I'd hoped (i.e. without additional brackets) within my output file. Little else changed (i.e. subscribers remained an array, the clients are still correct etc).

I'm definitely not a Ruby pro, nor do I know all of the innards of Sensu. This is just what I've observed in my testing. I don't know if changing those brackets would cause a ruckus somewhere else, but I HTH!

majormoses commented 6 years ago

@drhey awesome thanks for triaging this, can you submit a PR? I'd love to get this fixed, it's my favorite handler. If not I will create a PR and test it.

drhey commented 6 years ago

@majormoses Absolutely! As you can tell, I'm pretty attached to it, too! Glad to be of service; I'll get the PR submitted shortly.