sensu-plugins / sensu-plugins-process-checks

This plugin provides native process instrumentation for monitoring and metrics collection, including: process status, uptime, thread count, and others.
http://sensu-plugins.io
MIT License
20 stars 55 forks source link

Sensu - Start a Particular Process if not running #44

Closed amruthapbhat closed 7 years ago

amruthapbhat commented 7 years ago

Hi,

I have set up sensu and monitoring a particular process with check-process.rb, if the process is not running then it shows up in the uchiwa dashboard and a mail gets triggered to the respective people stating that the particular process is not running.

Is there any way to start the process if that is not running if the above scenario has occurred?

majormoses commented 7 years ago

yes you need to use the remediator feature which is by far one of the least documented features.

Here is one of the articles I used as reference when setting up remediation: http://thesoftjaguar.com/posts/2015/06/14/sensu-remediation/

majormoses commented 7 years ago

if you still need help after reading that I can try sharing some snippets of chef to set this up.

majormoses commented 7 years ago

I just posted some stuff here: https://github.com/sensu-plugins/sensu-plugins-sensu/issues/25#issuecomment-305955997 that might be helpful to you.

amruthapbhat commented 7 years ago

Hi @majormoses

I have made the required changes as mentioned above.

I have given the below remediation configs.

check-process.json

{
  "checks": {
    "check_process": {
       "command": "/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1",
       "interval": 120,
       "subscribers": [ "nexus-server-sensu-client" ],
       "handlers": [ "remediator" ],
       "remediation": {
         "check_remediation": {
         "occurrences": ["1+"],
         "severities": [2]
         }
      }
    }
  }
}

check-nexus.json:

"checks": {
  "check_remediation": {
    "command": "sudo /usr/local/nexus-3.0.2-02/bin/nexus run",
    "subscribers": [ "nexus-server-sensu-client" ],
    "handlers": [ "remediator" ],
    "standalone": false,
    "publish": false
  }
}

remeadiator.json

{
  "handlers": {
    "remediator": {
      "command": "/etc/sensu/handlers/sensu.rb",
      "type": "pipe",
      "severities": ["critical"]
    }
  }
}

client.json:

"client": {
  "name": "check-process",
  "address": "IP address of the server",
  "subscriptions": ["nexus-server-sensu-client"],
  "safe_mode":false
}

I have placed the handler at /sensu/handlers/sensu.rb which i have taken from the below path https://github.com/sensu-plugins/sensu-plugins-sensu/blob/master/bin/handler-sensu.rb

The output of the sensu-client:

{"timestamp":"2017-06-05T10:35:45.938499+0000","level":"info","message":"received check request","check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["remediator"],"remediation":{"check_remediation":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496658945}} {"timestamp":"2017-06-05T10:35:46.087514+0000","level":"info","message":"publishing check result","payload":{"client":"check-process","check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["remediator"],"remediation":{"check_remediation":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496658945,"interval":120,"subscribers":["nexus-server-sensu-client"],"executed":1496658945,"duration":0.148,"output":"CheckProcess CRITICAL: Found 0 matching processes; cmd //usr/lib/jvm/java-8-oracle/jre/bin/java/\n","status":2}}}

The sensu-server log:

{"timestamp":"2017-06-05T10:47:46.334224+0000","level":"info","message":"handler output","handler":{"command":"/etc/sensu/handlers/sensu.rb","type":"pipe","severities":["critical"],"name":"remediator"},"output":["REMEDIATION: Evaluating remediation: check-process {"check_remediation"=>{"occurrences"=>["1+"], "severities"=>[2]}} #=556 sev=2\nREMEDIATION: Triggering remediation check 'check_remediation' for ["check-process"]\nREMEDIATION: Received API Response (202): {"issued":1496659666}, exiting.\n"]}

The response shown is as issued, but the command given in the remediator i.e : check-nexus.json is not executing

Could you let me know if i am missing out any config?

majormoses commented 7 years ago

@amruthapbhat looks to like you have extra } from what you have here, Can you validate if this is a copy/paste error on the issue or if this matches your config? I updated the formatting to be a bit easier to read but did not remove any extra }.

amruthapbhat commented 7 years ago

@majormoses thats a copy/paste error.

majormoses commented 7 years ago

ok I updated the comment to reflect.

amruthapbhat commented 7 years ago

@majormoses thank you.

please let me know if i am missing out any configurations.

majormoses commented 7 years ago

It looks like the remediator fired per:

{"timestamp":"2017-06-05T10:47:46.334224+0000","level":"info","message":"handler output","handler":{"command":"/etc/sensu/handlers/sensu.rb","type":"pipe","severities":["critical"],"name":"remediator"},"output":["REMEDIATION: Evaluating remediation: check-process {"check_remediation"=>{"occurrences"=>["1+"], "severities"=>[2]}} #=556 sev=2\nREMEDIATION: Triggering remediation check 'check_remediation' for ["check-process"]\nREMEDIATION: Received API Response (202): {"issued":1496659666}, exiting.\n"]}

Also I seem to recall you opening up another issue related to this can you put a link to the other so we can seem them both with all the appropriate context.

amruthapbhat commented 7 years ago

@majormoses As per the server log it shows that it has sent the request to client. But the client is not executing the command, that is what i could see from the client log

I created another issue by mistake and could not close it: https://github.com/sensu-plugins/sensu-plugins-sensu/issues/26

majormoses commented 7 years ago

hmm I'd have to take a closer look tonight as this seems right taking a quick pass at it. One thing you could try is using sensu-plugin 1.x as I have not upgraded my env to 2.x so I can not say for sure if there are changes required to make that work. I closed the other issue for you so we can focus on keeping all the relevant info and discussion here.

majormoses commented 7 years ago

So one thing you will want to add when you get it working is to set a unique subscription (I reccomend a hostname or uuid) as that way you can ensure that it only restarts the process on that one machine and not all of them.

majormoses commented 7 years ago

your client name looks wrong, that should probably be a hostname or something.

amruthapbhat commented 7 years ago

@majormoses ok. But its still not starting the process

amruthapbhat commented 7 years ago

@majormoses Client name could be anything right. its just a name for display purpose

majormoses commented 7 years ago

in check-nexus.json I don't think you would want remediator to call remediator

majormoses commented 7 years ago

Other than that it all looks like it matches my env, so can we confirm that it would work if you went to sensu-plugin 1.x

amruthapbhat commented 7 years ago

@majormoses I should be removing "handlers": [ "remediator" ] this right?

majormoses commented 7 years ago

yes, you would ideally want to set it to something like email, pagerduty, etc so you get monitored if it can not auto resolve.

amruthapbhat commented 7 years ago

@majormoses i tried removing the handler. It still does not trigger the process. Could you please let me know how i could go back to sensu-plugin 1.x

majormoses commented 7 years ago

Here are my examples: process check:

root@ip-10-55-142-253:/etc/sensu/conf.d# cat checks/chef_client_process.json
{
  "checks": {
    "chef_client_process": {
      "command": "check-process.rb -p '/opt/chef/embedded/bin/ruby /usr/bin/chef-client'",
      "subscribers": [
        "chef_client"
      ],
      "handlers": [
        "pagerduty",
        "remediator"
      ],
      "interval": 60,
      "pager_team": "urgent",
      "notification": "No chef-client service is running",
      "occurrences": 15,
      "remediation": {
        "chef_client_process_remediate": {
          "occurrences": [
            "1-5"
          ],
          "severities": [
            2
          ]
        }
      }
    }
  }
}

remediation:

root@ip-10-55-142-253:/etc/sensu/conf.d# cat checks/chef_client_process_remediate.json
{
  "checks": {
    "chef_client_process_remediate": {
      "command": "sudo service chef-client start",
      "subscribers": [
        "chef_client",
        "ip-10-55-142-253.us-west-2.compute.internal"
      ],
      "standalone": false,
      "handlers": [
        "pagerduty"
      ],
      "publish": false,
      "interval": 10,
      "pager_team": "urgent",
      "notification": "Remediate failed: Can not start chef-client service",
      "occurrences": 3
    }
  }
}
majormoses commented 7 years ago

to validate you do not have 2.x installed:

root@ip-10-55-142-253:/etc/sensu/conf.d# /opt/sensu/embedded/bin/gem list | grep sensu-plugin | head -n 1
sensu-plugin (1.4.2, 1.2.0)

to install 1.x you can do something like:

/opt/sensu/embedded/bin/gem gem install sensu-plugin -v 1.4.2

and to remove 2.x:

/opt/sensu/embedded/bin/gem gem uninstall sensu-plugin --version '>= 2'
amruthapbhat commented 7 years ago

@majormoses i tried doing the above things as listed above but still the process is not running

Please find the logs below:

Client log:

{"timestamp":"2017-06-06T07:47:45.934209+0000","level":"info","message":"received check request","check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["mailer","remediator"],"remediation":{"check_nexus":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496735265}} {"timestamp":"2017-06-06T07:47:46.074050+0000","level":"info","message":"publishing check result","payload":{"client":"check-process","check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["mailer","remediator"],"remediation":{"check_nexus":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496735265,"subscribers":["nexus-server-sensu-client"],"interval":60,"executed":1496735265,"duration":0.139,"output":"CheckProcess CRITICAL: Found 0 matching processes; cmd //usr/lib/jvm/java-8-oracle/jre/bin/java/\n","status":2}}}

API log:

{"timestamp":"2017-06-06T07:47:46.284135+0000","level":"info","message":"api response","request":{"remote_address":"96.118.6.251","user_agent":"Ruby","method":"GET","uri":"/stash/silence/all/check_process","query_string":null,"body":""},"status":404,"content_length":0} {"timestamp":"2017-06-06T07:47:46.286118+0000","level":"info","message":"publishing check request","payload":{"command":"sudo /usr/local/nexus-3.0.2-02/bin/nexus run","subscribers":["check-process"],"standalone":false,"handlers":["mailer"],"publish":false,"interval":10,"name":"check_nexus","issued":1496735266},"subscribers":["check-process"]} {"timestamp":"2017-06-06T07:47:46.286795+0000","level":"info","message":"api response","request":{"remote_address":"96.118.6.251","user_agent":"Ruby","method":"POST","uri":"/request","query_string":null,"body":"{\"check\":\"check_nexus\",\"subscribers\":[\"check-process\"]}"},"status":202,"content_length":21} {"timestamp":"2017-06-06T07:47:46.871769+0000","level":"info","message":"api response","request":{"remote_address":"96.118.6.251","user_agent":"Ruby","method":"GET","uri":"/stash/silence/check-process","query_string":null,"body":""},"status":404,"content_length":0} {"timestamp":"2017-06-06T07:47:46.873653+0000","level":"info","message":"api response","request":{"remote_address":"96.118.6.251","user_agent":"Ruby","method":"GET","uri":"/stash/silence/check-process/check_process","query_string":null,"body":""},"status":404,"content_length":0}

Server Log:

{"timestamp":"2017-06-06T07:44:45.925182+0000","level":"info","message":"publishing check request","payload":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["mailer","remediator"],"remediation":{"check_nexus":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496735085},"subscribers":["nexus-server-sensu-client"]} {"timestamp":"2017-06-06T07:44:46.078892+0000","level":"info","message":"processing event","event":{"client":{"name":"check-process","address":"96.118.6.251","subscriptions":["nexus-server-sensu-client","client:check-process"],"safe_mode":false,"version":"0.26.5","timestamp":1496735083},"check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","subscribers":["nexus-server-sensu-client"],"handlers":["mailer","remediator"],"interval":60,"remediation":{"check_nexus":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496735085,"executed":1496735085,"duration":0.144,"output":"CheckProcess CRITICAL: Found 0 matching processes; cmd //usr/lib/jvm/java-8-oracle/jre/bin/java/\n","status":2,"type":"standard","history":["2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2"],"total_state_change":0},"occurrences":622,"occurrences_watermark":622,"action":"create","timestamp":1496735086,"id":"d355ce69-49c4-4beb-945c-8c06dc0dd118","last_state_change":1496660626,"last_ok":1496660626,"silenced":false,"silenced_by":[]}} {"timestamp":"2017-06-06T07:44:46.295489+0000","level":"info","message":"handler output","handler":{"command":"/etc/sensu/handlers/sensu.rb","type":"pipe","severities":["critical"],"name":"remediator"},"output":["warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nREMEDIATION: Evaluating remediation: check-process {\"check_nexus\"=>{\"occurrences\"=>[\"1+\"], \"severities\"=>[2]}} #=622 sev=2\nREMEDIATION: Triggering remediation check 'check_nexus' for [\"check-process\"]\nREMEDIATION: Received API Response (202): {\"issued\":1496735086}, exiting.\n"]}

amruthapbhat commented 7 years ago

@majormoses The process is getting executed now. The issue was as u said the name in client.json should be the same as the subscription. This works with sensu-plugin 2x as well.

Thank you for helping out.

majormoses commented 7 years ago

awesome glad I could help