Closed amruthapbhat closed 7 years ago
yes you need to use the remediator feature which is by far one of the least documented features.
Here is one of the articles I used as reference when setting up remediation: http://thesoftjaguar.com/posts/2015/06/14/sensu-remediation/
if you still need help after reading that I can try sharing some snippets of chef to set this up.
I just posted some stuff here: https://github.com/sensu-plugins/sensu-plugins-sensu/issues/25#issuecomment-305955997 that might be helpful to you.
Hi @majormoses
I have made the required changes as mentioned above.
I have given the below remediation configs.
check-process.json
{
"checks": {
"check_process": {
"command": "/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1",
"interval": 120,
"subscribers": [ "nexus-server-sensu-client" ],
"handlers": [ "remediator" ],
"remediation": {
"check_remediation": {
"occurrences": ["1+"],
"severities": [2]
}
}
}
}
}
check-nexus.json:
"checks": {
"check_remediation": {
"command": "sudo /usr/local/nexus-3.0.2-02/bin/nexus run",
"subscribers": [ "nexus-server-sensu-client" ],
"handlers": [ "remediator" ],
"standalone": false,
"publish": false
}
}
remeadiator.json
{
"handlers": {
"remediator": {
"command": "/etc/sensu/handlers/sensu.rb",
"type": "pipe",
"severities": ["critical"]
}
}
}
client.json:
"client": {
"name": "check-process",
"address": "IP address of the server",
"subscriptions": ["nexus-server-sensu-client"],
"safe_mode":false
}
I have placed the handler at /sensu/handlers/sensu.rb which i have taken from the below path https://github.com/sensu-plugins/sensu-plugins-sensu/blob/master/bin/handler-sensu.rb
The output of the sensu-client:
{"timestamp":"2017-06-05T10:35:45.938499+0000","level":"info","message":"received check request","check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["remediator"],"remediation":{"check_remediation":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496658945}} {"timestamp":"2017-06-05T10:35:46.087514+0000","level":"info","message":"publishing check result","payload":{"client":"check-process","check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["remediator"],"remediation":{"check_remediation":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496658945,"interval":120,"subscribers":["nexus-server-sensu-client"],"executed":1496658945,"duration":0.148,"output":"CheckProcess CRITICAL: Found 0 matching processes; cmd //usr/lib/jvm/java-8-oracle/jre/bin/java/\n","status":2}}}
The sensu-server log:
{"timestamp":"2017-06-05T10:47:46.334224+0000","level":"info","message":"handler output","handler":{"command":"/etc/sensu/handlers/sensu.rb","type":"pipe","severities":["critical"],"name":"remediator"},"output":["REMEDIATION: Evaluating remediation: check-process {"check_remediation"=>{"occurrences"=>["1+"], "severities"=>[2]}} #=556 sev=2\nREMEDIATION: Triggering remediation check 'check_remediation' for ["check-process"]\nREMEDIATION: Received API Response (202): {"issued":1496659666}, exiting.\n"]}
The response shown is as issued, but the command given in the remediator i.e : check-nexus.json is not executing
Could you let me know if i am missing out any config?
@amruthapbhat looks to like you have extra }
from what you have here, Can you validate if this is a copy/paste error on the issue or if this matches your config? I updated the formatting to be a bit easier to read but did not remove any extra }
.
@majormoses thats a copy/paste error.
ok I updated the comment to reflect.
@majormoses thank you.
please let me know if i am missing out any configurations.
It looks like the remediator fired per:
{"timestamp":"2017-06-05T10:47:46.334224+0000","level":"info","message":"handler output","handler":{"command":"/etc/sensu/handlers/sensu.rb","type":"pipe","severities":["critical"],"name":"remediator"},"output":["REMEDIATION: Evaluating remediation: check-process {"check_remediation"=>{"occurrences"=>["1+"], "severities"=>[2]}} #=556 sev=2\nREMEDIATION: Triggering remediation check 'check_remediation' for ["check-process"]\nREMEDIATION: Received API Response (202): {"issued":1496659666}, exiting.\n"]}
Also I seem to recall you opening up another issue related to this can you put a link to the other so we can seem them both with all the appropriate context.
@majormoses As per the server log it shows that it has sent the request to client. But the client is not executing the command, that is what i could see from the client log
I created another issue by mistake and could not close it: https://github.com/sensu-plugins/sensu-plugins-sensu/issues/26
hmm I'd have to take a closer look tonight as this seems right taking a quick pass at it. One thing you could try is using sensu-plugin 1.x as I have not upgraded my env to 2.x so I can not say for sure if there are changes required to make that work. I closed the other issue for you so we can focus on keeping all the relevant info and discussion here.
So one thing you will want to add when you get it working is to set a unique subscription (I reccomend a hostname or uuid) as that way you can ensure that it only restarts the process on that one machine and not all of them.
your client name looks wrong, that should probably be a hostname or something.
@majormoses ok. But its still not starting the process
@majormoses Client name could be anything right. its just a name for display purpose
in check-nexus.json I don't think you would want remediator to call remediator
Other than that it all looks like it matches my env, so can we confirm that it would work if you went to sensu-plugin 1.x
@majormoses I should be removing "handlers": [ "remediator" ] this right?
yes, you would ideally want to set it to something like email, pagerduty, etc so you get monitored if it can not auto resolve.
@majormoses i tried removing the handler. It still does not trigger the process. Could you please let me know how i could go back to sensu-plugin 1.x
Here are my examples: process check:
root@ip-10-55-142-253:/etc/sensu/conf.d# cat checks/chef_client_process.json
{
"checks": {
"chef_client_process": {
"command": "check-process.rb -p '/opt/chef/embedded/bin/ruby /usr/bin/chef-client'",
"subscribers": [
"chef_client"
],
"handlers": [
"pagerduty",
"remediator"
],
"interval": 60,
"pager_team": "urgent",
"notification": "No chef-client service is running",
"occurrences": 15,
"remediation": {
"chef_client_process_remediate": {
"occurrences": [
"1-5"
],
"severities": [
2
]
}
}
}
}
}
remediation:
root@ip-10-55-142-253:/etc/sensu/conf.d# cat checks/chef_client_process_remediate.json
{
"checks": {
"chef_client_process_remediate": {
"command": "sudo service chef-client start",
"subscribers": [
"chef_client",
"ip-10-55-142-253.us-west-2.compute.internal"
],
"standalone": false,
"handlers": [
"pagerduty"
],
"publish": false,
"interval": 10,
"pager_team": "urgent",
"notification": "Remediate failed: Can not start chef-client service",
"occurrences": 3
}
}
}
to validate you do not have 2.x installed:
root@ip-10-55-142-253:/etc/sensu/conf.d# /opt/sensu/embedded/bin/gem list | grep sensu-plugin | head -n 1
sensu-plugin (1.4.2, 1.2.0)
to install 1.x you can do something like:
/opt/sensu/embedded/bin/gem gem install sensu-plugin -v 1.4.2
and to remove 2.x:
/opt/sensu/embedded/bin/gem gem uninstall sensu-plugin --version '>= 2'
@majormoses i tried doing the above things as listed above but still the process is not running
Please find the logs below:
Client log:
{"timestamp":"2017-06-06T07:47:45.934209+0000","level":"info","message":"received check request","check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["mailer","remediator"],"remediation":{"check_nexus":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496735265}} {"timestamp":"2017-06-06T07:47:46.074050+0000","level":"info","message":"publishing check result","payload":{"client":"check-process","check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["mailer","remediator"],"remediation":{"check_nexus":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496735265,"subscribers":["nexus-server-sensu-client"],"interval":60,"executed":1496735265,"duration":0.139,"output":"CheckProcess CRITICAL: Found 0 matching processes; cmd //usr/lib/jvm/java-8-oracle/jre/bin/java/\n","status":2}}}
API log:
{"timestamp":"2017-06-06T07:47:46.284135+0000","level":"info","message":"api response","request":{"remote_address":"96.118.6.251","user_agent":"Ruby","method":"GET","uri":"/stash/silence/all/check_process","query_string":null,"body":""},"status":404,"content_length":0} {"timestamp":"2017-06-06T07:47:46.286118+0000","level":"info","message":"publishing check request","payload":{"command":"sudo /usr/local/nexus-3.0.2-02/bin/nexus run","subscribers":["check-process"],"standalone":false,"handlers":["mailer"],"publish":false,"interval":10,"name":"check_nexus","issued":1496735266},"subscribers":["check-process"]} {"timestamp":"2017-06-06T07:47:46.286795+0000","level":"info","message":"api response","request":{"remote_address":"96.118.6.251","user_agent":"Ruby","method":"POST","uri":"/request","query_string":null,"body":"{\"check\":\"check_nexus\",\"subscribers\":[\"check-process\"]}"},"status":202,"content_length":21} {"timestamp":"2017-06-06T07:47:46.871769+0000","level":"info","message":"api response","request":{"remote_address":"96.118.6.251","user_agent":"Ruby","method":"GET","uri":"/stash/silence/check-process","query_string":null,"body":""},"status":404,"content_length":0} {"timestamp":"2017-06-06T07:47:46.873653+0000","level":"info","message":"api response","request":{"remote_address":"96.118.6.251","user_agent":"Ruby","method":"GET","uri":"/stash/silence/check-process/check_process","query_string":null,"body":""},"status":404,"content_length":0}
Server Log:
{"timestamp":"2017-06-06T07:44:45.925182+0000","level":"info","message":"publishing check request","payload":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","handlers":["mailer","remediator"],"remediation":{"check_nexus":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496735085},"subscribers":["nexus-server-sensu-client"]} {"timestamp":"2017-06-06T07:44:46.078892+0000","level":"info","message":"processing event","event":{"client":{"name":"check-process","address":"96.118.6.251","subscriptions":["nexus-server-sensu-client","client:check-process"],"safe_mode":false,"version":"0.26.5","timestamp":1496735083},"check":{"command":"/etc/sensu/plugins/check-process.rb -p /usr/lib/jvm/java-8-oracle/jre/bin/java -W 1","subscribers":["nexus-server-sensu-client"],"handlers":["mailer","remediator"],"interval":60,"remediation":{"check_nexus":{"occurrences":["1+"],"severities":[2]}},"name":"check_process","issued":1496735085,"executed":1496735085,"duration":0.144,"output":"CheckProcess CRITICAL: Found 0 matching processes; cmd //usr/lib/jvm/java-8-oracle/jre/bin/java/\n","status":2,"type":"standard","history":["2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2"],"total_state_change":0},"occurrences":622,"occurrences_watermark":622,"action":"create","timestamp":1496735086,"id":"d355ce69-49c4-4beb-945c-8c06dc0dd118","last_state_change":1496660626,"last_ok":1496660626,"silenced":false,"silenced_by":[]}} {"timestamp":"2017-06-06T07:44:46.295489+0000","level":"info","message":"handler output","handler":{"command":"/etc/sensu/handlers/sensu.rb","type":"pipe","severities":["critical"],"name":"remediator"},"output":["warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nREMEDIATION: Evaluating remediation: check-process {\"check_nexus\"=>{\"occurrences\"=>[\"1+\"], \"severities\"=>[2]}} #=622 sev=2\nREMEDIATION: Triggering remediation check 'check_nexus' for [\"check-process\"]\nREMEDIATION: Received API Response (202): {\"issued\":1496735086}, exiting.\n"]}
@majormoses The process is getting executed now. The issue was as u said the name in client.json should be the same as the subscription. This works with sensu-plugin 2x as well.
Thank you for helping out.
awesome glad I could help
Hi,
I have set up sensu and monitoring a particular process with check-process.rb, if the process is not running then it shows up in the uchiwa dashboard and a mail gets triggered to the respective people stating that the particular process is not running.
Is there any way to start the process if that is not running if the above scenario has occurred?