Open gopalvd opened 7 years ago
Hmm that is indeed strange, when I have the time I will try to replicate in my environment. I never really considered using sensu to monitor the sensu process as how would it even get the request from the transport (rabbitmq|redis) if the process is dead? Even in the case of standalone checks we still have the same problem of it being able to locally schedule itself.
I think you probably should leverage native functionality in your process supervisor such as upstart
, runit
, systemd
, etc to attempt restarting the process if it dies. It's kinda a who watches the watcher problem. Here are some other suggestions which are not mutually exclusive with what was proposed above:
That being said if the process is in fact up then there is definately a bug, config, or environment issue somewhere.
Can you please include the version of this plugin, sensu-server, and sensu-client?
@majormoses Thanks for looking in to this. I totally agree that monitoring sensu-client by sensu is not a great idea. Also we are using a diff tools for that. And yes this plugin is working perfectly when using other processes like crond, chef-client systemd etc. I noticed this strange behaviour only when i did for sensu-client. And this was done with out any intention. I want to test the plugin and gave the sensu-client process to check and noticed this.
Yes to me seems like a bug.
Here are the details that you are looking for. gem version for plugins ---- 2.5.0 sensu-client version --- "sensu-0.26.3-1.x86_64" sensu-client version --- "sensu-1.0.2-1.el7.x86_64"
Sorry the last is the sensu-server version sensu-server version --- "sensu-1.0.2-1.el7.x86_64"
Hmm this is what I show in my environment and I can't explain it, I will look through the code when I have some time. This is what I got manually which matches your output:
babrams@ip-10-55-141-110:~$ /opt/sensu/embedded/bin/check-process.rb -p sensu-client
CheckProcess OK: Found 1 matching processes; cmd /sensu-client/
I was scratching my head for a minute (clearly tired) as this is what I initially did and got back:
babrams@ip-10-55-141-110:~$ /opt/sensu/embedded/bin/check-process.rb - sensu-client
CheckProcess OK: Found 121 matching processes
Means a bug? or some options needs to be used in the command?
I was missing the p
after the -
so it was returning matching all processes!
which basically matches:
babrams@ip-10-55-141-110:~$ ps -Al | wc -l
122
I did the same CheckProcess OK: Found 402 matching processes
Correct when i run the command, its giving one matching process. But when sensu-client is executing i see 0 processes running and throwing the error. CheckProcess CRITICAL: Found 0 matching processes; cmd /sensu-client/
@majormoses Any updates you have on this issue?
No, I honestly have not really thought about it much as it is such a limited use case to have sensu check if sensu is running. Without using TTLs there is of no value, I marked it as a low priority to reflect that it will not likely see a quick resolution without another contributor who is more motivated to solve it. It is intriguing problem so if someone does not volunteer to triage it I will get to it when I have the time.
Hello, I ran into this issue today as well. IMO monitoring the sensu client process is neccessary to define proper dependencies. I don't want any further notifications in case the sensu-client dies. And it's like described above. Executing the check-process.rb on the shell returns 1 process running., while doing the same check via sensu returns no running process.
Can you please take a look?
I agree its a bug but I don't see much value in fixing this bug. By the very nature of this check and how sensu works if the sensu-client
process is not running this would never be executed on. I would suggest looking at say the keepalive which is a built in TTL check that is meant to solve this exact problem. You can also add a very simple check which runs /bin/true
and another TTL check on top of that which is essentially what the built in healthcheck does for you other than it is a fixed 20 second interval. Other options include scheduling a wrapper script via cron and if there are issues (check $?
) to update it via the servers api or using monit and update the api. Bottom line is unless it is a TTL check there is 0 value in executing this from within the sensu client process as it is chicken and egg scenario.
I do not work for sensu or am I monetarily compensated for maintaining the plugins. There are lots of issues for me to look at to fix bugs, enhance plugins, and do housekeeping. This means I need to prioritize the things I spend my efforts on. While I find the problem itself intriguing the use case makes little sense to me so I will not be prioritizing this over things that I consider bring greater value to the community. I am more than happy to review a PR should you or someone else find the issue.
If you think this is important you have a couple options:
I have pinged the other maintainers as well for another set of :eyes: as maybe they will have a different perspective than I do.
I agree with @majormoses that the use-case here isn't totally clear. Looking through the code, I would be interested to see if pass -m
or possibly -M
causes this to work though: https://github.com/sensu-plugins/sensu-plugins-process-checks/blob/master/bin/check-process.rb#L79-L84
@jaredledvina good call, these options might help but again this gives you no real value without adding a TTL check and I would suggest using the keepalive instead as that is it's whole purpose
The options are defined here:
The code that rejects the processes unless args are changed:
@majormoses The sane philosophy for this would be using another tool, for example as simple as cron. to monitor the monitoring tool like sensu-client. It's weird to have a monitoring tool watch ifself because when it dies, what would alert you... you are arealdy DEAD? Anyway, if you insist this line may shed more light on why @gopalvd saw what he saw... https://github.com/sensu-plugins/sensu-plugins-process-checks/blob/master/bin/check-process.rb#L258-L259
The sane philosophy for this would be using another tool, for example as simple as cron. to monitor the monitoring tool like sensu-client. It's weird to have a monitoring tool watch ifself because when it dies, what would alert you... you are arealdy DEAD?
Yup the only real other alternative which is native to sensu that provides value in certain scenarios such as a client dying but the server remains functional is a TTL. Bottom line even if adding those args solves the problems it will not provide any value without adding a TTL. If you are gonna do that you already have something built into sensu that does this for you its called the keepalive
check. It is hardcoded to send an update every 20 seconds and you can adjust your keepalive thresholds and dependencies as you see fit.
After reading through the code and seeing those options pointed out I highly doubt this is a bug anymore. It sounds like intended functionality and with sane defaults. The use case is still flawed but it would appear to be possible.
Well, after reading your arguments I came to the conclusion that i need to rethink my approach. It was with my nagios installation where i used service dependencies on the check if nrpe is running to suppress further service notifications when the host died.
I admit that i need to learn better how sensu handles those situations.
You can do the same thing, just use keepalive. As this clearly is something that multiple people have tried and does not work I feel like we should add some documentation to help out users.
Team, I have a strange behaviour. I am trying to monitor the sensu-client process running on my linux server. For that i have set up the check definition and installed the respective dependent gems. This is what i am seeing.
When i run the script manually i see CheckProcess OK and found 1 process matching. [root@XXXXXXXX checks]# /opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check-process.rb -p sensu-client CheckProcess OK: Found 1 matching processes; cmd /sensu-client/
When sensu-client itself is running the check, I see this. {"timestamp":"2017-11-09T03:53:50.181638-0600","level":"info","message":"publishing check result","payload":{"client":"xxxxxxxxxx","check":{"command":"/etc/sensu/plugins/check-process.rb -p sensu-client","subscribers":[],"standalone":true,"handlers":["default","mailer"],"interval":30,"type":"standard","mail_to":"xxxxxxxxxxxx@target.com","name":"check_process-test","issued":1510221230,"executed":1510221230,"duration":0.17,"output":"CheckProcess CRITICAL: Found 0 matching processes; cmd /sensu-client/\n","status":2}}}
Can some one help, what could be the issue?