sensu-plugins / sensu-plugins-process-checks

This plugin provides native process instrumentation for monitoring and metrics collection, including: process status, uptime, thread count, and others.
http://sensu-plugins.io
MIT License
20 stars 55 forks source link

check-process.rb filtering behavior does not work as expected #45

Closed jgnagy closed 7 years ago

jgnagy commented 7 years ago

Trying to alert on processes with RSS above some value (-r) or with more threads than some threshold (-T) does not work the way I imagined it would based on the description of those flags. The confusion seems to come from:

    procs.select! { |p| p[:vsz].to_f > config[:vsz] } if config[:vsz]
    procs.select! { |p| p[:rss].to_f > config[:rss] } if config[:rss]
    procs.select! { |p| p[:cpu].to_f > config[:cpu_utilization] } if config[:cpu_utilization]
    procs.select! { |p| p[:thcount].to_i > config[:thcount] } if config[:thcount]

I would have expected check-process.rb -p foo -u bar -r 100000 to find my process running as bar, matching foo (let's assume that is specific enough). I would then expect it to throw an alert if RSS for the process is over 100,000, and otherwise return CheckProcess OK when it is under this value.

Instead, this is returned:

CheckProcess CRITICAL: Found 0 matching processes; cmd /foo/; user bar; rss > 100000

Without -r, I get:

CheckProcess OK: Found 1 matching processes; cmd /foo/; user bar

I suppose this can still be useful to someone looking to make sure their process is consuming more than the RSS value. Was this the intention? If so, maybe I'll work on a PR to clarify things in the README plus a -R to allow the behavior I was hoping to see. If not, this may be a bug.

majormoses commented 7 years ago

I can not speak to intention, I briefly read the code and feel its more likely a bug than intended feature. That being said I see the use case for both and would prefer a new flag as proposed to avoid breaking real use cases.

majormoses commented 7 years ago

@jgnagy @eheydrick so it can do exactly what you want to do but is super unintuitive after review there is no code change required to make the expected functionality as outlined here: https://github.com/sensu-plugins/sensu-plugins-process-checks/pull/46#discussion_r124469977 I am tempted to say we should remove all defaults on warn and crits (above and under) which would be a breaking change and force users to always explicitly set this or we should maybe add some output to the check result that might give the user a hint as to their useage of the plugin might not do what they expect. Thoughts?

majormoses commented 7 years ago

see https://github.com/sensu-plugins/sensu-plugins-process-checks/issues/22 which is really the same issue.

majormoses commented 7 years ago

@jgnagy did that help you understand how you can do what you want with no code change to the plugin. I am in agreement this is a poor experience but right now I just do not have the time to deal with this so I would like to make sure that you are unblocked and we can address making this suck less separately.

jgnagy commented 7 years ago

Honestly, I just went with a specifically-tailored one-off check for a bit while I was troubleshooting a process with a memory leak. I have since corrected the leak and used one of the metrics checks in this plugin to keep track of it, so although I never quite understood what changes I needed to make to catch what I was trying to alert for, we can close this issue.

majormoses commented 7 years ago

@jgnagy well sorry you were not able to get this one to work but I am glad that you had a reasonable work around. I definitely feel this check is bloated and really needs an overhaul to figure out how to make it easier to use and understand. I am gonna open an issue and either someone or myself will eventually take care of making this a lot better.