Open nmat opened 7 years ago
Not sure I understand but sounds like your looking for detail-syntax ?
From: https://docs.nsclient.org/reference/windows/CheckSystem/#usage_11
Changing the returned text:
check_process process=explorer.exe "warn=working_set > 70m" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
explorer.exe ws:77271040, handles: 800, user time:107s
Performance data: 'explorer.exe ws_size'=73M;70;0
I have tested various ways to use the detail-syntax with check_nrpe. However the output does not change as intended. Maybe this can be lack of knowledge however if I use:
./check_nrpe -H [ip] -c check_process -a "process=[process]" "warn=working_set > 70m" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
The output will still be:
OK: all processes are ok
The detail syntax is very confusing in this case.
There are several format strings... You can see this by running the command with show-defaults:
heck_cpu show-default
L cli OK: "filter=core = 'total'" "warning=load > 80" "critical=load > 90" "empty-state=ignored" "top-syntax=${status}: ${problem_list}" "ok-syntax=%(status): CPU load is ok." "detail-syntax=${time}: ${load}%" "perf-syntax=${core} ${time}"
So in your case what you want is most likely to:
Hello,
I tested the suggestion above with check_nrpe on OP5 and here is the line for nrpe:
./check_nrpe -H [host] -c check_process -a "process=[process]" "warn=working_set > 700M" "top-syntax=${status}: ${list}" "ok-syntax=none" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
Result:
none|process state'=1;0;0 'Process ws_size'=68.941MB;700;0 'count'=1;0;0
So unfortunately it did not work as expected. :/
From my view it seems that the output result is different when using nrpe and when using local tests to run the command and if this is the case its hard to follow the documentation on the website where you expect the result to be the same in both tests.
Not sure I follow, the result is the same, the only difference is that options required from check_nrpe (which means you put a -a before the options as well as -H and -c)...
But more importantly check_nrpe is one of many options to interact with NSClient++ thus I tend to opt for the generic version in the docs... how to use check_nrpe can be found in the check_nrpe docs...
Your result yields the following:
check_process "process=[process]" "warn=working_set > 700M" "top-syntax=${status}: ${list}" "ok-syntax=none" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
L cli CRITICAL: CRITICAL: [process] ws:0, handles: 0, user time:0s
L cli Performance data: '[process] state'=0;0;0 '[process] ws_size'=0MB;700;0 'count'=1;0;0
In an ok scenario I get the same:
check_process "process=[process]" critical=none "warn=working_set > 7000M" "top-syntax=${status}: ${list}" "ok-syntax=none" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
L cli OK: OK: [process] ws:0, handles: 0, user time:0s
L cli Performance data: '[process] ws_size'=0GB;6.83593;0
So if you get something else it could be a bug which has since been fixed (as I am on the latest version).
Hi,
I also can confirm the behaviour of the client in my environment.
Version is 0.5.0.65 2016-11-13 and also in versions before I have this behaviour.
Following comand line:
/check_nrpe -H [IP]-c check_process -a "process=nscp.exe" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
results in
OK: all processes are ok.|'nscp.exe state'=1;0;0 'count'=1;0;0
I enabled debugging on the system I this is what I can see:
D w32system Parsing: state != 'unreadable'
D w32system Parsing succeeded: (tbd){(int)var:state ? (s){unreadable}}
D w32system Type resolution succeeded: (bool){(int)var:state ? {ui:1}convert((s){unreadable})}
D w32system Binding succeeded: (bool){(int)var:state ? {ui:1}convert((s){unreadable})}
D w32system Static evaluation succeeded: (bool){(int)var:state ? {ui:1}convert((s){unreadable})}
D w32system Parsing: state not in ('started')
D w32system Parsing succeeded: (tbd){(int)var:state not in (s){started}}
D w32system Type resolution succeeded: (bool){(int)var:state not in {ui:1}convert((s){started})}
D w32system Binding succeeded: (bool){(int)var:state not in {ui:1}convert((s){started})}
D w32system Static evaluation succeeded: (bool){(int)var:state not in {ui:1}convert((s){started})}
D w32system Parsing: state = 'stopped'
D w32system Parsing succeeded: (tbd){(int)var:state = (s){stopped}}
D w32system Type resolution succeeded: (bool){(int)var:state = {ui:1}convert((s){stopped})}
D w32system Binding succeeded: (bool){(int)var:state = {ui:1}convert((s){stopped})}
D w32system Static evaluation succeeded: (bool){(int)var:state = {ui:1}convert((s){stopped})}
D w32system Parsing: count = 0
D w32system Parsing succeeded: (tbd){(int)var:count = (i){0}}
D w32system Type resolution succeeded: (bool){(int)var:count = (i){0}}
D w32system Binding succeeded: (bool){(int)var:count = (i){0}}
D w32system Static evaluation succeeded: (bool){(int)var:count = (i){0}}
D w32system Crit/warn/ok did not match: ws:, handles: , user time:s
D w32system Crit/warn/ok did not match: <END>
But if I run the following (only in Client as payload is too high) the result is as expected:
check_process "top-syntax=${status}: ${list}" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s" debug
also in debug log (just example) I can see a little bit more:
D w32system Parsing: state != 'unreadable'
D w32system Parsing succeeded: (tbd){(int)var:state ? (s){unreadable}}
D w32system Type resolution succeeded: (bool){(int)var:state ? {ui:1}convert((s){unreadable})}
D w32system Binding succeeded: (bool){(int)var:state ? {ui:1}convert((s){unreadable})}
D w32system Static evaluation succeeded: (bool){(int)var:state ? {ui:1}convert((s){unreadable})}
D w32system Parsing: state not in ('started')
D w32system Parsing succeeded: (tbd){(int)var:state not in (s){started}}
D w32system Type resolution succeeded: (bool){(int)var:state not in {ui:1}convert((s){started})}
D w32system Binding succeeded: (bool){(int)var:state not in {ui:1}convert((s){started})}
D w32system Static evaluation succeeded: (bool){(int)var:state not in {ui:1}convert((s){started})}
D w32system Parsing: state = 'stopped'
D w32system Parsing succeeded: (tbd){(int)var:state = (s){stopped}}
D w32system Type resolution succeeded: (bool){(int)var:state = {ui:1}convert((s){stopped})}
D w32system Binding succeeded: (bool){(int)var:state = {ui:1}convert((s){stopped})}
D w32system Static evaluation succeeded: (bool){(int)var:state = {ui:1}convert((s){stopped})}
D w32system Parsing: count = 0
D w32system Parsing succeeded: (tbd){(int)var:count = (i){0}}
D w32system Type resolution succeeded: (bool){(int)var:count = (i){0}}
D w32system Binding succeeded: (bool){(int)var:count = (i){0}}
D w32system Static evaluation succeeded: (bool){(int)var:count = (i){0}}
D w32system Filter did not match: ws:0, handles: 0, user time:0s
D w32system Crit/warn/ok did not match: smss.exe ws:565248, handles: 32, user time:0s
D w32system Crit/warn/ok did not match: csrss.exe ws:2891776, handles: 1099, user time:0s
D w32system Crit/warn/ok did not match: wininit.exe ws:1843200, handles: 82, user time:0s
D w32system Crit/warn/ok did not match: csrss.exe ws:95416320, handles: 1184, user time:1s
D w32system Crit/warn/ok did not match: services.exe ws:11513856, handles: 391, user time:23s
D w32system Crit/warn/ok did not match: winlogon.exe ws:4767744, handles: 135, user time:0s
D w32system Crit/warn/ok did not match: lsass.exe ws:16502784, handles: 1248, user time:14s
D w32system Crit/warn/ok did not match: lsm.exe ws:4210688, handles: 274, user time:0s
D w32system Crit/warn/ok did not match: svchost.exe ws:9449472, handles: 445, user time:69s
D w32system Crit/warn/ok did not match: svchost.exe ws:8556544, handles: 556, user time:2s
D w32system Crit/warn/ok did not match: svchost.exe ws:17145856, handles: 615, user time:10s
every time I ass the "process" value, the result is not as the expected one. Only when I add a warning or critical command like "warn=working_set > 70m" I get this value also in the result...
With 0.5.0 I get the expected result so not sure what is amiss... Could you let me know if it is w32 or x64 as well as attach any relevant config?
check_process "process=explorer.exe" "warn=working_set > 700M" "top-syntax=${status}: ${list}" "ok-syntax=none" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
L cli OK: OK: explorer.exe ws:86052864, handles: 3027, user time:105s
L cli Performance data: 'explorer.exe state'=1;0;0 'explorer.exe ws_size'=82.0664MB;700;0 'count'=1;0;0
As well as:
check_process "process=nscp.exe" "warn=working_set > 700M" "top-syntax=${status}: ${list}" "ok-syntax=none" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
L cli OK: OK: nscp.exe ws:21393408, handles: 467, user time:15s, nscp.exe ws:44482560, handles: 439, user time:541s, nscp.exe ws:33116160, handles: 414, user time:0s
L cli Performance data: 'nscp.exe state'=1;0;0 'nscp.exe ws_size'=20.40234MB;700;0 'nscp.exe state'=1;0;0 'nscp.exe ws_size'=42.42187MB;700;0 'nscp.exe state'=1;0;0 'nscp.exe ws_size'=31.58203MB;700;0 'count'=3;0;0
Hello,
So I have tested a few versions now and here is the result:
check_process "process=nscp.exe" "warn=working_set > 700M" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s" L cli OK: OK: all processes are ok. L cli Performance data: 'nscp.exe state'=1;0;0 'nscp.exe ws_size'=9.20312MB;700;0 'nscp.exe state'=1;0;0 'nscp.exe ws_size'=26.90234MB;700;0 'count'=2;0;0
check_process "process=nscp.exe" "warn=working_set > 700M" "ok-syntax=none" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s" L cli OK: none L cli Performance data: 'nscp.exe state'=1;0;0 'nscp.exe ws_size'=8.9414MB;700;0 'nscp.exe state'=1;0;0 'nscp.exe ws_size'=26.89843MB;700;0 'count'=2;0;0
check_process "process=nscp.exe" "warn=working_set > 700M" "top-syntax=${status}: ${list}" "ok-syntax=none" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s" L cli OK: OK: nscp.exe ws:9580544, handles: 377, user time:3s, nscp.exe ws:28319744, handles: 388, user time:0s L cli Performance data: 'nscp.exe state'=1;0;0 'nscp.exe ws_size'=9.13671MB;700;0 'nscp.exe state'=1;0;0 'nscp.exe ws_size'=27.00781MB;700;0 'count'=2;0;0
Now to understand why this was hard to figure out. Checking the documentation it looks like this:
check_process process=explorer.exe "warn=working_set > 70m" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s" explorer.exe ws:77271040, handles: 800, user time:107s Performance data: 'explorer.exe ws_size'=73M;70;0
So the documentation is not really correct right? You need to add some extra parameters to get the result that is mentioned?
I am testing this on: OS: windows server2016 Bit: 64bit Powershell: 5.1.14393.1532
Note that I have not tested this with return from check_nrpe
All tests are done using: 0.5.1.44 of nsclient
Now I have tested the check_nrpe that is provided from OP5 monitoring system.
The result is still the same where I get the following results:
./check_nrpe -H $HOSTADDRESS$ -c check_process -a "process=nscp.exe" "warn=working_set > 1000M" "crit=working_set > 1300M" "ok-syntax=none" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${users}s"
none|'nscp.exe ws_size'=0.02587GB;0.97656;1.26953
./check_nrpe -H $HOSTADDRESS$ -c check_process -a "process=nscp.exe" "warn=working_set > 1000M" "crit=working_set > 1300M" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${users}s"
OK: all processes are ok.|'nscp.exe ws_size'=0.02587GB;0.97656;1.26953
In general. Running the same command from the server not using NRPE yields a different result then what NRPE is being returned. So running the command remotely to the server gives wrong information.
It would be nice to have a better way to present information in summary for a specific process.
To monitor a process it would be nice to be able to have a set filter that tells you more about the process for performance monitoring:
OK - [process] - CPU: [cpu_used] , mem: [mem_used_workingset] MB, Handles: [handles]
Giving a better option to have more in the summary of the check and also graphs for them will monitor the process even better. Is this possible right now? Because I can only see the bytes calculations in the nsclient for 0.5.0.64 at the moment, unless I use counters.
Would be nice to have more numbers to work with regarding (MB,GB) etc for those specifik checks.