Open Tontonitch opened 6 years ago
Tested the last version 0.5.2.20, frozen situation still there.
Just to give more details with version 0.5.2.20:
C:\WINDOWS\system32>"C:\Program Files\ICINGA2\/sbin/check_nscp_api" --password xxxxxxxx -H localhost -P xxxx -q check_cpu
==> timeout
Humm, I honestly did not even know that Icinga had a REST based check for CPU (before they used the client option)... So do you have any idea what it does?
The "legacy API" which I assume it is using (as the new one is not even complete yet) should not have changed at all. I know prior to 0.5.2.20 the authentication was broken, but that has been fixed.
And if something "stopped working" I assume it is related to the configuration. Could you paste relevant bits of the NSClient++ config (and validate that passwords and ports seem correct)
Hello Michael,
Humm, I honestly did not even know that Icinga had a REST based check for CPU (before they used the client option)... >So do you have any idea what it does?
Icinga2 developers have played with this legacy API since the end of 2016 and it is officially available since august of the year (with icinga2 2.7.0) through the check_nscp_api plugin, to deal with runtime metrics such as cpu usage and windows event logs. https://www.icinga.com/2016/09/16/nsclient-0-5-0-rest-api-and-icinga-2-integration/ https://www.icinga.com/2017/07/05/monitoring-windows-clients-with-icinga-2-and-local-nsclient-checks/ https://www.icinga.com/2017/08/02/icinga-2-v2-7-0-released/
Icinga2 2.7.1 and then 2.8.0 produced and completed the related documentation: https://www.icinga.com/docs/icinga2/snapshot/doc/06-distributed-monitoring/#nsclient-with-check_nscp_api
This plugin is already listed in the nscp documentation, I guest it was a contribution to your doc by the Icinga2 core dev team. https://docs.nsclient.org/api/#integrations
The "legacy API" which I assume it is using (as the new one is not even complete yet) should not have changed at all. I know prior to 0.5.2.20 the authentication was broken, but that has been fixed. And if something "stopped working" I assume it is related to the configuration. >Could you paste relevant bits of the NSClient++ config (and validate that passwords and ports seem correct)
Password and port are correct. My configuration did not change. I use it with NSCP 0.5.0.62 on servers for which I don’t face any issue (other than some know issues like false-positive error messages, fixed in 5.1.x)
I will attach it asap.
But what I’m really worry about is the fact that, on a server where NSCP 0.5.0.62 works correctly, if I install the 0.5.1.46 the issue starts to occur, and then I cannot rollback as even if I put the old version back the issue persists. I really don't understand what's going on.
It is an official API, so it should work for sure... I was just unaware they used it :)
Are the passwords different?
(there are two in the file)
Passwords are the same
It is an official API, so it should work for sure... I was just unaware they used it :)
Hope that the info provided are interesting to you. And as I understood from the documentation, checks via the check_nscp_api plugin, so querying the REST API (Legacy API currently), is the way the icinga2 dev team goes, as much flexible and not limited as the "nscp client" command is.
Maybe you will meet some Icinga2 developers during the coming OSMC. There is a pending task about the future NSCP version integrated with Icinga2 (github task https://github.com/Icinga/icinga2/issues/5633).
Hello Michael,
Any news about at least the frozen stats situation appeared since the upgrade of nscp?
While trying to find a good icinga2 client <-> nscp integration, I still face this frozen stats issue.
An exemple with check_nrpe requests: as you can see after a nscp service restart, it gives good and changing values, and after some seconds it freezes, returning always the same value.
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666
I (0.5.1.46 2017-09-24) seem to be doing fine...
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
WARNING: 5s: 84%|'total 5m'=0%;80;90 'total 1m'=32%;80;90 'total 5s'=84%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
[root@monitorsrv1 plugins]# ./check_nrpe -H xxxxxxxxxxxxx -p 5666 -c check_cpu
OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=39%;80;90 'total 5s'=69%;80;90
What could I try to fix that?
BR, Yannick
Hello Michael,
I have a working configuration now:
It seems that after
... the returned stats are ok.
I'm testing to see which change and/or action produced the issue.
By the way, is there a way to keep real-time stats across NSCP restarts, to avoid the following drops in the cpu usage stats for example?
Edit: opened a separate "issue" as it is not related to this issue (#555)
Hello Michael,
I faced again the issue, even with NSCP 0.5.0.62 (bundled with Icinga2). Restarting the NSCP service fixed the issue, but for how long? What is bad is that, as the cpu check returned wrong values, we were not notified about an important issue.
BR, Yannick
For your information, I had 10 servers impacted. Restarting the service fixed the issue for the moment.
please open a ticket about keeping cpu stats across restarts.
As for the next work issue I dont have access to icinga client myself, but I have fixed a rest issue in the next version so please see if that resolves it here: https://github.com/mickem/nscp/releases/tag/0.5.3.3
Issue and Steps to Reproduce
Using NSCP 0.5.0.62 with Icinga2 and the check_nscp_api.exe plugin, I wanted to upgrade the running NSCP version to last stable 0.5.1.46, to
Unfortunately, the new version introduces:
nscp client
(done by the icinga2 agent) as regulary it returns an "unknown" message. It happens on a server with high cpu activity. Nothing in the NSCP log (info level)No way to get the cpu usage check working correctly. The other checks, done via Icinga2 launching commands like nscp client…, seem to have continued to work correctly. For example, the memory check:
The most important problem is that I tried to revert to the version 0.5.0.62, but the issue still occur! I cannot get my cpu check to work as before, even after a server reboot!
I had that issue on the 2 servers where I did the migration from 0.5.0.62 to 0.5.1.46.
Any idea to fix my situation, at least to get 0.5.0.62 working again?
Expected Behavior
check_cpu should still work via the REST API after migrating from 0.5.0.62 to 0.5.1.46 check_cpu should work again after downgrading to 0.5.0.62
Actual Behavior
check_cpu via the REST API seems to be frozen to the first gathered values after migrating from 0.5.0.62 to 0.5.1.46 check_cpu via the REST API seems to keep the issue after downgrading to 0.5.0.62
Details
Additional Details
NSClient++ log: absolutely nothing in the log