Closed netphantm closed 2 years ago
Indeed, it seems "incr" is not a valid value according to performance data rules: https://nagios-plugins.org/doc/guidelines.html#AEN200
Hi,
Sorry about the late answer but I've been thinking about this issue a lot.
FWIW, here's an example of all the keys used:
# Retention service
"Long message : full=1",
"Long message : diff=1",
"Long message : incr=1",
"Long message : latest=incr,20211012-130037F_20211012-130045I",
"Long message : latest_age=1s",
"Long message : latest_full=20211012-130037F",
"Long message : latest_full_age=7s",
# Archives service
"Long message : latest_archive_age=5s",
"Long message : num_unique_archives=4",
"Long message : min_wal=00000001000000000000000D",
"Long message : max_wal=000000010000000000000010",
"Long message : latest_archive=000000010000000000000010",
"Long message : latest_bck_archive_start=00000001000000000000000F",
"Long message : latest_bck_type=incr",
"Long message : oldest_archive=00000001000000000000000D",
"Long message : oldest_bck_archive_start=00000001000000000000000D",
"Long message : oldest_bck_type=full",
First of all, I strongly believe that the latest
info is really interesting. So, I'll split latest
into latest_bck
and latest_bck_type
(to make it consistent with the archives service).
For the prtg
output format, we've added some kind of filter:
# Define which @longmsg keys will use TimeSeconds or Count units.
# Otherwise, it will be added to TEXT message.
my @TimeKeys = ("latest_age", "latest_full_age", "latest_archive_age");
my @CountKeys = ("full", "diff", "incr", "num_unique_archives", "num_missing_archives");
Applying the same filter would move latest_bck
, latest_bck_type
and latest_full
to the text output and give something like:
BACKUPS_RETENTION OK - backups policy checks ok - latest_bck=20211013-064032F_20211013-064105I, latest_bck_type=incr
There are other possibilities:
nagios_strict
output format to simply exclude those keys from the perfdata parthuman
output format only (like it's done for the archives service); it would also have an impact on the prtg
output formatNot sure what's the best option yet.
Kind Regards
First of all, I strongly believe that the latest info is really interesting. So, I'll split latest into latest_bck and latest_bck_type (to make it consistent with the archives service).
+1
Adding nagios_strict
format seems the best option to avoid breaking compatibility of existing installation. We can also add min/max thresholds in the perfdata.
Hi,
I've just pushed the nagios_strict format.
It should solve this issue. @netphantm, can you please try it ?
Thanks, Kind regards
Hi @pgstef looks ok. no more warnings in the log, and perfdata looks like this:
check_pgbackrest -s retention -S data -O nagios_strict
BACKUPS_RETENTION OK - backups policy checks ok | full=2 diff=0 incr=11 latest_bck_age=4h57m17s
greetings, hugo.-
on the other hand, the production machine still shows this:
[2021-10-15 10:50:12 +0200] information/ExternalCommandListener: Executing external command: [1634287812] SCHEDULE_FORCED_SVC_CHECK;foo-prodns-pdb;Pgbackrest retention;1634287812
[2021-10-15 10:50:12 +0200] warning/GraphiteWriter: Ignoring invalid perfdata for checkable 'foo-prodns-pdb!Pgbackrest retention' and command 'by_ssh' with value: latest_bck_age=5h37m2s
Context:
(0) Processing check result for 'foo-prodns-pdb!Pgbackrest retention'
[2021-10-15 10:50:12 +0200] warning/ElasticsearchWriter: Ignoring invalid perfdata for checkable 'foo-prodns-pdb!Pgbackrest retention' and command 'by_ssh' with value: latest_bck_age=5h37m2s
Context:
(0) Elasticwriter processing check result for 'foo-prodns-pdb!Pgbackrest retention'
the icinga2 version on that is
root@monitoring-01:~$ icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.13.1-1)
Copyright (c) 2012-2021 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
Platform: Debian GNU/Linux
Platform version: 10 (buster)
Kernel: Linux
Kernel version: 4.19.0-17-amd64
Architecture: x86_64
Build information:
Compiler: GNU 8.3.0
Build host: runner-hh8q3bz2-project-508-concurrent-0
OpenSSL version: OpenSSL 1.1.1d 10 Sep 2019
sorry, I forgot, on the staging I don't have 'elasticsearch' and 'graphite' features enabled, my bad :)
greetings
That's not an error from Icinga itself. Anyway, I found out why and pushed the fix ;-)
looks OK now:
[2021-10-15 11:03:08 +0200] information/ExternalCommandListener: Executing external command: [1634288588] SCHEDULE_FORCED_SVC_CHECK;foo-prodns-pdb;Pgbackrest retention;1634288588
thanks :-) looking forward to this getting into the repos, and then I'll update all machines and icinga2 config.
I'm getting these in the icinga2.log all the time.
If no unit of measurement is specified, I think it assumes a number (int or float) of things (eg, users, processes, load averages) and gets confused about that value. Would it help to put it in quotes perhaps?