pgstef / check_pgbackrest

pgBackRest backup check plugin for Nagios
PostgreSQL License
35 stars 14 forks source link

Output format 'prtg' added #21

Closed hpza closed 2 years ago

hpza commented 3 years ago

Dear Stefan, I'm using PRTG for monitoring and your check_pgbackrest was exactly what I needed. Thanks for this nice tool! I added therefore another output format to suit the PRTG advanced sensors format. I successfully added a sensor for my PostgreSQL Backups. I'm not a real perl coder, but my code seems to work fine. Would you please consider to integrate my contribution?

Thanks and kind regards! Hans-Peter ksnip_20210707-114619

pgstef commented 3 years ago

Hi,

Thank you for suggesting this new feature. To be honest, I don't know PRTG at all and it doesn't seem to be an open-source / free to use software. I don't really feel comfortable to support this feature without finding a way to cover it in the test suite.

I'm currently modifying the test system (switch to docker containers to allow multi-repo and GH actions). I'd feel more able to integrate this feature if you'd have the possibility to provide an example on how to install/use PRTG and execute check_pgbackrest as sensor, so I could look on how to add it to the test system.

Many thanks for your help and patience, Kind Regards

hpza commented 3 years ago

HI,

Unfortunately PRTG is not opensource. I will check with the PRTG Support if they provide a tool for testing custom sensors. If not, I agree with you, that it's not a good idea to introduce this feature. I'll get back to you later. Thanks and kind regards

hpza commented 3 years ago

Hi, I just get a response from PRTG-Support, they can't provide any tool for testing/validation of the generated output format. I'm not in the mood to write my own testing tool and setup a PRTG-Server for testing is not really an option (runs on Windows).

So for my use case I think I have to stay with my own fork. Tell me if you should change your mind ;-) Regards Hans-Peter

pgstef commented 2 years ago

Hi,

I finally pushed the test suite refactoring recently and then had the time to give this change a new look. I still didn't found any way to emulate PRTG to validate the output, but since it's just a formatting feature, I don't believe it would be hard to maintain.

I gave a closer look to your code. And I don't really like changing the longmsg content to msg for the text output. I strongly believe the "global" text output should be consistent across all the output formats.

Also, based on PRTG custom sensors guide, when no unit is defined, Custom is used. We could simply transform all longmsg to use that custom unit. I listed the keys included in that part and found it pretty easy to maintain in TimeKeys and CountKeys lists to adjust the unit accordingly. With that logic, we could even add the human_only_longmsg values if that could be useful to you.

I can imagine something like this:

"--service=retention --retention-full",
"Service        : BACKUPS_RETENTION",
"Returns        : 0 (OK)",
"Message        : backups policy checks ok",
"Long message   : full=1",
"Long message   : diff=1",
"Long message   : incr=1",
"Long message   : latest=incr,20210826-144059F_20210826-144107I",
"Long message   : latest_age=1s",

"<prtg>
<result><channel>status</channel><value>0</value><LimitMaxWarning>0</LimitMaxWarning><LimitMaxError>1</LimitMaxError><LimitMode>1</LimitMode></result>
<result><channel>full</channel><value>1</value><Unit>Count</Unit></result>
<result><channel>diff</channel><value>1</value><Unit>Count</Unit></result>
<result><channel>incr</channel><value>1</value><Unit>Count</Unit></result>
<result><channel>latest</channel><value>incr,20210826-144059F_20210826-144107I</value></result>
<result><channel>latest_age</channel><value>1</value><Unit>TimeSeconds</Unit></result>
<text>BACKUPS_RETENTION OK - backups policy checks ok</text>
</prtg>",

"--service=archives",
"Service        : WAL_ARCHIVES",
"Returns        : 0 (OK)",
"Message        : 4 WAL archived",
"Message        : latest archived since 1s",
"Long message   : latest_archive_age=1s",
"Long message   : num_archives=4",
"Long message   : archives_dir=archive/c7pg/13-1",
"Long message   : min_wal=00000007000000000000009D",
"Long message   : max_wal=0000000700000000000000A0",
"Long message   : latest_archive=0000000700000000000000A0",
"Long message   : latest_bck_archive_start=00000007000000000000009F",
"Long message   : latest_bck_type=incr",
"Long message   : oldest_archive=00000007000000000000009D",
"Long message   : oldest_bck_archive_start=00000007000000000000009D",
"Long message   : oldest_bck_type=full",

"<prtg>
<result><channel>status</channel><value>0</value><LimitMaxWarning>0</LimitMaxWarning><LimitMaxError>1</LimitMaxError><LimitMode>1</LimitMode></result>
<result><channel>latest_archive_age</channel><value>1</value><Unit>TimeSeconds</Unit></result>
<result><channel>num_archives</channel><value>4</value><Unit>Count</Unit></result>
<text>WAL_ARCHIVES OK - 4 WAL archived, latest archived since 1s</text>
</prtg>",

The code change is pretty simple:

sub prtg_output ($$;$$) {
    my $rc  = shift;
    my $service = shift;
    my $ret;
    my @msg;
    my @longmsg;

    @msg      = @{ $_[0] } if defined $_[0];
    @longmsg  = @{ $_[1] } if defined $_[1];

    # Generate TEXT message
    my $text = "<text>";
    $text .= $service . " OK"       if $rc == 0;
    $text .= $service . " WARNING"  if $rc == 1;
    $text .= $service . " CRITICAL" if $rc == 2;
    $text .= $service . " UNKNOWN"  if $rc == 3;
    $text .= " - ". join( ', ', @msg )  if @msg;
    $text .= "</text>";

    # Generate service status result
    my $results = "<result><channel>status</channel><value>$rc</value>";
    $results .= "<LimitMaxWarning>0</LimitMaxWarning>";
    $results .= "<LimitMaxError>1</LimitMaxError>";
    $results .= "<LimitMode>1</LimitMode></result>";

    # Define which @longmsg keys will use TimeSeconds or Count units.
    # Otherwise, the default unit is Custom.
    my @TimeKeys = ("latest_age", "latest_full_age", "latest_archive_age");
    my @CountKeys = ("full", "diff", "incr", "num_archives", "num_missing_archives");

    foreach my $msg_to_split (@longmsg) {
        my ($key, $value) = split(/=/, $msg_to_split);
        chop($value) if ( grep /^$key$/, @TimeKeys );
        $results .= "<result><channel>$key</channel><value>$value</value>";
        $results .= "<Unit>TimeSeconds</Unit>" if ( grep /^$key$/, @TimeKeys );
        $results .= "<Unit>Count</Unit>" if ( grep /^$key$/, @CountKeys );
        $results .= "</result>"
    }

    print "<prtg>" . $results . $text. "</prtg>";
    return $rc;
}

If that works for you, I can try to push my mods (and refresh the code from the main branch) directly to this PR branch. And then merge the PR.

Let me know what you think about this ;-)

Kind Regards, Stefan

hpza commented 2 years ago

Dear Stefan,

thank you for looking at my code and your proposal. Since I'm not a real perl coder I completely relay on your suggestions. So I certainly will give that a try and let you know if it works. Many thanks! Hans-Peter

pgstef commented 2 years ago

Dear Stefan,

thank you for looking at my code and your proposal. Since I'm not a real perl coder I completely relay on your suggestions. So I certainly will give that a try and let you know if it works. Many thanks! Hans-Peter

Hi Hans-Peter,

I've just refreshed your PR branch with the latest commits from the main branch and pushed some updates to the prtg output format.

You can try with that now and if that works (I believe it should) I'll merge this PR ;-) (and probably make a new release soon)

hpza commented 2 years ago

Hi Stefan,

I just tested and it works, except that the channels 'latest' and 'latest_full' don't. The value of a channel is always an integer or float, it 'cant' be a text! So we can add this information in the global text element, like I did it before. So I pushed this change again. Maybe this could be done in a more elegant way? Thanks

pgstef commented 2 years ago

Hm, right. Let's move the "not int/float" into text.

I still don't really want to keep a list of the "not count" or "not time" keys. We can simply have:

    # Define which @longmsg keys will use TimeSeconds or Count units.
    # Otherwise, it will be added to TEXT message.
    my @TimeKeys = ("latest_age", "latest_full_age", "latest_archive_age");
    my @CountKeys = ("full", "diff", "incr", "num_archives", "num_missing_archives");
    my @textmsg;

    foreach my $msg_to_split (@longmsg) {
        my ($key, $value) = split(/=/, $msg_to_split);

        if ( grep /^$key$/, @TimeKeys ) {
            chop($value);
            $results .= "<result><channel>$key</channel><value>$value</value><Unit>TimeSeconds</Unit></result>";

        } elsif ( grep /^$key$/, @CountKeys ) {
            $results .= "<result><channel>$key</channel><value>$value</value><Unit>Count</Unit></result>";

        } else {
            push @textmsg, $msg_to_split;
        }
    }

    $text .= " - ". join( ', ', @textmsg )  if @textmsg;
    $text .= "</text>";

So, every keys not in our lists will be sent to text.

That will produce:

# --service=archives
WAL_ARCHIVES OK - 5 WAL archived, latest archived since 1m25s | latest_archive_age=85s num_archives=5

# --service=archives --output=prtg 
<prtg>
<result><channel>status</channel><value>0</value><LimitMaxWarning>0</LimitMaxWarning><LimitMaxError>1</LimitMaxError><LimitMode>1</LimitMode></result>
<result><channel>latest_archive_age</channel><value>101</value><Unit>TimeSeconds</Unit></result>
<result><channel>num_archives</channel><value>5</value><Unit>Count</Unit></result>
<text>WAL_ARCHIVES OK - 5 WAL archived, latest archived since 1m41s</text>
</prtg>

# --service=archives --output=human
Service        : WAL_ARCHIVES
Returns        : 0 (OK)
Message        : 5 WAL archived
Message        : latest archived since 1m37s
Long message   : latest_archive_age=1m37s
Long message   : num_archives=5
Long message   : archives_dir=archive/c7pg/13-1
Long message   : min_wal=00000001000000000000000D
Long message   : max_wal=000000010000000000000011
Long message   : latest_archive=000000010000000000000011
Long message   : latest_bck_archive_start=00000001000000000000000F
Long message   : latest_bck_type=incr
Long message   : oldest_archive=00000001000000000000000D
Long message   : oldest_bck_archive_start=00000001000000000000000D
Long message   : oldest_bck_type=full
# --service=retention
BACKUPS_RETENTION OK - backups policy checks ok | full=1 diff=1 incr=1 latest=incr,20210831-121527F_20210831-121533I latest_age=609s latest_full=20210831-121527F latest_full_age=615s

# --service=retention --output=prtg 
<prtg>
<result><channel>status</channel><value>0</value><LimitMaxWarning>0</LimitMaxWarning><LimitMaxError>1</LimitMaxError><LimitMode>1</LimitMode></result>
<result><channel>full</channel><value>1</value><Unit>Count</Unit></result>
<result><channel>diff</channel><value>1</value><Unit>Count</Unit></result>
<result><channel>incr</channel><value>1</value><Unit>Count</Unit></result>
<result><channel>latest_age</channel><value>616</value><Unit>TimeSeconds</Unit></result>
<result><channel>latest_full_age</channel><value>622</value><Unit>TimeSeconds</Unit></result>
<text>BACKUPS_RETENTION OK - backups policy checks ok - latest=incr,20210831-121527F_20210831-121533I, latest_full=20210831-121527F</text></prtg>

# --service=retention --output=human 
Service        : BACKUPS_RETENTION
Returns        : 0 (OK)
Message        : backups policy checks ok
Long message   : full=1
Long message   : diff=1
Long message   : incr=1
Long message   : latest=incr,20210831-121527F_20210831-121533I
Long message   : latest_age=10m14s
Long message   : latest_full=20210831-121527F
Long message   : latest_full_age=10m20s

The keys are joined with $text .= " - ". join( ', ', @textmsg ) if @textmsg;. Perhaps replacing - with | like in the nagios output would make sense ?

hpza commented 2 years ago

fantastic! just tested on PRTG: image

Thanks!

pgstef commented 2 years ago

Thanks for your feedback and tests !

I've added the changelog entry and merged the PR.

hpza commented 2 years ago

great! thanks!