pgstef / check_pgbackrest

pgBackRest backup check plugin for Nagios
PostgreSQL License
35 stars 14 forks source link

service=retention should check presence of backups #17

Closed mbanck closed 3 years ago

mbanck commented 3 years ago

I have the feeling (though I have not checked the code), that check_pgbackrest -s retention only runs pgbackrest info and verifies that output for sanity.

However, if I e.g. move away the directory of a full backup (simulating a mistaken delete), check_pgbackrest keeps considering the backups to be OK:

test(postgres,t4,postgres):/srv/container/postgres-test4> check_pgbackrest -c $PGETC/pgbackrest.conf -s retention -S test4 ; echo
BACKUPS_RETENTION OK - backups policy checks ok | full=2 diff=0 incr=0 latest=full,20201215-135410F latest_age=2614s
test(postgres,t4,postgres):/srv/container/postgres-test4> mv backup/pgbackrest/backup/test4/20201215-133738F/ backup/
test(postgres,t4,postgres):/srv/container/postgres-test4> check_pgbackrest -c $PGETC/pgbackrest.conf -s retention -S test4 ; echo
BACKUPS_RETENTION OK - backups policy checks ok | full=2 diff=0 incr=0 latest=full,20201215-135410F latest_age=3095s
test(postgres,t4,postgres):/srv/container/postgres-test4> /srv/container/postgres-test4/bin/pgbackrest --config /srv/container/postgres-test4/etc/pgbackrest.conf --stanza test4 info
stanza: test4
    status: ok
    cipher: none

    db (current)
        wal archive min/max (10-1): 0000000200000A32000000F9/0000000200000A32000000FD

        full backup: 20201215-133738F
            timestamp start/stop: 2020-12-15 13:37:38 / 2020-12-15 13:43:19
            wal start/stop: 0000000200000A32000000F9 / 0000000200000A32000000F9
            database size: 250.3GB, backup size: 250.3GB
            repository size: 6.6GB, repository backup size: 6.6GB

        full backup: 20201215-135410F
            timestamp start/stop: 2020-12-15 13:54:10 / 2020-12-15 13:59:50
            wal start/stop: 0000000200000A32000000FB / 0000000200000A32000000FB
            database size: 250.3GB, backup size: 250.3GB
            repository size: 6.6GB, repository backup size: 6.6GB

I think it would be prudent for check_pgbackrest to at least check that those backup directories exist and/or are actually a backup

pgstef commented 3 years ago

Indeed, the retention service trust the pgBackRest info command and only check that output without checking anything on disk. The purpose of this service is to validate the retention policy, not verifying the backups themselves.

The underlying problem here is that the info command trust the backup.info manifest and doesn't verify the repository content (yet). There's a WIP in the pgBackRest side to implement a verify command. I'm then not sure if that's something to add here or to wait the verify command from pgBackRest itself.

Now that we'll only rely on the pgBackRest repo* commands to reach the repository content in the next release, it could probably be possible to make sure that the backup.manifest file is there for each backup but then what would be the output ? Not really a retention issue but more a "repo consistency" issue.

mbanck commented 3 years ago

I think implementing a full-blown verify (i.e. checking against a manifest that all files for a backup are available, possibly even verifying the checksum) is out-of-scope (and would probably not be feasible time/performance-wise).

But I could image that somebody doing rm -rf on a backup directory to reclaim space is not out of the question, so having that flagged would be sensible.

In my opinion, if the retention policy is e.g. 2 full backups, and one of them is missing (even though pgBackRest doesn't know it yet), then the retention policy is violated and check_pgbackrest should return CRITICAL.

By the way, it would be nice if check_pgbackrest could figure out the retention policy from pgbackrest.conf (if provided), would that be possible?

pgstef commented 3 years ago

I've just pushed the directory check-up in the v2 development branch. Does it look like what you expected ? You can probably try this dev version before release, it would be much appreciated.

Regarding the retention policy settings, I prefer not to parse the configuration files since those settings could be anywhere (given the new multi-repo config possibilities, remote backup hosts,...). And that's also better to define it manually here so it will detect configuration errors/changes in the pgBackRest side.

pgstef commented 3 years ago

v2.0 have been released and should solve this issue. Don't hesitate to reopen it the request isn't answered. Kind regards