Closed mbanck closed 3 years ago
Indeed, the retention
service trust the pgBackRest info command and only check that output without checking anything on disk. The purpose of this service is to validate the retention policy, not verifying the backups themselves.
The underlying problem here is that the info command trust the backup.info
manifest and doesn't verify the repository content (yet). There's a WIP in the pgBackRest side to implement a verify command. I'm then not sure if that's something to add here or to wait the verify command from pgBackRest itself.
Now that we'll only rely on the pgBackRest repo*
commands to reach the repository content in the next release, it could probably be possible to make sure that the backup.manifest
file is there for each backup but then what would be the output ? Not really a retention issue but more a "repo consistency" issue.
I think implementing a full-blown verify
(i.e. checking against a manifest that all files for a backup are available, possibly even verifying the checksum) is out-of-scope (and would probably not be feasible time/performance-wise).
But I could image that somebody doing rm -rf
on a backup directory to reclaim space is not out of the question, so having that flagged would be sensible.
In my opinion, if the retention policy is e.g. 2 full backups, and one of them is missing (even though pgBackRest doesn't know it yet), then the retention policy is violated and check_pgbackrest
should return CRITICAL
.
By the way, it would be nice if check_pgbackrest
could figure out the retention policy from pgbackrest.conf
(if provided), would that be possible?
I've just pushed the directory check-up in the v2 development branch. Does it look like what you expected ? You can probably try this dev version before release, it would be much appreciated.
Regarding the retention policy settings, I prefer not to parse the configuration files since those settings could be anywhere (given the new multi-repo config possibilities, remote backup hosts,...). And that's also better to define it manually here so it will detect configuration errors/changes in the pgBackRest side.
I have the feeling (though I have not checked the code), that
check_pgbackrest -s retention
only runspgbackrest info
and verifies that output for sanity.However, if I e.g. move away the directory of a full backup (simulating a mistaken delete),
check_pgbackrest
keeps considering the backups to be OK:I think it would be prudent for
check_pgbackrest
to at least check that those backup directories exist and/or are actually a backup