Also the script does rotation without checking if the backup was successful or not.
This is a pitfall for two possible very bad scenarios:
something goes wrong, and all backups since that time become broken. For example, mariabackup is not successful anymore. But backup rotation still works and in N days user will have N corrupted backups and zero good backups.
server goes offline for a time longer than --delete-days in cron job. After going online it will delete all backups except the last taken.
The two scenarios can combine: server goes offline for N+M days, than it goes offline but now docker stops working: /var/lib/docker failed to mount. Backup volume however mounted correctly, so now the cron job creates incorrect backup and deletes all good backups.
I suggest:
Include backup was successful check in the script and run rotation only if it was ok
Replace --delete-days parametes with safer approach like --number-backups-to-keep
Motivation
Users will be less likely to find themselves without correct backup.
Also, if the backup script can provide exit code indication backup success or failure, it can be integrated with monitoring.
Additional context
When I was just starting to work in IT, I've learned the idea about backup rotation only after checking that taken backup is correct the hard way. Let's keep everybody else from that experience :)
A discussion on implementing existing backup software like borgbackup or restic might be appropriate here. This is basic functionality that has been implemented at least a dozen times. Why reinvent the wheel?
Summary
backup_and_restore.sh
has--delete-days
parameter that works very straightforward:Also the script does rotation without checking if the backup was successful or not.
This is a pitfall for two possible very bad scenarios:
something goes wrong, and all backups since that time become broken. For example,
mariabackup
is not successful anymore. But backup rotation still works and in N days user will have N corrupted backups and zero good backups.server goes offline for a time longer than
--delete-days
in cron job. After going online it will delete all backups except the last taken.The two scenarios can combine: server goes offline for N+M days, than it goes offline but now docker stops working:
/var/lib/docker
failed to mount. Backup volume however mounted correctly, so now the cron job creates incorrect backup and deletes all good backups.I suggest:
--delete-days
parametes with safer approach like--number-backups-to-keep
Motivation
Users will be less likely to find themselves without correct backup.
Also, if the backup script can provide exit code indication backup success or failure, it can be integrated with monitoring.
Additional context
When I was just starting to work in IT, I've learned the idea about backup rotation only after checking that taken backup is correct the hard way. Let's keep everybody else from that experience :)