nextcloud / backup

Backup now. Restore later.
GNU Affero General Public License v3.0
248 stars 37 forks source link

Storage requirements & incremental backups #101

Open dodedodo opened 2 years ago

dodedodo commented 2 years ago

The problem Making regular 'full-backups' every other week incurs a high toll on uptime and resources. Each full backup needs 65% free disk space, and enables maintenance mode for a significant amount of time. This is theoretically unnecessary because most of that data is already stored off-site in the previous full-backup. The whole process is for the most part redundant.

However, regular full backups are currently necessary because the backup app implements differential rather than incremental backups.

A differential backup always take the last full-backup as starting point, making every partial backup larger than the previous one. An incremental backup doesn't exhibit that problem because it takes the last full OR partial as starting point.

The solution By implementing incremental backups, the need for full-backups reduces significantly. The only remaining reason to create full-backups is to cut back on restore times. I feel like a lot of admins would take the trade-off of increasing disaster-recovery times in favour of uptime. Choosing for example to make a full-backup once every few months.

Also, if a regular full backup isn't necessary, an admin could choose to incur the major time penalty of writing a full-backup directly to off-site storage. This would reduce the disk space requirements of the nextcloud instance by ~65% (!).


What are your thoughts on this? Would you consider implementing incremental backups?

EDIT: Readability.

ArtificialOwl commented 2 years ago

While this might be interesting, restoring incremental backups require a lot of time. It might be a solution the add incremental backups between differential backups ending with:

May I ask you what is the current size of your last differential backup ?

dodedodo commented 2 years ago

What is your estimate on that extra time requirement? I'd say download + decompress + disk-write for each incremental I need to restore, am I missing something? With highly redundant datacenter storage, I don't expect to restore from backup often (if ever 🤞). I don't mind waiting a day for the restore to finish.

Your suggestion would improve things a bit, but not at the same rate full+incremental would. But if the settings are there I could 'abuse' them by setting full backups to something ridiculous like once a decade (run on demand), and use differential as full instead.

I'm not running backups through the app yet (had issues with external storage). But I'd very roughly estimate ~10GB/week file changes, on top of a ~300GB full-backup. This could grow in the future.

ArtificialOwl commented 2 years ago

Your suggestion would improve things a bit, but not at the same rate full+incremental would. But if the settings are there I could 'abuse' them by setting full backups to something ridiculous like once a decade (run on demand), and use differential as full instead.

So we're good :) The customization will help to fit instances with huge data and low edits as much as small data with huge edits.

Now, the last factor is the time to spend on its implementation; but I have added a tag to this request. Just be patient :]

neufeind commented 2 years ago

Maybe we can make use of efficient backup-handling and deduplication of tools like borg? Part of that needed functionality could imho be provided with a driver-based approach handing over part of the handling from this plugin to for example borg.

288

dodedodo commented 2 years ago

Hey @neufeind,

While I agree that borg is a nice and efficient tool for backups, I personally don't see how this fits within nextcloud-backup's scope.

As far as I can tell, the backup app is meant as a simple and approachable way to set up relatively simple backups. Borg is very powerful, but far from simple. Nextcloud backup would need to add a whole lot of toggles and buttons to mirror borg's features. Or maybe a user would set up their borg repository outside of nextcloud and provide a command string to nc-backup. In which case, the user is probably familiar with sysadmin stuff, and should just add a regular cron-job while they're at it.

Another issue is the assumption that borg can be installed by the user.

Your idea is different from mine because incremental backups would provide a lot of technical advantages under-the-hood, without exposing much more options in the UI. Zero even, if the decision is made to swap 'differential' for 'incremental' altogether.

There's absolutely nothing wrong with using borg/restic/duplicati/whatever. If you got the expertise, go for it!