restic / restic

Fast, secure, efficient backup program
https://restic.net
BSD 2-Clause "Simplified" License
25.17k stars 1.51k forks source link

Full Backup Flag to eliminate this SPOF #3561

Open layer7gmbh opened 2 years ago

layer7gmbh commented 2 years ago

Output of restic version

restic 0.12.0 compiled with go1.15.5 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

Hi,

right now, if i understand it correctly, restic will first pull a whole copy of the to be backup'ed files.

For the next backup run, thanks to deduplication, it will just copy changed files and reference unchanged files to the already existing full copies that has been taken at the very first backup.

I would like to have a flag to force restic to take a full copy again and actually make this new full backup being the reference for future deduplication maths.

The existing --force / --ignore-ctime / --ignore-inode are not helpful to achieve this.

What are you trying to do? What problem would this solve?

The way restic works now is actually building a SPOF.

If, for what ever reason ( physical or logical damage ) this first whole copy becomes corrupted, all the following taken backups alias snapshots might be worth nothing.

This situation is in real life actually happening more frequently than anything else. Because of a physical drive failure some parts/sectors of a drive ( no matter if its a single drive or an array of drives ) might become unreadable. If you have a SPOF in your data, better hope that this SPOF is not located in this failing area.

Or think about a logical mistake. Some script or human had a bad day and killed parts of data accidently. Having here a SPOF by design is just a bad idea as backups should ideally not be ( too much ) doubted.

In most backup strategies you actually do a full copy then various incrementals and then again a full copy. So during a month you might hold 1-2 full copies or even more and erase this way the logical SPOF.

Of course you could create for every new full copy just a new repository in restic. But this would actually organisatoric complexity by the need to init again a repository with a new password and giving any kind of automatics a hard life like this if for the same server/backup multiple repositories/passwords/what ever has to be managed.

Maybe its possible to make the code consider the latest existing full copy for deduplication only -- without loosing too much speed or adding too much complexity to the code.

Did restic help you today? Did it make you happy in any way?

We are actually coming from backuppc and want to / are currently migrating to restic.

Restic is faster, scriptable for own automation and thanks to the REST server more in the "now" and not the past, enabling quiet some other features to access/work with the backed'up data. The tagging feature is also a smart idea.

Restic clients are available for the major OS through precompiled ready-to-run clients or are even part of the repository of the used (linux) distribution.

The whole picture of restic is so far the best i could find to get the job done. We started the development of a simple webbased UI that we will also publish for free on git as soon as its ready to give back the community something.

Thank you for your time and work! Greetings Oliver

MichaelEischer commented 2 years ago

This is partially related to #804 and #256. Telling restic to store blobs twice would however have to work a bit different than telling a specific backup run to store blobs a second time. Such a feature must be integrated with the prune command which removes no longer used blobs. Thus we'd probably end up with something that tells restic to keep two copies of each blob that is used more than once. That way the redundancy would be added automatically (one just has to specify the amount of redundancy) and pruning a backup wouldn't remove all duplicates later on.

Currently the best way to avoid such SPOFs is to use to repositories and either store the backups there alternatingly or to use the copy command to sync data between these repositories (the copy command verifies data before copying).

A proper solution to avoid SPOFs would also require taking care of #3404 and probably a few other well hidden issues.