restic / restic

Fast, secure, efficient backup program
https://restic.net
BSD 2-Clause "Simplified" License
26.01k stars 1.55k forks source link

Backup to multiple repositories in one backup run of restic #4432

Open smessmer opened 1 year ago

smessmer commented 1 year ago

Output of restic version

restic 0.16.0 compiled with go1.20.6 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

Backup to multiple repositories in one backup run of restic

What are you trying to do? What problem would this solve?

I'm building a tool on top of restic to make periodic cloud-to-cloud backups easy to set up. For example, it could be deployed as a Docker image and configured to run once a day, and back up all the data you store in Dropbox or pcloud into a restic repository that itself is hosted on S3. Or it can periodically back up your S3 data to a restic repository in backblaze. Or backup your emails (imap) or git repositories into a restic repository stored in pcloud. During the backup run, many of those data sources are only mounted (e.g. with rclone) and not actually present on the machine running the backup.

To make the backups more resilient against one cloud provider failing, I would like to add functionality to back up to multiple different cloud providers, e.g. S3, backblaze, maybe also pcloud. As I see it, there are currently multiple ways of doing this but all have their downsides:

  1. I could just run restic multiple times, once for each backup destination. This is usually the method of choice if your data source is local. But since in my case the data source itself is just a rclone mount, running restic multiple times would mean downloading all the data multiple times. This is slow and, depending on the cloud provider, expensive.
  2. I could run a restic backup to one cloud provider and then use restic copy to copy any new snapshots to other cloud providers. This probably incurs fewer costs from the cloud providers, but it still requires downloading all the snapshots you just created so they can be uploaded to a different cloud. Also, you have to trust that the first cloud provider created the snapshots correctly and you don't get corrupted data back when you copy it to a different provider.
  3. I could run a restic backup to a repository on the local file system and then copy snapshots to several remote repositories. This has the least network cost, but keeping the repository on the local disk might break the available space on the local hard disk of the docker image.

The ideal solution I'm imagining is to run backup once but have it back up to multiple repositories at the same time. It would only download the data from the data source once, chunk it once, but then upload it to multiple repositories.

Did restic help you today? Did it make you happy in any way?

Restic is an amazing tool. I was using duplicity before this and restic is much better. The advanced deduplication (duplicity only deduplicates based on timestamps, not on file contents) means my backups only take about 10% of the space they did before.

konidev20 commented 1 year ago

Hey @smessmer,

I had a similar requirement in the past. My solution to this problem was that you can just initialize a single repository and backup to it. This would be your primary repository. After the backup is complete, you can run rclone sync between the primary repository and the secondary repository location. Rclone will take care of syncing only the differential files between the 2 repositories. Rclone sync works between cloud providers as well.

This is close to Option 3. that you have mentioned. The only difference is that you don't have to create a local repository.

This solution made sense because restic has a standard repository structure for all storage backends. For example, you can back up to an S3 storage location then move all the repository files to Azure Blob Storage location, then use azure storage backend to access the moved repository.

smessmer commented 1 year ago

Yeah I believe that is basically the same as option 2, just using a different tool to copy snapshots over. Both rsync and restic copy should only sync differences. But it comes with the downsides I mentioned above. Native support of multiple repositories in restic would be superior.

MichaelEischer commented 3 months ago

Related to https://github.com/restic/restic/issues/265 . Native support in restic would be rather complex to implement as the data in each repository can vary wildly. Restic would also have to separately load the index for each destination repository, which would significantly increase the memory usage.