restic / restic

Fast, secure, efficient backup program
https://restic.net
BSD 2-Clause "Simplified" License
26.42k stars 1.56k forks source link

Backup option to remove a leading path prefix #2092

Open cdhowie opened 5 years ago

cdhowie commented 5 years ago

Output of restic version

restic 0.9.3 compiled with go1.11.1 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

It would be helpful to have an option to restic backup that says "strip this leading path from the paths of all backed-up files." For example, --backup-root /some/path. This would have the following effects:

(I think this may be related to #1376.)

What are you trying to do?

One of our backup scripts is run on a system with many running services that a cannot stop. These services guarantee that recovery is possible from a specific point in time (e.g. they do enough journaling to get their data back in a consistent state following a power cut). However, restic backups are not atomic; therefore, restic backups break the recovery guarantee from the service.

To fix this, we:

  1. Take an LVM snapshot of /. The snapshot is an atomic block-level copy of the entire volume.
  2. Mount the LVM snapshot under /mnt/backup-snapshot.
  3. Run the restic backup against /mnt/backup-snapshot.
  4. Unmount the LVM snapshot.
  5. Delete the LVM snapshot.

This makes the backup truly point-in-time and guarantees that the restored backup is effectively in a consistent state.

Unfortunately, this also causes files to be stored in our restic repository with the (useless) prefix /mnt/backup-snapshot. This can complicate restore efforts, and it's also a bit confusing if you don't know the details of how the backup was created.

The only feasible workaround I can think of is to run the backup within a chroot. While not the end of the world, it might be nicer for restic to provide an option to remove some leading prefix from files.

themightychris commented 5 years ago

Here's an older incarnation of this request I found: #555

aliron19 commented 5 years ago

+1

I also think this would be a really useful feature.

fd0 commented 5 years ago

So, let me summarize: you're running restic backup /mnt/backup-snapshot, so the file /mnt/backup-snapshot/foo is /mnt/backup-snapshot/foo in the snapshot, but you'd like it to be /foo. Is that correct?

You can achieve that with restic > 0.9.0 by changing the current directory, just run cd /mnt/backup-snapshot and then restic backup ..

Does that work for you?

sstallion commented 5 years ago

Changing the cwd works, but I've noticed that there is an unpleasant side-effect if using files for include/exclude. It seems that if absolute paths are placed in there, then they will be skipped when changing the cwd. I'd much rather use absolute paths as well - for now I'll probably head down the chroot path, but I agree it would be nicer to have something similar to the -C flag in tar.

arosl commented 5 years ago

I think this fake root option would be a useful feature. I would love to to the same as cdhowie but with apfs snapshot on macOS. To access the readonly apfs snapshots they need to be mounted somewhere. But when restore I would like the "original" path to be the canonical path stored in the snapshot.

the cd trick is unfortunately not optimal as I have lots (125) of absolute paths collected from StdExclusions.plist (macOS standard backup exclusion list) and all files and folders mdfind can find with the com_apple_backup_excludeItem attribute set.

blurayne commented 5 years ago

Just leaves the problem if you put /mnt into the ignore-file and start backup from /mnt/fs-snapshot it will exclude itself.

Plus cd $path && restic backup . still gives $path in the snapshot overview, while pathes in the snapshot are /-based.

I found a workaround with proot.

whi-tw commented 4 years ago

I also wanted to find a way to remove the path prefix. My use case is slightly different - I'm creating a zfs snapshot (fs@$(date +%s)) and wanted to back this up without having to mount it (/path/to/mount/.zfs/snapshots/${TS}) - this way hopefully I don't have to worry about the snapshot not unmounting and then hanging around forever in the case of something crashing.

The restic forget output for this makes me think that snapshots with different paths won't be forgotten as-per the schedule (daily / weekly / etc.).

The proot comment from @blurayne was a nice starting point, I think i've come to the same conclusion:

$snap_path="/path/to/where/snapshot/is/accessible"
$orig_fs="/path/to/filesystem"
proot -b "${snap_path}":"${orig_fs}" restic backup "${orig_fs}"

This works nicely, and now all the snapshots have the same path, with no cd or pushd required. Also, proot is available in user-space, so if backups aren't done as root, it's still possible.

cr1st1p commented 4 years ago

My use case: dumping database data into a temporary directory, like /tmp/tmpzmn28r02 (obtained via mktemp or python's mkdtemp()) and then backing it up. This method will mark all files between 2 snapshots as being different. So I need a way to tell restic to totally ignore the temporary directory prefix. Another possible use case: today I have all my pictures into '/mnt/something/pictures' but tomorrow, same content will be under '/mnt/external/pictures-from-home' (different partitioning scheme/whatever)

Also, if you want to use restic and backup multiple directories in the same run, in order to use the same snapshot, this gets even more complicated.

Until a fix is done, I'm going to use the 'proot' proposal - thank you @blurayne and @whi-tw

mmospanenko commented 4 years ago

Hi! I have similar case. For Example I have folders

/srv/my/long/server1/path/data (with many subfolders and dozen of files)
/path/to/dump.sql
/path/certbot.tar.gz

so I want to get backup like

/data
/dump.sql
/certbot.tar.gz

and get ability to restore on other server (I don't know about previous folder structure) by different path (relative).

I have no ideas hot to solve this trivial task. Restic is amazing tool but... why it works so difficult for end users?

I'm copying in predefined backup folder (/backup) everything I need and there run Restic backup (via cd). But this solution works only for small amounts of data.

Will be great to have ability restore with --include subfolder right after template (or including this mask). ex.: restic restore --include data --target /my/new/path and get as result /my/new/path/data


Thank you @whi-tw for solution with proot -b /path/i/wanted:./path_in_repo restic backup . - it works for me.

rolfw commented 4 years ago

My use case is migrating snapshots from other backup solutions to restic (Time Machine and disk images in my case).

I migrate them from where I mount the image or the subdirectory of the snapshot created by TM, which can get very long, e.g. /Volumes/TimeMachine-Backups/Backups.backupdb/MacBook Pro/2019-05-22-185113/Macintosh SSD/.

The cd solution works when using restic mount and restic restore, but the absolute path of the original snapshot is listed when I run restic snapshots.

Since it's a migrated snapshot, I'd like that to be the path from where the original snapshot has been taken too. Apart from that, with the long paths, it also makes to output of restic snapshots a bit noisy.

A flag to set an alternative prefix would be ideal for me too.

rolfw commented 4 years ago

This works nicely, and now all the snapshots have the same path, with no cd or pushd required. Also, proot is available in user-space, so if backups aren't done as root, it's still possible.

This would have been a nice workaround, but proot isn't available on macOS and doesn't seem to come anytime soon (most of code written against Linux specifically): Does PRoot work on MacOSX?

Is there another workaround that comes to mind?

cdhowie commented 4 years ago

My use case: dumping database data into a temporary directory, like /tmp/tmpzmn28r02 (obtained via mktemp or python's mkdtemp()) and then backing it up. This method will mark all files between 2 snapshots as being different.

Note that the files probably are different anyway; database backups usually include a timestamp in the first few lines.

themightychris commented 4 years ago

You can tune database dump commands to exclude dynamic comments and be sorted by primary keys though to make slow changing data really dedupable

cr1st1p commented 4 years ago

An update: 'proot' worked only on one machine for me, on another it segfaults. An alternative to it (newer) - bubblewrap Added a wrapper over it (attached) which should work with the same '-b' parameters. It seems to work so far. Note that depending on your needs and directory locations, you might have to change the wrapper a bit. I hope it helps you guys, but I'm looking forward for the support inside restic itself.

proot.sh.txt

pohly commented 4 years ago

I tried proot. It seems to break the ability to run restic as non-root with additional capabilities (https://restic.readthedocs.io/en/stable/080_examples.html#full-backup-without-root); at least I got scan: Open: open /.pulse: permission denied errors that I didn't get when running restic without proot.

Same problem with bwrap.

So to me, stripping a path prefix in restic itself still seems useful.

andynd commented 4 years ago

This missing feature makes it harder than it needs to be to backup VMs. My VM snapshots end up in a temporary folder and are then backed up by restic. This results in the following:

ID        Time                 Host         Tags        Paths
--------------------------------------------------------------------------------------
02c536db  2020-04-10 14:28:27  resolver-02              /tmp/tmp.vOFFxxly9O/config.xml
c5709aed  2020-04-10 14:28:29  resolver-02              /tmp/tmp.vOFFxxly9O/sdb.img
a88cc1e7  2020-04-10 14:36:22  resolver-02              /tmp/tmp.FoY1j5JPIZ/config.xml
7c44e6ee  2020-04-10 14:36:24  resolver-02              /tmp/tmp.FoY1j5JPIZ/sdb.img
65456111  2020-04-10 14:37:48  resolver-02              /tmp/tmp.vjtI9JE3Iz/config.xml
eaced756  2020-04-10 14:37:49  resolver-02              /tmp/tmp.vjtI9JE3Iz/sdb.img
8eccec2c  2020-04-10 16:04:30  resolver-02              /tmp/tmp.YtLYRd0rNI/config.xml
34c897e1  2020-04-10 16:04:31  resolver-02              /tmp/tmp.YtLYRd0rNI/sdb.img
99b67b97  2020-04-10 16:07:53  resolver-02              /tmp/tmp.aWaEDqAaTq/config.xml
cad2c9d8  2020-04-10 16:07:54  resolver-02              /tmp/tmp.aWaEDqAaTq/sdb.img
--------------------------------------------------------------------------------------

This breaks restic forget because it doesn't recognize it's the same file and keeps a snapshot for every instance. I'd prefere if there was a way to either remove a known prefix or only store a relativ path, no absolut. I'm already calling restic with a relativ path and ching in the temporary folder. Doesn't help sadly and I'd prefere not having to use bindmounts for this.

cdhowie commented 4 years ago

This breaks restic forget because it doesn't recognize it's the same file and keeps a snapshot for every instance.

We run into this too but the solution is pretty straightforward: tag each backup based on the file(s) being backed up.

For example, you could use the tags config.xml and sdb.img here. Then add --group-by host,tags when running restic forget.

themightychris commented 4 years ago

What makes this feature so hard to implement? Isn't it just same basic string filtering on snapshot metadata? The value it would bring is enormous. Yeah you can workaround with tagging, but there's a path field and it could be usable...

pohly commented 4 years ago

What makes this feature so hard to implement?

Speaking as a developer myself (not of restic, but other open source projects): it's often not the complexity of a feature that prevents implementing it, but rather mundane things like lack of time, motivation or simply "real life"...

themightychris commented 4 years ago

Of course, my aim was not to be critical, genuinely looking to map out the complexity for potential contributors

TheRealVincentVanGogh commented 4 years ago

Hi all, I've started working on implementing this "custom root" function. The implementation itself was seemingly simple, though I've had to learn golang having previously only known C#... Anyways, I'm trying to gauge what sort of support this issue still has, seeing as this stems from 2018, 2 years ago. I'll be committing to https://github.com/TheRealVincentVanGogh/restic/tree/2092-feature-custom-path-prefix soon, should anyone want to help me out with golang 😅. Hopefully soon later, I'll be putting up a pull request here.

themightychris commented 4 years ago

@TheRealVincentVanGogh I'm not going to learn Go, but I'm still eager for this feature and have a ton of backups I still want to port to restic but for this issue. Open a PR once you have something that looks like it's working and post the link here, I'll lend some heavy testing

MichaelEischer commented 4 years ago

@TheRealVincentVanGogh How does your planned implementation relate to PR #2010?

TheRealVincentVanGogh commented 4 years ago

@TheRealVincentVanGogh How does your planned implementation relate to PR #2010?

@MichaelEischer Oh shoot! Looks like someone beat me to it already. Yeah, PR #2010 is exactly what I'm in the midst of... re-accomplishing... Darn. Perhaps @cdhowie could link PR #2010 to this issue to help avoid future confusion? Thanks!

@themightychris Here's a link to that PR. Looks like dev dropped out in 2018 as well... curious.

Edit:

There seems to be some ambiguity between removing a path prefix from the snapshot file VS. removing a path prefix from every file structure + snapshot file. Looks like PR #2010 only addresses the former. Since OP was looking for "strip this leading path from the paths of all backed-up files" (AKA, file-structure level path fixing) I have to take back what I said about linking PR #2010 to this issue. Sorry for the mention cdhowie!

Nevertheless! @MichaelEischer My intentions have always been to get a file-structure + snapshot level path prefix slicing implementation in Restic (man that's a long feature/sentence). So I'll most likely begin working on that off of PR #2010 's existing code, which should speed up implementation.

P.S. I'm pretty busy nowadays so work might be slow for a while; of course, I'll post a PR when I think I have something worth sharing with all you folks! Stay Safe Everyone! 😄

intentionally-left-nil commented 3 years ago

The proot solution is still causing data to be uploaded even when nothing changes. From running restic diff it says the tree blobs have changed:

Files:           0 new,     0 removed,     0 changed
Dirs:            0 new,     0 removed
Others:          0 new,     0 removed
Data Blobs:      0 new,     0 removed
Tree Blobs:  19789 new, 19789 removed
  Added:   30.445 MiB
  Removed: 30.445 MiB

Any ideas?

rawtaz commented 3 years ago

@AnilRedshift Have you tried the options to the backup command (see restic help backup)?

intentionally-left-nil commented 3 years ago

edit: my issue is https://github.com/restic/restic/issues/3041

malemburg commented 3 years ago

So, let me summarize: you're running restic backup /mnt/backup-snapshot, so the file /mnt/backup-snapshot/foo is /mnt/backup-snapshot/foo in the snapshot, but you'd like it to be /foo. Is that correct?

You can achieve that with restic > 0.9.0 by changing the current directory, just run cd /mnt/backup-snapshot and then restic backup ..

Does that work for you?

This does indeed work for the files in the snapshot (they are rooted at the current dir when using e.g. the relative "." dir as starting point for the backup), but the snapshots listing still lists the absolute path of where the snapshots was taken. I think that is confusing users.

Perhaps all that is needed is to adjust the snapshots command output.

fthdgn commented 3 years ago

I need this feature to correctly backup my flash drive device. I scheduled a Windows task, and whenever a device with a specific name is connected to the PC, it backups that drive. Sometime it's E:/ sometime it's F:/, it depends on other external devices. I overrode host name as device name, but it would be great to overrode path part as /.

greedystack commented 2 years ago

+1

bgurmendi commented 1 year ago

¿Can we pay for this functionality?

perariontaeadastra commented 1 year ago

+1

andreymal commented 1 year ago

Also note a specific use case from the issue linked above — there may be multiple prefixes, for example:

It seems that cd doesn't help here, and the only possible way is to make two independent snapshots, which is not really convenient

It could probably be solved by specifying both source and destination, something like:

restic backup /local-path:/snapshot-path /Volumes/Some/dirToBackup:/dirToBackup /Volumes/ext_hdd/docs:/docs
hruzgar commented 10 months ago

Just wrote a comment in the restic forum about this exact same issue <) link We need this please!!

rybalkoss commented 10 months ago

+1, using restic to backup external ssd while using macos/ubuntu/windows, very important to be independent from machine/user

axeldunkel commented 10 months ago

There are many different scenarios that require something like this - that's why tar implemented the --strip-components=NUMBER option what feels like centuries ago. It's relatively easy to implement and has a wide range of use cases.

MichaelEischer commented 7 months ago

The plan is to include this feature (at least in a basic version) in restic 0.18: https://forum.restic.net/t/roadmap-for-restic-0-17-to-0-19/7197 .

bluppfisk commented 5 months ago

I need this feature to correctly backup my flash drive device. I scheduled a Windows task, and whenever a device with a specific name is connected to the PC, it backups that drive. Sometime it's E:/ sometime it's F:/, it depends on other external devices. I overrode host name as device name, but it would be great to overrode path part as /.

this exactly, especially because attempting to restore a snapshot with path \\?\Volume{77013d6b-0000...} (which is Windows' way of referring to a volume by its ID rather than drive letter) is considered illegal ("invalid child node name").

piscocgn commented 5 months ago

I came across this issue with the same question: how can I remove the leading directory path during restore. For the the dump with -a tar works fine, passing the output to tar, which provides an option to strip so called components from the path:

restic dump latest /path/to/my/directory2restore -a tar | tar -xv --strip-components=3 -C restore/

ends up with ./restore/directory2restore, "v" can be omitted from tar to make it more quiet.

MichaelEischer commented 4 months ago

I came across this issue with the same question: how can I remove the leading directory path during restore.

To restore only a subfolder, you can use the <snapshot>:<subfolder> syntax described at https://restic.readthedocs.io/en/stable/050_restore.html#restoring-from-a-snapshot . That is much faster than using dump.

guybrush commented 1 month ago

I also ran into this problem, a possible solution can be to use docker and map the directories to what you need.

For example if the path in your repo is /home/me/foo/bar but on your current computer its /home/someone/x/y you can just run docker run -v /home/someone/x/y:/home/me/foo/bar restic/restic backup --repo <your-repo> and it will use the same metadata.

It still would be nice if restic would provide a way to change paths of existing backups in repos tho.

andreymal commented 1 month ago

When using Docker, /etc/passwd and /etc/group need also to be synchronized to make sure file owners are properly backed up

And since the restic/restic image is based on Alpine, this may cause conflicts with Alpine's users (but maybe this is not a problem)

This is also incompatible with the userns-remap Docker option