restic / restic

Fast, secure, efficient backup program
https://restic.net
BSD 2-Clause "Simplified" License
26.02k stars 1.55k forks source link

Restic 0.10.0 always reports all directories "changed", adds duplicate metadata, when run on ZFS snapshots #3041

Open stephenedie opened 3 years ago

stephenedie commented 3 years ago

I run my backups from a ZFS snapshot in order to ensure the entire file-system is in a consistent state. After I upgraded to restic 0.10.0 from the previous official release, the backup started adding a duplicate copy of all the directory meta-data while claiming that all the directories have been changed. For example (pardon my bash):

# restic version
restic 0.10.0 compiled with go1.15.2 on freebsd/amd64

# commands executed for bug (repeatedly on unchanging 
# /usr/local/bin/restic backup -H $HOSTNAME --verbose=1 --cache-dir=$RESTIC_CACHE --exclude-file                 "${path}/${EXCLUDE_NAME}" "$path"
scan finished in 2.928s: 1604 files, 341.787 GiB

Files:           0 new,     0 changed,  1604 unmodified
Dirs:            0 new,   231 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:    219 new
Added to the repo: 17.876 MiB

The result occurs repeatedly after re-running the backup on a new ZFS snapshot of an otherwise static file-system. I expect it to work like the previous version in which directories were not seen to be "changed".

I tested this on the same file-system except without using a ZFS snapshot, and it does not report directories as "changed" or upload duplicate metadata. Therefore, this problem seems to be particular to using ZFS snapshots. My method for backing up from ZFS snapshots is as follows:

I find it interesting that restic is uploading new/unique directory meta-data with every run, suggesting that something about the directory meta-data is actually changing between runs. However, earlier versions of restic did not "see" these changes. I'm at a loss as to what's causing this.

In terms of severity, this is merely a nuisance to me---about 30ish MiBs added to the repo each day. However, I could see this being a bigger problem on systems with a lot more small files. Is there any way I can find out what aspect of the directory is being identified as "changed" from the command-line? Adding verbosity did not appear to do the trick.

stephenedie commented 3 years ago

Sorry, the formatting is a bit weird in my restic command example above. It should read:

# /usr/local/bin/restic backup -H $HOSTNAME --verbose=1 --cache-dir=$RESTIC_CACHE \
    --exclude-file "${path}/${EXCLUDE_NAME}" "$path"
rawtaz commented 3 years ago

Can you identify a file that is experiencing this problem, and stat it between runs so that we can see what the filesystem says about it? Also, there's a PR that adds a (for now named --trust-mtime) option that in practice ignores ctime changes and just looks at mtime changes. I have no indication that ctime is the problem, but if you want to you can try it to see if it makes a difference: https://github.com/restic/restic/pull/2823 . The stat information is probably more relevant though.

greatroar commented 3 years ago

2823 only changes the behavior for regular files, not directories.

aawsome commented 3 years ago

Is there any way I can find out what aspect of the directory is being identified as "changed" from the command-line?

You can run restic diff <old snapshot ID> <new snapshot ID> --metadata

stephenedie commented 3 years ago

Thank you for the suggestions! The metadata diff just listed every file and directory with a U to its left. Boring.

The output of running stat on a contained directory is more insightful. All the fields remain the same between different ZFS snapshots, except st_dev. The st_dev field specifies the id of the underlying device, so it makes sense that this changes between ZFS snapshots. It also makes sense that Restic treats these directories as novel as a consequence, and this behavior appears to be more correct than in previous versions. Curiously, it seems that the st_dev field is also being stored by Restic in the directory meta-data, which is why I'm uploading brand new tree blobs with each run. However previously this did not occur, so the older versions must have ignored st_dev when checking for changes even though it's part of the stored meta-data!

Things get weirder when I run stat on files instead of directories. The st_dev field still changes between zfs snapshots but Restic 0.10.0 doesn't seem to care and behaves as earlier versions did for directories. This is an inconsistency that should perhaps be corrected. I can't tell from program behavior whether Restic is storing st_dev with the file meta-data too. If so, it would seem more correct to also check st_dev when comparing files for changes, but that would cause my runs to re-scan all the file contents. :(

With this in mind, perhaps a switch can be added to always ignore st_dev when comparing for changes to accommodate my use-case and others were st_dev might be changing between runs? Thoughts?

rawtaz commented 3 years ago

Can you please be more elaborate when you describe this, for example can you commands and output of the stat commands? Would be nice to see what you're talking about here.

stephenedie commented 3 years ago

Here is a complete annotated session illustrating how the output of stat changes between ZFS snapshots and how this (presumably) affects Restic (starting with 0.10.0):

Step 1: Create new ZFS snapshot. Stat a file and a directory in that ZFS snapshot. Take note of value of the first field, which is st_dev:

# zfs snapshot main/media/video@restic
# stat /data/media/video/.zfs/snapshot/restic/download
10575765696535701816 27 drwxrwxr-t 2 xbmc media 18446744073709551615 19 "May 30 23:51:52 2020" "Jun 12 01:25:35 2017" "May 31 01:10:24 2020" "May 30 21:20:04 2020" 16384 49 0x800 /data/media/video/.zfs/snapshot/restic/download
# stat /data/media/video/.zfs/snapshot/restic/backups.txt
10575765696535701816 97 -rw-rw-r-T 1 root media 18446744073709551615 799 "May 30 21:20:12 2020" "May  1 18:53:09 2010" "May 30 21:20:12 2020" "May 30 21:20:12 2020" 4096 9 0x800 /data/media/video/.zfs/snapshot/restic/backups.txt

Step 2: Backup the contents of the ZFS snapshot. I presume Restic sees changed values for st_dev for all directories and adds tree blobs for them. I believe this behavior is new in 0.10.0. However, it still ignores st_dev for files:

# restic --cache-dir=./temp --verbose=1 backup /data/media/video/.zfs/snapshot/restic
open repository
repository XXXXXXXX opened successfully, password is correct
lock repository
load index files
using parent snapshot XXXXXXXX
start scan on [/data/media/video/.zfs/snapshot/restic]
start backup on [/data/media/video/.zfs/snapshot/restic]
scan finished in 3.199s: 1604 files, 341.787 GiB

Files:           0 new,     0 changed,  1604 unmodified
Dirs:            0 new,   231 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:    219 new
Added to the repo: 17.878 MiB

processed 1604 files, 341.787 GiB in 0:30
snapshot XXXXXXXX saved

Step 3: Run the backup again on the same ZFS snapshot. Note that nothing new is added to the repo this time:

# restic --cache-dir=./temp --verbose=1 backup /data/media/video/.zfs/snapshot/restic
open repository
repository XXXXXXXX opened successfully, password is correct
lock repository
load index files
using parent snapshot XXXXXXXX
start scan on [/data/media/video/.zfs/snapshot/restic]
start backup on [/data/media/video/.zfs/snapshot/restic]
scan finished in 3.311s: 1604 files, 341.787 GiB

Files:           0 new,     0 changed,  1604 unmodified
Dirs:            0 new,     0 changed,   231 unmodified
Data Blobs:      0 new
Tree Blobs:    219 new
Added to the repo: 0 B

processed 1604 files, 341.787 GiB in 0:04
snapshot XXXXXXXX saved

Step 4: Destroy the old ZFS snapshot and create a new ZFS snapshot of the exact same file-system. Note how st_dev is changed for the stat of both the file and directory:

# zfs destroy main/media/video@restic
# zfs snapshot main/media/video@restic
# stat /data/media/video/.zfs/snapshot/restic/download
11704657377658002536 27 drwxrwxr-t 2 xbmc media 18446744073709551615 19 "May 30 23:51:52 2020" "Jun 12 01:25:35 2017" "May 31 01:10:24 2020" "May 30 21:20:04 2020" 16384 49 0x800 /data/media/video/.zfs/snapshot/restic/download
# stat /data/media/video/.zfs/snapshot/restic/backups.txt 
11704657377658002536 97 -rw-rw-r-T 1 root media 18446744073709551615 799 "May 30 21:20:12 2020" "May  1 18:53:09 2010" "May 30 21:20:12 2020" "May 30 21:20:12 2020" 4096 9 0x800 /data/media/video/.zfs/snapshot/restic/backups.txt

Step 5: Run the backup one more time. Note that Restic reports all directories as changed and stores new tree blobs for them:

# restic --cache-dir=./temp --verbose=1 backup /data/media/video/.zfs/snapshot/restic
open repository
repository 509797e0 opened successfully, password is correct
lock repository
load index files
using parent snapshot 7615de93
start scan on [/data/media/video/.zfs/snapshot/restic]
start backup on [/data/media/video/.zfs/snapshot/restic]
scan finished in 3.504s: 1604 files, 341.787 GiB

Files:           0 new,     0 changed,  1604 unmodified
Dirs:            0 new,   231 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:    219 new
Added to the repo: 17.876 MiB

processed 1604 files, 341.787 GiB in 0:31
snapshot 70edf3cd saved

Unfortunately, I'm not sure of any easy way to reproduce this behavior unless you a way to change the underlying st_dev (the ID of the mounted device!) while leaving all the other data and metadata alone. Without being able to manage ZFS snapshots, I suppose you could dd a whole file-system between block devices to test changing st_dev. Ugh!

Just to cover bases, is there another possible explanation? Or should stat account for everything that might change about a directory, other than its contents?

rawtaz commented 3 years ago

Thanks a lot for that clarity :) We'll have to look at whether it makes sense to look at the device ID.

stephenedie commented 3 years ago

Let me point out something that dawned on me in case it isn't obvious to you: The st_dev attribute on all the files contained in a directory is also changing between ZFS snapshots. This means that the directory change detection logic may be based on changing st_dev of the constituent files rather than st_dev of the directory itself.

I'm also having second thoughts about whether it is correct that changes to st_dev alone should be treated as changes by Restic. I'm also not sure how st_dev is used by different OSes. My FreeBSD man page says "The st_dev and st_ino fields together identify the file uniquely within the system". My Linux man page makes no such promise. What happens if the file-system is on a USB stick, and the USB stick gets inserted into different computers or even different ports. Does st_dev change? It seems like it could without some guarantee that st_dev will always be stable over time for files+directories residing on the same file-system. I don't know that it's designed to work that way though.

greatroar commented 3 years ago

My Linux man page makes no such promise. What happens if the file-system is on a USB stick, and the USB stick gets inserted into different computers or even different ports. Does st_dev change?

The GNU libc manual and The Linux Programming Interface both do. However, st_dev is better thought of as the connection to the device rather than the actual device. For internal disk drives that doesn't matter, but when I unplug my USB disk, plug in a USB stick and then plug in the disk again, the stick gets the disk's former device number and the disk gets a new one.

Restic doesn't look at the st_dev field for files because it's not considered in its change detection heuristic. It does not, and cannot have such a heuristic for directories: the timestamps on directories don't reflect changes to the files within, so those would get skipped too. In any case, it still records the metadata change, even for a file that is reported as unmodified (#2823 documents this in some more detail than the current manual). If you think of directories as entirely metadata, the fact that directories still change should make more sense.

The situation is a bit strange at first glance, but it usually works well. There's a few possibilities for improvement:

mamoit commented 3 years ago

I'm running restic 0.10.0 on android (because I can), and the "all files and dirs have changed" situation happens when backing up the the sdcard. Running stat on the files yields a weird result on an access time 1979-12-31 00:00:00, everything else looks normal. mount reports that the file system is sdcardfs, which I had never heard of to be honest. Can this be considered the same issue as described, or should I report it on a new issue?

I also had an issue similar to this on my desktop, where a file that had a weird 1903 (or some year of the sort) modified time would always be backed up. I touched it and it stopped being backed up all the time.

wpbrown commented 3 years ago

I'm doing the exact same thing as @stephenedie except with btrfs snapshots and I'm having the same problem. Restic is resaving all the unchanged tree data for every snapshot.

  File: snap/z7
  Size: 34          Blocks: 0          IO Block: 4096   directory
Device: 9bh/155d    Inode: 256         Links: 1
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-11-13 00:01:25.815487088 +0000
Modify: 2020-11-13 00:01:35.807477764 +0000
Change: 2020-11-13 00:01:35.807477764 +0000
 Birth: -
  File: snap/z8
  Size: 34          Blocks: 0          IO Block: 4096   directory
Device: 9ch/156d    Inode: 256         Links: 1
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-11-13 02:57:22.120369133 +0000
Modify: 2020-11-13 00:01:35.807477764 +0000
Change: 2020-11-13 00:01:35.807477764 +0000
 Birth: -

Files use this logic that doesn't include device id: https://github.com/restic/restic/blob/b67b7ebfe6ca32deb071a6f45e13ee2a7d28cf15/internal/archiver/archiver.go#L450

If I'm reading this right there is no change function for trees, it just relies on hash collisions: https://github.com/restic/restic/blob/445b84526735ba0a47e8303088031bce1efe4785/internal/archiver/archiver.go#L162

wpbrown commented 3 years ago

For btrfs I bind mount the latest snapshot to a directory so restic sees a stable path. Commenting out this line solves the issue, and unchanged snapshots are properly detected as unchanged by restic.

https://github.com/restic/restic/blob/41a45ae9083c6afdb0832b14bb5605e2b448f3d5/internal/restic/node.go#L581

Looking at @greatroar's PR and it seems like integrating an an --ignore-device option there would be a good solution. I'm not sure there is much value to tracking device id at all.

rawtaz commented 3 years ago

I'm not sure there is much value to tracking device id at all.

I agree, what's the use case for looking at the device ID? It can change very much. Shan't we just stop doing that (instead of introducing an option for it)?

stephenedie commented 3 years ago

In the detailed example I gave, I'm storing 17.876 MiB for 341.787 GiB. It's relatively small, but these are all video files with a very high average size. Running this on trees with lots of small files may very inefficient.

I tend to agree that Restic should just always ignore and not store/hash st_dev.

wpbrown commented 3 years ago

Shan't we just stop doing that (instead of introducing an option for it)?

I tend to agree that Restic should just always ignore and not store/hash st_dev.

I can't think of a good reason. It's already ignored for files. Restic is already keying the tree off the absolute path. If that changes, the tree data is repacked. If the device under an absolute path changes (maybe a pre-failure disk gets replaced) and its mounted in the same path, that wouldn't be any reason to repack the tree data.

The only reason I thought of an ignore option is "backward compat". If you stop tracking device, everyone would get a one-time tree data repack upon upgrade. I suppose you could automatically include device if an existing repo already has it, but maybe since restic is pre-1.0 now would be a great time to have the one-time change and not carry this baggage? Only users with huge amount of small files would likely even notice, and even then it would only be a one-time annoyance.

NobbZ commented 3 years ago

Can I assume that this very same issue will also exist on BTRFS and LVM snapshots?

Currently I do not use either, but am tempted to use LVM snapshotting to prepare the volumes before backing them up.

After my experiences so far, backing up a snapshot of the drive that contains the eg. mysql database is less error prone than doing an unsnapshotted backup, due to how files are changed in time...

intentionally-left-nil commented 3 years ago

My 2 cents - it's best to make the file & directory logic similar. I can't imagine why it would be okay to treat a file as cached with a different device ID, but not the folder. So, either restic should add device ID checking to files, or remove it from folders. IMO the latter makes sense to me here.

For restic to produce the wrong behavior: 1) The file would have to be in the exact same location as the parent snapshot 2) The ctime/mtime (depending on flags) would have to be the same 3) However the user would have managed to change the file on a different device. Yes in theory it's possible for you to modify two files with the same timestamp on different devices but I really don't think it's a likely case (and if it's something you're worried about then it would make more sense to check files for this behavior as well as folders)

Since the device ID isn't stable to begin with, and there's a given use case (ZFS/btrfs snapshots) that is impacted by this, I would just remove the device ID entirly (my 2 cents)

eyJhb commented 3 years ago

I am not sure what the best solution for this would be, but this is exactly the usecase that I want to use restic in. Ie. wount zfs snapshots, and then use restic to back them up :)

And in my case, I don't have video files, but am backing up my codes projects, etc. - I have 248644 files and 60264 folders.

I agree with https://github.com/restic/restic/issues/3041#issuecomment-727132962 that either ...

  1. Add a option to ignore it
  2. Just ignore the device id all together. I am unsure what value it adds.

Is there any usecases where it adds any value? I would be happy to know such. Does borgbackup keep track of device id as well?

Also, thanks for the project, I am excited to get going! (also thanks to OP for making me aware of this issue)

eyJhb commented 3 years ago

Looked at the source for borgbackup, and they do not use the DeviceID to determine if a file has changed/in any metadata. What they do instead, is to have a option named --one-file-system, which tells it not to recurse into filesystems.

Description can be found here - https://github.com/borgbackup/borg/blob/3ad8dc8bd081a32206246f2a008dee161720144f/src/borg/archiver.py#L3320-L3329

I suggest we remove the NodeID for now, as there is no reason to have it. We can add the same --one-file-system, but I am unsure if we want this?

Can any contributors comment on this? :) I will make a PR if it sounds OK.

rawtaz commented 3 years ago

Is borg's --one-file-system different from restic's --one-file-system (that already exists for the backup command)?

eyJhb commented 3 years ago

Is borg's --one-file-system different from restic's --one-file-system (that already exists for the backup command)?

Sorry, this is one of the issues that is keeping me from using restic, so I didn't know restic already had this option. Thanks for pointing it out!

I guess just removing the DeviceID as metadata would suffice then, and of course keeping the current --one-file-system as-is.

MichaelEischer commented 2 years ago

The DeviceID is necessary for the hardlink detection in the restore command to work properly. Otherwise two files on different devices but with the same inode numbers (+ both files already using hardlinks) could end up hardlinked together.

My suggestion would be to use pseudo device ids instead. restic could just map device ids to pseudo device id starting from 0 and increment that counter each time it encounters a new device id. That should essentially let the subvolume always get the same pseudo device id which would then ensure that no new tree blobs are created.

[Edit] An alternative could be to add a --ignore-hardlinks option which would allow removing the DeviceID. But I'd rather prefer the pseudo device ids variant [/Edit]

smlx commented 2 years ago

Since there is no way heuristically for restic to know that it should consider a new device ID to be the same as a different one in another snapshot, maybe an option --override-device-id or --force-device-id? Then people that want to consider different ZFS snapshots to be the same device can hard-code a value in their backup scripts.

pvgoran commented 2 years ago

Probably --map-device-id would be the best way to go. Like this: --map-device-id 1234=2345, or --map-device-id 1234:2345.

MichaelEischer commented 2 years ago

Since there is no way heuristically for restic to know that it should consider a new device ID to be the same as a different one in another snapshot, maybe an option --override-device-id or --force-device-id? Then people that want to consider different ZFS snapshots to be the same device can hard-code a value in their backup scripts.

When does restic compare device id across snapshots? The mapping would happen directly during the backup process such that the created snapshots only contain pseudo device ids. If one always backs up the same paths with usually the same mount points then the pseudo ids should be stable across different snapshots for these paths. Although I'm not completely sure whether that would work out properly for a full system backup.

Probably --map-device-id would be the best way to go. Like this: --map-device-id 1234=2345, or --map-device-id 1234:2345.

That sounds way too complicated to be usable by the average restic user. Not to mention that we'd have to check the device id mappings for collisions with device ids that are not mapped.

rawtaz commented 2 years ago

I was just about to write that the suggested --map-device-id is way too complicated, @MichaelEischer beat me to it :D :+1:

pvgoran commented 2 years ago

Hm, maybe my "complication scoring" is not properly calibrated, because --map-device-id seems quite simple to me. :)

Collision checking is actually not required, because this option would be used to map the Device ID of a snapshot to the Device ID of the underlying filesystem, thus consciously creating a collision. And unlike --force-device-id, it would still allow to reliably backup files from several filesystems/snapshots in one go.

Anyway, it's just a suggestion. :)

MichaelEischer commented 2 years ago

Collision checking is actually not required, because this option would be used to map the Device ID of a snapshot to the Device ID of the underlying filesystem, thus consciously creating a collision. And unlike --force-device-id, it would still allow to reliably backup files from several filesystems/snapshots in one go.

What is the source of the Device IDs you want to map? I'm no exactly sure if I understood what and at which point in the backup process --map-device-id would map ids. To me it sound like you want to map the device ids which are stored in an existing snapshot. However, that would be completely useless as the device ids are not compared at all during backup. It's just that a different device id will lead to a different tree blob which then has to be uploaded again. If you want to map device ids of files on the filesystem, then you have to check for collisions as otherwise restore might detect hardlinks which shouldn't exist.

The usage of that option would also require a user to first get the device id, and then to decide for a new value to map to.

pvgoran commented 2 years ago

What is the source of the Device IDs you want to map?

Source device is a temporary snapshot from which restic actually reads files. Target device is the underlying persistent filesystem.

I'm no exactly sure if I understood what and at which point in the backup process --map-device-id would map ids.

When reading files that are backed up. So files from the snapshot would get the same device ID as if they were read from the underlying persistent filesystem.

This way, each new snapshot will not change files' device IDs.

Hardlink detection will work properly too, unless a single backup mixes files from multiple snapshots of the same filesystem, or mixes files from a snapshot with files from the underlying filesystem. Both are quiet exotic scenarios, probably much less common than just backing up one or several snapshots of different filesystems.

To me it sound like you want to map the device ids which are stored in an existing snapshot. However, that would be completely useless as the device ids are not compared at all during backup. It's just that a different device id will lead to a different tree blob which then has to be uploaded again.

Well, this issue is essentially about not generating a different tree blob, isn't it?

The usage of that option would also require a user to first get the device id, and then to decide for a new value to map to.

True. If a user creates a snapshot in their backup script, I assume they would be able to easily acquire this information (FS's Device ID and snapshot's Device ID) in the same script.

MichaelEischer commented 2 years ago

When reading files that are backed up. So files from the snapshot would get the same device ID as if they were read from the underlying persistent filesystem.

It took me a while, but now I know why I've misunderstood the explanation: snapshot here refers to ZFS snapshot and not a restic snapshot as created by the backup command.

True. If a user creates a snapshot in their backup script, I assume they would be able to easily acquire this information (FS's Device ID and snapshot's Device ID) in the same script.

When backing up ZFS snapshots there a probably no submounts inside the mounted snapshot. Then my proposed pseudo device ids would just use 0 as device id, which also leads to identical tree blobs but without requiring any configuration.

intentionally-left-nil commented 2 years ago

@MichaelEischer thanks for the replies. It makes more sense to me why device ID's are checked for files but not directories (since directories can't be hard linked).

IMO the simplest fix follows the existing pattern for ignoring things. We already have

    --ignore-ctime                           ignore ctime changes when checking for modified files
      --ignore-inode                           ignore inode number changes when checking for modified files

So we should either add a new --ignore-device-id flag OR add that functionality when --ignore-inode is selected (If you are ignoring the inode then I can't imagine how the device id would possibly matter).

If the latter proposal (adding this functionality to --ignore-inode) sounds good to folks I can take a crack at this.

smlx commented 2 years ago

+1 for --ignore-device-id. Would that effectively assume some hard-coded device ID?

pvgoran commented 2 years ago

It took me a while, but now I know why I've misunderstood the explanation: snapshot here refers to ZFS snapshot and not a restic snapshot as created by the backup command.

It didn't even occur to me that the word is ambiguous here, sorry. :)

When backing up ZFS snapshots there a probably no submounts inside the mounted snapshot. Then my proposed pseudo device ids would just use 0 as device id, which also leads to identical tree blobs but without requiring any configuration.

Right. However, the device ID mapping functionality would support a wider range of use cases: it would be possible to safely backup several ZFS/BTRFS snapshots at once, and each would get its own stable device ID.

pvgoran commented 2 years ago

@MichaelEischer I just re-read your earlier comment with the pseudo-ID suggestion, and see that it takes the possibility of multiple devices being backed up into account. So it would work well for many cases, and its simplicity is tempting.

However, I can see how it can sometimes lead to surprising behaviour. Namely, if the directories being backed up gain file(s) with a Device ID that didn't occur in previous restic snapshots for some reason (a new mount was created in the system, or just a new directory was included in backup this time), then the pseudo-IDs will shift, which would result in new tree blobs again. This scenario would be probably rare (which is good), but because it's rare, it would be difficult to trace it if it occurs (which is bad).

rawtaz commented 2 years ago

For the love of everything, let's keep things simple.

kakra commented 2 years ago

The problem with device IDs from stat is that posix doesn't claim that those are stable, neither across reboot nor after remounts. So it's not a proper stable ID to begin with. Most of the time it's true that it doesn't change even across reboots. But it's never true for virtual devices (those which have no associated block device in /dev), and that's the case for NFS mounts, btrfs subvols, probably ZFS snapshots, fuse mounts, and many others, you get the idea.

To work around this, restic should probably store the previously known device ID per mount point it encounters (that's when the device ID changes compared to the parent), and then auto-map those to newly encountered device IDs of that same device. Care must to taken to avoid collision when mount points are encountered that didn't previously exist.

This could be integrated with pseudo ID generation where each pseudo ID would just be an index into a table of known device IDs.

The KDE balloo project is having similar issues with its indexer which uses device and inode IDs to detect files already indexed. On btrfs, those device IDs change with every reboot because subvolumes are virtual device and IDs are allocated ad-hoc, thus all files are indexed as new files, and duplicated into the index (which will grow because balloo expects the other device ID to eventually return in the future).

Thus, device IDs should never be considered stable IDs. We can probably expect the user to backup from a stable source path, so we could easily automap a device ID from the old to the new. But as outlined above, this would need to be tracked for each device-cross in the paths stored.

Posix doesn't even consider inode IDs stable: For file-systems that don't store inode IDs internally, these may and will change per umount/mount. This essentially leaves the path and ctime to detect changes, and this disallows comparing inodes discovered from the filesystem with inodes stored in the backup to detect hardlinks. So even inode IDs can be considered valid only for the duration of the backup session.

intentionally-left-nil commented 2 years ago

So, it wasn't trivial to implement --ignore-deviceid. Restic uses it to determine if links span across filesystems. Doing something like hard-coding the DeviceID to 0 would mean that this check would be lost.

Instead, I chose to add the proposed device-map flag and allow users to map one device ID to the other. See the attached PR for more details. If the concept is agreeable I'll push forward and add tests, etc.

ArsenArsen commented 2 years ago

device ids could use partuuids instead, or some similar mechanism

kakra commented 2 years ago

device ids could use partuuids instead, or some similar mechanism

This is probably the way to go, pseudo IDs as stated above could be used as a LUT index for a table of known PARTUUIDS. But this is only half the story because file systems like ZFS and btrfs can store multiple virtual volumes in a single partition. And some file systems can cross multiple partitions (probably the same).

So depending on whether you mount the file system via the one or the other pool member, the PARTUUID will change. There's also a UUID which identifies the file system instance itself. Not sure if that works for NFS, it will require to do some research. Also, UUID doesn't solve the problem on virtual sub volumes that btrfs etc provide: Those are identical but btrfs has stable subvolume ids (not sure for ZFS). But using it would collide with the example from above "take snapshot, backup the snapshot" because each snapshot would have a new ID.

The best thing to do is basing file system identification on mount path only (which restic already more or less does by storing the full base path in the archive meta data, I think). For hard link detection, a mapping from the device IDs of the backup base snapshot and the new backup snapshot must be created (so it always converts current device IDs to original device IDs of the first snapshot) - or we just live with metadata duplication.

A more proper solution would probably be not trying to re-assemble the hard link detection during restore, by posix definition, this must be a backup-time only implementation. Instead of storing dev id and inode id, it should store the information that this object is a hard link to a previously encountered object (using a restic object ID). Device and inode ids must not be part of the metadata stored in the archive because it's not reliable information for restore. During restore, if restic encounters a hardlink object, it first needs to check if it already restored the linked object, and if it didn't it needs to restore the hardlink as a normal file object - otherwise we may overwrite files during restore that are not part of the restore job, or IOW, it is not allowed to restore hardlinks for objects that are not part of the restore set, and it needs to fallback to the file contents itself instead then.

This is more complicated than it may look at first glance, so there is no simple solution. And I think the problem is in the storage design, and not in the directory traversal implementation. But changing archive storage format for backup programs is always a headache - you'd want to avoid that at all cost. Luckily, we are still in the 0.x versions and I think, restic doesn't claim that the storage format is finalized.

I think we should completely discard ideas about mapping IDs back and forth and rethink the implementation design: dev id and inode id do not belong into the archive, it's probably that simple and will then force a proper implementation for hardlink detection during store, and also force a proper implementation for hardlink restore. Plus, file change detection cannot make wrong assumptions about dev and inode id stability (it should use ctime instead which is the inode change time).

ArsenArsen commented 2 years ago

Sorry, in the original comment I meant UUID, not PARTUUID, which are FS-bound rather than partition table bound.

But this is only half the story because file systems like ZFS and btrfs can store multiple virtual volumes in a single partition. And some file systems can cross multiple partitions (probably the same).

zhome guid 13213858044556755772 - at the very least ZFS has GUIDs for filesystems.

And some file systems can cross multiple partitions (probably the same).

If you mean LVM, the FS UUID still stays the same inside each logical volume, as they're FS bound.

A more proper solution would probably be not trying to re-assemble[...]

+1

I was under the impression that a type: hardlink object: <...> thing was already stored, and that this is only in RAM.

I'm looking at what NFS does for identifier, but the most stable thing might be ip:path. I'll get back to you in due time

EDIT: There's an fsid that could be queried via RPC, but I think doing this becomes obsolete to (deviceid,inode) if hardlinks are stored as a special file instead of being detected at restore time.

Since hardlinks are just an inode being mapped multiple times in one FS, it'd be sufficient during restore to know where a certain blob was restored to first, and then relinking that again later. If an edge case of the restore and backup disk layouts differ, outputting a warning and creating a new file with the same content is probably the sanest thing to do.

kakra commented 2 years ago

If you mean LVM, the FS UUID still stays the same inside each logical volume, as they're FS bound.

No, I didn't mean LVM because that presents a single virtual block device to the file system. Rather, it was about file systems that natively support device pooling like f2fs, btrfs, and zfs (and maybe others like bcachefs). In that case, the file system itself spans multiple devices without an intermediate block layer. You can then mount any block device that's a member of the pool, and it would mount the whole file system across all block devices. At least for btrfs, the mounted block device identifies the block device id of the mount point, and thus also the PARTUUID.

Sorry, in the original comment I meant UUID

Okay, this mostly obsoletes my remarks about that idea.

I was under the impression that a type: hardlink object: <...> thing was already stored

I didn't look at the actual implementation but the current problems imply that the restore routine reconstructs hard links from the (deviceid,inode) number instead of a native restic object identifier. I believe that's actually not needed at restore time if the archive stores a proper type: hardlink object: <...> item. Only backup-time needs to read (devid,inode) to detect hard links and then act properly by keeping a mapping in RAM. As far as I can see, it already traverses the whole directory tree at backup time - so it can just detect hard links without comparing anything to the stored base archive.

A good plan could probably be to rename those fields in the source code, then see where the compiler blows up, then fix the restore function to not use these fields at all, and fix the backup routine to properly store the data without storing these ids itself. If it works, rename the fields back - or leave as-is, wouldn't properly matter. It's probably oversimplified but could be sufficient to get an idea which parts of the code need changes.

To stay compatible with existing archives, the fields could just stay in place, and stored with a value of 0. It probably needs some flag/tag in the metadata so older restic clients won't be able to restore from archives with zero-valued id fields. If restic's archive design somewhat follows ideas of file system superblocks, it should already have some concept of incompat and compat archive flags.

If an edge case of the restore and backup disk layouts differ

Yep, another edge case would be partial restores. And in that case, you don't want to hardlink anyways: A user won't expect some unrelated file to be changed only because the restore encountered a hard link, it needs to be restored as a new instance of the file then. Logging a warning would be a nice feature then, I think. It enables the user to properly act upon this in the aftermath, restic cannot really decide what's the right thing to do.

kakra commented 2 years ago

BTW: There's an edge case for file change detection which storing the inode in the archive tries to fix: It is possible to replace a file with another file that has the same ctime and same size but different contents. Looking at ctime only wouldn't detect this change then. This tells us that storing the inode may still be a good thing to do, but by no means it is allowed to use this information at restore time, otherwise it breaks posix assumptions.

It may make sense to rather implement a "file change detection cache" that is not part of the archive but rather kept locally in an xdg .cache directory. I think borg-backup implements it this way. Such a cache, if abstracted properly, could be optimized for individual file system, e.g. it could use btrfs snapshot compare functions (by listing the extents that changed between the old and current generation number) to quickly list content that actually changed. Since it wouldn't be part of the archive, its format can be changed at any time. For all other file systems, it would fall back to inode numbers, and if the user backups a file system without stable inode numbers, there'd still --ignore-inode available. In no case deviceid would be needed other than for invalidating said cache (and yes, file system boundary checks).

ArsenArsen commented 2 years ago

No, I didn't mean LVM because that presents a single virtual block device to the file system. Rather, it was about file systems that natively support device pooling like f2fs, btrfs, and zfs (and maybe others like bcachefs).

I can only speak for ZFS here when I say it doesn't expose the PARTUUID but does expose a different guid (see example in the previous comment)

At least for btrfs, the mounted block device identifies the block device id of the mount point, and thus also the PARTUUID.

I'd expect a native Linux filesystem to expose a filesystem-level UUID, though I don't have btrfs on any devices to test with.

To stay compatible with existing archives, the fields could just stay in place, and stored with a value of 0. It probably needs some flag/tag in the metadata so older restic clients won't be able to restore from archives with zero-valued id fields. If restic's archive design somewhat follows ideas of file system superblocks, it should already have some concept of incompat and compat archive flags.

There's a version field in the JSON objects. See: https://github.com/restic/restic/blob/90473ea9ffadf3178667a052d047ed717d0b9746/doc/design.rst. Newer versions of restic would probably just detect and use the inode field, slowly replacing it on write operations (or acquiring a write lock and doing it all at once)

MichaelEischer commented 2 years ago

A more proper solution would probably be not trying to re-assemble the hard link detection during restore, by posix definition, this must be a backup-time only implementation. Instead of storing dev id and inode id, it should store the information that this object is a hard link to a previously encountered object (using a restic object ID).

restic can just use the path of the first occurrence of a hardlinked file as the canonical name. There's no such thing as an object ID to refer to a file.

Yep, another edge case would be partial restores. And in that case, you don't want to hardlink anyways: A user won't expect some unrelated file to be changed only because the restore encountered a hard link, it needs to be restored as a new instance of the file then. Logging a warning would be a nice feature then, I think. It enables the user to properly act upon this in the aftermath, restic cannot really decide what's the right thing to do.

For a partial restore, restore should only hardlink files within the set of restored files. I think that's what the current implementation does.

BTW: There's an edge case for file change detection which storing the inode in the archive tries to fix: It is possible to replace a file with another file that has the same ctime and same size but different contents.

The overall plan was to get rid of the device id or to make it deterministic. But keeping the inode won't be a problem. The JSON won't include the field if it is set to 0.

Luckily, we are still in the 0.x versions and I think, restic doesn't claim that the storage format is finalized.

Still, newer restic versions must be backwards compatible to the current format. The current format is already used too much to just break compatibility without a migration path.

Newer versions of restic would probably just detect and use the inode field, slowly replacing it on write operations (or acquiring a write lock and doing it all at once)

The backup command always creates the directory metadata from scratch (and deduplicates it if the JSON is already in the repo), that way backup just creates all new metadata in the new format. The only challenge then would be to handle both formats during restore.

[Edit]Updating the existing metadata won't work without changing the ids of all affected snapshots, which I'd rather want to avoid. To upgrade a repository from one version to another the simplest way would be if restic supports both the new and old format at the same time. That would also be a prerequisite if we wanted to provide the possibility for downgrading a repository. In that case the metadata would have to be rewritten first.[/Edit]

Wojtek242 commented 1 year ago

FWIW, I've noticed that the impact of backing up ZFS snapshots can be reduced by first cloning the snapshot with zfs clone -o mountpoint=<somewhere else than the original dataset> <snapshot> <clone_name> and then deleting the clone. On my system at least, the clones will often, but not always, have the same device id. This also has the upside of having the same path for each backup.

haslersn commented 1 year ago

I have the same problem, stated in this issue. I'm backing up from btrfs snapshots.

BTW: @Wojtek242 having the same path for each backup can be achieved more easily by bind-mounting to a stable path, like @wpbrown suggested.

kakra commented 1 year ago

Maybe as a first step, restic could stop writing metadata changes unless inode times or file contents changed. It would still involve reading and comparing file contents because with a different device ID, the file may actually be different. But as we cannot restore "device ID", there's actually no point in writing meta data that doesn't differ in the other fields from previously recorded data.

Additionally, there could be a command line option "ignore device ID" for those special cases when you knowingly only backup real-only snapshots so it would go by path only for file identity.

MichaelEischer commented 1 year ago

Not updating the deviceID will likely cause weird bugs when restoring hardlinks, which would be missing somewhat randomly depending on how the snapshot was created (and therefore which DeviceID is used for a file). I'd prefer a different approach: the deviceID is only used to correctly detect hardlinks. When restic knows that there are no hardlinks, then there's no need to store it. See https://github.com/restic/restic/pull/4006 for an implementation of that suggestion.

kakra commented 1 year ago

Not updating the deviceID will likely cause weird bugs when restoring hardlinks, which would be missing somewhat randomly depending on how the snapshot was created (and therefore which DeviceID is used for a file).

Oh, true. I didn't think of that.