psy0rz / zfs_autobackup

ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. Easy to use and very reliable.
https://github.com/psy0rz/zfs_autobackup
GNU General Public License v3.0
601 stars 63 forks source link

Incompatible snapshot: false positives #262

Closed lowjoel closed 2 months ago

lowjoel commented 2 months ago

I'm using zfs_autobackup to perform backups onto different sets of media locally, including one set of offsite backups.

/usr/local/bin/zfs-autobackup --no-snapshot --no-thinning --utc --decrypt --encrypt --snapshot-format auto_%Y%m%d-%H%M%SZ --buffer 256M --clear-refreservation --clear-mountpoint data backup{1,2,3...}

I've swapped my media such that now I am updating the old set of media with new snapshots, but the media contains thinned snapshots. In this example with fictional dates:

Date Source Disk 1 Disk 2
20240630... Yes Yes No
20240701... thinned on 20240703 Yes No
20240702... thinned on 20240704 Yes No
swap disk
20240703... Yes No Yes
20240704... Yes No Yes
swap again
20240705... (today) Yes Backup fails: Incompatible snapshots

Essentially, on 20240705:

Snapshot Source Disk 1 Disk 2
20240630... Yes Yes No
20240701... No Yes No
20240702... No Yes No
20240703... Yes No Yes
20240704... Yes No Yes
20240705... Yes Backup fails

zfs-autobackup erroneously detects 20240701... and 20240702... as incompatible and refuses to backup the newer snapshot (20240705...)

I've dug through the code and found that the method for computing incompatible snapshots are at c52857@zfs_autobackup/ZfsDataset.py#L920. Only snapshots that have 0 written are compatible -- but the definition of written in zfsprops(7):

The amount of referenced space written to this dataset since the specified snapshot. This is the space that is referenced by this dataset but was not referenced by the specified snapshot.

This means that any snapshot that has a child, even if from the same ancestry, would be considered incompatible. Interestingly, even the 20240702... snapshot on the backup medium has a written != 0.

My current workaround is to manually delete the snapshots and I'm OK with this. But I think the off site backup strategy is something we should support. I'll trawl through the code to figure out if I can fix it.

lowjoel commented 2 months ago

Oh, it's a limitation on zfs recv:

If an incremental stream is received, then the destination file system must already exist, and its most recent snapshot must match the incremental stream's source. For zvols, the destination device link is destroyed and recreated, which means the zvol cannot be accessed during the receive operation.

Closing 🙇

psy0rz commented 2 months ago

yep, works as designed :)

a solution would be to do make the thinning on the source side a bit later, if you have the space. Or use --destroy-incompatible to automatically destroy snapshots that are in the way.