oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
604 stars 136 forks source link

Feature: Add ability of DST to store snapshots as files #617

Closed eharris closed 3 months ago

eharris commented 8 months ago

I was looking at znapzend for use to backup some zfs volumes, and initially it looked like a pretty good solution until I realized that there doesn't appear to be a way to cause it to create a DST that just stores the zfs send as a file (either locally or remotely) rather than expecting the destination to have a zfs pool to apply them to.

In my use case, the backup destination I'd like to use does not have zfs support, so I'd like to create a destination that just dumps the individual zfs send's as files into a destination directory, but otherwise works the same (including thinning). These snapshot files could then be applied to an offsite and offline pool in the case of a catastrophic failure of the source.

Another way to think of it is as a destination that is rarely or never online (due to either security or operational concerns), but you don't want to risk losing all the data that has changed since the last time it was online (or the last time it was updated, in the case of never-online), in case of a catastrophic failure of the source pool. These files would then be able to be applied to a zfs instance through a simple shell script that iterates over all the snapshot files, checks if the destination already exists, and applies it if not.

Obviously, there would also need to be a way to specify the snapshot to use as the initial starting point to start generating incremental backups from, and a way to specify what updates have been applied so they can be pruned.

stale[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

eharris commented 6 months ago

bump to keep alive

jimklimov commented 5 months ago

Just in case, FYI: since the early days of ZFS, sending snapshots into files was more of an experimental approach (to help with demos and simulations) than a "first-class citizen" one. Snapshot information stored in files is not well protected against bit-rot (or not as well as "real" pools are, with block check-summing and device or copies redundancy where available). Maybe wrapping those into ZIP files (with CRC) or equivalent containers with some bit-rot protection can address that.

I guess it can also be problematic to identify which snapshot goes after which, other than trusting e.g. alphanumeric sorting of filenames (rather than ZFS metadata and its core code doing the magic to mix and match).

Likewise, snapshot removal via files is not really a thing, so no pruning of such backups. You really would have to store each and every increment as immutable, in case you'd want to re-apply them to a pool, and maybe prune intermediate snapshots on the receiver after the restoration (or during, if short on space). It can benefit however from relatively recent feature of tracking which snapshot was the last one posted to a certain destination, so such snapshots are exempt from default automatic "cleanup" on the source pool (so there is a starting point for future sync's) - this helps with not-always-online remote destinations, rotating several mechanically-plugged backup disks, etc.

I suppose znapzend can be extended with a way to store snapshot contents in a local or remote file (maybe even piped into zip or some such) - PRs can be welcome, but it does not seem likely the maintainers would implement that use-case :)

All that said, *.zfs images of root filesystems are still quite a thing for quick OS and zone installations. Just as well, I did use the files with chains of snapshots in early days of ZFS for both backups on other filesystems, and particularly to rsync those files over flaky links to remote customers etc. to apply on the other side. Possibly now holds, bookmarks and tokens address the latter use-case, supposedly allowing to pass large initial or iterative snapshots in several connection retries.

stale[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.