Closed DvdGiessen closed 11 months ago
Yeah, i don't think this is an actual problem in practical use cases.
Yes you are right, that a dataset that is created in the time between selecting and snapshotting will be ignored. But I think this is expected and not a real problem. Also: A dataset destroyed between selecting and snapshotting will make zfs-autobackup exit with a fatal error. (as it should)
If you have a special use-case where the time between selecting and snapshotting is huge and you create lots of new datasets all the time, then yes it might be a problem. However in that case I think its better for the admin to do the actual snapshotting and let zfs-autobackup only do the syncing.
Your proposal for the improvement is interesting, but the downsides are significant as you've mentioned.
I think if you need maximum real-time performance, also have a look at https://zrepl.github.io/ . My guess is they handle these super-heavy use-cases better. But at the cost of a lot more complexity. (they use libzfs instead of simple zfs-commands, and they monitor for zfs-events i think. )
hmm..seems zrepl isnt even using atomic snapshots? https://github.com/zrepl/zrepl/issues/634
probably because of the dynamic nature of that project? (everything happens async)
indeed:
zrepl itself currently does not even guarantee atomicity of snapshots within the same pool as it doesn't use the multi-snapshot zfs snapshot syntax.
https://github.com/zrepl/zrepl/discussions/632#discussioncomment-3726490
Snapshots are not as atomic as they could be because selecting and snapshotting are not a single atomic operation. Thus, if new datasets are being created while
zfs-autobackup
is running we could end up with a snapshot that contains data that references a different dataset that wasn't snapshotted because it did not yet exist at selection time.The
zfs snapshot -r
command guarantees that a recursive snapshot is made atomically; however we're not using the-r
option but instead every dataset is specified separately.How it works right now has a good reason: Because we do not necessarily always want a full recursive snapshot; for example because of the
autobackup:property
of children, or that by default (unless--allow-empty
is given) we also skip snapshotting datasets without changes.Perhaps a possible improvement could be to do a 3-step process?
autobackup:
property.zfs snapshot -r
which guarantees atomicity).Downside would be that this temporarily creates more snapshots than needed, with all kinds of overhead and side-effects. It would also increase code complexity. And perhaps some other complications I haven't thought of yet.
EDIT: Use cases where this might be relevant is if software is automatically managing and thus creating datasets. For example Docker uses ZFS datasets to store image layers; thus a busy server with ongoing image builds, container deployments, etc could conceivably end up with an incomplete snapshot.
Note: Probably not a high priority because it's not a very common scenario. For me it's purely theoretical; haven't actually run into problems with this.