Closed jimklimov closed 3 years ago
Thinking about it a bit more over the past week of digging in this code, a safe path forward seems to be:
backupSet
array entries) is a root of a "straightforward" tree to back up completely (plan's "recursive" option is "on", and no descendant has an "enabled=off" setting... and maybe also that no descendant has a "source=local" backup plan of its own)
* also consider --autoCreation=on
(BTW... introduce a backup plan option for that, if needed? => #463 says it was done... later; using a send -R
would create them too - but* beware that a zfs send -R ... | zfs recv -F
would delete destination children currently missing on source) if some target datasets are missing; ignore this point if trees are equivalentzfs send -R | mbuffer | zfs recv
and let ZFS figure out what to accept and what to ignore; different vendors' and releases' builds of ZFS are differently capable in this regard (e.g. some might send the whole stream to be ignored in vain, while others might perhaps agree to skip it and begin sending another more quickly)zfs send -R
attempt failed, fall back to what we do now, sending each individual dataset and each their enabled descendant, one by oneHopefully this approach is backwards-compatible for complex configurations, but might benefit from simpler faster single-command-per-tree send/receive; in particular I hope that it would let the zfs recv
half of the equation not to start hundreds of times each with its evaluation of the pools and stuff, and so would cut days-long runs of small but numerous backups into something a lot shorter.
I like the idea to add a 'cut through' path for 'the normal case'. Not sure about the opportunistic receive ... it is backup, so I would like for znapzend to complain if things do not work out.
Looking forward to testing this as I currently have a number of filesystems, $HOME for example, where each user has a filesystem under the parent directory so that, for example, pool/home might have 500 or 600 (and growing) child filesystems in it. There's no interest in having or allowing any children distinct settings so even a simple option that just ignored any child settings and forced the -R attempt or failed would be perfect for this use case. Currently we have to keep backing off our snapshot/replication frequency as the number of users/directories grows. But we very much want to keep the individual user filesystems/snapshots as occasionally we have to go in and drop a persons snaps if they've done something that requires purging $HOME and want that activity confined to the single affected user.
As a stressing reminder to self, and just data-points: there is a use-case for dataset trees with some branches pruned via zfs set org.znapzend:enabled=off backupSet-with-policy/excludedChild
and I've just checked that this is honored if I only define this one custom attribute in the child dataset:
### rpool/SHARED/var/mail is a backupSet with policy
:; zfs create rpool/SHARED/var/mail/test
:; zfs set org.znapzend:enabled=off rpool/SHARED/var/mail/test
:; znapzend --runonce=rpool/SHARED/var/mail
On destination I've got new snaps of the parent dataset, but no mention of the excluded child. (In debug log, the source dataset was recursively snapshot'ed and then for the excluded children that snapname was removed, as expected).
This is something that operations with recursive send might have to take into account. For first shot maybe bluntly - only do zfs send -R
if there are no excluded children under some branch (or no children with a different/incompatible explicitly defined retention policy) - which may be not the backupset root but a deeper point in its sub-tree; and handle the more complicated cases with same approach as now, dataset by dataset.
UPDATE: I checked what happens if I make a further child, .../test/sub
which is enabled for replication. Currently this fails since there is no parent dataset on destination, if it is not made by admin manually, but otherwise it tries:
:; zfs create rpool/SHARED/var/mail/test/sub
:; zfs set org.znapzend:enabled=on rpool/SHARED/var/mail/test/sub
:; znapzend --runonce=rpool/SHARED/var/mail
...
# zfs send -I 'rpool/SHARED/var/mail@znapzend-auto-2020-09-13T10:13:25Z' 'rpool/SHARED/var/mail@znapzend-auto-2020-09-13T10:22:45Z'|/opt/csw/bin/amd64/mbuffer -q -s 256k -W 600 -m 1G|zfs recv -u -F naspool/snapshots/rpool/SHARED/var/mail
# zfs list -H -o name -t snapshot rpool/SHARED/var/mail@znapzend-auto-2020-09-13T10:22:45Z
# zfs set org.znapzend:dst_0=naspool/snapshots/rpool/SHARED/var/mail rpool/SHARED/var/mail@znapzend-auto-2020-09-13T10:22:45Z
# zfs set org.znapzend:dst_0_synced=1 rpool/SHARED/var/mail@znapzend-auto-2020-09-13T10:22:45Z
[Sun Sep 13 14:23:09 2020] [debug] sending snapshots from rpool/SHARED/var/mail/test to naspool/snapshots/rpool/SHARED/var/mail/test
[Sun Sep 13 14:23:09 2020] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/SHARED/var/mail/test
# zfs list -H -o name -t snapshot -s creation -d 1 naspool/snapshots/rpool/SHARED/var/mail/test
cannot open 'naspool/snapshots/rpool/SHARED/var/mail/test': dataset does not exist
[Sun Sep 13 14:23:09 2020] [debug] sending snapshots from rpool/SHARED/var/mail/test/sub to naspool/snapshots/rpool/SHARED/var/mail/test/sub
[Sun Sep 13 14:23:09 2020] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/SHARED/var/mail/test/sub
# zfs list -H -o name -t snapshot -s creation -d 1 naspool/snapshots/rpool/SHARED/var/mail/test/sub
cannot open 'naspool/snapshots/rpool/SHARED/var/mail/test/sub': dataset does not exist
# zfs send 'rpool/SHARED/var/mail/test/sub@znapzend-auto-2020-09-13T10:22:45Z'|/opt/csw/bin/amd64/mbuffer -q -s 256k -W 600 -m 1G|zfs recv -u -F naspool/snapshots/rpool/SHARED/var/mail/test/sub
cannot create 'naspool/snapshots/rpool/SHARED/var/mail/test/sub@znapzend-auto-2020-09-13T10:22:45Z': parent does not exist
mbuffer: error: outputThread: error writing to <stdout> at offset 0x0: Broken pipe
mbuffer: warning: error during output to <stdout>: Broken pipe
[Sun Sep 13 14:24:20 2020] [warn] ERROR: cannot send snapshots to naspool/snapshots/rpool/SHARED/var/mail/test/sub
[Sun Sep 13 14:24:20 2020] [warn] ERROR: suspending cleanup source dataset rpool/SHARED/var/mail because 1 send task(s) failed:
[Sun Sep 13 14:24:20 2020] [warn] +--> ERROR: cannot send snapshots to naspool/snapshots/rpool/SHARED/var/mail/test/sub
[Sun Sep 13 14:24:20 2020] [info] done with backupset rpool/SHARED/var/mail in 94 seconds
[Sun Sep 13 14:24:20 2020] [debug] send/receive worker for rpool/SHARED/var/mail done (9435)
After I created an empty naspool/snapshots/rpool/SHARED/var/mail/test
manually, the next znapzend suceeded to send over the sub
.
@griznog : Your use-case with individual homedir datasets makes sense (also for quota, reservations, access to snapshots via CIFS Shadow Volumes or TimeMachine, different ofher zfs settings).
I believe the goal you are after, to have the growing populace replicated automatically, should already be handled with --autoCreation
parameter passed to the daemon in your (customized) service definition.
@jimklimov we do use --autoCreation
and it works great, the issue is that replication of home requires 1000+ ssh connections, each of which takes time. From watching logs it's at least 5 - 10 seconds per filesystem. We get some speedup by using a ControlMaster setup in ssh, but still as the number of child filesystems goes up we have to adjust the replication interval to accommodate doing this serially.
@griznog : by my experience, there is a large lag of zfs recv
looking through target pool's snapshot guids before it decides how to receive each new snapshot. It is relatively negligible overhead for large snapshots, but annoying one for empty ones still taking many seconds to get 324 bytes or so. And alas happens also for manual zfs send -R
activities too, though maybe with a bit less overhead than for many individual sends, maybe not... I'm not sure anymore :-\
I wonder if implementing an opposite option, with configurably-parallel sends of child datasets of one configured backupSet schedule (extending the original mode where we zfs send
stuff one by one anyway, and only parallelize diffferent schedules) could be a suitable approach - to do more bulk data transfers in any particular second and less time spent doing ONLY those guid lookups or whatever the receiver waits on initially...
We aren't very particular about how we get speedup, just hoping that we can get something faster 😄 A quick test of manual zfs send -R
on our $HOME shows it is indeed not a silver bullet for this problem, it takes several minutes before it even starts sending any data.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Looking at code around https://github.com/oetiker/znapzend/blob/66baf324d60526ab565844e11c363e5b39cb4c19/lib/ZnapZend/ZFS.pm#L339 (and at processes actually running on my system) I think
znapzend
only sends an incremental or full update forlastSnapshot
of one dataset. So for a fairly zfs-treeish system with hundreds of datasets, it makes (eventually, not at once) hundreds ofzfs send|mbuffer|(ssh|)zfs recv
forks, each with its latency hit ofzfs
processes talking to kernel for several seconds or minutes to begin actual I/O (Related to #104 - similar use-case).If we have a
recursive = on
setting for snapshots creation and cleaning, why not use it for sending? At least, if it does succeed, we are more quickly done. If it fails, we can re-evaluate which datasets and snapshots exist on both sides, and catch up one-by-one to get at the specific errors.The #104 issue discussion goes further into the land of wanna-woulda, with different retention policies and so sending-schedules (as well as, in my earlier posted wishlist, an exclusion support - so e.g. I set up once the pool retention rules, but point-exclude the swap/dump/... datasets). Both of these directions call for more sophisticated schedule calculator than current all-or-nothing, but it is doable). Those features would be great, and with the scheduler mentioned above the computer could calculate the work scope needed (with e.g. differently-scheduled
zfs send -R
calls for different sub-trees with same settings under some point) rather than admins doing such grunt work, having recursive actions and different policies would not all contradict each other.Note that
zfs send -R
comes with caveats - such as that receiving withzfs recv -F
would remove snapshots and datasets absent on the source, so this flag should probably not be used in such cases and only reserved for certain cases (e.g. someone modified the destination dataset, listed withatime=on
or some such -- do a single old sync of each such dataset with rollback as we do now to make incremental sends work again, and then massive replication send without rollbacks).