I am debugging my setup with partially-disabled trees of datasets (home dir with setup of a build agent should be backed up, but scratch working areas and caches should not).
Currently znapzend lists all datasets as "sending snapshots from..." which is a bit misleading.
With this PR I externalize listDisabledSourceDescendants() so it applies not only to createSnapshot() but also to e.g. sendRecvCleanup(). In fact, after some back and forth, the array of known-disabled local (source) descendant dataset names is queried from ZFS once and attached to $backupSet as @{$backupSet->{srcDisabledDescendants}} so it can be re-used quickly and in different places.
So now the report goes like:
[2024-01-15 10:08:05.52188] [2405200] [info] refreshing backup plans for dataset "rpool/home/abuild" ...
[2024-01-15 10:08:05.87011] [2405200] [info] checking for explicitly excluded ZFS dependent datasets under 'rpool/home/abuild'
[2024-01-15 10:08:06.29950] [2405200] [info] Found disabled sub: rpool/home/abuild/.ccache
[2024-01-15 10:08:06.29973] [2405200] [info] Found disabled sub: rpool/home/abuild/jenkins-nut
[2024-01-15 10:08:06.29977] [2405200] [info] Found disabled sub: rpool/home/abuild/jenkins-nut-altroots
[2024-01-15 10:08:06.29981] [2405200] [info] Found disabled sub: rpool/home/abuild/jenkins-nut-doc
[2024-01-15 10:08:06.30024] [2405200] [info] found a valid backup plan for rpool/home/abuild...
...
[2024-01-15 10:13:25.93202] [2410053] [debug] sending snapshots from rpool/home/abuild to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild
[2024-01-15 10:13:42.82858] [2410053] [debug] sending snapshots from rpool/home/abuild/.ccache to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache: not enabled, should be skipped
...
Barring any bugs, this PR should not change the znapzend end-user behavior beyond such cosmetics.
Looking at some further work ahead, I see a couple of issues with the existing logic (screenshot below):
Currently each dataset not slated for retention must have an explicit org.znapzend:enabled=off - and this is not inherited by its descendants (well, it is a "storage=inherited" attribute, but not a "storage=local" so ignored by znapzend). This is by design so far, but is cumbersome for large setups where I'd want a whole tree pruned with whatever datasets appear there over time, so I propose to add handling for such datasets that can optionally declare both enabled='on|off' and recursion=on for such purpose (currently there is special handling for datasets that declare only one property and that is enabled).
[ ] @oetiker : WDYT? :) UPDATE: A proposal about this is tackled in #625
Logic for these disablements is such that a recursive snapshot of the backupSet (the one with a full znapzendzetup schedule) is made atomically, data is sent, and then disabled snapshots get removed locally and remotely.
I believe, in non-oracleMode this is handled as one recursive send, hence the trickery. With it in place, each dataset goes one by one so might be skipped cleanly - especially now that we have a way to know?..
[ ] Maybe for backupSets with recursion and some not-enabled descendants, we should fall back to oracleMode even if it is not asked for in config (and then exclusions quickly skipped from sending)? @oetiker : WDYT? :) UPDATE: A proposal about this quick skip is tackled in #626
In some but not all cases I see it goes to try sending out the snapshots for disabled datasets as well: e.g. rpool/home/abuild/.ccache is sent below, rpool/home/abuild/jenkins-nut-altroots is not, both are disabled. In fact it seems that the sending (with attempt to unmount and redefine the dataset?) happens if there is no snapshot on the destination (backup pool), and does not happen if there are snapshots (maybe it helps that they are compatible between the two hosts, as well).
[2024-01-15 10:19:14.48192] [2414199] [info] starting work on backupSet rpool/home/abuild
[2024-01-15 10:19:14.51896] [2414199] [debug] sending snapshots from rpool/home/abuild to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild
[2024-01-15 10:19:19.43557] [2414199] [debug] sending snapshots from rpool/home/abuild/.ccache to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache: not enabled, should be skipped
cannot unmount '/srv/libvirt/abuild/.ccache': permission denied
warning: cannot send 'rpool/home/abuild/.ccache@znapzend-auto-2024-01-15T10:11:47Z': signal received
[2024-01-15 10:19:20.09820] [2414199] [warn] ERROR: cannot send snapshots to pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache on znapzend
[2024-01-15 10:19:20.09856] [2414199] [debug] sending snapshots from rpool/home/abuild/jenkins-nut to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut: not enabled, should be skipped
[2024-01-15 10:19:21.43644] [2414199] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots: not enabled, should be skipped
[2024-01-15 10:19:21.84354] [2414199] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh
[2024-01-15 10:19:42.23397] [2414199] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots/jenkins-archlinux-amd64 to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/jenkins-archlinux-amd64
...
Or in more detail:
...
[2024-01-15 10:33:02.29285] [2423768] [info] starting work on backupSet rpool/home/abuild
# zfs list -H -r -o name -t filesystem,volume rpool/home/abuild
[2024-01-15 10:33:02.33838] [2423768] [debug] sending snapshots from rpool/home/abuild to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild
[2024-01-15 10:33:02.33871] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild
# zfs send -Lce -I 'rpool/home/abuild@znapzend-auto-2024-01-15T10:19:14Z' 'rpool/home/abuild@znapzend-auto-2024-01-15T10:33:01Z'|ssh -o batchMode=yes -o ConnectTimeout=30 znapzend 'zfs recv -u -F pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild'
# zfs list -H -o name -t snapshot rpool/home/abuild@znapzend-auto-2024-01-15T10:33:01Z
# zfs set org.znapzend:dst_0=znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild rpool/home/abuild@znapzend-auto-2024-01-15T10:33:01Z
# zfs set org.znapzend:dst_0_synced=1 rpool/home/abuild@znapzend-auto-2024-01-15T10:33:01Z
[2024-01-15 10:33:07.86728] [2423768] [debug] sending snapshots from rpool/home/abuild/.ccache to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache: not enabled, should be skipped
[2024-01-15 10:33:07.86772] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild/.ccache
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 'pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache'
# zfs send -Lce 'rpool/home/abuild/.ccache@znapzend-auto-2024-01-15T10:11:47Z'|ssh -o batchMode=yes -o ConnectTimeout=30 znapzend 'zfs recv -u -F '"'"'pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache'"'"''
cannot unmount '/srv/libvirt/abuild/.ccache': permission denied
warning: cannot send 'rpool/home/abuild/.ccache@znapzend-auto-2024-01-15T10:11:47Z': signal received
[2024-01-15 10:33:08.50054] [2423768] [warn] ERROR: cannot send snapshots to pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache on znapzend
[2024-01-15 10:33:08.50091] [2423768] [debug] sending snapshots from rpool/home/abuild/jenkins-nut to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut: not enabled, should be skipped
[2024-01-15 10:33:08.50104] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild/jenkins-nut
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut
# zfs list -H -o name -t snapshot rpool/home/abuild/jenkins-nut@znapzend-auto-2024-01-15T10:11:47Z
# zfs set org.znapzend:dst_0=znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut rpool/home/abuild/jenkins-nut@znapzend-auto-2024-01-15T10:11:47Z
# zfs set org.znapzend:dst_0_synced=1 rpool/home/abuild/jenkins-nut@znapzend-auto-2024-01-15T10:11:47Z
[2024-01-15 10:33:09.36190] [2423768] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots: not enabled, should be skipped
[2024-01-15 10:33:09.36220] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild/jenkins-nut-altroots
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots
# zfs list -H -o name -t snapshot rpool/home/abuild/jenkins-nut-altroots@znapzend-auto-2024-01-15T10:11:47Z
# zfs set org.znapzend:dst_0_synced=1 rpool/home/abuild/jenkins-nut-altroots@znapzend-auto-2024-01-15T10:11:47Z
# zfs set org.znapzend:dst_0=znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots rpool/home/abuild/jenkins-nut-altroots@znapzend-auto-2024-01-15T10:11:47Z
[2024-01-15 10:33:09.80435] [2423768] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh
[2024-01-15 10:33:09.80467] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh
# zfs send -Lce -I 'rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh@znapzend-auto-2024-01-15T10:19:14Z' 'rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh@znapzend-auto-2024-01-15T10:33:01Z'|ssh -o batchMode=yes -o ConnectTimeout=30 znapzend 'zfs recv -u -F pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh'
...
I am debugging my setup with partially-disabled trees of datasets (home dir with setup of a build agent should be backed up, but scratch working areas and caches should not).
Currently
znapzend
lists all datasets as "sending snapshots from..." which is a bit misleading.With this PR I externalize
listDisabledSourceDescendants()
so it applies not only tocreateSnapshot()
but also to e.g.sendRecvCleanup()
. In fact, after some back and forth, the array of known-disabled local (source) descendant dataset names is queried from ZFS once and attached to$backupSet
as@{$backupSet->{srcDisabledDescendants}}
so it can be re-used quickly and in different places.So now the report goes like:
Barring any bugs, this PR should not change the
znapzend
end-user behavior beyond such cosmetics.Looking at some further work ahead, I see a couple of issues with the existing logic (screenshot below):
Currently each dataset not slated for retention must have an explicit
org.znapzend:enabled=off
- and this is not inherited by its descendants (well, it is a "storage=inherited" attribute, but not a "storage=local" so ignored byznapzend
). This is by design so far, but is cumbersome for large setups where I'd want a whole tree pruned with whatever datasets appear there over time, so I propose to add handling for such datasets that can optionally declare bothenabled='on|off'
andrecursion=on
for such purpose (currently there is special handling for datasets that declare only one property and that isenabled
).Logic for these disablements is such that a recursive snapshot of the backupSet (the one with a full znapzendzetup schedule) is made atomically, data is sent, and then disabled snapshots get removed locally and remotely.
oracleMode
even if it is not asked for in config (and then exclusions quickly skipped from sending)? @oetiker : WDYT? :) UPDATE: A proposal about this quick skip is tackled in #626rpool/home/abuild/.ccache
is sent below,rpool/home/abuild/jenkins-nut-altroots
is not, both are disabled. In fact it seems that the sending (with attempt to unmount and redefine the dataset?) happens if there is no snapshot on the destination (backup pool), and does not happen if there are snapshots (maybe it helps that they are compatible between the two hosts, as well).Or in more detail: