jimklimov commented 3 years ago

This follows up from #384 and #506 to hopefully avoid a potential issue with "inherit" mode ignoring the parent's dst_X_synced values.

Testing privately, review welcome to clear for merging. I hope it won't eat our data :)

jimklimov commented 3 years ago

From testing:

znapzending a dataset nvpool/SHARED which is a policy root:


:; ./bin/znapzend --nodestroy --features=zfsGetType,recvu,compressed,skipIntermediates --autoCreation --debug --runonce=nvpool/SHARED/var

... [Mon Aug 3 17:12:55 2020] [debug] sending snapshots from nvpool/SHARED/opt/glassfish-4.0 to backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0

zfs list -H -o name -t snapshot -s creation -d 1 nvpool/SHARED/opt/glassfish-4.0

zfs list -H -o name -t snapshot -s creation -d 1 backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0

zfs send -Lce -i 'nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T14:56:40Z' 'nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T15:08:18Z'|/usr/bin/amd64/mbuffer -q -s 256k -W 600 -m 1G|zfs recv -u -F 'backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0'

zfs list -H -o name -t snapshot nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T15:08:18Z

zfs set org.znapzend:dst_0_synced=1 nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T15:08:18Z

zfs set org.znapzend:dst_0=backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0 nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T15:08:18Z

... [Mon Aug 3 17:17:32 2020] [debug] got nothing to clean in children of source nvpool/SHARED

zfs list -H -o name -t snapshot -s creation -d 1 nvpool/SHARED/opt/glassfish-4.0

zfs list -H -o name -t snapshot -s creation -d 1 backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0

[Mon Aug 3 17:17:34 2020] [debug] not cleaning up source nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T15:08:18Z because it is needed by backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0 [Mon Aug 3 17:17:34 2020] [debug] got nothing to clean in children of source nvpool/SHARED ...


From the above I see that it:
** successfully sets `dst_X_synced=1` property,
** honors this property by not cleaning the just-created snapshot,
** ...which it should not have cleaned anyway, so I suppose it should have gone looking to keep the older second-newest snapshot? or at least not inspected/logged this one?
** logs many times (for each child and their children) the verbatim same mantra "got nothing to clean in children of source nvpool/SHARED" (TODO: check the loop and message) UPDATE: fixed

jimklimov commented 3 years ago

For an inherited mode:

[Mon Aug  3 19:41:44 2020] [debug] now will look if there is anything to clean in children of destination backup-adata/snapshots/nvpool/SHARED/var/adm
# zfs list -H -o name -t snapshot -s creation -d 1 nvpool/SHARED/var/adm
# zfs list -H -o name -t snapshot -s creation -d 1 nvpool/SHARED/var/adm
# zfs get -H -s local,inherited -t snapshot -r -o property,value all nvpool/SHARED/var/adm@znapzend-auto-2020-08-03T17:41:40Z

### Here it is recursing into parent (hm, should somehow make snapshotExists quieter in case of missing snapshot):

# zfs list -H -o name -t snapshot nvpool/SHARED/var@znapzend-auto-2020-08-03T17:41:40Z
cannot open 'nvpool/SHARED/var@znapzend-auto-2020-08-03T17:41:40Z': dataset does not exist

[Mon Aug  3 19:41:45 2020] [debug] not cleaning up source nvpool/SHARED/var/adm@znapzend-auto-2020-08-03T17:41:40Z recursively because it is needed by backup-adata/snapshots/nvpool/SHARED/var/adm
[Mon Aug  3 19:41:45 2020] [debug] cleaning up source snapshots recursively under nvpool/SHARED/var/adm
# WOULD # zfs destroy -r nvpool/SHARED/var/adm@znapzend-auto-2020-08-03T16:37:20Z,znapzend-auto-2020-08-03T16:51:50Z,znapzend-auto-2020-08-03T16:58:52Z,znapzend-auto-2020-08-03T17:27:57Z,znapzend-auto-2020-08-03T17:33:11Z,znapzend-auto-2020-08-03T17:34:49Z
[Mon Aug  3 19:41:45 2020] [debug] now will look if there is anything to clean in children of source nvpool/SHARED/var/adm
[Mon Aug  3 19:41:45 2020] [info] done with backupset nvpool/SHARED/var/adm in 5 seconds
[Mon Aug  3 19:41:45 2020] [debug] send/receive worker for nvpool/SHARED/var/adm done (17830)

So seems to work and try to protect the snapshots now. Usually however this fast-tracks to the older lastAndCommonSnapshots() when both source and destination are up, instead of looping to call the new getSnapshotProperties() in mostRecentCommonSnapshot() -- to test that, have to neuter the clause for lastAndCommonSnapshots (if (0) {...}).

In either case it seems to protect the newest snapshot made (which is reasonable since it is replicated after all) and can consider dropping older snapshots (e.g. if I want 1 per hour, a 3-minute old one from earlier test is not needed anymore).

I did not yet manage to get it to try and delete a snapshot made earlier in backupSet's (nvpool/SHARED) history, maybe its scheduled time did not come yet.

Sometimes it ends up with this error however, when looking at a missing snapshot -- probably gotta merge hashes differently:

# zfs list -H -o name -t snapshot nvpool@znapzend-auto-2020-08-03T17:45:53Z
cannot open 'nvpool@znapzend-auto-2020-08-03T17:45:53Z': dataset does not exist
Reference found where even-sized list expected at /export/home/jim/shared/znapzend/bin/../lib/ZnapZend/ZFS.pm line 1035.
Reference found where even-sized list expected at /export/home/jim/shared/znapzend/bin/../lib/ZnapZend/ZFS.pm line 1035.

And also better find a way to not spew "dataset does not exist" to stderr where we anticipate it may not exist.

jimklimov commented 3 years ago

Algorithmic flaws, and not all are introduced by this work.

Key points: when we do get into the mode of looking for older dst_X_synced property flags (dest is offline so the just-made snapshot is not the newest common known), we now iterate into parent datasets to inspect their same-named snapshots and newly check if they have the property defined (and seems it is not inheritable into same-named snapshots of child datasets).

The check can optionally be done first in recursive mode, so zfs can destroy all same-named datasets in a tree starting from current backupSet->src root, and then we follow up by the legacy mode inspecting snapshots that remain in child datasets to see if any of those should be cleared (e.g. had earlier run inherited mode and have obsolete snapshots that are not defined in the root backupSet).

In this legacy mode (e.g. without recursive snapshot deletion support) we immediately look at backupSet and each its sub-dataset, and for each one we look at their snapshots individually. If we search for some parent which maybe is setting the dst_X_synced - there is quite a bit of recursing up into parents' values already requested. More so, if we do currently iterate from parents to children (which seems to be the case per definition from code pasted below, but we can reverse-sort this for certainty) and destroy the targets we see first, there will be no parent value to see as inherited unless we cache it:

   #get all sub datasets of source filesystem; need to send them all individually if recursive                                                                            
    my $srcSubDataSets = $backupSet->{recursive} eq 'on'                                                                                                                   
        ? $self->zZfs->listSubDataSets($backupSet->{src}) : [ $backupSet->{src} ];

Take-aways:

we should not do this iterative approach for dst_X_synced if the current dataset is already the real backupSet (directly has the policy defined, we are not using the "--inherited" mode), and so not waste some processing time in the default mode for most users
we should not re-check in child datasets the remaining snapshot names we already inspected (and decided to keep alive) in their parent, our current backupSet (inherited or root) ** UPDATE: fixed
in legacy mode, maybe we should prune the leaf child datasets first and go up to the backupSet eventually (so there is always some "inherited from same-named snapshot" value of dst_X_synced as long as possible, during the cleanup loop or if we break out of it/crash/...) ** UPDATE: fixed
it seems that when we descent into child datasets and hit a branch where several sub-datasets have same snapshot names, e.g. there was an "inherited" mode run from that point in tree, the current iteration goes from each such child and grandchild up in the tree, calling zfs get to look for dst_X_synced, and so re-parses the same data over and over (mostly finding nothing). While this is pedantically correct, it can easily spiral to be too costly in I/O, forking, processing... so there should be some safe and reliable way that we can cache the results and quickly see that "from this dataset@snapname upwards we know this property is defined to X, or is definitely not-set" and only request each such data once. And somehow not eat all RAM (fall back to current mode if lowMemRecursion?)... at least, we should not go above the current backupSet to request same items more than once, while inspecting its children also while looking for properties set by znapzend, we should not go up above a root backupSet (one that is a filesystem/volume dataset defines the properties locally; it may be the one reported by zfs get in the source column for values inherited from)
we should avoid getting all snapshot/dataset properties when we know we are going to use one or two specific names and discard the rest - pass those names to zfs get propname1,propname2,... dataset@snap right away. If we know we are looking for one particular attribute and have a hit (from some cache we might make, or from OS), we should not iterate up into parents UPDATE: fixed and fixed
mostRecentCommonSnapshot() currently always goes to zfs for all dataset discovery and interrogation; it does not consult the @snapshots list that we could pre-filter (and which already is a filtered superset of our list of what we are considering toDestroy and so is less to look at than the whole ZFS snapshot history in the dataset). on the up-side, this allows the routine to find common snapshots which are not part of the @toDestroy and/or @snapshot lists; on the down-side with current logic this might not matter since we were not going to delete such snapshots anyway... Oh wait, my original point is somewhat moot and I'll record it as such: we actually use this for double-checking that when the destination was offline (so there was a replication error), we have at least some recorded common snapshot so we can proceed with clean-up and not bail. Maybe we might optimize by first checking for such hit in the pre-selected constrained list of @snapshots and then fall back to checking the other snapshots (all present except these already checked), but not sure it is worth the complexity, and may lead znapzend into keeping snapshots it could otherwise safely delete which is a pathology we are fixing per #384.

jimklimov commented 3 years ago

Seems we have a nice bonus for the dst_X_synced use-case: the code also stores dst_X as a property of the snapshot that was the current backupSet source (e.g. for inherited ones too), which allows us to cut iterations short when we hit the true parent of a snapname:

# zfs get all nvpool/SHARED/var/adm@znapzend-auto-2020-08-03T16:51:50Z | grep local
nvpool/SHARED/var/adm@znapzend-auto-2020-08-03T16:51:50Z  org.znapzend:dst_0          backup-adata/snapshots/nvpool/SHARED/var/adm                       local
nvpool/SHARED/var/adm@znapzend-auto-2020-08-03T16:51:50Z  org.znapzend:dst_0_synced   1                                                                  local

:; zfs get all nvpool/SHARED/var/adm | grep local
:; zfs get all nvpool/SHARED/var | grep local
:; zfs get all nvpool/SHARED | grep local
nvpool/SHARED  org.znapzend:dst_0_plan     1weeks=>30minutes,2months=>1days,4months=>1weeks,10years=>1months  local
nvpool/SHARED  org.znapzend:dst_0          backup-adata/snapshots/nvpool/SHARED                               local
nvpool/SHARED  org.znapzend:mbuffer_size   1G                                                                 local
nvpool/SHARED  org.znapzend:src_plan       1weeks=>30minutes,2months=>1days,4months=>1weeks,10years=>1months  local
nvpool/SHARED  org.znapzend:zend_delay     0                                                                  local
nvpool/SHARED  org.znapzend:tsformat       znapzend-auto-%Y-%m-%dT%H:%M:%SZ                                   local
nvpool/SHARED  org.znapzend:post_znap_cmd  off                                                                local
nvpool/SHARED  org.znapzend:mbuffer        /usr/bin/amd64/mbuffer                                             local
nvpool/SHARED  org.znapzend:pre_znap_cmd   off                                                                local
nvpool/SHARED  org.znapzend:recursive      on                                                                 local
nvpool/SHARED  org.znapzend:enabled        on                                                                 local

jimklimov commented 3 years ago

At the moment it looks pretty neat and functional. There may be some points of improvement as listed in detail above, primarily:

reduce unneeded I/O and processing load from requests to zfs command getting data we had seen recently ** stop recursing into parents when we pass a border (root backupSet, initial backupSet this run is iterating from) even if we do not find a way to cache all seen results
reduce noise from snapshotExists() probing a snapshot that does not exist and we anticipate it is not there - find way to hide its STDERR into /dev/null if we asked for $quiet==1

but even so, performance does not seem to take extreme hits on reasonably small amounts of datasets, and when destination is online, and when we do not "runonce inherit" which is the primary mode of operation - so these and other improvements can happen in other PRs and by other authors ;)

Maybe the positive result is also due to balance of slow-downs added in cases where this code has to run vs. speed-ups in createSnapshot (no longer listing all datasets known to zfs) and not cleaning child source snapshots that we already saw in recursive cleanup mode.

jimklimov commented 3 years ago

Another TODO (now or in another PR to not stall this one): if the idea with not-revisiting snapshot cleanup for tags we have considered in recursive mode seems okay, it can be replicated into destination clean-ups too.

UPDATE: done

jimklimov commented 3 years ago

Hm, seems like we say the snapshots seen in recursive mode would not be revisited in toDestroy list, but then the list still contains them. Gotta check that this grep works...

[Fri Aug  7 13:25:23 2020] [debug] checking to clean up snapshots of source nvpool/SHARED/var/adm
# zfs list -H -o name -t snapshot -s creation -d 1 nvpool/SHARED/var/adm
[Fri Aug  7 13:25:23 2020] [debug] not considering whether to clean source nvpool/SHARED/var/adm@znapzend-auto-2020-07-30T19:25:08Z as it was already processed in recursive mode
[Fri Aug  7 13:25:23 2020] [debug] not considering whether to clean source nvpool/SHARED/var/adm@znapzend-auto-2020-07-30T20:00:45Z as it was already processed in recursive mode
[Fri Aug  7 13:25:23 2020] [debug] not considering whether to clean source nvpool/SHARED/var/adm@znapzend-auto-2020-08-06T11:11:50Z as it was already processed in recursive mode
[Fri Aug  7 13:25:23 2020] [debug] not considering whether to clean source nvpool/SHARED/var/adm@znapzend-auto-2020-08-06T11:51:36Z as it was already processed in recursive mode
[Fri Aug  7 13:25:23 2020] [debug] not considering whether to clean source nvpool/SHARED/var/adm@znapzend-auto-2020-08-07T11:21:34Z as it was already processed in recursive mode

# zfs list -H -o name -t snapshot -s creation -d 1 nvpool/SHARED/var/adm
# zfs list -H -o name -t snapshot -s creation -d 1 backup-adata/snapshots/nvpool/SHARED/var/adm

[Fri Aug  7 13:25:23 2020] [debug] not cleaning up source nvpool/SHARED/var/adm@znapzend-auto-2020-08-07T11:24:07Z because it is needed by backup-adata/snapshots/nvpool/SHARED/var/adm

[Fri Aug  7 13:25:23 2020] [debug] cleaning up 34 snapshots on source nvpool/SHARED/var/adm

[Fri Aug  7 13:25:23 2020] [debug] $VAR1 = [
          'nvpool/SHARED/var/adm@znapzend-auto-2020-07-30T19:25:08Z',
          'nvpool/SHARED/var/adm@znapzend-auto-2020-07-30T20:00:45Z',
          'nvpool/SHARED/var/adm@znapzend-auto-2020-08-05T22:39:05Z',
...
          'nvpool/SHARED/var/adm@znapzend-auto-2020-08-06T10:55:44Z',
          'nvpool/SHARED/var/adm@znapzend-auto-2020-08-06T10:58:52Z',
          'nvpool/SHARED/var/adm@znapzend-auto-2020-08-06T10:59:48Z',
          'nvpool/SHARED/var/adm@znapzend-auto-2020-08-06T11:11:50Z',
          'nvpool/SHARED/var/adm@znapzend-auto-2020-08-06T11:51:36Z',
          'nvpool/SHARED/var/adm@znapzend-auto-2020-08-07T11:21:34Z'
        ];

So the FQSN's from "not considering whether to clean source dataset@snapname as it was already processed in recursive mode" are still in the list, but one from "not cleaning up source dataset@snapname because it is needed by Dst" is removed :\

UPDATE: Argh, regex lost in copypasting: grep { $snapname ne $_ }

jimklimov commented 3 years ago

Looks better now, almost no snaps to clean in the child phase because they were looked at in the recursive phase and decidedly left to stick around (including the case where they would have been deleted and not seen here in the first place, but --nodestroy mode was enabled).

jimklimov commented 3 years ago

Interesting, I think I over-engineered my worries for now: in manual experiments I was using zfs send -R -I ... | zfs recv for quicker wholesale replications -- something I intended or asked long time ago (#438) to use in znapzend for first stage sendings (like recursive clean-ups, that might not need following by child-by-child cleanups if all went well), so I assumed somehow this was the mode it used already - and then it would only mark the topmost parent dataset of a replication run with dst_X_synced flags in sendRecvSnapshot().

Well, we're saved by the bell: that mode is not in codebase yet, so every dataset in a subtree is replicated individually, marked as synced individually, and iteration to get the snapshot properties involved does not go far and cost much even without us making some caching and tree-pruning :)

jimklimov commented 3 years ago

Applied a few patches from #497 that do fix issues and would conflict the remaining PR after merging one to master first.

oetiker commented 3 years ago

thanks for your contribution

oetiker / znapzend

Implement getSnapshotProperties() "inherit from same-named snapshot" concept #507

zfs list -H -o name -t snapshot -s creation -d 1 nvpool/SHARED/opt/glassfish-4.0

zfs list -H -o name -t snapshot -s creation -d 1 backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0

zfs send -Lce -i 'nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T14:56:40Z' 'nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T15:08:18Z'|/usr/bin/amd64/mbuffer -q -s 256k -W 600 -m 1G|zfs recv -u -F 'backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0'

zfs list -H -o name -t snapshot nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T15:08:18Z

zfs set org.znapzend:dst_0_synced=1 nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T15:08:18Z

zfs set org.znapzend:dst_0=backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0 nvpool/SHARED/opt/glassfish-4.0@znapzend-auto-2020-08-03T15:08:18Z

zfs list -H -o name -t snapshot -s creation -d 1 nvpool/SHARED/opt/glassfish-4.0

zfs list -H -o name -t snapshot -s creation -d 1 nvpool/SHARED/opt/glassfish-4.0

zfs list -H -o name -t snapshot -s creation -d 1 backup-adata/snapshots/nvpool/SHARED/opt/glassfish-4.0