oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
608 stars 138 forks source link

Introduce feature "zfsGetType" for "zfs get -t" support #452

Closed jimklimov closed 4 years ago

jimklimov commented 4 years ago

Found an issue that on my OI I needed to filter the recursive zfs get output (got all snapshots listed) so introduced get -t to filter. But now saw that on my Solaris 10 systems this flag is not supported.

CC @oetiker and @lotheac for a better look :)

coveralls commented 4 years ago

Coverage Status

Coverage increased (+1.4%) to 91.54% when pulling 6302ab41af6b1d3916797fb61bfa35be43d01b86 on jimklimov:zfs-get-t into 37a001878e69dd91e147ccbf59b5e3b1f86ab6ab on oetiker:master.

jimklimov commented 4 years ago

Caveat found : with the current design of znapzendzetup, the "root options" including --features must immediately follow the $mainOpt (e.g. list), and only then the CLI args relevant for this mainOpt.

So znapzendzetup list --sudo --features=zfsGetType --debug --recursive rpool/SHARED works as expected, while znapzendzetup list --debug --recursive --features=zfsGetType rpool/SHARED does not (keeps zfsGetType not-set).

jimklimov commented 4 years ago

For a feel of the (kernel) time and text-size trade-offs running these different options, on moderately recent OI SunOS jimoi 5.11 illumos-052042909d i86pc i386 i86pc illumos after priming the ARC with data (running the selections a few times over so timing is consistent - no storage I/O involved):

real 0m31.096s user 0m2.300s sys 0m28.383s


* Constrain to only looking at non-snapshot datasets - `zfs` command completes a lot faster, and outputs less text:

root@jimoi:/root# time zfs get -H -t filesystem,volume -r -o name,property,value,source all nvpool/SHARED | wc -l 2301

real 0m0.186s user 0m0.028s sys 0m0.169s


* Constrain to only looking at non-snapshot datasets AND the "local" source of the attribute, as znapzend does - `zfs` command completes about as fast, and finds just the one dataset which has the backup plan (11 lines):

root@jimoi:/root# time zfs get -H -s local -t filesystem,volume -r -o name,property,value,source all nvpool/SHARED | wc -l 11

real 0m0.208s user 0m0.025s sys 0m0.200s


* After all of the above, re-run the constraint to "local" source but no constraint to dataset type - about as slow as the topmost no-constraints case (a bit faster since less text is pushed around):

root@jimoi:/root# time zfs get -H -s local -r -o name,property,value,source all nvpool/SHARED | wc -l 11

real 0m28.466s user 0m1.966s sys 0m26.428s


ZFS-tree-wise, here are the counts of different dataset types involved:

root@jimoi:/root# time zfs list -Honame -t snapshot -r nvpool/SHARED | wc -l 8925

real 0m10.770s user 0m1.690s sys 0m9.055s

root@jimoi:/root# time zfs list -Honame -t volume -r nvpool/SHARED | wc -l 0

real 0m0.136s user 0m0.025s sys 0m0.127s

root@jimoi:/root# time zfs list -Honame -t filesystem -r nvpool/SHARED | wc -l 31

real 0m0.125s user 0m0.017s sys 0m0.118s

jimklimov commented 4 years ago

Similar timing on Solaris 10u8 - there is no filtering by dataset type (the crux of this PR) :

[root@x4100 ~]# time zfs get -H  -r -o name,property,value,source all rpool/SHARED | wc -l
    9219

real    0m4.590s
user    0m0.109s
sys     0m1.638s

[root@x4100 ~]# time zfs get -H  -s local -r -o name,property,value,source all rpool/SHARED | wc -l
      12

real    0m3.335s
user    0m0.092s
sys     0m1.217s

[root@x4100 ~]# time zfs get -H  -t filesystem,volume -r -o name,property,value,source all rpool/SHARED | wc -l
invalid option 't'
usage:
        get [-rHp] [-d max] [-o field[,...]] [-s source[,...]]
            <"all" | property[,...]> [filesystem|volume|snapshot] ...

The following properties are supported:
...

Amounts of datasets involved:

[root@x4100 ~]# time zfs list -Honame -t snapshot -r rpool/SHARED | wc -l
     469

real    0m0.347s
user    0m0.071s
sys     0m0.092s

[root@x4100 ~]# time zfs list -Honame -t filesystem -r rpool/SHARED | wc -l
       7

real    0m0.014s
user    0m0.005s
sys     0m0.010s

[root@x4100 ~]# time zfs list -Honame -t volume -r rpool/SHARED | wc -l
       0

real    0m0.013s
user    0m0.005s
sys     0m0.010s

Filtering with egrep to see the "live" datasets only gets a lot less text to parse:

[root@x4100 ~]# zfs get -H  -r -o name,property,value,source all rpool/SHARED | egrep '^[^@]*[ \t]' | wc -l
     270
jimklimov commented 4 years ago

And timing of this PR's code on OI:

NOTE: if you have modified your configuration, send a HUP signal (pkill -HUP znapzend) to your znapzend daemon for it to notice the change. 14

real 0m0.441s user 0m0.189s sys 0m0.260s


* Backwards-compatible by default, but slower:

root@jimoi:/root# time ~jim/shared/znapzend/bin/znapzendzetup list -r nvpool/SHARED | wc -l cannot open 'backup-adata/snapshots/nvpool/SHARED': dataset does not exist WARNING: destination 'backup-adata/snapshots/nvpool/SHARED' does not exist, will be ignored!

NOTE: if you have modified your configuration, send a HUP signal (pkill -HUP znapzend) to your znapzend daemon for it to notice the change. 14

real 0m29.772s user 0m2.200s sys 0m27.498s



Finally note this timing difference is only expected to impact the case of recently added (and explicitly requested via CLI) recursive search for backup plans.

The other recently added mode, which only looks at the specified dataset(s) looks into them directly without recursing further (it may suffer on Solaris 10 with the master branch before this PR though, as it does `get -t` only). The very default mode of looking at all filesystem or volume datasets is similar: it lists them first and then looks at the list items one by one. I timed these to perform similarly (~19s on the OI system above for the full search with and without the new feature).
oetiker commented 4 years ago

how about adding a note into the documentation about the motivation ... performance gain for this option.

jimklimov commented 4 years ago

Hm... my earlier posting stands corrected. Solaris 10u8 does list the properties for snapshots with that zfs get -H -r -o name,property,value,source all rpool/SHARED request above, but yet does not allow to filter them away right in the zfs tool.

jimklimov commented 4 years ago

This research also led to an improvement proposal, to filter away snapshots if our zfs get call does not as soon as possible, to spend less cycles processing those lines. Seems like this can also prevent errors (passing snapshot names to routines that expect a "live" dataset) if nothing else.

From cursory manual testing, seems this filtering part is relevant when not just recursing datasets, but when asked to look at inherited properties.

jimklimov commented 4 years ago

The latest red-cross in CI is just ETOOFEWMOCKS ... I'm not gonna fight this bit in t/zfs and the tests (to have zfs get return some snapshots too) today, sorry ;)

jimklimov commented 4 years ago

Trying a fix for the lack of mocks. Found a small omission that acts like a bug, too.

jimklimov commented 4 years ago

Hm... the latest improvements to tests and code broke something... again :(

jimklimov commented 4 years ago

Marking as "WIP" to figure out the test fails...

jimklimov commented 4 years ago

Why oh why?..

In the test I set the envvar into ENV array, then undef it. From the test program it is no longer defined. But from the called zfs it is defined, just without value :\

So where it is intended, it works:

znapzend.t: Now ZNAPZENDTEST_ZFS_GET_TYPE_UNHANDLED = 1
znapzend.t: Is defined? Y
[Fri Nov 22 17:36:59 2019] [info] znapzend (PID=6311) starting up ...
[Fri Nov 22 17:36:59 2019] [info] refreshing backup plans for dataset "tank/source" ...
# zfs get -H -s local -t filesystem,volume -o name,property,value,source all tank/source
zfs: Now ZNAPZENDTEST_ZFS_GET_TYPE_UNHANDLED = 1
zfs: Is defined? Y
invalid option 't'
usage:
...

When it is not intended... oh well... :

ok 3 - znapzend --features=zfsGetType --runonce=tank/source fails with old ZFS
Use of uninitialized value $ENV{"ZNAPZENDTEST_ZFS_GET_TYPE_UNHAND"...} in concatenation (.) or string at ./t/znapzend.t line 188.
znapzend.t: Now ZNAPZENDTEST_ZFS_GET_TYPE_UNHANDLED = 
znapzend.t: Is defined? N
[Fri Nov 22 17:36:59 2019] [info] znapzend (PID=6311) starting up ...
[Fri Nov 22 17:36:59 2019] [info] refreshing backup plans for dataset "tank/source" ...
# zfs get -H -s local -t filesystem,volume -o name,property,value,source all tank/source
zfs: Now ZNAPZENDTEST_ZFS_GET_TYPE_UNHANDLED = 
zfs: Is defined? Y
invalid option 't'
usage:
        get [-rHp] [-d max] [-o "all" | field[,...]] [-s source[,...]]
            <"all" | property[,...]> [filesystem|volume|snapshot] ...

EXCEPTION: No backup set defined or enabled, yet. run 'znapzendzetup' to setup znapzend

not ok 4 - znapzend --features=zfsGetType --runonce=tank/source succeeds with new ZFS when it lists snapshots

Printing done by instrumenting this into both files:

print STDERR "znapzend.t: Now ZNAPZENDTEST_ZFS_GET_TYPE_UNHANDLED = " . $ENV{'ZNAPZENDTEST_ZFS_GET_TYPE_UNHANDLED'} . "\n";
print STDERR "znapzend.t: Is defined? " . ( (defined $ENV{'ZNAPZENDTEST_ZFS_GET_TYPE_UNHANDLED'}) ? "Y" : "N" ) . "\n";
jimklimov commented 4 years ago

Hopefully changing such envvar handling into non-empty (or specific 1 value) checks would fix this.

jimklimov commented 4 years ago

Green and covered, is there something more to fix here? :)

oetiker commented 4 years ago

Green and covered, is there something more to fix here? :)

if you think a PR is done, just click on the review button ... :)