oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
604 stars 136 forks source link

Fix parser for root (only) dataset names - investigate #619

Closed jimklimov closed 5 months ago

jimklimov commented 5 months ago

Follows up from #585 and a Gitter discussion https://matrix.to/#/!XZJhhFzueFpkcSXLHR:gitter.im/$gVcghYo80LIz2zIdyJQyfxRYJkyb07mQjoB5bGH5zTU?via=gitter.im&via=inf.ethz.ch&via=matrix.org (and later posts during the day)

The crux of it is that znapzend v0.21.2 (current release) got confused when a backup schedule was defined on a root dataset of a pool AND the custom tsformat included colons to separate hours-minutes-seconds:

[2024-01-08 09:57:54.87309] [154415] [debug] === getDataSetProperties():
        Collected: $VAR1 = {
          'pre_znap_cmd' => 'off',
          'mbuffer' => 'off',
          'enabled' => 'on',
          'src_plan' => '1months=>1weeks,1years=>1months,10years=>6months',
          'mbuffer_size' => '128M',
          'zend_delay' => '0',
          'post_znap_cmd' => 'off',
          'src' => 'bpool',
          'dst_dst_0_plan' => '1months=>1week,1years=>1month,10years=>6months',
          'recursive' => 'on',
          'tsformat' => 'znapzend-auto-%Y-%m-%dT%H:%M:%SZ',
          'dst_dst_0' => 'znapzend:pond/export/DUMP/ci-deb/bpool'
        };

...

[2024-01-08 09:57:54.89115] [154583] [info] creating recursive snapshot on bpool
# ssh -o batchMode=yes -o ConnectTimeout=30 bpool@znapzend-auto-2024-01-08T10 zfs snapshot -r '22:13Z'
ssh: Could not resolve hostname znapzend-auto-2024-01-08t10: Name or service not known
# ssh -o batchMode=yes -o ConnectTimeout=30 22 zfs list -H -o name -t snapshot 13Z

So a request to snapshot a local bpool dataset as bpool@znapzend-auto-2024-01-08T10:22:13Z got interpreted by the generic routine as user=bpool, host=znapzend-auto-2024-01-08T10, dsname=22:13Z and an absent snap. The latter part also got further parsed for the subsequent command into host=22 and dsname=13Z, it seems.

This visibly strikes in sub createSnapshot {...}, sub destroySnapshots {...} and probably other consumers of sub $splitHostDataSet.

If the "DST" definition remains (per help/man) as [[user@]host:]dataset where dataset is fixed for DST but may be dsname[@snap] for general parsing (and snap may have some but not all sorts of funny characters) -- we don't really have good criteria (for regex or beside it) to tell apart a user@host from partial dataset@snap strings, in some cases, it seems.

There are various ideas in that thread that can be pursued as separate PRs. This one lays the foundations for such pursuits, by adding a few run-time sanity checks (e.g. to avoid destructive actions with bogus values), and some self-test code to gauge success of different solution attempts. A large helper in this effort is the t/znapzend-lib-splitter.t script which calls whatever implementations we have in ZFS.pm and runs them against a matrix of known remote, dataset and snapname strings (concatenated into what can be seen in production from configs and ZFS queries), to see if they get parsed back properly.

github-actions[bot] commented 5 months ago

@check-spelling-bot Report

Unrecognized words, please review:

Previously acknowledged words that are now absent aix Autotools bashisms CBuilder Cwd cygwin DBD ev Fcntl fh forkcall gh Gregy gz Ip JB JBERGER LEONT Mkbootstrap nf nh oi Pipely qq qw RCAPUTO README rr rw SUBDIRS SZ Ubuntu ve VOS wu wx xargs xf yy ZL
Some files were were automatically ignored These sample patterns would exclude them: ``` ^AUTHORS$ ^debian/znapzend\.links\.in$ ``` You should consider adding them to: ``` .github/workflows//spelling/excludes.txt ``` File matching is via Perl regular expressions. To check these files, more of their words need to be in the dictionary than not. You can use `patterns.txt` to exclude portions, add items to the dictionary (e.g. by adding them to `allow.txt`), or fix typos.
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands ... in a clone of the [null](null) repository on the `master` branch: ``` update_files() { perl -e ' my @expect_files=qw('".github/workflows//spelling/whitelist.txt"'); @ARGV=@expect_files; my @stale=qw('"$patch_remove"'); my $re=join "|", @stale; my $suffix=".".time(); my $previous=""; sub maybe_unlink { unlink($_[0]) if $_[0]; } while (<>) { if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; } next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print; }; maybe_unlink($previous);' perl -e ' my $new_expect_file=".github/workflows//spelling/whitelist.txt"; use File::Path qw(make_path); use File::Basename qw(dirname); make_path (dirname($new_expect_file)); open FILE, q{<}, $new_expect_file; chomp(my @words = ); close FILE; my @add=qw('"$patch_add"'); my %items; @items{@words} = @words x (1); @items{@add} = @add x (1); @words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items; open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; }; close FILE; system("git", "add", $new_expect_file); ' (cat '.github/workflows//spelling/excludes.txt' - < '.github/workflows//spelling/excludes.txt.temp' && mv '.github/workflows//spelling/excludes.txt.temp' '.github/workflows//spelling/excludes.txt' } comment_json=$(mktemp) curl -L -s -S \ --header "Content-Type: application/json" \ "https://api.github.com/repos/oetiker/znapzend/issues/comments/1881510638" > "$comment_json" comment_body=$(mktemp) jq -r .body < "$comment_json" > $comment_body rm $comment_json patch_remove=$(perl -ne 'next unless s{^(.*)
$}{$1}; print' < "$comment_body") patch_add=$(perl -e '$/=undef; $_=<>; s{
.*}{}s; s{^#.*}{}; s{\n##.*}{}; s{(?:^|\n)\s*\*}{}g; s{\s+}{ }g; print' < "$comment_body") should_exclude_patterns=$(perl -e '$/=undef; $_=<>; exit unless s{(?:You should consider excluding directory paths|You should consider adding them to).*}{}s; s{.*These sample patterns would exclude them:}{}s; s{.*\`\`\`([^`]*)\`\`\`.*}{$1}m; print' < "$comment_body" | grep . || true) update_files rm $comment_body git add -u ```