oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
604 stars 136 forks source link

Fix parser for root (only) dataset names #621

Closed jimklimov closed 5 months ago

jimklimov commented 5 months ago

Includes PRs #619 (see for most context of the initial problem) and #620 (so also addresses spellchecker passing after these changes).

After some tinkering with the splitter logic vs. conjured-up test cases, just one case remained where we can only define our preference (at least, lacking some other method arguments or wholly a new class to convey the information explicitly): how to treat X@Y:Z strings - are they a remote user@host:rootds or a local rootds@funny:snapname?

So the self-test defines this one use-case as a "bogus" one with somewhat reasonably "hard-coded" specific expectations:

NOTE: In absence of any hints we can not reliably discern below a task='poolrootfs@snap-2:3' vs. task='username@hostname:poolrootfs' which one is a local pool's root dataset with a funny but legal snapshot name, and which one is a remote user@host spec with a remote pool's root dataset. For practical purposes, we proclaim preference for the former: we are more likely to look at funny local snapshot names, than to back up to (or otherwise care about) remote pools' ROOT datasets.

...and so the self-tests for the parser are enabled as part of standard test runs here.

Also this uncovered deficiencies in ZnapZend::Config variant of the parser which got fixed now by the regex proposed in #585 (and improved upon here for ZnapZend::ZFS). It does not have to care about snapshots, so is a lot simpler.

github-actions[bot] commented 5 months ago

@check-spelling-bot Report

Unrecognized words, please review:

Previously acknowledged words that are now absent aix Autotools bashisms CBuilder Cwd cygwin DBD ev Fcntl fh forkcall gh Gregy gz Ip JB JBERGER LEONT Mkbootstrap nf nh oi Pipely qq qw RCAPUTO README rr rw SUBDIRS SZ Ubuntu ve VOS wu wx xargs xf yy ZL
Some files were were automatically ignored These sample patterns would exclude them: ``` ^AUTHORS$ ^debian/znapzend\.links\.in$ ``` You should consider adding them to: ``` .github/workflows//spelling/excludes.txt ``` File matching is via Perl regular expressions. To check these files, more of their words need to be in the dictionary than not. You can use `patterns.txt` to exclude portions, add items to the dictionary (e.g. by adding them to `allow.txt`), or fix typos.
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands ... in a clone of the [null](null) repository on the `master` branch: ``` update_files() { perl -e ' my @expect_files=qw('".github/workflows//spelling/whitelist.txt"'); @ARGV=@expect_files; my @stale=qw('"$patch_remove"'); my $re=join "|", @stale; my $suffix=".".time(); my $previous=""; sub maybe_unlink { unlink($_[0]) if $_[0]; } while (<>) { if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; } next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print; }; maybe_unlink($previous);' perl -e ' my $new_expect_file=".github/workflows//spelling/whitelist.txt"; use File::Path qw(make_path); use File::Basename qw(dirname); make_path (dirname($new_expect_file)); open FILE, q{<}, $new_expect_file; chomp(my @words = ); close FILE; my @add=qw('"$patch_add"'); my %items; @items{@words} = @words x (1); @items{@add} = @add x (1); @words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items; open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; }; close FILE; system("git", "add", $new_expect_file); ' (cat '.github/workflows//spelling/excludes.txt' - < '.github/workflows//spelling/excludes.txt.temp' && mv '.github/workflows//spelling/excludes.txt.temp' '.github/workflows//spelling/excludes.txt' } comment_json=$(mktemp) curl -L -s -S \ --header "Content-Type: application/json" \ "https://api.github.com/repos/oetiker/znapzend/issues/comments/1881836709" > "$comment_json" comment_body=$(mktemp) jq -r .body < "$comment_json" > $comment_body rm $comment_json patch_remove=$(perl -ne 'next unless s{^(.*)
$}{$1}; print' < "$comment_body") patch_add=$(perl -e '$/=undef; $_=<>; s{
.*}{}s; s{^#.*}{}; s{\n##.*}{}; s{(?:^|\n)\s*\*}{}g; s{\s+}{ }g; print' < "$comment_body") should_exclude_patterns=$(perl -e '$/=undef; $_=<>; exit unless s{(?:You should consider excluding directory paths|You should consider adding them to).*}{}s; s{.*These sample patterns would exclude them:}{}s; s{.*\`\`\`([^`]*)\`\`\`.*}{$1}m; print' < "$comment_body" | grep . || true) update_files rm $comment_body git add -u ```
oetiker commented 5 months ago

how about assuming case user@host:destds unless user actually is a local rootds ?

jimklimov commented 5 months ago

how about assuming case user@host:destds unless user actually is a local rootds ?

For practical purposes, we proclaim preference for the former: we are more likely to look at funny local snapshot names, than to back up to (or otherwise care about) remote pools' ROOT datasets.

Yep, so we declare that user@host:destds means local pool named user (and its root dataset), and its snapshot host:destds.

oetiker commented 5 months ago

what I mean is that the code could actually test(!) if a given string actually is a local root dataset

jimklimov commented 5 months ago

Ah, must have misread your earlier comment. Yes, I've had such thoughts too - just did not get to implementing them :)

Instead, wondered whether we should be concerned about:

On the opposite side, where can such string with user@host:rootds meaning a remote user at a remote host pop up? Only in DST schedules, or something else? We can just document that backing up to root datasets of remote pools is not supported, a child dataset tree for this backed-up environment is recommended (so the same target can store backups of other systems), and that's it... right?

oetiker commented 5 months ago

you are right this might be fragile ... another idea is this:

we know the actual src and destination configurations ... they do not include snapshots, so at that time things are still clear ...

so I guess we just need to alter the parsing flow

jimklimov commented 5 months ago

I guess we just need to alter the parsing flow

As noted earlier, for the least intrusive change, I am currently inclined in favor of further arguments (caller knows if the context deals with possible snapshots or strictly "live" dataset names), maybe wrapped as separate method names for the two use-cases and minimal syntax changes for callers.

Not sure if such refinement has to be part of this PR, or if you deem this one as an improvement with its own merit already (less-broken state of codebase) and it can be merged as a stepping-block to someone (else) making and diligently testing that change. Coding the change does not feel that hard, but testing and chizeling would likely need more time than I could currently share.

jimklimov commented 5 months ago

It may also be worthwhile to use ZnapZend::Config definitions of the splitting method for strings coming from the config, as opposed to those generally juggled by ZnapZend::ZFS. OOP-wise, these strings reflect sort of different object classes. But other than such an idea, I have not much more about implementing something of the sort.

jimklimov commented 5 months ago

Cheers, is the update okay? :)