jimklimov commented 5 months ago

Includes PRs #619 (see for most context of the initial problem) and #620 (so also addresses spellchecker passing after these changes).

After some tinkering with the splitter logic vs. conjured-up test cases, just one case remained where we can only define our preference (at least, lacking some other method arguments or wholly a new class to convey the information explicitly): how to treat X@Y:Z strings - are they a remote user@host:rootds or a local rootds@funny:snapname?

So the self-test defines this one use-case as a "bogus" one with somewhat reasonably "hard-coded" specific expectations:

NOTE: In absence of any hints we can not reliably discern below a task='poolrootfs@snap-2:3' vs. task='username@hostname:poolrootfs' which one is a local pool's root dataset with a funny but legal snapshot name, and which one is a remote user@host spec with a remote pool's root dataset. For practical purposes, we proclaim preference for the former: we are more likely to look at funny local snapshot names, than to back up to (or otherwise care about) remote pools' ROOT datasets.

...and so the self-tests for the parser are enabled as part of standard test runs here.

Also this uncovered deficiencies in ZnapZend::Config variant of the parser which got fixed now by the regex proposed in #585 (and improved upon here for ZnapZend::ZFS). It does not have to care about snapshots, so is a lot simpler.

github-actions[bot] commented 5 months ago

@check-spelling-bot Report

Unrecognized words, please review:

a'u
aaaa
aaaf
aab
aabee
aabf
aac
aacba
aacc
aacea
aacf
aad
aaddeab
aadf
aae
aaea
aaeaececace
aaec
aaed
aaf
aafd
aaff
abbae
abcc
abcd
abdbae
abe
abeda
abee
abf
abfc
abff
aca
acaa
acaafa
acab
acaf
acb
acbba
acbce
acbef
accabf
accb
accbf
accd
acd
acdd
acddf
acea
acebe
acecd
acf
ada
adab
adacfa
adb
adbaa
adbb
adbc
adbe
adcbbd
adcce
addc
addes
adedd
adeea
adf
adfd
aea
aeaba
aeabccdf
aeac
aeaefea
aeb
aebcb
aebdf
aec
aeca
aecf
aed
aeddafc
aee
aeeecb
aef
aefb
aefbba
afad
afafb
afafba
afbb
afbd
afc
afcabf
afcb
afdcfd
afe
affaff
albundy
amd
amoser
antoneliasson
Aqs
arglist
asciidoc
assignements
atime
atj
autocreation
baaa
babc
bace
badb
baf
bafdc
bafe
baffde
bashism
bashrc
bba
bbac
bbb
bbba
bbbdf
bbbe
bbc
bbcb
bbce
bbd
bbdb
bbdc
bbdeb
bbe
bbeb
bbf
bbfc
bbfe
bbff
bca
bcabc
bcadf
bcaf
bcafdd
bcb
bcc
bccaed
bccb
bcccb
bccd
bcda
bcdd
bceb
bcecb
bcee
bcfccc
bda
bdaa
bdacb
bdaebf
bdaf
bdb
bdbbf
bdc
bdce
bdcf
bdd
bdda
bddcf
bddd
bdeff
bdf
bdfcfe
bdfd
bdfdd
bdfece
bea
beabbdd
beb
bebb
beced
bedbf
bedc
beddf
beeb
beec
beed
befa
befbaa
Benter
bentertain
bfa
bfab
bfad
bfb
bfbb
bfbf
bfbfc
bfc
bfca
bfcc
bfce
bfd
bfdfe
bfdff
bfe
bff
bffe
bfff
blockquotes
booleanish
bossert
buffersize
bugfix
bugfixes
buildpackage
bulletpoints
caa
caab
cabf
cac
cacb
cacbb
cadb
caddb
cae
caebfe
caee
cafc
cafcfab
cafdd
cafddfe
cafeccaeb
cange
canmount
cba
cbaac
cbae
cbaec
cbb
cbbd
cbbe
cbc
cbd
cbdc
cbdd
cbe
cbea
cbeb
cbee
cbefa
cbefc
cbfc
cbfd
cbfe
cca
ccab
ccac
ccae
ccba
ccbe
ccc
cccd
cccf
ccd
ccdb
ccdc
ccdcc
ccdfd
cce
cced
ccf
ccff
ccffef
cda
cdaaa
cdab
cdad
cdaeb
cdb
cdbcdeea
cdbfe
cdc
cdcb
cdce
cdd
cdda
cddb
cddd
cdde
cddf
cde
cdec
cdee
cdefe
cdfaafc
cdfe
cdff
cdffaed
cea
ceaf
ceafe
ceb
cec
ced
cedde
ceeb
ceebab
ceecf
ceef
cef
cefae
cefba
cfa
cfabe
cfafde
cfb
cfbd
cfbf
cfc
cfcbcfd
cfe
cfed
cff
cffc
cfff
changelog
chrigel
chrisridd
christo
chroot
cmdfail
cmds
codebase
concating
conmplete
coprs
copypasted
coredumps
cpanminus
crfl
crlf
cron
CVS
daa
daad
daae
daba
dabc
dabf
dac
daccb
dadb
dadc
daded
dadfbd
daed
daf
dafb
dafeeee
datarootdir
datastream
dba
dbab
dbacc
dbacf
dbad
dbb
dbba
dbbb
dbbff
dbc
dbca
dbcb
dbcbcc
dbce
dbdbe
dbdd
dbe
dbecd
dbf
dbfa
dbfb
dbfcdacf
dcaa
dcac
dcafee
dcbe
dcc
dccba
dccc
dcccbb
dccdcc
dccf
dcd
dcda
dcdaa
dcdc
dcdcae
dce
dceb
dcebde
dcee
dcf
dcfcdf
ddad
ddae
ddb
ddc
ddcc
ddcf
ddd
dddacf
dddb
dddd
ddddfef
dddf
dde
ddeb
ddec
ddf
ddfaddc
ddfd
ddfe
deaa
deaab
deac
debb
debd
debhelper
decf
decfcd
ded
dede
dedup
deeaefda
defbf
defcf
deffd
definedness
dependecy
dependeny
deref
destorying
dfa
dfaa
dfacff
dfb
dfbde
dfc
dfca
dfcd
dfd
dfda
dfde
dfe
dfeba
dfec
dfed
dfee
dff
dffd
dglushenok
dickson
dirs
DISTRIBUTIONNAME
docu
dominik
domnik
dpkg
dse
DTDs
Dungen
dylan
eaa
eaab
eab
eabe
eabedbaeacc
eac
eacaf
eade
eae
eaedb
eaf
eafae
eafd
eafff
eba
ebad
ebbad
ebbb
ebbc
ebbf
ebbfe
ebc
ebca
ebcb
ebd
ebdbcfc
ebdd
ebe
ebea
ebebe
ebf
ebfe
eca
ecaac
ecae
ecbc
ecc
eccc
ecd
ecdceef
ecde
ecdsa
ece
eced
eceeb
ecf
eda
edaad
edac
edaf
edb
edbaac
edbc
edbd
edbe
edc
edcfa
edd
eddac
eddec
eddfe
ede
edeba
edec
ededad
edede
edef
edf
edfa
edffb
edffd
edouard
edu
eea
eeabba
eeac
eead
eeaf
eeb
eebae
eec
eeca
eecaa
eecbcccc
eed
eedb
eedbe
eede
eee
eeea
eeebfcc
eeefc
eef
eefb
efa
efaa
efac
efadde
efaf
efb
efbade
efbd
efbdc
efbf
efc
efcbf
efcd
efd
efdd
efddbf
efe
efebb
effc
effec
Eliasson
Elmar
ENOMEM
environement
eol
epruesse
erroring
extist
faa
faab
faac
faad
fabbc
fabd
fabdbf
fabe
fabeffc
faca
fadfb
fadfde
faeb
faecf
faef
faf
fafd
failsafes
fba
fbae
fbb
fbba
fbbb
fbc
fbca
fbcb
fbcf
fbd
fbde
fbdffce
fbe
fbea
fbed
fbf
fbfc
fbfcaa
fca
fcaba
fcb
fcbee
fcbf
fcc
fccb
fccd
fcd
fcddf
fce
fceba
fcec
fcef
fcf
fcff
fda
fdadb
fdadf
fdae
fdb
fdbbc
fdbd
fdc
fdcbeca
fdcc
fdd
fddc
fddccfab
fddea
fddfda
fde
fdeb
fdeccfcf
fdef
fdefcb
fdf
fdfb
fdfbf
fdfc
fea
fead
feaf
feb
fecbcc
fedc
fedd
fede
fedoraproject
feeaf
feedad
feeeeb
feefb
fefd
fefdcbfa
ffaee
ffb
ffbf
ffbfa
ffc
ffca
ffcc
ffcdc
ffd
ffdb
ffde
ffdea
ffdf
ffdfba
ffe
ffea
ffed
fff
fffaf
fffcdbef
fffd
fffe
flaged
flixman
freenode
FRONTEND
frubar
generatable
ghanima
grantwwu
greggbg
griffith
guyz
HAARG
hardcoded
hardcoding
hashpointers
haystask
healthian
homedir
hotfix
howtogeek
I'u
ico
iki
ilm
implem
incrementals
informatique
inhmode
initialising
instanciating
invokations
irc
issuecomment
jamesmarsh
jenkins
jimkilmov
JMo
jsoref
justinscholz
karssen
kauffman
keygen
kngnt
Kuzmarski
lauri
lckarssen
Lennart
leoj
logbias
lotheac
machanics
Makefiles
malc
manpade
manuel
metaworx
morphsen
nahall
nameing
noreply
Nyman
Oostendorp
oss
OSX
parseable
Phlogi
png
pobox
polyomica
poolname
prebuild
previosuly
primarycache
Proxmox
Pruesse
pullrequests
pulsewidth
rageltman
rbash
rczei
READNE
refactor
refquota
refreservation
regen
regexes
regexp
remotehost
renard
repoen
respinn
Ridd
rsa
rsync
rueegg
rwilkey
schould
secondarycache
sempervictus
shapshot
shaun
shess
shlibs
simplifie
smv
snaptime
snyman
softprops
somecommand
Soref
spashot
spellcheck
spellchecker
spikings
stringify
subdir
subfolder
substvars
supress
svn
sylvain
symlinking
syslogstyle
testbird
testmode
Thu
timewarp
Tirkkonen
tisc
tiscarabee
trialen
truobleshooting
tuxera
uchicago
unneccessary
usecase
useradd
usermod
whitelist
Wiedenroth
wiedi
wip
workaround
wouter
xtrue
yandex
Zends
zet
zfsonlinux
zie
znapdest
znapzendztats
ztatz

Previously acknowledged words that are now absent

aix Autotools bashisms CBuilder Cwd cygwin DBD ev Fcntl fh forkcall gh Gregy gz Ip JB JBERGER LEONT Mkbootstrap nf nh oi Pipely qq qw RCAPUTO README rr rw SUBDIRS SZ Ubuntu ve VOS wu wx xargs xf yy ZL

Some files were were automatically ignored

These sample patterns would exclude them: ``` ^AUTHORS$ ^debian/znapzend\.links\.in$ ``` You should consider adding them to: ``` .github/workflows//spelling/excludes.txt ``` File matching is via Perl regular expressions. To check these files, more of their words need to be in the dictionary than not. You can use `patterns.txt` to exclude portions, add items to the dictionary (e.g. by adding them to `allow.txt`), or fix typos.

To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [null](null) repository on the `master` branch: ``` update_files() { perl -e ' my @expect_files=qw('".github/workflows//spelling/whitelist.txt"'); @ARGV=@expect_files; my @stale=qw('"$patch_remove"'); my $re=join "|", @stale; my $suffix=".".time(); my $previous=""; sub maybe_unlink { unlink($_[0]) if $_[0]; } while (<>) { if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; } next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print; }; maybe_unlink($previous);' perl -e ' my $new_expect_file=".github/workflows//spelling/whitelist.txt"; use File::Path qw(make_path); use File::Basename qw(dirname); make_path (dirname($new_expect_file)); open FILE, q{<}, $new_expect_file; chomp(my @words = ); close FILE; my @add=qw('"$patch_add"'); my %items; @items{@words} = @words x (1); @items{@add} = @add x (1); @words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items; open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; }; close FILE; system("git", "add", $new_expect_file); ' (cat '.github/workflows//spelling/excludes.txt' - < '.github/workflows//spelling/excludes.txt.temp' && mv '.github/workflows//spelling/excludes.txt.temp' '.github/workflows//spelling/excludes.txt' } comment_json=$(mktemp) curl -L -s -S \ --header "Content-Type: application/json" \ "https://api.github.com/repos/oetiker/znapzend/issues/comments/1881836709" > "$comment_json" comment_body=$(mktemp) jq -r .body < "$comment_json" > $comment_body rm $comment_json patch_remove=$(perl -ne 'next unless s{^(.*)

$}{$1}; print' < "$comment_body") patch_add=$(perl -e '$/=undef; $_=<>; s{

.*}{}s; s{^#.*}{}; s{\n##.*}{}; s{(?:^|\n)\s*\*}{}g; s{\s+}{ }g; print' < "$comment_body") should_exclude_patterns=$(perl -e '$/=undef; $_=<>; exit unless s{(?:You should consider excluding directory paths|You should consider adding them to).*}{}s; s{.*These sample patterns would exclude them:}{}s; s{.*\`\`\`([^`]*)\`\`\`.*}{$1}m; print' < "$comment_body" | grep . || true) update_files rm $comment_body git add -u ```

oetiker commented 5 months ago

how about assuming case user@host:destds unless user actually is a local rootds ?

jimklimov commented 5 months ago

how about assuming case user@host:destds unless user actually is a local rootds ?

For practical purposes, we proclaim preference for the former: we are more likely to look at funny local snapshot names, than to back up to (or otherwise care about) remote pools' ROOT datasets.

Yep, so we declare that user@host:destds means local pool named user (and its root dataset), and its snapshot host:destds.

oetiker commented 5 months ago

what I mean is that the code could actually test(!) if a given string actually is a local root dataset

jimklimov commented 5 months ago

Ah, must have misread your earlier comment. Yes, I've had such thoughts too - just did not get to implementing them :)

Instead, wondered whether we should be concerned about:

reliability:
- do we have some context that such pool does not exist currently so we can not reliably check if it is a pool?
- e.g. a pluggable HDD/flash seems likely; rotation of several normally-vaulted media devices is a good practice for redundancy
- pre-emptive setup to create a pool (more so via znapzend) seems unlikely
- can then check if the host part can be resolved, but for systems with intermittent networking (VPN?) this may also not be bullet-proof
a chicken-and-egg problem:
- our generic ZFS calls IIRC use this helper method to decide if they should go over SSH :)
- should we go for custom call-out to a local zfs tool and/or special tweaks to the common method?

On the opposite side, where can such string with user@host:rootds meaning a remote user at a remote host pop up? Only in DST schedules, or something else? We can just document that backing up to root datasets of remote pools is not supported, a child dataset tree for this backed-up environment is recommended (so the same target can store backups of other systems), and that's it... right?

oetiker commented 5 months ago

you are right this might be fragile ... another idea is this:

we know the actual src and destination configurations ... they do not include snapshots, so at that time things are still clear ...

so I guess we just need to alter the parsing flow

jimklimov commented 5 months ago

I guess we just need to alter the parsing flow

As noted earlier, for the least intrusive change, I am currently inclined in favor of further arguments (caller knows if the context deals with possible snapshots or strictly "live" dataset names), maybe wrapped as separate method names for the two use-cases and minimal syntax changes for callers.

Not sure if such refinement has to be part of this PR, or if you deem this one as an improvement with its own merit already (less-broken state of codebase) and it can be merged as a stepping-block to someone (else) making and diligently testing that change. Coding the change does not feel that hard, but testing and chizeling would likely need more time than I could currently share.

jimklimov commented 5 months ago

It may also be worthwhile to use ZnapZend::Config definitions of the splitting method for strings coming from the config, as opposed to those generally juggled by ZnapZend::ZFS. OOP-wise, these strings reflect sort of different object classes. But other than such an idea, I have not much more about implementing something of the sort.

jimklimov commented 5 months ago

Cheers, is the update okay? :)

oetiker / znapzend

Fix parser for root (only) dataset names #621

@check-spelling-bot Report

Unrecognized words, please review: