oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
608 stars 138 forks source link

Problem: failed send cancels cleanup, hard to debug/troubleshoot #447

Closed jimklimov closed 4 years ago

jimklimov commented 4 years ago

Solution: repeat the warnings/errors logged when flagging a send as failed in the end, to help find the needles in the haystask. Reuse the previously booleanish flag $sendFailed as an array of such messages (usually one line = one sending error) @sendFailed.

Signed-off-by: Jim Klimov jimklimov@gmail.com

coveralls commented 4 years ago

Coverage Status

Coverage increased (+1.09%) to 89.822% when pulling 6dbc1e1c26b9473a40fd19bb1e8cbeb397908db4 on jimklimov:troubleshoot-failed-send into aed9c61325103b377a7f4996b612586a190d02b8 on oetiker:master.

jimklimov commented 4 years ago

The report (as seen in Travis at least) resulted in a strange markup:

[Thu Oct 24 12:05:38 2019] [warn] ERROR: suspending cleanup source dataset because 3 send task(s) failed:\ndestination 'backup/tank/anothersource' does not exist or is offline. ignoring it for this round...\n\tdestination 'root@remote:remote/tank/anothersource' does not exist or is offline. ignoring it for this round...\n\tdestination 'backup/destfail' does not exist or is offline. ignoring it for this round...\n

probably some qw inside? Should I change this to a loop of warn() calls for each array element?

jimklimov commented 4 years ago

Ok, with a loop it looks better:

[Thu Oct 24 12:54:26 2019] [warn] ERROR: suspending cleanup source dataset because 3 send task(s) failed:
[Thu Oct 24 12:54:26 2019] [warn]  +-->   destination 'backup/tank/anothersource' does not exist or is offline. ignoring it for this round...
[Thu Oct 24 12:54:26 2019] [warn]  +-->   destination 'root@remote:remote/tank/anothersource' does not exist or is offline. ignoring it for this round...
[Thu Oct 24 12:54:26 2019] [warn]  +-->   destination 'backup/destfail' does not exist or is offline. ignoring it for this round...
[Thu Oct 24 12:54:26 2019] [info] done with backupset tank/anothersource in 1 seconds

or (hey, typo!)

[Thu Oct 24 12:54:41 2019] [warn] ERROR: suspending cleanup source dataset because 2 send task(s) failed:
[Thu Oct 24 12:54:41 2019] [warn]  +-->   skipping backup/destinationdue to pre-command failure
[Thu Oct 24 12:54:41 2019] [warn]  +-->   destination 'backup/destfail' does not exist or is offline. ignoring it for this round...

or

[Thu Oct 24 12:55:01 2019] [warn] ERROR: suspending cleanup source dataset because 2 send task(s) failed:
[Thu Oct 24 12:55:01 2019] [warn]  +-->   ERROR: cannot send snapshots to backup/destination
[Thu Oct 24 12:55:01 2019] [warn]  +-->   destination 'backup/destfail' does not exist or is offline. ignoring it for this round...
[Thu Oct 24 12:55:01 2019] [info] done with backupset tank/source in 3 seconds
jimklimov commented 4 years ago

Cleaned up (hopefully) the rebase of many masters and PR iterations of the day... now.

jimklimov commented 4 years ago

I guess soon it will be hard for PRs to not break coverage percentage :-)

oetiker commented 4 years ago

thanks

jimklimov commented 4 years ago

Welcome! :)