oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
607 stars 137 forks source link

Optionally email the admin if a send task failed and we suspend the source cleanup #499

Closed jimklimov closed 3 years ago

jimklimov commented 4 years ago

A recurrent problem at one of sites I help manage is that due to whatever reasons, they occasionally lose connection to the NAS which hosts a backup pool, so on the server side this pool becomes read-only. Recovery varies on situation, but is something that an admin should do anyway - until then, regular snapshots of the original system accumulate on the storage until zfs performance collapses due to scarce free space fragmentation. One way out of this practical problem is to know about it in time, and untangle the problems with the destination pool before overrunning the source pool impacts the users. Of course there may be proper reactive monitoring setups and syslog diggers and stuff, but the actual subsystem reporting a problem (and in a way causing it) is also a good place to proactively inform the admins.

Earlier I extended the "cleanup source" error handling to (re-)list all problematic datasets in one place when we skip the cleanup (and not just state we had a problem) with the text ending up on console or in syslog, at https://github.com/oetiker/znapzend/blame/1caafdbb48b807ab8d4c2539f7987ea0477af9c5/lib/ZnapZend.pm#L458 - so for me the "obvious idea" is to add an optional CLI argument for admin e-mail, and if that is configured - try to use system mailx or some such CLI utility, and submit essentially the same text. Then it is up to local mail submission program and relaying setup (outside znapzend concerns and complexity) to deliver this somewhere if it should end up in a mailbox not on the same host. (in other words, I do not want to teach znapzend to bother about networked SMTP, AUTH, STARTTLS and whatever comes with e-mail these days).

Are there any apparent issues with such a plan?

jimklimov commented 4 years ago

Saw #221 after posting this suggestion but after re-reading that PR, not sure it would be a duplicate :)

PR #221 seems to add ability to log everything emitted into a file or pipe, same as done for console (stderr/stdout) or syslog, and #499 concerns about alerting to fail to send and so refusing to clean up origin (which may lead to it getting too full).

jimklimov commented 3 years ago

Goal achieved by PR above (and a follow-up fix in #524) :)