oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
608 stars 137 forks source link

mbuffer watchdog timer too short #341

Closed nahall closed 6 years ago

nahall commented 6 years ago

Recently the server that I was using znapzend to send snapshots to over ssh was down for a few days. After it came back online its snapshots continued to be behind even after several days.

I ran znapzend with the "-d" option and found in the logs:

mbuffer: error: watchdog timeout: output stalled; sending SIGINT
Assertion failed: err == 0, file mbuffer.c, line 719
mbuffer: error: watchdog timeout: output stalled; sending SIGINT
mbuffer: error: watchdog timeout: output stalled; sending SIGINT
mbuffer: error: watchdog timeout: output stalled; sending SIGINT
mbuffer: error: watchdog timeout: output stalled; sending SIGINT
mbuffer: warning: error during output to <stdout>: canceled
cannot receive incremental stream: checksum mismatch or incomplete stream
cannot receive: failed to read from stream
warning: cannot send 'zones/data/backups@2018-04-24-000500': signal received
warning: cannot send 'zones/data/backups@2018-04-24-001000': Broken pipe
....

While it was running I could see snapshots coming over but mbuffer's watchdog timer was being triggered and cancelling the datastream. I edited znapzend's code to change the "-w" option passed to mbuffer from 60 to 600, which was enough time that it was able to complete the transfers of the snapshots and get caught up.

oetiker commented 6 years ago

neat ... could you make this into a PR please ?

nahall commented 6 years ago

Sure PR https://github.com/oetiker/znapzend/pull/342