psy0rz / zfs_autobackup

ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. Easy to use and very reliable.
https://github.com/psy0rz/zfs_autobackup
GNU General Public License v3.0
601 stars 63 forks source link

bug: integer overflow in fail_count signals backup success in spite of fatal errors #153

Closed oddlama closed 2 years ago

oddlama commented 2 years ago

Due to a bad remote pool configuration I recently had the pleasure of failing an autobackup of exactly 256 datasets. This resulted in fail_count = 256 in ZfsAutobackup. When this was passed to sys.exit() it is equivalent to sys.exit(0) due to an integer overflow (exit codes in linux are stored as a uint8_t).

I'd argue against returning the number of failed snapshots via the exit code, as this will report wrong information if many datasets are involved (anything >=255 seems problematic). It might be better to present parseable information in the output in such a case and dedicate an exit code (2 ?) for the general failure condition.

psy0rz commented 2 years ago

thanks! ill fix it right away. usually i send the exit code to my zabbix server.

errors are send to stderr: so when everything goes perfectly there should be no output, maybe you can use that?

oddlama commented 2 years ago

Thanks for the swift response and fix! Don't worry about my specific case, that was a one-off error caused by me anyway. I was just wondering why the error didn't trigger an email from the source host (which I send based on failed exit code), I personally don't actually need to know the amount of failed datasets. Still glad that it's fixed now :)

psy0rz commented 2 years ago

I know, but i treat it as very a serious bug: an exit 0 should be 100% reliable.

Correct and reliable error handling in zfs-autobackup has the highest priority, as it should be in a backup tool.