Closed saurabhnanda closed 3 years ago
I would suggest you to monitor the destination side with simple bash script which gets all zfs datasets and determines the latest snapshot exists. Then compare it with MAX age allowed and warn you if needed
This kind of solutions has more advantages than just relying on what sender side says (lies?)
If I'm not mistaken I've seen some kind of ZFS Zabbix templates with discovery and trigger that analyze datasets size. I'm pretty sure that template could be easily extended to store time (secs/mins/hours/days) passed sins the latest snapshot for each dataset
I've modified an existing Zabbix template to monitoring snapshot age: GabrieleV/zabbix-zfs-on-linux
ZFS-autobackup does very consistent and strict error reporting. Exit codes are also very reliable, check it out at: https://github.com/psy0rz/zfs_autobackup
There is also an example how to monitor it with Zabbix.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I'm building a Grafana dashboard to make sure that the following tasks are running as per schedule:
All znapzend logs are being pushed to a DB and I'm looking for the following patterns to ascertain whether these tasks were successful or not:
However, even if the send/receive fails, the following line is emitted to the logs:
send/receive worker for $DATASET done
, which makes detecting errors very difficult.The only way to detect errors via logs is to look for the following pattern:
But this is not ideal, because it will be unable to catch the case where znapzend doesn't even run.
This, coupled with https://github.com/oetiker/znapzend/issues/367 makes monitoring znapzend very difficult in production.