yboetz / pyznap

ZFS snapshot tool written in python
GNU General Public License v3.0
197 stars 35 forks source link

Uncaught IndexError exception #104

Open namniav opened 8 months ago

namniav commented 8 months ago

Today I found this log:

/var/log/pyznap.log.1

[...]
Jan 01 00:00:01 INFO: Starting pyznap...
[...]
Jan 01 00:25:11 INFO: Deleting snapshot rpool/var/lib/docker/wbuu1gj1afus39gfy6arms4sx@pyznap_2023-12-21_02:11:01_monthly...
[...]

/var/log/pyznap.log

Jan 01 00:15:01 INFO: Starting pyznap...
[...]
Jan 01 00:25:11 INFO: Deleting snapshot rpool/var/lib/docker/wbuu1gj1afus39gfy6arms4sx@pyznap_2023-12-21_02:11:01_monthly...
Traceback (most recent call last):
File "/opt/pyznap/venv/bin/pyznap", line 8, in <module>
sys.exit(main())
File "/opt/pyznap/venv/lib/python3.10/site-packages/pyznap/main.py", line 170, in main
return _main()
File "/opt/pyznap/venv/lib/python3.10/site-packages/pyznap/main.py", line 115, in _main
clean_config(config)
File "/opt/pyznap/venv/lib/python3.10/site-packages/pyznap/clean.py", line 165, in clean_config
clean_filesystem(child, conf)
File "/opt/pyznap/venv/lib/python3.10/site-packages/pyznap/clean.py", line 85, in clean_filesystem
clean_snap(snap)
File "/opt/pyznap/venv/lib/python3.10/site-packages/pyznap/clean.py", line 33, in clean_snap
snap.destroy()
File "/opt/pyznap/venv/lib/python3.10/site-packages/pyznap/pyzfs.py", line 249, in destroy
check_output(cmd, ssh=self.ssh)
File "/opt/pyznap/venv/lib/python3.10/site-packages/pyznap/process.py", line 105, in check_output
ret.check_returncode()
File "/opt/pyznap/venv/lib/python3.10/site-packages/pyznap/process.py", line 56, in check_returncode
match = re.search(pattern, self.stderr.splitlines()[0])

It happened when two cron jobs was running at the same time. Before the first job managed to clean a lot of snapshots of docker containers/volumes which need long time, next job started deleting snapshots at the same time.

I don't know why the zfs command of second job failed with empty stderr. I checked syslog but no related log. I have been using pyznap for about 1.5 year with no error except this one.

It can be prevented by modifying /etc/cron.d/pyznap to use flock: flock -w 600 /var/lock/pyznap.lock /opt/pyznap/venv/bin/pyznap snap >> /var/log/pyznap.log 2>&1.