oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
608 stars 137 forks source link

znapzend stops to create snaps on one pool #392

Closed mpicjohn closed 5 years ago

mpicjohn commented 5 years ago

I have a file server here with two pool (ssd_pool and fast_pool). Suddenly znapzend stopped to create znapshots (locally) for one of the pool (the other one works as exspected. Znapzend runs as a service on OpenIndiana.

Starting command line:

/opt/znapzend/bin/znapzend --daemonize --pidfile=/dev/null --autoCreation --connectTimeout=200 --logto=/var/log/znapzend.log --loglevel=debug --features=recvu

Znapzend configuration for affected pool:

`/opt/znapzend/bin/znapzendzetup list ssd_pool

backup plan: ssd_pool dst_1 = root@zfs-backup-1-binf:data_pool/zfs-mirror/cluster-filer/ssd_pool dst_1_plan = 3weeks=>1day,3months=>1week enabled = on mbuffer = /usr/bin/mbuffer mbuffer_size = 4G post_znap_cmd = off pre_znap_cmd = off recursive = on src = ssd_pool src_plan = 1day=>1hour,14days=>1day tsformat = %Y-%m-%d-%H%M%S zend_delay = 0`

Relevant log entries: [Thu Nov 1 08:00:00 2018] [debug] snapshot worker for slow_pool spawned (21495) [Thu Nov 1 08:00:00 2018] [info] creating recursive snapshot on slow_pool [Thu Nov 1 08:00:00 2018] [debug] snapshot worker for ssd_pool/reserved_space spawned (21497) [Thu Nov 1 08:00:00 2018] [info] creating snapshot on ssd_pool/reserved_space [Thu Nov 1 08:00:00 2018] [debug] snapshot worker for ssd_pool spawned (21499) [Thu Nov 1 08:00:00 2018] [info] creating recursive snapshot on ssd_pool [Thu Nov 1 08:00:00 2018] [debug] snapshot worker for ssd_pool/reserved_space done (21497) [Thu Nov 1 08:00:00 2018] [debug] send/receive worker for ssd_pool/reserved_space spawned (21501) [Thu Nov 1 08:00:00 2018] [info] starting work on backupSet ssd_pool/reserved_space [Thu Nov 1 08:00:00 2018] [debug] sending snapshots from ssd_pool/reserved_space to root@zfs-backup-1-binf:data_pool/zfs-mirror/cluster-filer/ssd_pool [Thu Nov 1 08:00:00 2018] [warn] ERROR: snapshot(s) exist on destination, but no common found on source and destination clean up destination root@zfs-backup-1-binf:data_pool/zfs-mirror/cluster-f iler/ssd_pool (i.e. destroy existing snapshots) [Thu Nov 1 08:00:00 2018] [warn] ERROR: suspending cleanup source dataset because at least one send task failed [Thu Nov 1 08:00:00 2018] [info] done with backupset ssd_pool/reserved_space in 0 seconds [Thu Nov 1 08:00:00 2018] [debug] send/receive worker for ssd_pool/reserved_space done (21501) [Thu Nov 1 08:00:00 2018] [warn] taking snapshot on ssd_pool failed: ERROR: cannot create snapshot ssd_pool@2018-11-01-080000 [Thu Nov 1 08:00:00 2018] [debug] snapshot worker for ssd_pool done (21499) [Thu Nov 1 08:00:00 2018] [debug] send/receive worker for ssd_pool spawned (21505) [Thu Nov 1 08:00:00 2018] [info] starting work on backupSet ssd_pool

Obviously the line: ERROR: cannot create snapshot ssd_pool@2018-11-01-080000 shows the problem

When running manually (--runonce) I get the following in the logfile:

taking snapshot on ssd_pool failed: ERROR: cannot create snapshot ssd_pool@2018-11-01-091559 [Thu Nov 1 09:16:00 2018] [debug] snapshot worker for ssd_pool done (12246)

and at the command line:

cannot open 'ssd_pool@2018-11-01-091559': dataset does not exist

something is terribly screwed up, resulting in a broken backup....

Any help is really appreciated.

thx

Carsten

moetiker commented 5 years ago

what is the shell output if you run? /opt/znapzend/bin/znapzend -d --runonce=ssd_pool

mpicjohn commented 5 years ago

Just recreated the configuration for the pool with znapzendzetup. Seems to work now. Snaps are done and send to remote server. Perhaps I had messed around with the recursive setting of some datasets in the past to fix a similar hang. Need to wait now for the run to complete (the two pools use some 190TB).

BTW, what is the exact meaning of "recursive"? Does it only affect the creation of the config, or the actual processing of the snaps?

moetiker commented 5 years ago

It means all "sub" zfs of the zfs with the znapzend plan are snapshoted as well with the same plan.

mpicjohn commented 5 years ago

... regardless of specific settings in one of the "subs", right?

mpicjohn commented 5 years ago

OK, after deleting all configurations for each sub-zfs individually, and recreating it recursively from the root zfs, everything is back to normal now. Sorry for the noise...