Closed maniac0s closed 3 years ago
Do you think this has anything to do with znapzend.... the best way to find out is to run the commands that znapzend using manual in a terminal and check if you can reproduce the behavior...
znapzend -d --runonce=/pool/storage you can check the commands
PS: sure it's "znapzend -d --runonce=/pool/storage"? Because that gives "ERROR: filesystem /pool/storage does not exist". "--runonce=pool/storage" (without leading /) works however
Well, that was quicker than expected. I created the pool and the datasets new from scratch and it just vanished again:
root@backup:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
pool 2.10T 1.41T 104K /pool
pool/db 100G 100G 96K /pool/db
pool/storage 2T 2.00T 96K /pool/storage
root@backup:~# mount | grep pool
pool on /pool type zfs (rw,xattr,noacl)
pool/db on /pool/db type zfs (rw,xattr,noacl)
pool/storage on /pool/storage type zfs (rw,xattr,noacl)
After starting znapzend --runonce it disappeared again.
root@backup:~# mount | grep pool
pool on /pool type zfs (rw,xattr,noacl)
pool/db on /pool/db type zfs (rw,xattr,noacl)
zpool history shows nothing unusual either...
root@backup:~# zpool history pool
History for 'pool':
2019-09-16.12:21:50 zpool create pool raidz /dev/sda /dev/sdb
2019-09-16.12:22:43 zfs create pool/db
2019-09-16.12:22:48 zfs create pool/storage
2019-09-16.12:23:11 zfs set refreservation=100G pool/db
2019-09-16.12:23:11 zfs set refreservation=2.0T pool/storage
2019-09-16.12:23:11 zfs set refquota=2.0T pool/storage
2019-09-16.12:23:16 zfs set refquota=100G pool/db
Edit: Output of znapzend:
root@server:~# znapzend -d --runonce=pool/storage
[Mon Sep 16 12:25:18 2019] [info] znapzend (PID=15732) starting up ...
[Mon Sep 16 12:25:18 2019] [info] refreshing backup plans...
[Mon Sep 16 12:25:20 2019] [info] found a valid backup plan for pool/storage...
[Mon Sep 16 12:25:20 2019] [info] znapzend (PID=15732) initialized -- resuming normal operations.
[Mon Sep 16 12:25:20 2019] [debug] snapshot worker for pool/storage spawned (15884)
[Mon Sep 16 12:25:20 2019] [info] creating recursive snapshot on pool/storage
# zfs snapshot -r pool/storage@2019-09-16-122520
cannot create snapshot 'pool/storage@2019-09-16-122520': out of space
no snapshots were created
# zfs list -H -o name -t snapshot pool/storage@2019-09-16-122520
cannot open 'pool/storage@2019-09-16-122520': dataset does not exist
[Mon Sep 16 12:25:31 2019] [warn] taking snapshot on pool/storage failed: ERROR: cannot create snapshot pool/storage@2019-09-16-122520
[Mon Sep 16 12:25:31 2019] [debug] snapshot worker for pool/storage done (15884)
[Mon Sep 16 12:25:31 2019] [debug] send/receive worker for pool/storage spawned (17838)
[Mon Sep 16 12:25:31 2019] [info] starting work on backupSet pool/storage
# zfs list -H -r -o name -t filesystem,volume pool/storage
[Mon Sep 16 12:25:31 2019] [debug] sending snapshots from pool/storage to root@192.168.1.11:pool/storage
# zfs list -H -o name -t snapshot -s creation -d 1 pool/storage
# ssh -o batchMode=yes -o ConnectTimeout=30 root@192.168.1.11 zfs list -H -o name -t snapshot -s creation -d 1 pool/storage
# zfs send pool/storage@2019-08-29-210000|ssh -o batchMode=yes -o ConnectTimeout=30 'root@192.168.1.11' 'zfs recv -F pool/storage'
It says out of space but I don't know if that happens after the dataset disappeared or before?
I just recreated the whole pool once again and ran rsync on /pool/storage to the backup server instead of znapzend and there been no issues. So I guess that the issue is either in znapzend or in the underlying zfs send mechanism?
Didn't notice this one before. In most of the posts and "screenshots" you mention that it vanishes, but seemingly in the context of being mounted or not. It does not help that your source and backup pools seem named the same and used interchangeably in the message above ;)
If it remains in the ZFS dataset tree (per zfs list -r pool
and disk space usage, as seems to be the case in original post for the original server) then its unmounting but not deletion on the backup server may be due to zfs recv -u
flag being used to not-mount destination datasets after receiving, and maybe that ZFS has to roll destination back to latest snapshot to drop changes (e.g. small mess due to atime=on
by default) and receive an incremental snapshot (perhaps ZoL unmounts it to do so... and never remounts due to -u
).
Unmounting on the source server is not something I've seen on Solaris/illumos systems, even if a pool or dataset (quota'd) runs out of space and can not make snapshots nor delete files from a "live" dataset (if there are older snapshots referencing these files, and it tries to reassign blocks, and can't write that), but maybe ZoL is different in this regard and can unmount stuff on error.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
My storage server on Ubuntu 18.04.3 LTS was now running for half a year, quite stable. It's using zfs (raidZ) for the storage partition.
The server runs somewhat as a backup mirror to another server that uses znapzend to send over snapshots frequently. (I somewhat have the feeling the pool disappears when znapzend starts sending however why only the storage dataset and not the database, I don't know) Both servers should have the same zfs pool-setup if I remember correctly, including snapshots and quota. Both servers recently got a complete system update. The "main" server shows no issues with the datasets, they are not disappearing there.
The pools on the backup have refreservation and refquota set to keep enough space for snapshots.
After latest update of the whole Ubuntu system, pool/storage keeps disappearing from filesystem but is still listed in zfs.
My monitoring reports a few times a day pool/storage returning to the filesystem and then disappearing again:
Report from 5:21, dataset came back:
Disappears again at 6:07:
I don't see anything wrogng in zpool status either
What's going on here? What can I do to investigate into this behavior?