0.7.0-97_g5f88d2c8a (that is zfs with encryption from tcaputi, with cytrinox's work on debian packaging github.com/cytrinox/zfs )
SPL Version
0.7.0-15_g275146c ditto see above
Describe the problem you're observing
ZFS 'just' hang. As in no operations where possible, no (kernel-) threads done any work, no disk I/O.
Issuing the command line tools (zpool / zfs / zdb) stuck as well. The Kernel threads eventually got a 'hung task timeout' warning in the kernel. logs and dmesg show nothing (except the hung task info).
Describe how to reproduce the problem
I had a rather complex situation here, likely not (easily) reproducible. Eventually I rebooted the system. I am now trying to reproduce the problem.
A list and annotations about what was going on:
everything below on encrypted datasets/zvols
resilvering to a striped md devices (2x4TB disks as one 8TB)
git annex was running checksumming a media library
the zfs-auto-snapshot service was running but the zvol's are excluded
created a zvol
qemu-img convert from one old .qcow2 to a zvol
The first image conversion skyrocketed the system load, first one was at load 30-40 but eventually completed over night.
The next image brought the ZFS down, load gone up to 70+ and no progress/effects as
described above.
I've noticed that there was a lot pointless write load to the l2arc, i tried to offline them around the time the filesystem got stuck, load was already above 70, possibly the filesystem hang already.
Trying to reproduce with no success so far:
the high load condition with 'qemu-img' is reproducible
online/offline the cache devices a few times works.
pool configuration:
pool: data
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Oct 4 14:58:57 2017
6,84T scanned out of 16,3T at 13,9M/s, 198h44m to go
1,37T resilvered, 41,89% done
config:
System information
Describe the problem you're observing
ZFS 'just' hang. As in no operations where possible, no (kernel-) threads done any work, no disk I/O. Issuing the command line tools (zpool / zfs / zdb) stuck as well. The Kernel threads eventually got a 'hung task timeout' warning in the kernel. logs and dmesg show nothing (except the hung task info).
Describe how to reproduce the problem
I had a rather complex situation here, likely not (easily) reproducible. Eventually I rebooted the system. I am now trying to reproduce the problem.
A list and annotations about what was going on:
Trying to reproduce with no success so far:
pool configuration:
pool: data state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Oct 4 14:58:57 2017 6,84T scanned out of 16,3T at 13,9M/s, 198h44m to go 1,37T resilvered, 41,89% done config:
Include any warning/errors/backtraces from the system logs
yeah, sorry, no nothing logged
note: Looks to me like some rare race/deadlock problem under high load.