openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.58k stars 1.75k forks source link

ZFS 'just' hangs #6725

Closed cehteh closed 4 years ago

cehteh commented 7 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version 9.1
Linux Kernel 4.13.4
Architecture x86_64
ZFS Version 0.7.0-97_g5f88d2c8a (that is zfs with encryption from tcaputi, with cytrinox's work on debian packaging github.com/cytrinox/zfs )
SPL Version 0.7.0-15_g275146c ditto see above

Describe the problem you're observing

ZFS 'just' hang. As in no operations where possible, no (kernel-) threads done any work, no disk I/O. Issuing the command line tools (zpool / zfs / zdb) stuck as well. The Kernel threads eventually got a 'hung task timeout' warning in the kernel. logs and dmesg show nothing (except the hung task info).

Describe how to reproduce the problem

I had a rather complex situation here, likely not (easily) reproducible. Eventually I rebooted the system. I am now trying to reproduce the problem.

A list and annotations about what was going on:

Trying to reproduce with no success so far:

pool configuration:

pool: data state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Oct 4 14:58:57 2017 6,84T scanned out of 16,3T at 13,9M/s, 198h44m to go 1,37T resilvered, 41,89% done config:

NAME                                               STATE     READ WRITE CKSUM
data                                               DEGRADED     0     0     0
  raidz2-0                                         DEGRADED     0     0     0
    replacing-0                                    DEGRADED     0     0     0
      /root/spare1                                 OFFLINE      0     0     0
      md-uuid-97addc40:1ac606c6:bbc357af:9c81950d  ONLINE       0     0     0  (resilvering)
    /root/spare2                                   OFFLINE      0     0     0
    sdf                                            ONLINE       0     0     0
    sdg                                            ONLINE       0     0     0
    sdh                                            ONLINE       0     0     0
logs
  mirror-1                                         ONLINE       0     0     0
    nvme0n1p5                                      ONLINE       0     0     0
    nvme1n1p5                                      ONLINE       0     0     0
cache
  nvme0n1p6                                        OFFLINE      0     0     0
  nvme1n1p6                                        OFFLINE      0     0     0

Include any warning/errors/backtraces from the system logs

yeah, sorry, no nothing logged

note: Looks to me like some rare race/deadlock problem under high load.

gmelikov commented 4 years ago

Closed as stale.