Closed laf070810 closed 1 week ago
Closing this issue as it seems to be a device-specific problem. The issue disappeared after banning a specific NFS client, though we have no idea about what's wrong with that machine.
[<0>] __cv_timedwait_common+0x12d/0x170 [spl]
[<0>] __cv_timedwait_io+0x15/0x20 [spl]
[<0>] zio_wait+0x130/0x290 [zfs]
[<0>] dmu_buf_hold+0x5f/0x90 [zfs]
[<0>] zap_lockdir+0x4e/0xc0 [zfs]
Looks like it is waiting for some disk read, but can't say why it stuck somewhere. When you was saying "device-specific", my first though was about your storage device, not client.
System information
Describe the problem you're observing
Server A is the NFS server sharing a ZFS pool. Server B, C, D, ... are NFS clients mounting the share from server A. If the NFS clients have some loads accessing the NFS share, the
updatedb
and some or all of thenfsd
processes on the NFS server may hang in D state forever and some or all of the clients may hang forever on any command involving the NFS share, even when the NFS mount is a soft mount rather than a hard mount. I guess the cause might be some deadlock in ZFS like #11003 .Describe how to reproduce the problem
No exact condition that can reproduce the problem is found. But high IO load of the NFS is suspected. And this issue has happened several times in our cases.
Include any warning/errors/backtraces from the system logs
The
updatedb
and somenfsd
processes are hung in D state:The stack trace of the hung processes: