Closed gerardba closed 4 years ago
Attaching dmesg output of echo t > /proc/sysrq-trigger
a few hours after the zpool disks are not there anymore and zpool status
still thinks it's all good.
kea2.trace.gz
A few more occurrences and further investigation I think this may be related to https://github.com/zfsonlinux/zfs/issues/7057 and I suspect triggered by a flaky cable from the server to the NVMe JBOF (single port).
I see the following stuck for over 24 hours now:
#crash > bt 20331
PID: 20331 TASK: ffffa10e5e6b81c0 CPU: 65 COMMAND: "z_wr_iss"
#0 [ffffbd3ca42f7b58] __schedule at ffffffffa4683340
#1 [ffffbd3ca42f7be8] schedule at ffffffffa46837a2
#2 [ffffbd3ca42f7bf8] rwsem_down_write_failed at ffffffffa4686530
#3 [ffffbd3ca42f7c98] call_rwsem_down_write_failed at ffffffffa4679723
#4 [ffffbd3ca42f7cd8] down_write at ffffffffa4685d10
#5 [ffffbd3ca42f7ce0] dbuf_write_ready at ffffffffc0acb1ff [zfs]
#6 [ffffbd3ca42f7d18] arc_write_ready at ffffffffc0ac5efb [zfs]
#7 [ffffbd3ca42f7d68] zio_ready at ffffffffc0b7c7a7 [zfs]
#8 [ffffbd3ca42f7dc0] zio_execute at ffffffffc0b776e7 [zfs]
#9 [ffffbd3ca42f7de8] taskq_thread at ffffffffc05f7209 [spl]
#10 [ffffbd3ca42f7f10] kthread at ffffffffa409394c
#11 [ffffbd3ca42f7f50] ret_from_fork at ffffffffa48001f5
crash> bt 20693
PID: 20693 TASK: ffffa14e3cf44900 CPU: 64 COMMAND: "txg_sync"
#0 [ffffbd3ca56a3be0] __schedule at ffffffffa4683340
#1 [ffffbd3ca56a3c70] schedule at ffffffffa46837a2
#2 [ffffbd3ca56a3c80] io_schedule at ffffffffa409ee02
#3 [ffffbd3ca56a3c90] cv_wait_common at ffffffffc05fa9dc [spl]
#4 [ffffbd3ca56a3ce8] zio_wait at ffffffffc0b79e20 [zfs]
#5 [ffffbd3ca56a3d18] dsl_pool_sync at ffffffffc0afc248 [zfs]
#6 [ffffbd3ca56a3d80] spa_sync at ffffffffc0b18d37 [zfs]
#7 [ffffbd3ca56a3e38] txg_sync_thread at ffffffffc0b2ac36 [zfs]
#8 [ffffbd3ca56a3ee8] thread_generic_wrapper at ffffffffc05f61ff [spl]
#9 [ffffbd3ca56a3f10] kthread at ffffffffa409394c
#10 [ffffbd3ca56a3f50] ret_from_fork at ffffffffa48001f5
I even removed the disks from the system and the zpool is happily running on another server, but this server is totally stuck on the above kernel tasks and reporting as if the zpool was still available and healthy (as per zpool list and zfs list).
It seems like if a ZFS is stuck eternally trying to do a txg_sync_thread. I would expect it to realize that the disk is gone/failed and eventually return an error (and perhaps retry?), but instead it just hangs forever and with it the whole ZFS subsystem.
failmode=wait|continue|panic
Controls the system behavior in the event of catastrophic pool failure. This condition is typically a result of a loss of connectivity to the underlying storage device(s) or a failure of all devices within the pool.
The behavior of such an event is determined as follows:
wait Blocks all I/O access until the device connectivity is recovered and the errors are cleared. This is the default behavior.
continue Returns EIO to any new write I/O requests but allows reads to any of the remaining healthy devices. Any write requests that have yet to be committed to disk would be blocked.
panic Prints out a message to the console and generates a system crash dump.
My understanding is that for a different 'failmode' to help I'd have to see errors in zpool status right?
Right now status looks clean even though the disks have been gone for over 24h:
root@kea2:~# zpool status
pool: zpool
state: ONLINE
scan: scrub repaired 0B in 2h43m with 0 errors on Sun Aug 12 03:07:52 2018
config:
NAME STATE READ WRITE CKSUM
zpool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
nvme12n1 ONLINE 0 0 0
nvme13n1 ONLINE 0 0 0
nvme14n1 ONLINE 0 0 0
nvme15n1 ONLINE 0 0 0
nvme0n1 ONLINE 0 0 0
nvme1n1 ONLINE 0 0 0
nvme2n1 ONLINE 0 0 0
nvme3n1 ONLINE 0 0 0
nvme4n1 ONLINE 0 0 0
nvme5n1 ONLINE 0 0 0
nvme6n1 ONLINE 0 0 0
nvme7n1 ONLINE 0 0 0
nvme8n1 ONLINE 0 0 0
nvme9n1 ONLINE 0 0 0
nvme10n1 ONLINE 0 0 0
nvme11n1 ONLINE 0 0 0
errors: No known data errors
root@kea2:~# nvme list
root@kea2:~#
After upgrading to zfs 0.8.0-rc3_126_g5dbf8b4ed (compiled from master) I'm still seeing the same/similar issue on kernel 4.14.97. A fresh stack trace:
[ 4669.240201] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4669.247068] nfsd D 0 2742 2 0x80000000
[ 4669.247069] Call Trace:
[ 4669.247071] ? __schedule+0x3b4/0x7e0
[ 4669.247072] ? packet_rcv+0x41/0x430
[ 4669.247073] schedule+0x32/0x80
[ 4669.247074] schedule_preempt_disabled+0xa/0x10
[ 4669.247075] __mutex_lock.isra.2+0x279/0x4a0
[ 4669.247077] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[ 4669.247079] ? sch_direct_xmit+0x62/0x1a0
[ 4669.247105] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[ 4669.247132] zfs_znode_hold_enter+0x107/0x150 [zfs]
[ 4669.247136] ? bond_start_xmit+0x168/0x3d0 [bonding]
[ 4669.247139] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[ 4669.247162] zfs_zget+0xfb/0x220 [zfs]
[ 4669.247164] ? __dev_queue_xmit+0x283/0x630
[ 4669.247165] ? getboottime64+0x22/0x30
[ 4669.247168] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[ 4669.247193] zfs_vget+0x1fc/0x390 [zfs]
[ 4669.247197] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[ 4669.247222] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[ 4669.247226] exportfs_decode_fh+0x49/0x290
[ 4669.247230] ? exp_find_key+0x6a/0xa0 [nfsd]
[ 4669.247231] ? kmem_cache_alloc+0xff/0x510
[ 4669.247235] ? exp_find+0x5b/0xb0 [nfsd]
[ 4669.247238] fh_verify+0x2a6/0x5e0 [nfsd]
[ 4669.247241] nfsd_open+0x2c/0x180 [nfsd]
[ 4669.247244] nfsd_write+0x4e/0x1e0 [nfsd]
[ 4669.247248] nfsd3_proc_write+0x8d/0x110 [nfsd]
[ 4669.247251] nfsd_dispatch+0xb1/0x240 [nfsd]
[ 4669.247257] svc_process_common+0x395/0x6e0 [sunrpc]
[ 4669.247263] svc_process+0xf6/0x1a0 [sunrpc]
[ 4669.247266] nfsd+0xe3/0x150 [nfsd]
[ 4669.247267] kthread+0xff/0x130
[ 4669.247270] ? nfsd_destroy+0x60/0x60 [nfsd]
[ 4669.247271] ? kthread_create_on_node+0x40/0x40
[ 4669.247272] ret_from_fork+0x35/0x40
In the process list we have z_wr_iss, txg_sync and nfsd in D state, here are some traces from them:
crash> bt 3002
PID: 3002 TASK: ffff9396f6d81840 CPU: 9 COMMAND: "nfsd"
#0 [ffffac47e40df818] __schedule at ffffffff89694074
#1 [ffffac47e40df8a8] schedule at ffffffff896944d2
#2 [ffffac47e40df8b8] rwsem_down_read_failed at ffffffff8969772b
#3 [ffffac47e40df938] call_rwsem_down_read_failed at ffffffff8968a314
#4 [ffffac47e40df980] down_read at ffffffff89696a33
#5 [ffffac47e40df988] dnode_hold_impl at ffffffffc70e71e9 [zfs]
#6 [ffffac47e40dfa30] dmu_buf_hold_noread at ffffffffc70d1dd7 [zfs]
#7 [ffffac47e40dfa60] dmu_buf_hold at ffffffffc70d1ec7 [zfs]
#8 [ffffac47e40dfa90] zap_lockdir at ffffffffc715ef87 [zfs]
#9 [ffffac47e40dfad0] zap_lookup_norm at ffffffffc715f3b0 [zfs]
#10 [ffffac47e40dfb20] zap_lookup at ffffffffc715f422 [zfs]
#11 [ffffac47e40dfb48] zfs_dirent_lock at ffffffffc716cb2b [zfs]
#12 [ffffac47e40dfbc0] zfs_dirlook at ffffffffc716cbc7 [zfs]
#13 [ffffac47e40dfc20] zfs_lookup at ffffffffc718583d [zfs]
#14 [ffffac47e40dfc78] zpl_lookup at ffffffffc71aa5b9 [zfs]
#15 [ffffac47e40dfcf8] __lookup_hash at ffffffff891fb762
#16 [ffffac47e40dfd28] lookup_one_len at ffffffff891fb876
#17 [ffffac47e40dfd60] nfsd_lookup_dentry at ffffffffc07dae5d [nfsd]
#18 [ffffac47e40dfdd0] nfsd_lookup at ffffffffc07db18b [nfsd]
#19 [ffffac47e40dfe18] nfsd3_proc_lookup at ffffffffc07e3220 [nfsd]
#20 [ffffac47e40dfe48] nfsd_dispatch at ffffffffc07d7151 [nfsd]
#21 [ffffac47e40dfe78] svc_process_common at ffffffffc0637525 [sunrpc]
#22 [ffffac47e40dfed0] svc_process at ffffffffc06386b6 [sunrpc]
#23 [ffffac47e40dfef0] nfsd at ffffffffc07d6c03 [nfsd]
#24 [ffffac47e40dff10] kthread at ffffffff8909680f
#25 [ffffac47e40dff50] ret_from_fork at ffffffff898001f5
crash> bt 3001
PID: 3001 TASK: ffff9396f6c81800 CPU: 51 COMMAND: "nfsd"
#0 [ffffac47e40c7818] __schedule at ffffffff89694074
#1 [ffffac47e40c78a8] schedule at ffffffff896944d2
#2 [ffffac47e40c78b8] rwsem_down_read_failed at ffffffff8969772b
#3 [ffffac47e40c7938] call_rwsem_down_read_failed at ffffffff8968a314
#4 [ffffac47e40c7980] down_read at ffffffff89696a33
#5 [ffffac47e40c7988] dnode_hold_impl at ffffffffc70e71e9 [zfs]
#6 [ffffac47e40c7a30] dmu_object_alloc_impl at ffffffffc70d3df3 [zfs]
#7 [ffffac47e40c7ad0] dmu_object_alloc_dnsize at ffffffffc70d401c [zfs]
#8 [ffffac47e40c7af8] zfs_mknode at ffffffffc7191460 [zfs]
#9 [ffffac47e40c7c28] zfs_create at ffffffffc7185edc [zfs]
#10 [ffffac47e40c7ce8] zpl_create at ffffffffc71aaed4 [zfs]
#11 [ffffac47e40c7d48] vfs_create at ffffffff891fc79c
#12 [ffffac47e40c7d80] do_nfsd_create at ffffffffc07dd817 [nfsd]
#13 [ffffac47e40c7df8] nfsd3_proc_create at ffffffffc07e3978 [nfsd]
#14 [ffffac47e40c7e48] nfsd_dispatch at ffffffffc07d7151 [nfsd]
#15 [ffffac47e40c7e78] svc_process_common at ffffffffc0637525 [sunrpc]
#16 [ffffac47e40c7ed0] svc_process at ffffffffc06386b6 [sunrpc]
#17 [ffffac47e40c7ef0] nfsd at ffffffffc07d6c03 [nfsd]
#18 [ffffac47e40c7f10] kthread at ffffffff8909680f
#19 [ffffac47e40c7f50] ret_from_fork at ffffffff898001f5
crash> bt 3000
PID: 3000 TASK: ffff9396f73457c0 CPU: 43 COMMAND: "nfsd"
#0 [ffffac47e409f888] __schedule at ffffffff89694074
#1 [ffffac47e409f918] schedule at ffffffff896944d2
#2 [ffffac47e409f928] rwsem_down_read_failed at ffffffff8969772b
#3 [ffffac47e409f9a8] call_rwsem_down_read_failed at ffffffff8968a314
#4 [ffffac47e409f9f0] down_read at ffffffff89696a33
#5 [ffffac47e409f9f8] dbuf_read at ffffffffc70c6486 [zfs]
#6 [ffffac47e409fa60] dnode_hold_impl at ffffffffc70e7241 [zfs]
#7 [ffffac47e409fb08] dmu_bonus_hold at ffffffffc70cf812 [zfs]
#8 [ffffac47e409fb30] zfs_zget at ffffffffc7193991 [zfs]
#9 [ffffac47e409fbc0] zfs_vget at ffffffffc718236c [zfs]
#10 [ffffac47e409fc08] zpl_fh_to_dentry at ffffffffc71a8fa8 [zfs]
#11 [ffffac47e409fc28] exportfs_decode_fh at ffffffff892cfa99
#12 [ffffac47e409fd70] fh_verify at ffffffffc07da536 [nfsd]
#13 [ffffac47e409fdd0] nfsd_lookup at ffffffffc07db15f [nfsd]
#14 [ffffac47e409fe18] nfsd3_proc_lookup at ffffffffc07e3220 [nfsd]
#15 [ffffac47e409fe48] nfsd_dispatch at ffffffffc07d7151 [nfsd]
#16 [ffffac47e409fe78] svc_process_common at ffffffffc0637525 [sunrpc]
#17 [ffffac47e409fed0] svc_process at ffffffffc06386b6 [sunrpc]
#18 [ffffac47e409fef0] nfsd at ffffffffc07d6c03 [nfsd]
#19 [ffffac47e409ff10] kthread at ffffffff8909680f
#20 [ffffac47e409ff50] ret_from_fork at ffffffff898001f5
crash>
crash> bt 2971
PID: 2971 TASK: ffff9396f8db90c0 CPU: 61 COMMAND: "nfsd"
#0 [ffffac47e02778a8] __schedule at ffffffff89694074
#1 [ffffac47e0277938] schedule at ffffffff896944d2
#2 [ffffac47e0277948] rwsem_down_read_failed at ffffffff8969772b
#3 [ffffac47e02779c0] call_rwsem_down_read_failed at ffffffff8968a314
#4 [ffffac47e0277a08] down_read at ffffffff89696a33
#5 [ffffac47e0277a10] dnode_hold_impl at ffffffffc70e71e9 [zfs]
#6 [ffffac47e0277ab8] dmu_bonus_hold at ffffffffc70cf812 [zfs]
#7 [ffffac47e0277ae0] zfs_zget at ffffffffc7193991 [zfs]
#8 [ffffac47e0277b70] zfs_vget at ffffffffc718236c [zfs]
#9 [ffffac47e0277bb8] zpl_fh_to_dentry at ffffffffc71a8fa8 [zfs]
#10 [ffffac47e0277bd8] exportfs_decode_fh at ffffffff892cfa99
#11 [ffffac47e0277d20] fh_verify at ffffffffc07da536 [nfsd]
#12 [ffffac47e0277d80] do_nfsd_create at ffffffffc07dd4e5 [nfsd]
#13 [ffffac47e0277df8] nfsd3_proc_create at ffffffffc07e3978 [nfsd]
#14 [ffffac47e0277e48] nfsd_dispatch at ffffffffc07d7151 [nfsd]
#15 [ffffac47e0277e78] svc_process_common at ffffffffc0637525 [sunrpc]
#16 [ffffac47e0277ed0] svc_process at ffffffffc06386b6 [sunrpc]
#17 [ffffac47e0277ef0] nfsd at ffffffffc07d6c03 [nfsd]
#18 [ffffac47e0277f10] kthread at ffffffff8909680f
#19 [ffffac47e0277f50] ret_from_fork at ffffffff898001f5
crash> bt 2893
PID: 2893 TASK: ffff9396fde49a40 CPU: 4 COMMAND: "nfsd"
#0 [ffffac47cfa3b8a8] __schedule at ffffffff89694074
#1 [ffffac47cfa3b938] schedule at ffffffff896944d2
#2 [ffffac47cfa3b948] rwsem_down_read_failed at ffffffff8969772b
#3 [ffffac47cfa3b9c0] call_rwsem_down_read_failed at ffffffff8968a314
#4 [ffffac47cfa3ba08] down_read at ffffffff89696a33
#5 [ffffac47cfa3ba10] dnode_hold_impl at ffffffffc70e71e9 [zfs]
#6 [ffffac47cfa3bab8] dmu_bonus_hold at ffffffffc70cf812 [zfs]
#7 [ffffac47cfa3bae0] zfs_zget at ffffffffc7193991 [zfs]
#8 [ffffac47cfa3bb70] zfs_vget at ffffffffc718236c [zfs]
#9 [ffffac47cfa3bbb8] zpl_fh_to_dentry at ffffffffc71a8fa8 [zfs]
#10 [ffffac47cfa3bbd8] exportfs_decode_fh at ffffffff892cfa99
#11 [ffffac47cfa3bd20] fh_verify at ffffffffc07da536 [nfsd]
#12 [ffffac47cfa3bd80] do_nfsd_create at ffffffffc07dd4e5 [nfsd]
#13 [ffffac47cfa3bdf8] nfsd3_proc_create at ffffffffc07e3978 [nfsd]
#14 [ffffac47cfa3be48] nfsd_dispatch at ffffffffc07d7151 [nfsd]
#15 [ffffac47cfa3be78] svc_process_common at ffffffffc0637525 [sunrpc]
#16 [ffffac47cfa3bed0] svc_process at ffffffffc06386b6 [sunrpc]
#17 [ffffac47cfa3bef0] nfsd at ffffffffc07d6c03 [nfsd]
#18 [ffffac47cfa3bf10] kthread at ffffffff8909680f
#19 [ffffac47cfa3bf50] ret_from_fork at ffffffff898001f5
crash> bt 10127
PID: 10127 TASK: ffff9396dd2cc740 CPU: 21 COMMAND: "txg_sync"
#0 [ffffac47e976fb38] __schedule at ffffffff89694074
#1 [ffffac47e976fbc8] schedule at ffffffff896944d2
#2 [ffffac47e976fbd8] schedule_timeout at ffffffff89697d12
#3 [ffffac47e976fc60] io_schedule_timeout at ffffffff89694879
#4 [ffffac47e976fc78] __cv_timedwait_common at ffffffffc09d53d2 [spl]
#5 [ffffac47e976fcd8] zio_wait at ffffffffc719f684 [zfs]
#6 [ffffac47e976fd28] dsl_pool_sync at ffffffffc7102c59 [zfs]
#7 [ffffac47e976fda0] spa_sync at ffffffffc7126a20 [zfs]
#8 [ffffac47e976fe40] txg_sync_thread at ffffffffc713b4e1 [zfs]
#9 [ffffac47e976fee8] thread_generic_wrapper at ffffffffc09dc1af [spl]
#10 [ffffac47e976ff10] kthread at ffffffff8909680f
#11 [ffffac47e976ff50] ret_from_fork at ffffffff898001f5
crash> bt 9335
PID: 9335 TASK: ffff9356ae530880 CPU: 19 COMMAND: "z_wr_iss"
#0 [ffffac47e70cfb48] __schedule at ffffffff89694074
#1 [ffffac47e70cfbd8] schedule at ffffffff896944d2
#2 [ffffac47e70cfbe8] rwsem_down_write_failed at ffffffff896972a0
#3 [ffffac47e70cfc88] call_rwsem_down_write_failed at ffffffff8968a343
#4 [ffffac47e70cfcc8] down_write at ffffffff89696a70
#5 [ffffac47e70cfcd0] dbuf_write_ready at ffffffffc70c4c81 [zfs]
#6 [ffffac47e70cfd08] arc_write_ready at ffffffffc70bdfc4 [zfs]
#7 [ffffac47e70cfd60] zio_ready at ffffffffc71a17b5 [zfs]
#8 [ffffac47e70cfdb8] zio_execute at ffffffffc719c53d [zfs]
#9 [ffffac47e70cfde8] taskq_thread at ffffffffc09db479 [spl]
#10 [ffffac47e70cff10] kthread at ffffffff8909680f
#11 [ffffac47e70cff50] ret_from_fork at ffffffff898001f5
crash> bt 9331
PID: 9331 TASK: ffff9356ae520780 CPU: 60 COMMAND: "z_wr_iss"
#0 [ffffac47e70afb48] __schedule at ffffffff89694074
#1 [ffffac47e70afbd8] schedule at ffffffff896944d2
#2 [ffffac47e70afbe8] rwsem_down_write_failed at ffffffff896972a0
#3 [ffffac47e70afc88] call_rwsem_down_write_failed at ffffffff8968a343
#4 [ffffac47e70afcc8] down_write at ffffffff89696a70
#5 [ffffac47e70afcd0] dbuf_write_ready at ffffffffc70c4c81 [zfs]
#6 [ffffac47e70afd08] arc_write_ready at ffffffffc70bdfc4 [zfs]
#7 [ffffac47e70afd60] zio_ready at ffffffffc71a17b5 [zfs]
#8 [ffffac47e70afdb8] zio_execute at ffffffffc719c53d [zfs]
#9 [ffffac47e70afde8] taskq_thread at ffffffffc09db479 [spl]
#10 [ffffac47e70aff10] kthread at ffffffff8909680f
#11 [ffffac47e70aff50] ret_from_fork at ffffffff898001f5
crash> bt 9293
PID: 9293 TASK: ffff9356aea35b80 CPU: 22 COMMAND: "z_wr_iss"
#0 [ffffac47e50ffb48] __schedule at ffffffff89694074
#1 [ffffac47e50ffbd8] schedule at ffffffff896944d2
#2 [ffffac47e50ffbe8] rwsem_down_write_failed at ffffffff896972a0
#3 [ffffac47e50ffc88] call_rwsem_down_write_failed at ffffffff8968a343
#4 [ffffac47e50ffcc8] down_write at ffffffff89696a70
#5 [ffffac47e50ffcd0] dbuf_write_ready at ffffffffc70c4c81 [zfs]
#6 [ffffac47e50ffd08] arc_write_ready at ffffffffc70bdfc4 [zfs]
#7 [ffffac47e50ffd60] zio_ready at ffffffffc71a17b5 [zfs]
#8 [ffffac47e50ffdb8] zio_execute at ffffffffc719c53d [zfs]
#9 [ffffac47e50ffde8] taskq_thread at ffffffffc09db479 [spl]
#10 [ffffac47e50fff10] kthread at ffffffff8909680f
#11 [ffffac47e50fff50] ret_from_fork at ffffffff898001f5
Now I also know how to reproduce it - running parallel mdtest from a few nodes at the same time, eg from 5 nodes running:
time mpirun -np 32 mdtest -d /mnt/zpool/mdtest-run-$HOSTNAME -i 2 -I 1024 -z 1 -b 512 -L -u -F
Where /mnt/zpool is a ZFS dataset exported with sharenfs=rw=@10.0.0.0/255.0.0.0,async,no_root_squash
With 0.7.13 it would get stuck (with another error) early on in the parallel file creation phase. With 0.8 looks like it's getting stuck once it starts removing files.
Is there any other useful information I can provide to help debug this issue?
Now with 0.8 and deadman enabled before the hang there are some zpool events and it took a lot longer (many hours) for ZFS to get stuck while running parallel mdtest on top of some regular, concurrent, read/write activity.
Here's the dmesg trace:
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2845 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2845 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] ? packet_rcv+0x41/0x430
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] schedule_preempt_disabled+0xa/0x10
[Sun Mar 31 02:45:50 2019] __mutex_lock.isra.2+0x279/0x4a0
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] ? sch_direct_xmit+0x62/0x1a0
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] ? bond_start_xmit+0x168/0x3d0 [bonding]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_zget+0xfb/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? __dev_queue_xmit+0x283/0x630
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2846 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2846 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] ? dev_hard_start_xmit+0xa0/0x1e0
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] rwsem_down_read_failed+0xfb/0x140
[Sun Mar 31 02:45:50 2019] ? call_rwsem_down_read_failed+0x14/0x30
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] call_rwsem_down_read_failed+0x14/0x30
[Sun Mar 31 02:45:50 2019] ? zpl_encode_fh+0xa0/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] down_read+0x13/0x30
[Sun Mar 31 02:45:50 2019] dnode_hold_impl+0xc9/0xc50 [zfs]
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] dmu_bonus_hold+0x22/0x60 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_zget+0x111/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? sch_direct_xmit+0x62/0x1a0
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2847 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2847 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] ? packet_rcv+0x41/0x430
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] schedule_preempt_disabled+0xa/0x10
[Sun Mar 31 02:45:50 2019] __mutex_lock.isra.2+0x279/0x4a0
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] ? sch_direct_xmit+0x62/0x1a0
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] ? bond_start_xmit+0x168/0x3d0 [bonding]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_zget+0xfb/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? __dev_queue_xmit+0x283/0x630
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? tcp_v4_do_rcv+0x114/0x1e0
[Sun Mar 31 02:45:50 2019] ? __release_sock+0x91/0xd0
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2848 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2848 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] ? packet_rcv+0x41/0x430
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] schedule_preempt_disabled+0xa/0x10
[Sun Mar 31 02:45:50 2019] __mutex_lock.isra.2+0x279/0x4a0
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] ? sch_direct_xmit+0x62/0x1a0
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] ? bond_start_xmit+0x168/0x3d0 [bonding]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_zget+0xfb/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? __dev_queue_xmit+0x283/0x630
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2849 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2849 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] ? packet_rcv+0x41/0x430
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] schedule_preempt_disabled+0xa/0x10
[Sun Mar 31 02:45:50 2019] __mutex_lock.isra.2+0x279/0x4a0
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] ? sch_direct_xmit+0x62/0x1a0
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] ? bond_start_xmit+0x168/0x3d0 [bonding]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_zget+0xfb/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? __dev_queue_xmit+0x283/0x630
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? tcp_v4_do_rcv+0x114/0x1e0
[Sun Mar 31 02:45:50 2019] ? __release_sock+0x91/0xd0
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2850 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2850 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] ? packet_rcv+0x41/0x430
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] schedule_preempt_disabled+0xa/0x10
[Sun Mar 31 02:45:50 2019] __mutex_lock.isra.2+0x279/0x4a0
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] ? sch_direct_xmit+0x62/0x1a0
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] ? bond_start_xmit+0x168/0x3d0 [bonding]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_zget+0xfb/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? __dev_queue_xmit+0x283/0x630
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? tcp_v4_do_rcv+0x114/0x1e0
[Sun Mar 31 02:45:50 2019] ? __release_sock+0x91/0xd0
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2851 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2851 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] ? packet_rcv+0x41/0x430
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] schedule_preempt_disabled+0xa/0x10
[Sun Mar 31 02:45:50 2019] __mutex_lock.isra.2+0x279/0x4a0
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] ? sch_direct_xmit+0x62/0x1a0
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] ? bond_start_xmit+0x168/0x3d0 [bonding]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_zget+0xfb/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? __dev_queue_xmit+0x283/0x630
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? tcp_v4_do_rcv+0x114/0x1e0
[Sun Mar 31 02:45:50 2019] ? __release_sock+0x91/0xd0
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2852 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2852 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] ? packet_rcv+0x41/0x430
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] schedule_preempt_disabled+0xa/0x10
[Sun Mar 31 02:45:50 2019] __mutex_lock.isra.2+0x279/0x4a0
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] ? sch_direct_xmit+0x62/0x1a0
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] ? bond_start_xmit+0x168/0x3d0 [bonding]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_zget+0xfb/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? __dev_queue_xmit+0x283/0x630
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2853 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2853 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] ? packet_rcv+0x41/0x430
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] schedule_preempt_disabled+0xa/0x10
[Sun Mar 31 02:45:50 2019] __mutex_lock.isra.2+0x279/0x4a0
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] ? sch_direct_xmit+0x62/0x1a0
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] ? bond_start_xmit+0x168/0x3d0 [bonding]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_zget+0xfb/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? __dev_queue_xmit+0x283/0x630
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? tcp_v4_do_rcv+0x114/0x1e0
[Sun Mar 31 02:45:50 2019] ? __release_sock+0x91/0xd0
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
[Sun Mar 31 02:45:50 2019] INFO: task nfsd:2854 blocked for more than 120 seconds.
[Sun Mar 31 02:45:50 2019] Tainted: P O 4.14.97.hrtdev #1
[Sun Mar 31 02:45:50 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Mar 31 02:45:50 2019] nfsd D 0 2854 2 0x80000000
[Sun Mar 31 02:45:50 2019] Call Trace:
[Sun Mar 31 02:45:50 2019] ? __schedule+0x3b4/0x7e0
[Sun Mar 31 02:45:50 2019] schedule+0x32/0x80
[Sun Mar 31 02:45:50 2019] rwsem_down_read_failed+0xfb/0x140
[Sun Mar 31 02:45:50 2019] ? call_rwsem_down_read_failed+0x14/0x30
[Sun Mar 31 02:45:50 2019] ? spl_kmem_cache_alloc+0xfe/0x860 [spl]
[Sun Mar 31 02:45:50 2019] call_rwsem_down_read_failed+0x14/0x30
[Sun Mar 31 02:45:50 2019] down_read+0x13/0x30
[Sun Mar 31 02:45:50 2019] dnode_hold_impl+0xc9/0xc50 [zfs]
[Sun Mar 31 02:45:50 2019] ? zfs_znode_hold_enter+0x107/0x150 [zfs]
[Sun Mar 31 02:45:50 2019] dmu_bonus_hold+0x22/0x60 [zfs]
[Sun Mar 31 02:45:50 2019] zfs_zget+0x111/0x220 [zfs]
[Sun Mar 31 02:45:50 2019] ? __dev_queue_xmit+0x283/0x630
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zfs_vget+0x1fc/0x390 [zfs]
[Sun Mar 31 02:45:50 2019] ? nfsd_proc_setattr+0x180/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] zpl_fh_to_dentry+0x58/0xa0 [zfs]
[Sun Mar 31 02:45:50 2019] exportfs_decode_fh+0x49/0x290
[Sun Mar 31 02:45:50 2019] ? exp_find_key+0x6a/0xa0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kmem_cache_alloc+0xff/0x510
[Sun Mar 31 02:45:50 2019] ? exp_find+0x5b/0xb0 [nfsd]
[Sun Mar 31 02:45:50 2019] fh_verify+0x2a6/0x5e0 [nfsd]
[Sun Mar 31 02:45:50 2019] ? getboottime64+0x22/0x30
[Sun Mar 31 02:45:50 2019] nfsd_open+0x2c/0x180 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_write+0x4e/0x1e0 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd3_proc_write+0x8d/0x110 [nfsd]
[Sun Mar 31 02:45:50 2019] nfsd_dispatch+0xb1/0x240 [nfsd]
[Sun Mar 31 02:45:50 2019] svc_process_common+0x395/0x6e0 [sunrpc]
[Sun Mar 31 02:45:50 2019] svc_process+0xf6/0x1a0 [sunrpc]
[Sun Mar 31 02:45:50 2019] nfsd+0xe3/0x150 [nfsd]
[Sun Mar 31 02:45:50 2019] kthread+0xff/0x130
[Sun Mar 31 02:45:50 2019] ? nfsd_destroy+0x60/0x60 [nfsd]
[Sun Mar 31 02:45:50 2019] ? kthread_create_on_node+0x40/0x40
[Sun Mar 31 02:45:50 2019] ret_from_fork+0x35/0x40
And some of the zpool events:
Mar 31 2019 09:01:53.420546561 ereport.fs.zfs.deadman
class = "ereport.fs.zfs.deadman"
ena = 0xd2ea468c63300001
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0x4ddb8a5d6c54d028
(end detector)
pool = "zpool"
pool_guid = 0x4ddb8a5d6c54d028
pool_state = 0x0
pool_context = 0x0
pool_failmode = "wait"
zio_err = 0x0
zio_flags = 0x0
zio_stage = 0x10
zio_pipeline = 0x17980f4
zio_delay = 0x0
zio_timestamp = 0x0
zio_delta = 0x0
zio_objset = 0x108
zio_object = 0x0
zio_level = 0x1
zio_blkid = 0x39b
time = 0x5ca0ba41 0x19110801
eid = 0xe5a
Mar 31 2019 09:01:53.420546561 ereport.fs.zfs.deadman
class = "ereport.fs.zfs.deadman"
ena = 0xd2ea46963c200001
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0x4ddb8a5d6c54d028
(end detector)
pool = "zpool"
pool_guid = 0x4ddb8a5d6c54d028
pool_state = 0x0
pool_context = 0x0
pool_failmode = "wait"
zio_err = 0x0
zio_flags = 0x0
zio_stage = 0x10
zio_pipeline = 0x17980f4
zio_delay = 0x0
zio_timestamp = 0x0
zio_delta = 0x0
zio_objset = 0x108
zio_object = 0x0
zio_level = 0x1
zio_blkid = 0x39a
time = 0x5ca0ba41 0x19110801
eid = 0xe5b
Mar 31 2019 09:01:53.420546561 ereport.fs.zfs.deadman
class = "ereport.fs.zfs.deadman"
ena = 0xd2ea46a263100001
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0x4ddb8a5d6c54d028
(end detector)
pool = "zpool"
pool_guid = 0x4ddb8a5d6c54d028
pool_state = 0x0
pool_context = 0x0
pool_failmode = "wait"
zio_err = 0x0
zio_flags = 0x0
zio_stage = 0x10
zio_pipeline = 0x17980f4
zio_delay = 0x0
zio_timestamp = 0x0
zio_delta = 0x0
zio_objset = 0x108
zio_object = 0x0
zio_level = 0x1
zio_blkid = 0x399
time = 0x5ca0ba41 0x19110801
eid = 0xe5c
Mar 31 2019 09:01:53.420546561 ereport.fs.zfs.deadman
class = "ereport.fs.zfs.deadman"
ena = 0xd2ea46b2e6f00001
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0x4ddb8a5d6c54d028
(end detector)
pool = "zpool"
pool_guid = 0x4ddb8a5d6c54d028
pool_state = 0x0
pool_context = 0x0
pool_failmode = "wait"
zio_err = 0x0
zio_flags = 0x0
zio_stage = 0x10
zio_pipeline = 0x17980f4
zio_delay = 0x0
zio_timestamp = 0x0
zio_delta = 0x0
zio_objset = 0x108
zio_object = 0x0
zio_level = 0x1
zio_blkid = 0x398
time = 0x5ca0ba41 0x19110801
eid = 0xe5d
Mar 31 2019 09:01:53.420546561 ereport.fs.zfs.deadman
class = "ereport.fs.zfs.deadman"
ena = 0xd2ea46ba21900001
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0x4ddb8a5d6c54d028
(end detector)
pool = "zpool"
pool_guid = 0x4ddb8a5d6c54d028
pool_state = 0x0
pool_context = 0x0
pool_failmode = "wait"
zio_err = 0x0
zio_flags = 0x0
zio_stage = 0x10
zio_pipeline = 0x17980f4
zio_delay = 0x0
zio_timestamp = 0x0
zio_delta = 0x0
zio_objset = 0x108
zio_object = 0x0
zio_level = 0x1
zio_blkid = 0x347
time = 0x5ca0ba41 0x19110801
eid = 0xe5e
I suspect there have been zpool events since 2:48, when ZFS got stuck, but 'zpool events' seems to just show the last few.
txg_sync
, a few z_wr_iss
and all nfsd
are the processes in D state.
Here's all the ZFS tuning:
# arc_summary
------------------------------------------------------------------------
ZFS Subsystem Report Sun Mar 31 09:05:32 2019
Linux 4.14.97.hrtdev 0.8.0-rc3_126_g5dbf8b4ed
Machine: kea2 (x86_64) 0.8.0-rc3_126_g5dbf8b4ed
ARC status: HEALTHY
Memory throttle count: 0
ARC size (current): 91.1 % 411.5 GiB
Target size (adaptive): 91.1 % 411.5 GiB
Min size (hard limit): 55.6 % 251.0 GiB
Max size (high water): 1:1 451.9 GiB
Most Frequently Used (MFU) cache size: 28.5 % 116.9 GiB
Most Recently Used (MRU) cache size: 71.5 % 292.5 GiB
Metadata cache size (hard limit): 75.0 % 338.9 GiB
Metadata cache size (current): 1.7 % 5.9 GiB
Dnode cache size (hard limit): 10.0 % 33.9 GiB
Dnode cache size (current): 1.9 % 662.6 MiB
ARC hash breakdown:
Elements max: 7.1M
Elements current: 47.5 % 3.4M
Collisions: 22.1M
Chain max: 5
Chains: 81.5k
ARC misc:
Deleted: 74.5M
Mutex misses: 9.5M
Eviction skips: 1.9M
ARC total accesses (hits + misses): 70.3G
Cache hit ratio: 99.8 % 70.2G
Cache miss ratio: 0.2 % 129.8M
Actual hit ratio (MFU + MRU hits): 99.5 % 70.0G
Data demand efficiency: 99.1 % 1.3G
Data prefetch efficiency: n/a 0
Cache hits by cache type:
Most frequently used (MFU): 94.8 % 66.6G
Most recently used (MRU): 4.9 % 3.4G
Most frequently used (MFU) ghost: 0.1 % 55.9M
Most recently used (MRU) ghost: 0.1 % 62.8M
Anonymously used: 0.1 % 71.4M
Cache hits by data type:
Demand data: 1.9 % 1.3G
Demand prefetch data: 0.0 % 0
Demand metadata: 97.8 % 68.6G
Demand prefetch metadata: 0.4 % 265.6M
Cache misses by data type:
Demand data: 9.2 % 11.9M
Demand prefetch data: 0.0 % 0
Demand metadata: 79.4 % 103.0M
Demand prefetch metadata: 11.4 % 14.8M
DMU prefetch efficiency: 0
Hit ratio: n/a 0
Miss ratio: n/a 0
L2ARC not detected, skipping section
Solaris Porting Layer (SPL):
spl_hostid 0
spl_hostid_path /etc/hostid
spl_kmem_alloc_max 1048576
spl_kmem_alloc_warn 65536
spl_kmem_cache_expire 2
spl_kmem_cache_kmem_limit 2048
spl_kmem_cache_kmem_threads 4
spl_kmem_cache_magazine_size 0
spl_kmem_cache_max_size 32
spl_kmem_cache_obj_per_slab 8
spl_kmem_cache_obj_per_slab_min 1
spl_kmem_cache_reclaim 0
spl_kmem_cache_slab_limit 16384
spl_max_show_tasks 512
spl_panic_halt 0
spl_taskq_kick 0
spl_taskq_thread_bind 0
spl_taskq_thread_dynamic 1
spl_taskq_thread_priority 1
spl_taskq_thread_sequential 4
Tunables:
dbuf_cache_hiwater_pct 10
dbuf_cache_lowater_pct 10
dbuf_cache_max_bytes 8423723520
dbuf_cache_shift 5
dbuf_metadata_cache_max_bytes 4211861760
dbuf_metadata_cache_shift 6
dmu_object_alloc_chunk_shift 7
ignore_hole_birth 1
l2arc_feed_again 1
l2arc_feed_min_ms 200
l2arc_feed_secs 1
l2arc_headroom 2
l2arc_headroom_boost 200
l2arc_noprefetch 1
l2arc_norw 0
l2arc_write_boost 8388608
l2arc_write_max 8388608
metaslab_aliquot 524288
metaslab_bias_enabled 1
metaslab_debug_load 0
metaslab_debug_unload 0
metaslab_force_ganging 16777217
metaslab_fragmentation_factor_enabled 1
metaslab_lba_weighting_enabled 1
metaslab_preload_enabled 1
send_holes_without_birth_time 1
spa_asize_inflation 24
spa_config_path /etc/zfs/zpool.cache
spa_load_print_vdev_tree 0
spa_load_verify_data 1
spa_load_verify_maxinflight 10000
spa_load_verify_metadata 1
spa_slop_shift 5
vdev_removal_max_span 32768
vdev_validate_skip 0
zfetch_array_rd_sz 1048576
zfetch_max_distance 8388608
zfetch_max_streams 8
zfetch_min_sec_reap 2
zfs_abd_scatter_enabled 1
zfs_abd_scatter_max_order 10
zfs_abd_scatter_min_size 1536
zfs_admin_snapshot 0
zfs_arc_average_blocksize 8192
zfs_arc_dnode_limit 0
zfs_arc_dnode_limit_percent 10
zfs_arc_dnode_reduce_percent 10
zfs_arc_grow_retry 0
zfs_arc_lotsfree_percent 10
zfs_arc_max 485206474752
zfs_arc_meta_adjust_restarts 4096
zfs_arc_meta_limit 0
zfs_arc_meta_limit_percent 75
zfs_arc_meta_min 0
zfs_arc_meta_prune 10000
zfs_arc_meta_strategy 1
zfs_arc_min 269559152640
zfs_arc_min_prefetch_ms 0
zfs_arc_min_prescient_prefetch_ms 0
zfs_arc_p_dampener_disable 1
zfs_arc_p_min_shift 0
zfs_arc_pc_percent 0
zfs_arc_shrink_shift 0
zfs_arc_sys_free 0
zfs_async_block_max_blocks 100000
zfs_autoimport_disable 1
zfs_checksum_events_per_second 20
zfs_commit_timeout_pct 5
zfs_compressed_arc_enabled 1
zfs_condense_indirect_commit_entry_delay_ms 0
zfs_condense_indirect_vdevs_enable 1
zfs_condense_max_obsolete_bytes 1073741824
zfs_condense_min_mapping_bytes 131072
zfs_dbgmsg_enable 1
zfs_dbgmsg_maxsize 4194304
zfs_dbuf_state_index 0
zfs_ddt_data_is_special 1
zfs_deadman_checktime_ms 60000
zfs_deadman_enabled 1
zfs_deadman_failmode continue
zfs_deadman_synctime_ms 600000
zfs_deadman_ziotime_ms 300000
zfs_dedup_prefetch 0
zfs_delay_min_dirty_percent 60
zfs_delay_scale 500000
zfs_delete_blocks 20480
zfs_dirty_data_max 4294967296
zfs_dirty_data_max_max 4294967296
zfs_dirty_data_max_max_percent 25
zfs_dirty_data_max_percent 10
zfs_dirty_data_sync_percent 20
zfs_disable_ivset_guid_check 0
zfs_dmu_offset_next_sync 0
zfs_expire_snapshot 300
zfs_flags 0
zfs_free_bpobj_enabled 1
zfs_free_leak_on_eio 0
zfs_free_min_time_ms 1000
zfs_immediate_write_sz 32768
zfs_initialize_value 16045690984833335022
zfs_key_max_salt_uses 400000000
zfs_lua_max_instrlimit 100000000
zfs_lua_max_memlimit 104857600
zfs_max_missing_tvds 0
zfs_max_recordsize 1048576
zfs_metaslab_fragmentation_threshold 70
zfs_metaslab_segment_weight_enabled 1
zfs_metaslab_switch_threshold 2
zfs_mg_fragmentation_threshold 85
zfs_mg_noalloc_threshold 0
zfs_multihost_fail_intervals 10
zfs_multihost_history 0
zfs_multihost_import_intervals 20
zfs_multihost_interval 2000
zfs_multilist_num_sublists 0
zfs_no_scrub_io 0
zfs_no_scrub_prefetch 0
zfs_nocacheflush 0
zfs_nopwrite_enabled 1
zfs_object_mutex_size 64
zfs_obsolete_min_time_ms 500
zfs_override_estimate_recordsize 0
zfs_pd_bytes_max 52428800
zfs_per_txg_dirty_frees_percent 5
zfs_prefetch_disable 1
zfs_read_chunk_size 1048576
zfs_read_history 0
zfs_read_history_hits 0
zfs_reconstruct_indirect_combinations_max 4096
zfs_recover 0
zfs_recv_queue_length 16777216
zfs_removal_ignore_errors 0
zfs_removal_suspend_progress 0
zfs_remove_max_segment 16777216
zfs_resilver_disable_defer 0
zfs_resilver_min_time_ms 3000
zfs_scan_checkpoint_intval 7200
zfs_scan_fill_weight 3
zfs_scan_ignore_errors 0
zfs_scan_issue_strategy 0
zfs_scan_legacy 0
zfs_scan_max_ext_gap 2097152
zfs_scan_mem_lim_fact 20
zfs_scan_mem_lim_soft_fact 20
zfs_scan_strict_mem_lim 0
zfs_scan_suspend_progress 0
zfs_scan_vdev_limit 4194304
zfs_scrub_min_time_ms 1000
zfs_send_corrupt_data 0
zfs_send_queue_length 16777216
zfs_slow_io_events_per_second 20
zfs_spa_discard_memory_limit 16777216
zfs_sync_pass_deferred_free 2
zfs_sync_pass_dont_compress 5
zfs_sync_pass_rewrite 2
zfs_sync_taskq_batch_pct 75
zfs_txg_history 100
zfs_txg_timeout 5
zfs_unlink_suspend_progress 0
zfs_user_indirect_is_special 1
zfs_vdev_aggregation_limit 16777216
zfs_vdev_aggregation_limit_non_rotating 131072
zfs_vdev_async_read_max_active 3
zfs_vdev_async_read_min_active 1
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_vdev_async_write_active_min_dirty_percent 30
zfs_vdev_async_write_max_active 10
zfs_vdev_async_write_min_active 2
zfs_vdev_cache_bshift 16
zfs_vdev_cache_max 16384
zfs_vdev_cache_size 0
zfs_vdev_default_ms_count 200
zfs_vdev_initializing_max_active 1
zfs_vdev_initializing_min_active 1
zfs_vdev_max_active 1000
zfs_vdev_min_ms_count 16
zfs_vdev_mirror_non_rotating_inc 0
zfs_vdev_mirror_non_rotating_seek_inc 1
zfs_vdev_mirror_rotating_inc 0
zfs_vdev_mirror_rotating_seek_inc 5
zfs_vdev_mirror_rotating_seek_offset 1048576
zfs_vdev_ms_count_limit 131072
zfs_vdev_queue_depth_pct 1000
zfs_vdev_raidz_impl[fastest] original scalar sse2 ssse3 avx2 avx512f avx512bw
zfs_vdev_read_gap_limit 32768
zfs_vdev_removal_max_active 2
zfs_vdev_removal_min_active 1
zfs_vdev_scheduler deadline
zfs_vdev_scrub_max_active 2
zfs_vdev_scrub_min_active 1
zfs_vdev_sync_read_max_active 10
zfs_vdev_sync_read_min_active 10
zfs_vdev_sync_write_max_active 10
zfs_vdev_sync_write_min_active 10
zfs_vdev_write_gap_limit 4096
zfs_zevent_cols 80
zfs_zevent_console 0
zfs_zevent_len_max 1152
zfs_zil_clean_taskq_maxalloc 1048576
zfs_zil_clean_taskq_minalloc 1024
zfs_zil_clean_taskq_nthr_pct 100
zil_nocacheflush 0
zil_replay_disable 0
zil_slog_bulk 786432
zio_deadman_log_all 1
zio_dva_throttle_enabled 1
zio_requeue_io_start_cut_in_line 1
zio_slow_io_ms 30000
zio_taskq_batch_pct 75
zvol_inhibit_dev 0
zvol_major 230
zvol_max_discard_blocks 16384
zvol_prefetch_bytes 131072
zvol_request_sync 0
zvol_threads 32
zvol_volmode 1
VDEV cache disabled, skipping section
ZIL committed transactions: 0
Commit requests: 0
Flushes to stable storage: 0
Transactions to SLOG storage pool: 0 Bytes 0
Transactions to non-SLOG storage pool: 0 Bytes 0
@gerardba Please post the contents of /proc/spl/kstat/zfs/dbgmsg
which will show a more detailed dump of the entire zio tree. The lines of interest begin with "slow zio[...". Also please check the output of "ps ax" to see whether any ZFS-related tasks are stuck in "R" state. Finally, post the contents of /proc/spl/taskq
if it is not empty.
The only thing I can confirm at this time is that there were no ZFS-related tasks stuck in "R" state, the only process in "R" state on the system was [rcu_sched]. I had rebooted already when I saw the message, I've restarted the workload and will provide that data when it gets stuck.
This issue happened again and this time I could collect all the data @dweeezil , I'm attaching a compressed text file with the output of the following commands:
Running (0) - cat /sys/module/zfs/version
Running (1) - dstat 1 2
Running (2) - free
Running (3) - uptime
Running (4) - uname -a
Running (5) - cat /etc/issue
Running (6) - zpool status
Running (7) - zfs list
Running (10) - cat /proc/spl/kstat/zfs/dbgmsg
Running (11) - cat /proc/spl/taskq
Running (12) - ps faux
Running (13) - iostat -mx 1 2
Hopefully this helps find the culprit. For now I've updated the system to 0.8.1 so if this happens again it will be on a newer version of ZFS....
I think we are hit by the same kind of problem Brand new dell r740xd server with H740p perc card in eHBA mode. 10 disk on ZFS. hangs sometimes, IO appearing in virtual machines and a message about zvol getting stuck for more than 120s (kernel tainted)
We thought it was a zfs utils version problem, but we are at last version (0.8.1)
Updated our proxmox kernel from 5.0.16 to 5.0.21, hoping for a driver problem :(
We thought it was a zfs utils version problem, but we are at last version (0.8.1)
The latest version on Proxmox is 0.8.2 so I recommend updating
This issue has not happened again. What I believe made the difference was upgrading the NVME drives' firmware to the latest available, after that the system has been stable for over 10 months. The Intel devices I am using had a bug fixed in a FW upgrade which made the drive 'eat' some commands, never returning a reply for them.
System information
Describe the problem you're observing
Kernel triggers hung task timeout. ZFS suddenly stalled, the zpool seems to be OK but can't get an 'ls' off it, dmesg showing kernel stuck on nfsd, attaching kernel call trace.
Over an hour after failing the zpool over to another server (single port NVME disk mapping changed), the server is still thinking the zpool is healthy. 'zfs get all' shows the zpool information but stalls showing the zfs dataset, attaching info below.
Any IO to the zfs dataset stalls.
This issue may be related to https://github.com/zfsonlinux/zfs/issues/7425
Describe how to reproduce the problem
This happened when exporting ZFS via NFS, compiling code on it.
zpool scrub after failing over to another server shows no errors on the zpool.
I still have the 'stuck server' on, in case I can get any further data off it.
Include any warning/errors/backtraces from the system logs
Kernel call trace:
I tried getting a stack trace before failing over the zpool to another server:
Stuck 'zfs get all':
Stracing the 'zfs get all' shows it got stuck at:
Even after the disks are gone from the server, ZFS thinks the zpool is healthy:
And here is a full 'zfs get all':