Closed rnz closed 1 year ago
@behlendorf shouldn't this get the defect instead of performance label?
@GregorKopka indeed, this does look like a possible deadlock on a dbuf. It would be useful to know if this can be reproduced in master since some of the relevant locks have been split up.
I actually also get these hangs with 0.8.2.
Dec 23 06:31:20 spplusc-6 kernel: INFO: task nfsd:6469 blocked for more than 900 seconds.
Dec 23 06:31:20 spplusc-6 kernel: Tainted: P OE 4.19.65-1c.el7.x86_64 #1
Dec 23 06:31:20 spplusc-6 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 23 06:31:20 spplusc-6 kernel: nfsd D 0 6469 2 0x80000080
Dec 23 06:31:20 spplusc-6 kernel: Call Trace:
Dec 23 06:31:20 spplusc-6 kernel: ? __schedule+0x2ab/0x880
Dec 23 06:31:20 spplusc-6 kernel: schedule+0x32/0x80
Dec 23 06:31:20 spplusc-6 kernel: rwsem_down_read_failed+0x139/0x1c0
Dec 23 06:31:20 spplusc-6 kernel: call_rwsem_down_read_failed+0x14/0x30
Dec 23 06:31:20 spplusc-6 kernel: down_read+0x1c/0x30
Dec 23 06:31:20 spplusc-6 kernel: dmu_buf_lock_parent+0x5c/0xd0 [zfs]
Dec 23 06:31:20 spplusc-6 kernel: ? _cond_resched+0x15/0x30
Dec 23 06:31:20 spplusc-6 kernel: ? down_read+0xe/0x30
Dec 23 06:31:20 spplusc-6 kernel: dbuf_dirty+0x300/0x780 [zfs]
Dec 23 06:31:20 spplusc-6 kernel: dmu_write_uio_dnode+0x70/0x140 [zfs]
Dec 23 06:31:20 spplusc-6 kernel: dmu_write_uio_dbuf+0x4e/0x70 [zfs]
Dec 23 06:31:20 spplusc-6 kernel: zfs_write+0xb4e/0xcd0 [zfs]
Dec 23 06:31:20 spplusc-6 kernel: ? iput+0x6f/0x1d0
Dec 23 06:31:20 spplusc-6 kernel: ? __d_obtain_alias+0x32/0x80
Dec 23 06:31:20 spplusc-6 kernel: zpl_write_common_iovec+0xa9/0x120 [zfs]
Dec 23 06:31:20 spplusc-6 kernel: zpl_iter_write_common+0x98/0xc0 [zfs]
Dec 23 06:31:20 spplusc-6 kernel: zpl_iter_write+0x3f/0x70 [zfs]
Dec 23 06:31:20 spplusc-6 kernel: ? selinux_file_permission+0xe1/0x130
Dec 23 06:31:20 spplusc-6 kernel: do_iter_readv_writev+0x132/0x1b0
Dec 23 06:31:20 spplusc-6 kernel: do_iter_write+0x78/0x180
Dec 23 06:31:20 spplusc-6 kernel: nfsd_vfs_write+0xff/0x470 [nfsd]
Dec 23 06:31:20 spplusc-6 kernel: nfsd_write+0x94/0x180 [nfsd]
Dec 23 06:31:20 spplusc-6 kernel: nfsd3_proc_write+0x106/0x180 [nfsd]
Dec 23 06:31:20 spplusc-6 kernel: nfsd_dispatch+0xb7/0x250 [nfsd]
Dec 23 06:31:20 spplusc-6 kernel: svc_process_common+0x39e/0x800 [sunrpc]
Dec 23 06:31:20 spplusc-6 kernel: svc_process+0xeb/0x100 [sunrpc]
Dec 23 06:31:20 spplusc-6 kernel: nfsd+0xe3/0x150 [nfsd]
Dec 23 06:31:20 spplusc-6 kernel: kthread+0xf8/0x130
Dec 23 06:31:20 spplusc-6 kernel: ? nfsd_destroy+0x60/0x60 [nfsd]
Dec 23 06:31:20 spplusc-6 kernel: ? kthread_bind+0x10/0x10
Dec 23 06:31:20 spplusc-6 kernel: ret_from_fork+0x35/0x40
Mar 1 20:47:38 spvss-ptdca1 kernel: INFO: task nfsd:8218 blocked for more than 900 seconds. Mar 1 20:47:38 spvss-ptdca1 kernel: Tainted: P OE 4.19.101-1c.el7.x86_64 #1 Mar 1 20:47:38 spvss-ptdca1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd D 0 8218 2 0x80000080 Mar 1 20:47:38 spvss-ptdca1 kernel: Call Trace: Mar 1 20:47:38 spvss-ptdca1 kernel: ? __schedule+0x2ab/0x880 Mar 1 20:47:38 spvss-ptdca1 kernel: schedule+0x32/0x80 Mar 1 20:47:38 spvss-ptdca1 kernel: rwsem_down_read_failed+0x139/0x1c0 Mar 1 20:47:38 spvss-ptdca1 kernel: call_rwsem_down_read_failed+0x14/0x30 Mar 1 20:47:38 spvss-ptdca1 kernel: down_read+0x1c/0x30 Mar 1 20:47:38 spvss-ptdca1 kernel: dbuf_hold_impl+0x517/0x590 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dbuf_hold_level+0x33/0x60 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dmu_buf_hold_array_by_dnode+0xdc/0x4a0 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dmu_write_uio_dnode+0x56/0x140 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: ? dmu_tx_try_assign+0x304/0x370 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dmu_write_uio_dbuf+0x4e/0x70 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: zfs_write+0xb4e/0xcd0 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: zpl_write_common_iovec+0xa9/0x120 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: zpl_iter_write_common+0x98/0xc0 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: zpl_iter_write+0x3f/0x70 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: ? selinux_file_permission+0xe1/0x130 Mar 1 20:47:38 spvss-ptdca1 kernel: do_iter_readv_writev+0x132/0x1b0 Mar 1 20:47:38 spvss-ptdca1 kernel: do_iter_write+0x78/0x180 Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd_vfs_write+0xff/0x470 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd_write+0x94/0x180 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd3_proc_write+0x106/0x180 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd_dispatch+0xb7/0x250 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: svc_process_common+0x39e/0x800 [sunrpc] Mar 1 20:47:38 spvss-ptdca1 kernel: svc_process+0xeb/0x100 [sunrpc] Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd+0xe3/0x150 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: kthread+0xf8/0x130 Mar 1 20:47:38 spvss-ptdca1 kernel: ? nfsd_destroy+0x60/0x60 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: ? kthread_bind+0x10/0x10 Mar 1 20:47:38 spvss-ptdca1 kernel: ret_from_fork+0x35/0x40 Mar 1 20:47:38 spvss-ptdca1 kernel: INFO: task nfsd:8219 blocked for more than 900 seconds. Mar 1 20:47:38 spvss-ptdca1 kernel: Tainted: P OE 4.19.101-1c.el7.x86_64 #1 Mar 1 20:47:38 spvss-ptdca1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd D 0 8219 2 0x80000080 Mar 1 20:47:38 spvss-ptdca1 kernel: Call Trace: Mar 1 20:47:38 spvss-ptdca1 kernel: ? schedule+0x2ab/0x880 Mar 1 20:47:38 spvss-ptdca1 kernel: schedule+0x32/0x80 Mar 1 20:47:38 spvss-ptdca1 kernel: schedule_preempt_disabled+0xa/0x10 Mar 1 20:47:38 spvss-ptdca1 kernel: mutex_lock.isra.11+0x21b/0x4e0 Mar 1 20:47:38 spvss-ptdca1 kernel: ? cityhash4+0x78/0xa0 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dbuf_find+0xb8/0x190 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dbuf_hold_impl+0x62/0x590 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dbuf_hold_level+0x33/0x60 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dmu_tx_check_ioerr+0x32/0xc0 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dmu_tx_count_write+0xdd/0x190 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: dmu_tx_hold_write_by_dnode+0x35/0x50 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: zfs_write+0x516/0xcd0 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: zpl_write_common_iovec+0xa9/0x120 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: zpl_iter_write_common+0x98/0xc0 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: zpl_iter_write+0x3f/0x70 [zfs] Mar 1 20:47:38 spvss-ptdca1 kernel: ? selinux_file_permission+0xe1/0x130 Mar 1 20:47:38 spvss-ptdca1 kernel: do_iter_readv_writev+0x132/0x1b0 Mar 1 20:47:38 spvss-ptdca1 kernel: do_iter_write+0x78/0x180 Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd_vfs_write+0xff/0x470 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd_write+0x94/0x180 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd3_proc_write+0x106/0x180 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd_dispatch+0xb7/0x250 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: svc_process_common+0x39e/0x800 [sunrpc] Mar 1 20:47:38 spvss-ptdca1 kernel: svc_process+0xeb/0x100 [sunrpc] Mar 1 20:47:38 spvss-ptdca1 kernel: nfsd+0xe3/0x150 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: kthread+0xf8/0x130 Mar 1 20:47:38 spvss-ptdca1 kernel: ? nfsd_destroy+0x60/0x60 [nfsd] Mar 1 20:47:38 spvss-ptdca1 kernel: ? kthread_bind+0x10/0x10 Mar 1 20:47:38 spvss-ptdca1 kernel: ret_from_fork+0x35/0x40
Hit this again today and all I/O was hung afterwards. Also to note using 0.8.3 code with kernel that was released a few weeks ago.
@bgly , is this issue actual to you? Did you somehow get around these hangs?
@glztmf I still experience these, sometimes more prevalent than others. Do you have the same issue?
@bgly , i had the same issue with zfs 0.8.1 (pve 6) and remained on zfs 0.7.x
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
@behlendorf should the stale bot not ignore issues marked as 'defect' ?
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Hardware
Describe the problem you're observing
The IO hangs random. ZFS Volume is inaccessible, but ZFS Filesystems is accessible
Describe how to reproduce the problem
1) Create two zfs volume 2) one zfs volume set compression=on(lz4) 3) generate havy load in both zvol, 4) Wait until zvol is hung
Include any warning/errors/backtraces from the system logs
Temporary solve
Reboot (by reset system or sysrq-trigger) Move of one zfs volume to another host Disable compression on affected volume