openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.73k forks source link

Sometimes writes hang for long periods without IO or CPU on i386 #12035

Open rincebrain opened 3 years ago

rincebrain commented 3 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version 10 (buster)
Linux Kernel 4.19.0-16-686-pae
Architecture x86
ZFS Version 2.1.99-160_g38c6d6ced

Describe the problem you're observing

Somewhat randomly, during ZTS runs, all ZFS IO will grind to a halt - iostat reports nothing going on, CPU is over 99% idle, but some ZFS or ZFS-accessing task is in iowait. Usually they eventually continue, but sometimes I've let them wait 30-60 minutes without any progress.

mkfile seems especially good at triggering this.

Describe how to reproduce the problem

Run the "sanity" runfile for ZTS on i386 (probably with the test cases mentioned in #12029 removed, because nobody likes a BUG_ON) - I've yet to get one to run to completion without this happening.

Include any warning/errors/backtraces from the system logs

Sometimes there will be one or more log messages like the following:

[  605.404255] INFO: task txg_quiesce:5579 blocked for more than 120 seconds.
[  605.405589]       Tainted: P           OE     4.19.0-16-686-pae #1 Debian 4.19.181-1
[  605.406921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.408356] txg_quiesce     D    0  5579      2 0x80000000
[  605.408359] Call Trace:
[  605.408367]  __schedule+0x298/0x900
[  605.408370]  schedule+0x27/0x70
[  605.408375]  cv_wait_common+0xd5/0x110 [spl]
[  605.408378]  ? finish_wait+0x70/0x70
[  605.408381]  __cv_wait+0x14/0x20 [spl]
[  605.408414]  txg_quiesce_thread+0x29a/0x3c0 [zfs]
[  605.408444]  ? txg_sync_thread+0x460/0x460 [zfs]
[  605.408449]  thread_generic_wrapper+0x5c/0x70 [spl]
[  605.408455]  kthread+0xf0/0x110
[  605.408460]  ? __thread_exit+0x20/0x20 [spl]
[  605.408466]  ? kthread_bind+0x30/0x30
[  605.408469]  ret_from_fork+0x2e/0x38

echo 'w' | sudo tee /proc/sysrq-trigger while this was happening produced:

[ 1868.367515] sysrq: Show Blocked State
[ 1868.368032]   task                PC stack   pid father
[ 1868.368071] z_wr_iss        D    0 28181      2 0x80000000
[ 1868.368073] Call Trace:
[ 1868.368079]  __schedule+0x298/0x900
[ 1868.368081]  ? __wake_up_common_lock+0x82/0xb0
[ 1868.368083]  schedule+0x27/0x70
[ 1868.368084]  schedule_timeout+0x139/0x380
[ 1868.368087]  ? __next_timer_interrupt+0xc0/0xc0
[ 1868.368091]  spl_kmem_cache_alloc+0x4fb/0x710 [spl]
[ 1868.368106]  ? dbuf_cons+0xb0/0xb0 [zfs]
[ 1868.368107]  ? finish_wait+0x70/0x70
[ 1868.368123]  zio_buf_alloc+0x28/0x60 [zfs]
[ 1868.368137]  zio_write_compress+0x4eb/0x7c0 [zfs]
[ 1868.368151]  zio_execute+0x9b/0x100 [zfs]
[ 1868.368153]  taskq_thread+0x295/0x4f0 [spl]
[ 1868.368156]  ? wake_up_q+0x70/0x70
[ 1868.368170]  ? zio_execute_stack_check+0x80/0x80 [zfs]
[ 1868.368172]  kthread+0xf0/0x110
[ 1868.368173]  ? taskq_thread_spawn+0x50/0x50 [spl]
[ 1868.368175]  ? kthread_bind+0x30/0x30
[ 1868.368176]  ret_from_fork+0x2e/0x38
[ 1868.368190] txg_sync        D    0 28231      2 0x80000000
[ 1868.368192] Call Trace:
[ 1868.368194]  __schedule+0x298/0x900
[ 1868.368195]  schedule+0x27/0x70
[ 1868.368197]  schedule_timeout+0x139/0x380
[ 1868.368199]  ? __next_timer_interrupt+0xc0/0xc0
[ 1868.368200]  io_schedule_timeout+0x1a/0x40
[ 1868.368202]  __cv_timedwait_common+0xf5/0x130 [spl]
[ 1868.368203]  ? finish_wait+0x70/0x70
[ 1868.368205]  __cv_timedwait_io+0x11/0x20 [spl]
[ 1868.368219]  zio_wait+0x140/0x2a0 [zfs]
[ 1868.368220]  ? _cond_resched+0x17/0x30
[ 1868.368237]  dsl_pool_sync+0xe4/0x5b0 [zfs]
[ 1868.368239]  ? __raw_callee_save___pv_queued_spin_unlock+0x9/0x10
[ 1868.368258]  spa_sync+0x51d/0x1090 [zfs]
[ 1868.368259]  ? _cond_resched+0x17/0x30
[ 1868.368261]  ? __raw_callee_save___pv_queued_spin_unlock+0x9/0x10
[ 1868.368280]  txg_sync_thread+0x2a7/0x460 [zfs]
[ 1868.368292]  ? kfree+0x135/0x160
[ 1868.368310]  ? txg_thread_exit.isra.9+0x50/0x50 [zfs]
[ 1868.368312]  thread_generic_wrapper+0x5c/0x70 [spl]
[ 1868.368314]  kthread+0xf0/0x110
[ 1868.368315]  ? __thread_exit+0x20/0x20 [spl]
[ 1868.368316]  ? kthread_bind+0x30/0x30
[ 1868.368318]  ret_from_fork+0x2e/0x38
[ 1868.368332] zfs             D    0 29662  29098 0x00000000
[ 1868.368333] Call Trace:
[ 1868.368335]  __schedule+0x298/0x900
[ 1868.368336]  ? __wake_up_common+0x73/0x170
[ 1868.368338]  ? __raw_callee_save___pv_queued_spin_unlock+0x9/0x10
[ 1868.368339]  schedule+0x27/0x70
[ 1868.368341]  io_schedule+0x15/0x40
[ 1868.368342]  cv_wait_common+0x90/0x110 [spl]
[ 1868.368343]  ? finish_wait+0x70/0x70
[ 1868.368345]  __cv_wait_io+0x14/0x20 [spl]
[ 1868.368361]  txg_wait_synced_impl+0xe5/0x130 [zfs]
[ 1868.368376]  txg_wait_synced+0x10/0x50 [zfs]
[ 1868.368393]  dsl_sync_task_common+0x16a/0x240 [zfs]
[ 1868.368408]  ? dsl_dataset_hold+0x20/0x20 [zfs]
[ 1868.368422]  ? dsl_dataset_snapshot_sync_impl+0x740/0x740 [zfs]
[ 1868.368436]  ? dsl_dataset_hold+0x20/0x20 [zfs]
[ 1868.368449]  ? dsl_dataset_snapshot_sync_impl+0x740/0x740 [zfs]
[ 1868.368465]  dsl_sync_task+0x1a/0x20 [zfs]
[ 1868.368479]  dsl_dataset_snapshot+0x113/0x370 [zfs]
[ 1868.368483]  ? kvmalloc_node+0x69/0x80
[ 1868.368485]  ? spl_kvmalloc+0xac/0xc0 [spl]
[ 1868.368486]  ? spl_kmem_alloc_impl+0x105/0x120 [spl]
[ 1868.368489]  ? nvt_remove_nvpair+0xd0/0x130 [znvpair]
[ 1868.368491]  ? _cond_resched+0x17/0x30
[ 1868.368492]  ? __kmalloc+0x16f/0x230
[ 1868.368494]  ? kvmalloc_node+0x69/0x80
[ 1868.368495]  ? kvmalloc_node+0x69/0x80
[ 1868.368497]  ? kvmalloc_node+0x69/0x80
[ 1868.368498]  ? nvt_lookup_name_type+0x7a/0xb0 [znvpair]
[ 1868.368500]  ? nvlist_lookup_common+0x46/0x80 [znvpair]
[ 1868.368502]  ? nvlist_lookup_nvlist+0x15/0x20 [znvpair]
[ 1868.368518]  zfs_ioc_snapshot+0x224/0x2f0 [zfs]
[ 1868.368533]  ? zfs_secpolicy_hold+0xb0/0xb0 [zfs]
[ 1868.368547]  zfsdev_ioctl_common+0x3fe/0x7b0 [zfs]
[ 1868.368562]  zfsdev_ioctl+0x4c/0xb0 [zfs]
[ 1868.368575]  ? zfsdev_release+0x20/0x20 [zfs]
[ 1868.368578]  do_vfs_ioctl+0x9a/0x6c0
[ 1868.368580]  ksys_ioctl+0x56/0x80
[ 1868.368582]  sys_ioctl+0x16/0x20
[ 1868.368584]  do_fast_syscall_32+0x81/0x1a0
[ 1868.368586]  entry_SYSENTER_32+0x68/0xbb
[ 1868.368587] EIP: 0xb7f34d61
[ 1868.368589] Code: Bad RIP value.
[ 1868.368590] EAX: ffffffda EBX: 00000004 ECX: 00005a23 EDX: bfc1dc10
[ 1868.368591] ESI: 00005a23 EDI: bfc1dc10 EBP: bfc1dbd8 ESP: bfc1dba8
[ 1868.368591] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00200292
jxdking commented 3 years ago

I recent read through the source code. I found tx->tx_open_txg is often accessed without sufficient locks, see txg_wait_synced_impl(), which is fine in 64bit system. In 32bit system, you may get partially updated value. It may cause deadlock you experienced.

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

TimKritikos commented 3 months ago

I'm having the exact same issue especially when trying to compile Linux to bootstrap a Gentoo system. Unless I do some seemingly unrelated activity like some ls or find, kernel compilation will halt seemingly indefinitely ( at least 5-6h ) but as soon as I mess around on the filesystem on a different tty it continues as normal. Tested against 2.2.3 and 2.2.4 with the same results I use a tiny intel atom Sony Vaio VGN-P25G