openzfsonwindows / openzfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
402 stars 15 forks source link

zfs_deadman_failmode = continue can panic in zfs_refcount_destroy_many #378

Open lundman opened 1 month ago

lundman commented 1 month ago

To trigger the bug referenced in https://github.com/openzfsonwindows/openzfs/issues/364#issuecomment-2101862948 and probably also https://github.com/openzfs/zfs/issues/15802

* set zfs_deadman_failmode = continue
* set zfs_deadman_synctime_ms to something low (60000 -> 1000; 60s down to 1s).
* create zvol, with compression=zstd-19
* copy large (1g+) text file to zvol (format mount, whatever)

Will see message:

zio_deadman_impl(): slow zio[12]
zio_deadman(): zio_wait restarting hung I/O for pool 'tank'

I can confirm that https://github.com/openzfs/zfs/blob/master/module/zfs/zio.c#L2205-L2208

        if (failmode == ZIO_FAILURE_MODE_CONTINUE &&
            taskq_empty_ent(&pio->io_tqent)) {
            zio_interrupt(pio);
        }

are called a few times, until eventually

https://github.com/openzfsonwindows/openzfs/blob/development/module/zfs/refcount.c#L98

VERIFY3( rc->rc_count == number ) failed (5120 == 0)
Break instruction exception - code 80000003 (first chance)
OpenZFS!zfs_refcount_destroy_many+0x90:

With the stack:

>   OpenZFS!zfs_refcount_destroy_many+0x90 [C:\src\openzfs\module\zfs\refcount.c @ 98]  C/C++/ASM
    OpenZFS!zfs_refcount_destroy+0x17 [C:\src\openzfs\module\zfs\refcount.c @ 113]  C/C++/ASM
    OpenZFS!abd_fini_struct+0x68 [C:\src\openzfs\module\zfs\abd.c @ 160]    C/C++/ASM
    OpenZFS!abd_free+0xdd [C:\src\openzfs\module\zfs\abd.c @ 320]   C/C++/ASM
    OpenZFS!zio_pop_transforms+0x76 [C:\src\openzfs\module\zfs\zio.c @ 452]     C/C++/ASM
    OpenZFS!zio_done+0x1392 [C:\src\openzfs\module\zfs\zio.c @ 4795]    C/C++/ASM
    OpenZFS!__zio_execute+0x2f4 [C:\src\openzfs\module\zfs\zio.c @ 2363]    C/C++/ASM
    OpenZFS!zio_execute+0x30f [C:\src\openzfs\module\zfs\zio.c @ 2274]  C/C++/ASM
    OpenZFS!vdev_mirror_io_start+0x4d3 [C:\src\openzfs\module\zfs\vdev_mirror.c @ 694]  C/C++/ASM
    OpenZFS!zio_vdev_io_start+0x12b [C:\src\openzfs\module\zfs\zio.c @ 3946]    C/C++/ASM
    OpenZFS!__zio_execute+0x2f4 [C:\src\openzfs\module\zfs\zio.c @ 2363]    C/C++/ASM
    OpenZFS!zio_execute+0x30f [C:\src\openzfs\module\zfs\zio.c @ 2274]  C/C++/ASM
    OpenZFS!taskq_thread+0x51a [C:\src\openzfs\module\os\windows\spl\spl-taskq.c @ 2083]    C/C++/ASM

However, note that

1: kd> dt rc
Local var @ 0xffff978c7383e548 Type refcount*
0xffff878f`487195a0 
   +0x000 rc_count         : 0

So we die in the VERIFY because rc_count is 5120, but when I inspect rc_count it is now 0. Certainly feels like a race. I additionally had a breakpoint at: https://github.com/openzfsonwindows/openzfs/blob/development/module/zfs/zio.c#L2420 but it did not get triggered.

lundman commented 1 month ago
adjacent (non-boring) stacks

``` 4.003894 ffff878f3b5a8040 0000000 Blocked nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeWaitForSingleObject+0x256 OpenZFS!spl_mutex_enter+0x11f OpenZFS!spl_cv_wait+0xf8 OpenZFS!txg_wait_synced_impl+0x290 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c nt!ExpWorkerThread+0x155 nt!PspSystemThreadStartup+0x57 nt!KiStartSystemThread+0x34 4.0029fc ffff878f36f75040 0000938 Blocked nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeRemovePriQueue+0x1ff nt!ExpWorkerThread+0xd3 nt!PspSystemThreadStartup+0x57 nt!KiStartSystemThread+0x34 4.003490 ffff878f39963040 0000000 Blocked nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeWaitForSingleObject+0x256 OpenZFS!spl_mutex_enter+0x11f OpenZFS!spl_cv_wait+0xf8 OpenZFS!txg_wait_synced_impl+0x290 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c 4.003260 ffff878f3b02c040 0000003 READY nt!KiSwapContext+0x76 nt!KeYieldExecution+0xf11 nt!KeDelayExecutionThread+0x147 OpenZFS!kpreempt+0x2f OpenZFS!kmem_magazine_destroy+0x1ce OpenZFS!kmem_depot_ws_reap+0x1ca OpenZFS!kmem_cache_reap+0x122 OpenZFS!taskq_thread+0x51a 4.000cf4 ffff878f3f6d6080 0000000 RUNNING OpenZFS!zfs_refcount_destroy_many+0x90 OpenZFS!zfs_refcount_destroy+0x17 OpenZFS!abd_fini_struct+0x68 OpenZFS!abd_free+0xdd OpenZFS!zio_pop_transforms+0x76 OpenZFS!zio_done+0x1392 OpenZFS!zio_execute+0x30f OpenZFS!zio_execute+0x30f OpenZFS!vdev_mirror_io_start+0x4d3 OpenZFS!zio_vdev_io_start+0x12b OpenZFS!zio_execute+0x30f OpenZFS!zio_execute+0x30f OpenZFS!taskq_thread+0x51a 4.002f34 ffff878f43491080 0000000 READY nt!KiSwapContext+0x76 nt!KiDispatchInterrupt+0x67e nt!KiDpcInterrupt+0x350 nt!KeReleaseSpinLock+0x2d OpenZFS!spl_mutex_exit+0xfd OpenZFS!__zfs_dbgmsg+0xba OpenZFS!__dprintf+0x29e OpenZFS!zio_deadman+0x13c OpenZFS!zio_wait+0x69e OpenZFS!dsl_pool_sync+0x2f9 OpenZFS!spa_sync_iterate_to_convergence+0x109 OpenZFS!spa_sync+0x66d OpenZFS!txg_sync_thread+0x45b nt!PspSystemThreadStartup+0x57 4.000d3c ffff878f3a48d080 0000000 Blocked nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeWaitForSingleObject+0x256 OpenZFS!spl_mutex_enter+0x11f OpenZFS!spl_cv_wait+0xf8 OpenZFS!txg_wait_synced_impl+0x290 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c 4.002908 ffff878f3e6f62c0 0000000 READY nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeWaitForSingleObject+0x256 OpenZFS!spl_mutex_enter+0x11f OpenZFS!spl_cv_wait+0xf8 OpenZFS!txg_wait_synced_impl+0x290 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c 4.00325c ffff878f3a126040 0000000 STANDBY nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeWaitForSingleObject+0x256 OpenZFS!spl_mutex_enter+0x11f OpenZFS!spl_cv_wait+0xf8 OpenZFS!txg_wait_synced_impl+0x290 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c nt!ExpWorkerThread+0x155 nt!PspSystemThreadStartup+0x57 nt!KiStartSystemThread+0x34 4.002c90 ffff878f38f06040 0000000 Blocked nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeWaitForSingleObject+0x256 OpenZFS!spl_mutex_enter+0x11f OpenZFS!spl_cv_wait+0xf8 OpenZFS!txg_wait_synced_impl+0x290 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c 4.001bd4 ffff878f3b0c6580 0000000 Blocked nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeWaitForSingleObject+0x256 OpenZFS!spl_mutex_enter+0x11f OpenZFS!spl_cv_wait+0xf8 OpenZFS!txg_wait_synced_impl+0x290 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c nt!ExpWorkerThread+0x155 nt!PspSystemThreadStartup+0x57 nt!KiStartSystemThread+0x34 4.0013ec ffff878f3b0b45c0 0000000 Blocked nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeWaitForSingleObject+0x256 OpenZFS!spl_mutex_enter+0x11f OpenZFS!spl_cv_wait+0xf8 OpenZFS!txg_wait_synced_impl+0x290 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c 4.001704 ffff878f42a13080 0000000 Blocked nt!KiSwapContext+0x76 nt!KiSwapThread+0xab5 nt!KiCommitThreadWait+0x137 nt!KeWaitForSingleObject+0x256 OpenZFS!spl_mutex_enter+0x11f OpenZFS!spl_cv_wait+0xf8 OpenZFS!txg_wait_synced_impl+0x290 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c nt!ExpWorkerThread+0x155 nt!PspSystemThreadStartup+0x57 nt!KiStartSystemThread+0x34 4.0036e8 ffff878f3fbd6080 0000000 RUNNING nt!write_char+0x46 nt!write_string+0x48 nt!output_l+0x5cc nt!snprintf+0x69 OpenZFS!__dprintf+0x1ee OpenZFS!txg_wait_synced_impl+0x200 OpenZFS!txg_wait_synced+0x29 OpenZFS!dmu_tx_wait+0x2ac OpenZFS!dmu_tx_assign+0x17e OpenZFS!zvol_os_write_zv+0x3c9 OpenZFS!wzvol_WkRtn+0x54e OpenZFS!wzvol_GeneralWkRtn+0x7b nt!IopProcessWorkItem+0x13c ```

lundman commented 1 month ago

Applying https://github.com/openzfs/zfs/pull/16216 and running through the test above, the ASSERT has not been triggered. The breakpoint at zio_wait restarting hung I/O for pool 'tank' has kicked in so I feel it should have had a panic after that. So it does appear to help with this problem, even if it isn't obvious to me why that is.

lundman commented 1 month ago

Woo commenting never fails!

VERIFY3( rc->rc_count == number ) failed (5120 == 0)
Break instruction exception - code 80000003 (first chance)
OpenZFS!zfs_refcount_destroy_many+0x90:

So we are in fact tracking two problems in https://github.com/openzfs/zfs/issues/15802

1: kd> dt rc
Local var @ 0xffff978c74d95748 Type refcount*
0xffff878f`85da9ba0 
   +0x000 rc_count         : 0
   +0x008 rc_mtx           : kmutex
   +0x038 rc_tree          : avl_tree
   +0x058 rc_removed       : list
   +0x078 rc_removed_count : 0
   +0x07c rc_tracked       : 0 ( B_FALSE )

same thing where rc_count is actually 0, even if the ASSRT print says otherwise.

buffer:

FFFF878F3A7C3080: zio.c:2236:zio_deadman(): zio_wait restarting hung I/O for pool 'BOOM'
FFFF878F3E55C080: dprintf: dnode.c:2470:dnode_diduse_space(): ds=BOOM/vol obj=1 dn=FFFF878F49F435E0 dnp=FFFF878F49F24200 used=586098688 delta=2048
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_4096)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_heap)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_4096)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_heap)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3BDFB080: dprintf: dsl_dataset.c:291:dsl_dataset_block_kill(): freeing ds=81 DVA[0]=<0:421e7a00:c00> [L0 zvol object] fletcher4 zstd unencrypted LE contiguous unique single size=4000L/c00P birth=81L/81P fill=1 cksum=0000
013e082cd2e0:00022b37cff68f1a:023e5ed0cc6a6648:b6f79395539380b9
FFFF878F3BDFB080: dprintf: dsl_dataset.c:145:dsl_dataset_block_born(): ds=FFFF878F49BF1000 DVA[0]=<0:ac8e2c00:1400> [L0 zvol object] fletcher4 zstd unencrypted LE contiguous unique single size=4000L/1400P birth=105L/105P fill=1
cksum=0000025d95e0c23b:000641ab5cf96f7c:0a80ce5969bbb04a:37e02aab71c09ebd
FFFF878F3A7C3080: zio.c:2236:zio_deadman(): zio_wait restarting hung I/O for pool 'BOOM'
FFFF878F3A7C3080: zio.c:2236:zio_deadman(): zio_wait restarting hung I/O for pool 'BOOM'
FFFF878F3A7C3080: zio.c:2236:zio_deadman(): zio_wait restarting hung I/O for pool 'BOOM'
FFFF878F3A7C3080: zio.c:2236:zio_deadman(): zio_wait restarting hung I/O for pool 'BOOM'
FFFF878F3E55C080: dprintf: dnode.c:2470:dnode_diduse_space(): ds=BOOM/vol obj=1 dn=FFFF878F49F435E0 dnp=FFFF878F49F24200 used=586100736 delta=-1024
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_4096)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_heap)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3BDFB080: dprintf: dsl_dataset.c:291:dsl_dataset_block_kill(): freeing ds=81 DVA[0]=<0:421e9400:1000> [L0 zvol object] fletcher4 zstd unencrypted LE contiguous unique single size=4000L/1000P birth=81L/81P fill=1 cksum=00
0001d898ceee0b:000405240751c14a:056995fd2da433e0:6e23f7873626b125
FFFF878F3BDFB080: dprintf: dsl_dataset.c:145:dsl_dataset_block_born(): ds=FFFF878F49BF1000 DVA[0]=<0:ac8e4000:c00> [L0 zvol object] fletcher4 zstd unencrypted LE contiguous unique single size=4000L/c00P birth=105L/105P fill=1 ck
sum=000001536f6b6065:000221f16aedc85e:0224ef65e5714350:9b9edd3b7e772116
FFFF878F3A7C3080: zio.c:2236:zio_deadman(): zio_wait restarting hung I/O for pool 'BOOM'
FFFF878F3E55C080: dprintf: dnode.c:2470:dnode_diduse_space(): ds=BOOM/vol obj=1 dn=FFFF878F49F435E0 dnp=FFFF878F49F24200 used=586099712 delta=512
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_4096)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_heap)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_4096)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3E55C080: SPL: vmem_freelist_insert_sort_by_time: at marker (bucket_heap)(steps: 0) p->vs_start, end == 0, 4096
FFFF878F3A7C3080: zio.c:2197:zio_deadman_impl(): slow zio[11]: zio=FFFF878F636B5E00x timestamp=10539098126100 delta=42800 queued=0 io=10539098127900 path=/dev/physicaldrive0 last=10539077025100 type=2 priority=3 flags=0x184080 stage=0x200000 pipeline=0x2e00000 pipeline-trace=0x200001 objset=81 object=1
level=0 blkid=72629 offset=2899201024 size=5120 error=0
VERIFY3( rc->rc_count == number ) failed (5120 == 0)

size and rc_count both being 5120 is curious.

amotin commented 1 month ago

I haven't looked what is that I/O, but it seems reasonable to me that if some ZIO haven't completed, its buffer is likely not freed. "failmode=continue" is a direct request for troubles.

lundman commented 1 month ago

Yeah I assume the IO was scheduled, but not yet completed - the deadman triggers and the IO is restarted. It would seem this isn't done as easily as the code is. I would prefer it didn't panic of course. It can be triggered on a system without tuning, it is just harder. If we don't want to support failmode=continue then we should probably remove it.