openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.45k stars 1.73k forks source link

`import -f` hangs, `import -f -o readonly=on` works #12980

Open skull-squadron opened 2 years ago

skull-squadron commented 2 years ago

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04.3 LTS (server)
Kernel Version 5.4.0-94-generic
Architecture x86_64
OpenZFS Version 2.1.2

Describe the problem you're observing

Imported using -d /dev/disk/by-partlabel/zfs-* (all 4 data drives)

4 drive dataset refuses to import RW. RO works fine.

It had a ZIL which is an Intel SSD. (Available as /dev/sdb (sdb1, sdb9))

It had an L2ARC which was another drive, but I removed it because benchmarks proved there was no real performance gain w/ 512 GiB of RAM.

Describe how to reproduce the problem

Come over and slot-in all 4 drives and the SSD. :)

Include any warning/errors/backtraces from the system logs

(No error messages observed from utils.)

Note: First line doesn't make sense because user data is readable RO. All the drives should be fine mechanically and electrically because they were mounted under a Qubes (Xen) VM before I switched to VMware ESXi and added them to an Ubuntu using RDMs.

[  658.602488] WARNING: Pool 'tank' has encountered an uncorrectable I/O failure and has been suspended.

[  846.932501] INFO: task agents:1214 blocked for more than 120 seconds.
[  846.932557]       Tainted: P           OE     5.4.0-94-generic #106-Ubuntu
[  846.932599] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.932647] agents          D    0  1214      1 0x00000000
[  846.932650] Call Trace:
[  846.932686]  __schedule+0x2e3/0x740
[  846.932690]  ? __kmalloc_node+0x213/0x330
[  846.932703]  schedule+0x42/0xb0
[  846.932705]  schedule_preempt_disabled+0xe/0x10
[  846.932708]  __mutex_lock.isra.0+0x172/0x4c0
[  846.932721]  ? spl_vmem_alloc+0x19/0x20 [spl]
[  846.932737]  ? nv_mem_zalloc.isra.0+0x2f/0x40 [znvpair]
[  846.932742]  ? nvlist_xalloc.part.0+0x50/0xb0 [znvpair]
[  846.932745]  __mutex_lock_slowpath+0x13/0x20
[  846.932748]  mutex_lock+0x2e/0x40
[  846.932830]  spa_all_configs+0x4a/0x120 [zfs]
[  846.932908]  zfs_ioc_pool_configs+0x1c/0x70 [zfs]
[  846.932979]  zfsdev_ioctl_common+0x5ff/0x8e0 [zfs]
[  846.932984]  ? kvmalloc_node+0x4e/0x90
[  846.933055]  zfsdev_ioctl+0x57/0xe0 [zfs]
[  846.933062]  do_vfs_ioctl+0x407/0x670
[  846.933066]  ? do_user_addr_fault+0x216/0x450
[  846.933069]  ksys_ioctl+0x67/0x90
[  846.933072]  __x64_sys_ioctl+0x1a/0x20
[  846.933076]  do_syscall_64+0x57/0x190
[  846.933079]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.933086] RIP: 0033:0x7f94b4eb850b
[  846.933096] Code: Bad RIP value.
[  846.933097] RSP: 002b:00007f94b3ee3608 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  846.933099] RAX: ffffffffffffffda RBX: 000056377761ff00 RCX: 00007f94b4eb850b
[  846.933101] RDX: 00007f94b3ee3640 RSI: 0000000000005a04 RDI: 0000000000000009
[  846.933102] RBP: 00007f94b3ee6c30 R08: 00007f94a4000d00 R09: 0000000000021000
[  846.933102] R10: 00007f94a4021000 R11: 0000000000000246 R12: 0000000000000000
[  846.933103] R13: 00007f94b3ee3640 R14: 0000563777211830 R15: 00007f94b3ee6d00
[  846.933118] INFO: task zpool:4295 blocked for more than 120 seconds.
[  846.933157]       Tainted: P           OE     5.4.0-94-generic #106-Ubuntu
[  846.933198] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.933245] zpool           D    0  4295   1656 0x80004002
[  846.933247] Call Trace:
[  846.933250]  __schedule+0x2e3/0x740
[  846.933254]  ? default_wake_function+0x12/0x20
[  846.933257]  schedule+0x42/0xb0
[  846.933259]  io_schedule+0x16/0x40
[  846.933265]  cv_wait_common+0xb1/0x130 [spl]
[  846.933268]  ? wait_woken+0x80/0x80
[  846.933274]  __cv_wait_io+0x18/0x20 [spl]
[  846.933346]  txg_wait_synced_impl+0xd8/0x120 [zfs]
[  846.933410]  txg_wait_synced+0x10/0x40 [zfs]
[  846.933481]  spa_config_update+0x130/0x180 [zfs]
[  846.933543]  spa_import+0x5e6/0x680 [zfs]
[  846.933549]  ? nvpair_value_common.part.0+0xc2/0x140 [znvpair]
[  846.933622]  zfs_ioc_pool_import+0x138/0x150 [zfs]
[  846.933690]  zfsdev_ioctl_common+0x5ff/0x8e0 [zfs]
[  846.933693]  ? kvmalloc_node+0x4e/0x90
[  846.933762]  zfsdev_ioctl+0x57/0xe0 [zfs]
[  846.933765]  do_vfs_ioctl+0x407/0x670
[  846.933768]  ? do_user_addr_fault+0x216/0x450
[  846.933771]  ksys_ioctl+0x67/0x90
[  846.933773]  __x64_sys_ioctl+0x1a/0x20
[  846.933776]  do_syscall_64+0x57/0x190
[  846.933778]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.933780] RIP: 0033:0x7f384e8d050b
[  846.933782] Code: Bad RIP value.
[  846.933783] RSP: 002b:00007ffebed46738 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  846.933785] RAX: ffffffffffffffda RBX: 00007ffebed467b0 RCX: 00007f384e8d050b
[  846.933786] RDX: 00007ffebed467b0 RSI: 0000000000005a02 RDI: 0000000000000003
[  846.933787] RBP: 00007ffebed4a6a0 R08: 00005561bba02d80 R09: 00007f384e9a5370
[  846.933788] R10: 00005561bba15000 R11: 0000000000000246 R12: 00005561bb9a8570
[  846.933789] R13: 00005561bb9ae070 R14: 0000000000000000 R15: 0000000000000000
[  846.933802] INFO: task txg_sync:4568 blocked for more than 120 seconds.
[  846.933842]       Tainted: P           OE     5.4.0-94-generic #106-Ubuntu
[  846.933884] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.933930] txg_sync        D    0  4568      2 0x80004000
[  846.933932] Call Trace:
[  846.933935]  __schedule+0x2e3/0x740
[  846.933939]  ? __internal_add_timer+0x2d/0x40
[  846.933942]  schedule+0x42/0xb0
[  846.933944]  schedule_timeout+0x8a/0x160
[  846.933946]  ? _cond_resched+0x19/0x30
[  846.933948]  ? __next_timer_interrupt+0xe0/0xe0
[  846.933951]  io_schedule_timeout+0x1e/0x50
[  846.933956]  __cv_timedwait_common+0x137/0x160 [spl]
[  846.933959]  ? wait_woken+0x80/0x80
[  846.933964]  __cv_timedwait_io+0x19/0x20 [spl]
[  846.934034]  zio_wait+0x137/0x280 [zfs]
[  846.934090]  dbuf_read+0x2a0/0x580 [zfs]
[  846.934144]  ? dmu_buf_hold_array_by_dnode+0x192/0x510 [zfs]
[  846.934196]  dmu_buf_will_dirty_impl+0xb6/0x170 [zfs]
[  846.934246]  dmu_buf_will_dirty+0x16/0x20 [zfs]
[  846.934299]  dmu_write_impl+0x42/0xd0 [zfs]
[  846.934351]  dmu_write.part.0+0x6a/0xd0 [zfs]
[  846.934402]  dmu_write+0x14/0x20 [zfs]
[  846.934470]  spa_history_write+0x194/0x1e0 [zfs]
[  846.934531]  spa_history_log_sync+0x18f/0x7d0 [zfs]
[  846.934596]  dsl_sync_task_sync+0xb6/0x100 [zfs]
[  846.934657]  dsl_pool_sync+0x3d6/0x4f0 [zfs]
[  846.934721]  spa_sync+0x562/0xff0 [zfs]
[  846.934724]  ? mutex_lock+0x13/0x40
[  846.934792]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[  846.934857]  txg_sync_thread+0x26d/0x3f0 [zfs]
[  846.934918]  ? txg_thread_exit.isra.0+0x60/0x60 [zfs]
[  846.934925]  thread_generic_wrapper+0x79/0x90 [spl]
[  846.934930]  kthread+0x104/0x140
[  846.934935]  ? __thread_exit+0x20/0x20 [spl]
[  846.934937]  ? kthread_park+0x90/0x90
[  846.934940]  ret_from_fork+0x22/0x40
[  967.762286] INFO: task agents:1214 blocked for more than 241 seconds.
[  967.762337]       Tainted: P           OE     5.4.0-94-generic #106-Ubuntu
[  967.762378] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  967.762425] agents          D    0  1214      1 0x00000000
[  967.762429] Call Trace:
[  967.762439]  __schedule+0x2e3/0x740
[  967.762443]  ? __kmalloc_node+0x213/0x330
[  967.762448]  schedule+0x42/0xb0
[  967.762451]  schedule_preempt_disabled+0xe/0x10
[  967.762454]  __mutex_lock.isra.0+0x172/0x4c0
[  967.762463]  ? spl_vmem_alloc+0x19/0x20 [spl]
[  967.762471]  ? nv_mem_zalloc.isra.0+0x2f/0x40 [znvpair]
[  967.762476]  ? nvlist_xalloc.part.0+0x50/0xb0 [znvpair]
[  967.762479]  __mutex_lock_slowpath+0x13/0x20
[  967.762481]  mutex_lock+0x2e/0x40
[  967.762562]  spa_all_configs+0x4a/0x120 [zfs]
[  967.762644]  zfs_ioc_pool_configs+0x1c/0x70 [zfs]
[  967.762727]  zfsdev_ioctl_common+0x5ff/0x8e0 [zfs]
[  967.762731]  ? kvmalloc_node+0x4e/0x90
[  967.762814]  zfsdev_ioctl+0x57/0xe0 [zfs]
[  967.762819]  do_vfs_ioctl+0x407/0x670
[  967.762823]  ? do_user_addr_fault+0x216/0x450
[  967.762826]  ksys_ioctl+0x67/0x90
[  967.762830]  __x64_sys_ioctl+0x1a/0x20
[  967.762833]  do_syscall_64+0x57/0x190
[  967.762836]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  967.762840] RIP: 0033:0x7f94b4eb850b
[  967.762846] Code: Bad RIP value.
[  967.762847] RSP: 002b:00007f94b3ee3608 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  967.762850] RAX: ffffffffffffffda RBX: 000056377761ff00 RCX: 00007f94b4eb850b
[  967.762851] RDX: 00007f94b3ee3640 RSI: 0000000000005a04 RDI: 0000000000000009
[  967.762852] RBP: 00007f94b3ee6c30 R08: 00007f94a4000d00 R09: 0000000000021000
[  967.762853] R10: 00007f94a4021000 R11: 0000000000000246 R12: 0000000000000000
[  967.762854] R13: 00007f94b3ee3640 R14: 0000563777211830 R15: 00007f94b3ee6d00
[  967.762881] INFO: task zpool:4295 blocked for more than 241 seconds.
[  967.762926]       Tainted: P           OE     5.4.0-94-generic #106-Ubuntu
[  967.762973] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  967.763027] zpool           D    0  4295   1656 0x80004002
[  967.763029] Call Trace:
[  967.763033]  __schedule+0x2e3/0x740
[  967.763037]  ? default_wake_function+0x12/0x20
[  967.763041]  schedule+0x42/0xb0
[  967.763043]  io_schedule+0x16/0x40
[  967.763050]  cv_wait_common+0xb1/0x130 [spl]
[  967.763054]  ? wait_woken+0x80/0x80
[  967.763060]  __cv_wait_io+0x18/0x20 [spl]
[  967.763146]  txg_wait_synced_impl+0xd8/0x120 [zfs]
[  967.763215]  txg_wait_synced+0x10/0x40 [zfs]
[  967.763282]  spa_config_update+0x130/0x180 [zfs]
[  967.763344]  spa_import+0x5e6/0x680 [zfs]
[  967.763350]  ? nvpair_value_common.part.0+0xc2/0x140 [znvpair]
[  967.763422]  zfs_ioc_pool_import+0x138/0x150 [zfs]
[  967.763490]  zfsdev_ioctl_common+0x5ff/0x8e0 [zfs]
[  967.763493]  ? kvmalloc_node+0x4e/0x90
[  967.763562]  zfsdev_ioctl+0x57/0xe0 [zfs]
[  967.763565]  do_vfs_ioctl+0x407/0x670
[  967.763567]  ? do_user_addr_fault+0x216/0x450
[  967.763570]  ksys_ioctl+0x67/0x90
[  967.763572]  __x64_sys_ioctl+0x1a/0x20
[  967.763575]  do_syscall_64+0x57/0x190
[  967.763577]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  967.763579] RIP: 0033:0x7f384e8d050b
[  967.763581] Code: Bad RIP value.
[  967.763582] RSP: 002b:00007ffebed46738 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  967.763584] RAX: ffffffffffffffda RBX: 00007ffebed467b0 RCX: 00007f384e8d050b
[  967.763585] RDX: 00007ffebed467b0 RSI: 0000000000005a02 RDI: 0000000000000003
[  967.763586] RBP: 00007ffebed4a6a0 R08: 00005561bba02d80 R09: 00007f384e9a5370
[  967.763587] R10: 00005561bba15000 R11: 0000000000000246 R12: 00005561bb9a8570
[  967.763588] R13: 00005561bb9ae070 R14: 0000000000000000 R15: 0000000000000000
[  967.763609] INFO: task txg_sync:4568 blocked for more than 241 seconds.
[  967.763650]       Tainted: P           OE     5.4.0-94-generic #106-Ubuntu
[  967.763690] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  967.763736] txg_sync        D    0  4568      2 0x80004000
[  967.763738] Call Trace:
[  967.763741]  __schedule+0x2e3/0x740
[  967.763745]  ? __internal_add_timer+0x2d/0x40
[  967.763747]  schedule+0x42/0xb0
[  967.763749]  schedule_timeout+0x8a/0x160
[  967.763751]  ? _cond_resched+0x19/0x30
[  967.763754]  ? __next_timer_interrupt+0xe0/0xe0
[  967.763756]  io_schedule_timeout+0x1e/0x50
[  967.763761]  __cv_timedwait_common+0x137/0x160 [spl]
[  967.763764]  ? wait_woken+0x80/0x80
[  967.763769]  __cv_timedwait_io+0x19/0x20 [spl]
[  967.763839]  zio_wait+0x137/0x280 [zfs]
[  967.763891]  dbuf_read+0x2a0/0x580 [zfs]
[  967.763945]  ? dmu_buf_hold_array_by_dnode+0x192/0x510 [zfs]
[  967.763997]  dmu_buf_will_dirty_impl+0xb6/0x170 [zfs]
[  967.764047]  dmu_buf_will_dirty+0x16/0x20 [zfs]
[  967.764100]  dmu_write_impl+0x42/0xd0 [zfs]
[  967.764152]  dmu_write.part.0+0x6a/0xd0 [zfs]
[  967.764203]  dmu_write+0x14/0x20 [zfs]
[  967.764271]  spa_history_write+0x194/0x1e0 [zfs]
[  967.764331]  spa_history_log_sync+0x18f/0x7d0 [zfs]
[  967.764396]  dsl_sync_task_sync+0xb6/0x100 [zfs]
[  967.764457]  dsl_pool_sync+0x3d6/0x4f0 [zfs]
[  967.764521]  spa_sync+0x562/0xff0 [zfs]
[  967.764524]  ? mutex_lock+0x13/0x40
[  967.764592]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[  967.764662]  txg_sync_thread+0x26d/0x3f0 [zfs]
[  967.764732]  ? txg_thread_exit.isra.0+0x60/0x60 [zfs]
[  967.764741]  thread_generic_wrapper+0x79/0x90 [spl]
[  967.764745]  kthread+0x104/0x140
[  967.764752]  ? __thread_exit+0x20/0x20 [spl]
[  967.764755]  ? kthread_park+0x90/0x90
[  967.764757]  ret_from_fork+0x22/0x40
skull-squadron commented 2 years ago

Appears related to #8234 but no solution is indicated for repair.

(mounted RO)
  pool: tank
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 10:32:30 with 0 errors on Wed Aug 25 03:10:56 2021
config:

    NAME                      STATE     READ WRITE CKSUM
    tank                      ONLINE       0     0     0
      raidz1-0                ONLINE       0     0     0
        zfs-e7334676685ebfb1  ONLINE       0     0     0
        zfs-2cdde638274363a0  ONLINE       0     0     0
        zfs-0bcbaa5f63217fe4  ONLINE       0     0     0
        zfs-d4c0e7fd00f938a4  ONLINE       0     0     0

errors: No known data errors
skull-squadron commented 2 years ago

hostname: and each child's path: should be overwritten on -f yes?

root@zfs:~# zpool import
   pool: tank
     id: 13413710004025204341
  state: ONLINE
status: Some supported features are not enabled on the pool.
    (Note that they may be intentionally disabled if the
    'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identifier, though
    some features will not be available without an explicit 'zpool upgrade'.
 config:

    tank                                  ONLINE
      raidz1-0                            ONLINE
        ata-WDC_WUH721414ALE6L4_9JGBRDWT  ONLINE
        ata-WDC_WUH721414ALE6L4_9JGBW7YT  ONLINE
        ata-WDC_WUH721414ALE6L4_9JGBUB7T  ONLINE
        ata-WDC_WUH721414ALE6L4_9JGB387T  ONLINE
root@zfs:~# zdb -l /dev/disk/by-label/tank
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'tank'
    state: 0
    txg: 5124276
    pool_guid: 13413710004025204341
    errata: 0
    hostname: 'fedora'
    top_guid: 17742213968895681915
    guid: 7683140127223500668
    hole_array[0]: 1
    vdev_children: 2
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 17742213968895681915
        nparity: 1
        metaslab_array: 134
        metaslab_shift: 34
        ashift: 12
        asize: 56002017755136
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 4943082563016739590
            path: '/dev/xvdi1'
            devid: 'ata-WDC_WUH721414ALE6L4_9JGBRDWT-part1'
            phys_path: 'pci-0000:47:00.0-ata-5'
            whole_disk: 1
            DTL: 17787
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 15336595513193204854
            path: '/dev/xvdj1'
            devid: 'ata-WDC_WUH721414ALE6L4_9JGBW7YT-part1'
            phys_path: 'pci-0000:47:00.0-ata-6'
            whole_disk: 1
            DTL: 17786
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 6040381479544144918
            path: '/dev/xvdk1'
            devid: 'ata-WDC_WUH721414ALE6L4_9JGBUB7T-part1'
            phys_path: 'pci-0000:47:00.0-ata-7'
            whole_disk: 1
            DTL: 17785
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 7683140127223500668
            path: '/dev/xvdl1'
            devid: 'ata-WDC_WUH721414ALE6L4_9JGB387T-part1'
            phys_path: 'pci-0000:47:00.0-ata-8'
            whole_disk: 1
            DTL: 17784
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3
skull-squadron commented 2 years ago

zpool status hangs after import -f. Seems like a zfs kernel task is deadlocked. ):

skull-squadron commented 2 years ago

For grins, tried importing on current stable TrueNAS (which shares much of the ZoL codebase now in FreeBSD IIRC), similar error message:

Solaris: WARNING: Pool 'tank' has encountered an uncorrectable I/O failure and has been suspended.

zpool status also hangs on TrueNAS/FreeBSD.

Methinks the intended behavior is it should never hang regardless of the condition of any individual disk(s) or pool. This leads to system hard reboot time. ):

rincebrain commented 2 years ago

There are so many moving parts here it's hard to guess what might be going awry.

If you run zpool events -vf before trying the import r/w, does it print anything interesting?

Does /proc/spl/kstat/zfs/dbgmsg have anything interesting to say?

By default, the failure mode on a pool being suspended is "wait forever until it comes back", rather than returning errors when things try to access it. I suspect you're encountering a consequence of that.

Without knowing anything about why it's complaining, it's hard to speculate what might be a good or terrible idea to help it.

And no, it's perfectly plausible for it to have an uncorrectable error in importing RW and not RO - as the term implies, importing read/write involves writing to the pool, and if it's having problems doing that for some reason, [...].

How, exactly, are the disks connected to the host that can't import it r/w? It sounds like raw device passthrough on a VMware host to a guest?

Could you share the entirety of dmesg from either the Ubuntu VM or the FreeBSD VM on one of these times you tried the import?

ybh1998 commented 2 years ago

I've encountered the same issue: import tank hangs while import -o readonly=on tank works.

Before this happened, I was trying to benchmark using fio. It hangs in creating test file (iostat shows lots of write on hard drives and fio can't be killed by kill -SIGKILL). I hard reset the system, and then, it hangs at ZFS import.

I can share my logs. If I run zpool events -vf and then import tank in another terminal, nothing shows in zpool events -vf. iostat shows that zfs is keep reading from the hard drives.

/proc/spl/kstat/zfs/dbgmsg:

timestamp    message 
1644170557   spa.c:6232:spa_tryimport(): spa_tryimport: importing tank
1644170557   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1644170557   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/zfs_tank_special_0_1': best uberblock found for spa $import. txg 1412295
1644170557   spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=1412295
1644170557   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170557   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170557   spa.c:8337:spa_async_request(): spa=$import async request task=2048
1644170557   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1644170557   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1644170569   spa.c:6232:spa_tryimport(): spa_tryimport: importing tank
1644170569   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1644170569   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/zfs_tank_special_0_1': best uberblock found for spa $import. txg 1412295
1644170569   spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=1412295
1644170569   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170569   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170569   spa.c:8337:spa_async_request(): spa=$import async request task=2048
1644170569   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1644170569   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1644170573   spa.c:6232:spa_tryimport(): spa_tryimport: importing tank
1644170573   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1644170573   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/zfs_tank_special_0_1': best uberblock found for spa $import. txg 1412295
1644170573   spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=1412295
1644170573   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170573   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170574   spa.c:8337:spa_async_request(): spa=$import async request task=2048
1644170574   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1644170574   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1644170574   spa.c:6084:spa_import(): spa_import: importing tank
1644170574   spa_misc.c:418:spa_load_note(): spa_load(tank, config trusted): LOADING
1644170574   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/zfs_tank_special_0_1': best uberblock found for spa tank. txg 1412295
1644170574   spa_misc.c:418:spa_load_note(): spa_load(tank, config untrusted): using uberblock with txg=1412295
1644170574   spa.c:8337:spa_async_request(): spa=tank async request task=4096
1644170574   spa.c:8337:spa_async_request(): spa=tank async request task=4096
1644170574   spa_misc.c:418:spa_load_note(): spa_load(tank, config trusted): read 534 log space maps (542 total blocks - blksz = 131072 bytes) in 109 ms
1644170574   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 1, ms_id 23, smp_length 644536, unflushed_allocs 455041024, unflushed_frees 33660928, freed 0, defer 0 + 0, unloaded time 202589 ms, loading_time 24 ms, ms_max_size 2147483648, max size error 2145386496, old_weight 7c0000000000001, new_weight 7c0000000000001
1644170574   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 0, ms_id 64, smp_length 24088, unflushed_allocs 508084224, unflushed_frees 24576, freed 0, defer 0 + 0, unloaded time 202656 ms, loading_time 0 ms, ms_max_size 7415701504, max size error 7415685120, old_weight 800000000000001, new_weight 800000000000001
1644170579   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 5, ms_id 0, smp_length 11576, unflushed_allocs 172032, unflushed_frees 147456, freed 0, defer 0 + 0, unloaded time 207332 ms, loading_time 6 ms, ms_max_size 497016832, max size error 496885760, old_weight 700000000000001, new_weight 700000000000001
1644170586   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 1, ms_id 64, smp_length 41376, unflushed_allocs 117465088, unflushed_frees 432889856, freed 0, defer 0 + 0, unloaded time 214437 ms, loading_time 8 ms, ms_max_size 4557053952, max size error 4555571200, old_weight 800000000000001, new_weight 800000000000001
1644170634   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 0, ms_id 31, smp_length 113640, unflushed_allocs 113827840, unflushed_frees 55058432, freed 0, defer 0 + 0, unloaded time 262514 ms, loading_time 9 ms, ms_max_size 7593730048, max size error 7592673280, old_weight 800000000000001, new_weight 800000000000001
1644170658   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 1, ms_id 24, smp_length 514872, unflushed_allocs 0, unflushed_frees 73728, freed 0, defer 0 + 0, unloaded time 286432 ms, loading_time 25 ms, ms_max_size 2147672064, max size error 2147598336, old_weight 7c0000000000001, new_weight 7c0000000000001

iostat -h 3 when zfs import tank hangs:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.0%    0.0%    0.5%    4.8%    0.0%   94.7%

      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
     0.00         0.0k         0.0k         0.0k       0.0k       0.0k       0.0k nvme0n1
     0.00         0.0k         0.0k         0.0k       0.0k       0.0k       0.0k nvme1n1
    74.67        74.7M         0.0k         0.0k     224.0M       0.0k       0.0k sda
    77.67        77.7M         0.0k         0.0k     233.0M       0.0k       0.0k sdb
    74.67        74.7M         0.0k         0.0k     224.0M       0.0k       0.0k sdc
    77.67        77.7M         0.0k         0.0k     233.0M       0.0k       0.0k sdd

I can't see any errors related to ZFS in dmesg.

System information:

Linux version 5.10.0-11-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.92-1 (2022-01-18)
ZFS: Loaded module v2.1.2-1~bpo11+1, ZFS pool version 5000, ZFS filesystem version 5
sveitser commented 1 year ago

I think I encountered the same issue (able to import readonly, r/w import hangs, no IO after a few seconds, no evidence of errors found anywhere) after I had to reset my computer because a docker image build froze it. I was also rather low on disk space (10G free on 1.8TB) at the time.

After running

echo 1 > /sys/module/zfs/parameters/zfs_recover
zdb -e -bcsvL tank

I was able to import the pool r/w again. zdb also didn't report anything that seemed indicative of any problems.

I got the idea from https://www.reddit.com/r/zfs/comments/fcacws/using_zdb_to_repair_errors/

rincebrain commented 1 year ago

As someone said in that reddit thread, zdb should not be writing anything, and if it does, that's a bug, so it's just the zfs_recover setting that's affected that, I would expect.

On Wed, Jan 11, 2023 at 3:06 AM Mathis @.***> wrote:

I think I encountered the same issue (able to import readonly, r/w import hangs, no IO after a few seconds, no evidence of errors found anywhere) after I had to reset my computer because a docker image build froze it. I was also rather low on disk space (10G free on 1.8TB) at the time.

After running

echo 1 > /sys/module/zfs/parameters/zfs_recover zdb -e -bcsvL tank

I was able to import the pool r/w again. zdb also didn't report anything that seemed indicative of any problems.

I got the idea from https://www.reddit.com/r/zfs/comments/fcacws/using_zdb_to_repair_errors/

— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/12980#issuecomment-1378371374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7IKC2ENMPYDQMOCC5TWRZSZJANCNFSM5MBQQ46Q . You are receiving this because you commented.Message ID: @.***>

sveitser commented 1 year ago

Ok. I didn't try to importing after only enabling zfs_recover. In the hopefully unlikely event that I encounter the problem again I will try that.

zehro commented 2 months ago

Hello! I'm experiencing the same thing as @ybh1998 . After importing my pool as readonly, I see on zpool status -v that a stuck scrub is in-progress. How can I cancel this? I thought this wasn't possible/scrubs are supposed to be automatically cancelled. zpool events also shows that there's an I/O error.

Another thing to note is that the pool is healthy/online but there is a corrupted metadata file <0x40>. Is there a way to also resolve this? This doesn't seem to be related to the I/O error, but for context, I was running a scrub before to see if it could auto-correct the issue and then ran a second scrub where it encountered the I/O suspension.

I'm kind of new to this, so any tips on troubleshooting?

amotin commented 2 months ago

@zehro You can not cancel or pause started scrub while the pool is imported read-only, since it require metadata write, which is impossible. Scrub can not progress in that case, but you can't do anything about it either. I've tried to look for some magic module parameter that would cancel scrub during import without restarting it, but haven't found anything quickly, but may be I am looked in wrong places.

If before starting scrub you've already got pool suspension, and it is not caused by some hardware problem you can fix by some external means, like power cycle or fixing some fallen off disk, then scrub couldn't do anything about it from the beginning. Scrub is not a magic, it just verifies all checksums and tries to recover data copies that are still recoverable. But exactly the same recovery is done during normal read process, and if it can't do it, and the error is not transient, then there is nothing to be done. You should probably import the pool in read-only mode and evacuate the data.

zehro commented 2 months ago

@amotin I appreciate the response. I suspected as much, and am prepping to evac the data and recreate the pool. I'm trying out a few other potential solutions before taking the time (30+TB of data is a while), and I am still open for suggestions.