openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.64k stars 1.75k forks source link

VERIFY(zio->io_children[c][w] == 0) failed with 7.0-RC2 #5429

Closed tuomari closed 7 years ago

tuomari commented 7 years ago

System is same as in #5226 Kernel version: 4.7.6

Nov 22 10:31:46 helvi kernel: [   19.528130] ZFS: Loaded module v0.7.0-rc2_43_gce4197c (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Nov 22 10:31:46 helvi kernel: [   17.858508] SPL: Loaded module v0.7.0-rc2_2_gcbba714 (DEBUG mode)
[Sat Nov 26 22:49:00 2016] VERIFY(zio->io_children[c][w] == 0) failed
[Sat Nov 26 22:49:00 2016] PANIC at zio.c:3662:zio_done()
[Sat Nov 26 22:49:00 2016] Showing stack for process 1349
[Sat Nov 26 22:49:00 2016] CPU: 4 PID: 1349 Comm: z_wr_iss Tainted: P           O    4.7.6.kvm.ovs.1 #2
[Sat Nov 26 22:49:00 2016] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[Sat Nov 26 22:49:00 2016]  0000000000000000 ffff8808263eb708 ffffffff8131012b ffffffffa0e6b994
[Sat Nov 26 22:49:00 2016]  0000000000000e4e ffff8808263eb718 ffffffffa0316a5f ffff8808263eb898
[Sat Nov 26 22:49:00 2016]  ffffffffa0316b17 0236ac5b3fc16940 ffff880800000028 ffff8808263eb8a8
[Sat Nov 26 22:49:00 2016] Call Trace:
[Sat Nov 26 22:49:00 2016]  [<ffffffff8131012b>] dump_stack+0x4d/0x72
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0316a5f>] spl_dumpstack+0x3f/0x50 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0316b17>] spl_panic+0xa7/0xd0 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffff816aceb9>] ? _raw_spin_unlock_irq+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffff810c622a>] ? finish_task_switch+0x7a/0x200
[Sat Nov 26 22:49:00 2016]  [<ffffffff816a82f6>] ? __schedule+0x1d6/0x680
[Sat Nov 26 22:49:00 2016]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d9f33b>] zio_done+0xb8b/0x18d0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0da02b3>] ? zio_ready+0x233/0x5e0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d9b84e>] zio_nowait+0x10e/0x310 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0cbf484>] dbuf_prefetch+0x4d4/0x610 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0ccab7c>] dmu_prefetch+0x21c/0x240 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d347d6>] space_map_load+0xe6/0x6c0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d0f813>] metaslab_load+0x63/0x1a0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d0fbfd>] metaslab_activate+0xcd/0x150 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d12342>] metaslab_alloc+0x7c2/0x12d0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d9bd00>] zio_dva_allocate+0x170/0x900 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffffa031a3a0>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa031a48d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0312af3>] ? taskq_member+0x13/0x20 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d95d73>] zio_execute+0x103/0x320 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0313fe9>] taskq_thread+0x299/0x590 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffff810c9fb0>] ? wake_up_q+0x70/0x70
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0313d50>] ? taskq_thread_should_stop+0x80/0x80 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffff810c0b98>] kthread+0xe8/0x100
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ad30f>] ret_from_fork+0x1f/0x40
[Sat Nov 26 22:49:00 2016]  [<ffffffff810c0ab0>] ? kthread_create_on_node+0x190/0x190
tuomari commented 7 years ago

I hit this again with newer ZFS version:

ZFS: Loaded module v0.7.0-rc2_68_ge8a2014 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
SPL: Loaded module v0.7.0-rc2_3_g4934925 (DEBUG mode)
Dec 28 09:07:50 helvi kernel: [485283.704908] VERIFY(zio->io_children[c][w] == 0) failed
Dec 28 09:07:50 helvi kernel: [485283.704962] PANIC at zio.c:3719:zio_done()
Dec 28 09:07:50 helvi kernel: [485283.705018] Showing stack for process 1367
Dec 28 09:07:50 helvi kernel: [485283.705021] CPU: 6 PID: 1367 Comm: z_wr_iss Tainted: P           O    4.7.6.remod.kvm.ovs.1 #2
Dec 28 09:07:50 helvi kernel: [485283.705023] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
Dec 28 09:07:50 helvi kernel: [485283.705025]  0000000000000000 ffff88082caef6f8 ffffffff8131012b ffffffffa0ec3e7c
Dec 28 09:07:50 helvi kernel: [485283.705029]  0000000000000e87 ffff88082caef708 ffffffffa03b4a6f ffff88082caef888
Dec 28 09:07:50 helvi kernel: [485283.705031]  ffffffffa03b4b27 ffff88083fc96940 ffff880800000028 ffff88082caef898
Dec 28 09:07:50 helvi kernel: [485283.705034] Call Trace:
Dec 28 09:07:50 helvi kernel: [485283.705044]  [<ffffffff8131012b>] dump_stack+0x4d/0x72
Dec 28 09:07:50 helvi kernel: [485283.705062]  [<ffffffffa03b4a6f>] spl_dumpstack+0x3f/0x50 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705068]  [<ffffffffa03b4b27>] spl_panic+0xa7/0xd0 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705074]  [<ffffffff816aceb9>] ? _raw_spin_unlock_irq+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705077]  [<ffffffff810c622a>] ? finish_task_switch+0x7a/0x200
Dec 28 09:07:50 helvi kernel: [485283.705080]  [<ffffffff816a82f6>] ? __schedule+0x1d6/0x680
Dec 28 09:07:50 helvi kernel: [485283.705082]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
Dec 28 09:07:50 helvi kernel: [485283.705084]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705161]  [<ffffffffa0de6c23>] zio_done+0xbd3/0x1920 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705163]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705213]  [<ffffffffa0de7ba3>] ? zio_ready+0x233/0x5e0 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705215]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705264]  [<ffffffffa0de305e>] zio_nowait+0x10e/0x310 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705297]  [<ffffffffa0d08ed4>] dbuf_prefetch+0x4d4/0x610 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705332]  [<ffffffffa0d1460c>] dmu_prefetch+0x21c/0x240 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705381]  [<ffffffffa0d7e414>] space_map_load+0xf4/0x6d0 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705385]  [<ffffffff810c9f6d>] ? wake_up_q+0x2d/0x70
Dec 28 09:07:50 helvi kernel: [485283.705426]  [<ffffffffa0d592e3>] metaslab_load+0x63/0x1a0 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705475]  [<ffffffffa0d596cd>] metaslab_activate+0xcd/0x150 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705523]  [<ffffffffa0d5be12>] metaslab_alloc+0x7c2/0x12d0 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705583]  [<ffffffffa0ddd433>] ? zio_execute+0x103/0x320 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705638]  [<ffffffffa0de3510>] zio_dva_allocate+0x170/0x940 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705646]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705657]  [<ffffffffa03b83b0>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705669]  [<ffffffffa03b849d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705681]  [<ffffffffa03b0b03>] ? taskq_member+0x13/0x20 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705732]  [<ffffffffa0ddd433>] zio_execute+0x103/0x320 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705741]  [<ffffffffa03b1ff9>] taskq_thread+0x299/0x590 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705744]  [<ffffffff810c9fb0>] ? wake_up_q+0x70/0x70
Dec 28 09:07:50 helvi kernel: [485283.705749]  [<ffffffffa03b1d60>] ? taskq_thread_should_stop+0x80/0x80 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705752]  [<ffffffff810c0b98>] kthread+0xe8/0x100
Dec 28 09:07:50 helvi kernel: [485283.705762]  [<ffffffff816ad30f>] ret_from_fork+0x1f/0x40
Dec 28 09:07:50 helvi kernel: [485283.705770]  [<ffffffff810c0ab0>] ? kthread_create_on_node+0x190/0x190
tuomari commented 7 years ago

I seem to hit this every few days at the moment. These have all occured while resilvering the pool.

[   30.084573] SPL: Loaded module v0.7.0-rc2_8_g8e99d66 (DEBUG mode)
[   31.758485] ZFS: Loaded module v0.7.0-rc2_99_g3500a14 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Dec 12 21:42:20 2016
    16.2T scanned out of 29.5T at 4.49M/s, (scan is slow, no estimated time)
    1.10T resilvered, 54.89% done
config:

    NAME                                                       STATE     READ WRITE CKSUM
    tank                                                       DEGRADED     0     0     0
      raidz1-0                                                 DEGRADED     0     0     0
        ata-ST3000DM001-9YN166_W1F3AEZG                        ONLINE       0     0     0
        ata-ST3000DM001-1CH166_Z1F38MTB-part4                  ONLINE       0     0     0
        ata-ST3000DM001-1ER166_Z5009QEX                        ONLINE       0     0     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N0HRLESN-part4         ONLINE       0     0     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N2SED25L-part4         ONLINE       0     0     0
        spare-5                                                DEGRADED     0     0     0
          9714062319235286728                                  UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F4CKB1-part1
          ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3ELL01X             ONLINE       0     0     0  (resilvering)
      raidz1-1                                                 ONLINE       0     0     0
        ata-ST3000DM001-1CH166_Z1F4REQ1-part4                  ONLINE       0     0     0
        ata-ST3000DM001-1CH166_Z1F4S6CX-part4                  ONLINE       0     0     0
        ata-ST3000DM001-1CH166_Z1F3XC4M-part4                  ONLINE       0     0     0
        ata-ST3000DM001-1CH166_Z1F4S822-part4                  ONLINE       0     0     0
        ata-ST3000DM001-1CH166_Z1F4S6D9-part4                  ONLINE       0     0     0
        ata-ST3000DM001-1CH166_Z1F4S7Z6-part4                  ONLINE       0     0     0
    logs
      mirror-2                                                 ONLINE       0     0     0
        ata-KINGSTON_SH103S3120G_50026B7334000234-part4        ONLINE       0     0     0
        ata-Kingston_SSDNow_V_Series_128GB_06P970009771-part4  ONLINE       0     0     0
    cache
      ata-KINGSTON_SH103S3120G_50026B7334000234-part5          ONLINE       0     0     0
      ata-Kingston_SSDNow_V_Series_128GB_06P970009771-part5    ONLINE       0     0     0
    spares
      ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3ELL01X                 INUSE     currently in use

errors: No known data errors
[196471.609690] VERIFY(zio->io_children[c][w] == 0) failed
[196471.609742] PANIC at zio.c:3719:zio_done()
[196471.609801] Showing stack for process 20437
[196471.609805] CPU: 12 PID: 20437 Comm: zmaudit.pl Tainted: P           O    4.7.6.remod.kvm.ovs.1 #2
[196471.609807] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[196471.609809]  0000000000000000 ffff8801d8ceb8f8 ffffffff8131012b ffffffffa0ecd054
[196471.609812]  0000000000000e87 ffff8801d8ceb908 ffffffffa01f2b1f ffff8801d8ceba88
[196471.609815]  ffffffffa01f2bd7 ffff88083fd16940 ffff880100000028 ffff8801d8ceba98
[196471.609817] Call Trace:
[196471.609826]  [<ffffffff8131012b>] dump_stack+0x4d/0x72
[196471.609841]  [<ffffffffa01f2b1f>] spl_dumpstack+0x3f/0x50 [spl]
[196471.609852]  [<ffffffffa01f2bd7>] spl_panic+0xa7/0xd0 [spl]
[196471.609860]  [<ffffffff816aceb9>] ? _raw_spin_unlock_irq+0x9/0x10
[196471.609867]  [<ffffffff810c622a>] ? finish_task_switch+0x7a/0x200
[196471.609873]  [<ffffffff816a82f6>] ? __schedule+0x1d6/0x680
[196471.609880]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
[196471.609882]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[196471.609966]  [<ffffffffa0dea983>] zio_done+0xbd3/0x1920 [zfs]
[196471.609974]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[196471.610030]  [<ffffffffa0deb903>] ? zio_ready+0x233/0x5e0 [zfs]
[196471.610034]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[196471.610084]  [<ffffffffa0de6dbe>] zio_nowait+0x10e/0x310 [zfs]
[196471.610138]  [<ffffffffa0d0bee4>] dbuf_prefetch+0x4d4/0x610 [zfs]
[196471.610199]  [<ffffffffa0d1754c>] dmu_prefetch+0x14c/0x240 [zfs]
[196471.610285]  [<ffffffffa0dc9817>] zfs_readdir+0x2f7/0x400 [zfs]
[196471.610293]  [<ffffffff811b0cf2>] ? alloc_pages_vma+0xc2/0x260
[196471.610299]  [<ffffffff811de6bf>] ? terminate_walk+0x5f/0xd0
[196471.610307]  [<ffffffff811cd6e7>] ? mem_cgroup_commit_charge+0x67/0xa0
[196471.610310]  [<ffffffff811a1a0e>] ? page_add_new_anon_rmap+0x9e/0xd0
[196471.610313]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[196471.610319]  [<ffffffff81196e3f>] ? handle_mm_fault+0xb8f/0x1a50
[196471.610388]  [<ffffffffa0df0de7>] zpl_iterate+0x67/0x150 [zfs]
[196471.610393]  [<ffffffff811e8176>] iterate_dir+0x176/0x190
[196471.610395]  [<ffffffff811e85a4>] SyS_getdents+0x84/0xf0
[196471.610397]  [<ffffffff811e8190>] ? iterate_dir+0x190/0x190
[196471.610400]  [<ffffffff816ad11b>] entry_SYSCALL_64_fastpath+0x13/0x8f
tuomari commented 7 years ago

And again, after three days. Same versions as in the previous stack. Also resilver is still running.

[224731.486285] VERIFY(zio->io_children[c][w] == 0) failed
[224731.486312] PANIC at zio.c:3719:zio_done()
[224731.486328] Showing stack for process 1358
[224731.486333] CPU: 13 PID: 1358 Comm: z_wr_iss Tainted: P           O    4.7.6.remod.kvm.ovs.1 #2
[224731.486335] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[224731.486338]  0000000000000000 ffff880825f3b6f8 ffffffff8131012b ffffffffa2106054
[224731.486343]  0000000000000e87 ffff880825f3b708 ffffffffa0292b1f ffff880825f3b888
[224731.486346]  ffffffffa0292bd7 ffff88083fd56940 ffff880800000028 ffff880825f3b898
[224731.486350] Call Trace:
[224731.486362]  [<ffffffff8131012b>] dump_stack+0x4d/0x72
[224731.486381]  [<ffffffffa0292b1f>] spl_dumpstack+0x3f/0x50 [spl]
[224731.486390]  [<ffffffffa0292bd7>] spl_panic+0xa7/0xd0 [spl]
[224731.486397]  [<ffffffff816aceb9>] ? _raw_spin_unlock_irq+0x9/0x10
[224731.486403]  [<ffffffff810c622a>] ? finish_task_switch+0x7a/0x200
[224731.486407]  [<ffffffff816a82f6>] ? __schedule+0x1d6/0x680
[224731.486505]  [<ffffffffa201d52f>] ? zio_create+0x44f/0x6d0 [zfs]
[224731.486510]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
[224731.486513]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[224731.486581]  [<ffffffffa2023983>] zio_done+0xbd3/0x1920 [zfs]
[224731.486586]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[224731.486652]  [<ffffffffa2024903>] ? zio_ready+0x233/0x5e0 [zfs]
[224731.486657]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[224731.486714]  [<ffffffffa201fdbe>] zio_nowait+0x10e/0x310 [zfs]
[224731.486753]  [<ffffffffa1f44ee4>] dbuf_prefetch+0x4d4/0x610 [zfs]
[224731.486790]  [<ffffffffa1f5061c>] dmu_prefetch+0x21c/0x240 [zfs]
[224731.486847]  [<ffffffffa1fba424>] space_map_load+0xf4/0x6d0 [zfs]
[224731.486851]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
[224731.486894]  [<ffffffffa1f952d3>] metaslab_load+0x63/0x1a0 [zfs]
[224731.486937]  [<ffffffffa1f956bd>] metaslab_activate+0xcd/0x150 [zfs]
[224731.486980]  [<ffffffffa1f97e02>] metaslab_alloc+0x7c2/0x12d0 [zfs]
[224731.487030]  [<ffffffffa201a193>] ? zio_execute+0x103/0x320 [zfs]
[224731.487081]  [<ffffffffa2020270>] zio_dva_allocate+0x170/0x940 [zfs]
[224731.487083]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[224731.487091]  [<ffffffffa0296460>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
[224731.487097]  [<ffffffffa029654d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
[224731.487103]  [<ffffffffa028eb03>] ? taskq_member+0x13/0x20 [spl]
[224731.487153]  [<ffffffffa201a193>] zio_execute+0x103/0x320 [zfs]
[224731.487159]  [<ffffffffa028fff9>] taskq_thread+0x299/0x590 [spl]
[224731.487162]  [<ffffffff810c9fb0>] ? wake_up_q+0x70/0x70
[224731.487167]  [<ffffffffa028fd60>] ? taskq_thread_should_stop+0x80/0x80 [spl]
[224731.487170]  [<ffffffff810c0b98>] kthread+0xe8/0x100
[224731.487173]  [<ffffffff816ad30f>] ret_from_fork+0x1f/0x40
[224731.487175]  [<ffffffff810c0ab0>] ? kthread_create_on_node+0x190/0x190
dweeezil commented 7 years ago

Referencing #5226 due to the likelihood this involves gang blocks in some way. @tuomari The stacks involving metaslabs don't quite make sense. Are you running something newer than ce4197c now? Could you run gdb on your kernel module and list *(metaslab_alloc+0x7c2).

tuomari commented 7 years ago

It seems that although I did have zfs built with --enable-debug and kernel was compiled DEBUG_KERNEL, zfs.ko did not have debug symbols when compiled with make deb and installed from those packages.

I have now installed the module with make install and confirmed that debugging symbols exist. I will report back if I hit this again.

tuomari commented 7 years ago

@dweeezil I forgot to answer your question about the version. I was running SPL zfsonlinux/spl@8e99d66 and ZFS 3500a14

Also, if the gdb output for the recompiled module works correctly, list *(metaslab_alloc+0x7c2) gives the following:

(gdb) list *(metaslab_alloc+0x7c2)
0x72e82 is in metaslab_alloc (/usr/src2/zfs/module/zfs/metaslab.c:2295).
2290                    msp->ms_weight & ~METASLAB_ACTIVE_MASK);
2291                mutex_exit(&msp->ms_lock);
2292                continue;
2293            }
2294    
2295            if (metaslab_activate(msp, activation_weight) != 0) {
2296                mutex_exit(&msp->ms_lock);
2297                continue;
2298            }
2299    
(gdb) 
tuomari commented 7 years ago

It seems that I'm hitting this less frequently after the pool finished resilvering a disk.

This time ZFS version is 42b64e5 and SPL version zfsonlinux/spl@8e99d66

[474171.260844] VERIFY(zio->io_children[c][w] == 0) failed
[474171.260942] PANIC at zio.c:3719:zio_done()
[474171.261017] Showing stack for process 29747
[474171.261021] CPU: 6 PID: 29747 Comm: zfs Tainted: P           O    4.9.1.remod.kvm.ovs.1 #1
[474171.261023] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[474171.261025]  ffffc90040727540 ffffffff813262fb ffffffffa0e214f1 0000000000000e87
[474171.261030]  ffffc90040727550 ffffffffa0275b2f ffffc900407276d0 ffffffffa0275be7
[474171.261033]  ffff88083fc98038 ffffc90000000028 ffffc900407276e0 ffffc90040727680
[474171.261036] Call Trace:
[474171.261045]  [<ffffffff813262fb>] dump_stack+0x4d/0x72
[474171.261065]  [<ffffffffa0275b2f>] spl_dumpstack+0x3f/0x50 [spl]
[474171.261074]  [<ffffffffa0275be7>] spl_panic+0xa7/0xd0 [spl]
[474171.261079]  [<ffffffff816d5829>] ? _raw_spin_unlock_irq+0x9/0x10
[474171.261086]  [<ffffffff810cd1fa>] ? finish_task_switch+0x7a/0x210
[474171.261090]  [<ffffffff816d030b>] ? __schedule+0x17b/0x610
[474171.261094]  [<ffffffff816d2eb9>] ? __mutex_unlock_slowpath+0xa9/0x140
[474171.261096]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[474171.261213]  [<ffffffffa0d42a23>] zio_done+0xbd3/0x1920 [zfs]
[474171.261216]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[474171.261283]  [<ffffffffa0d439a3>] ? zio_ready+0x233/0x5e0 [zfs]
[474171.261286]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[474171.261348]  [<ffffffffa0d3ee5e>] zio_nowait+0x10e/0x310 [zfs]
[474171.261398]  [<ffffffffa0c63ef6>] dbuf_prefetch+0x4d6/0x610 [zfs]
[474171.261451]  [<ffffffffa0c6f630>] dmu_prefetch+0x220/0x240 [zfs]
[474171.261506]  [<ffffffffa0c7d622>] receive_read_prefetch+0x162/0x1c0 [zfs]
[474171.261563]  [<ffffffffa0c808f4>] dmu_recv_stream+0xb24/0xff0 [zfs]
[474171.261625]  [<ffffffffa0d13a4c>] zfs_ioc_recv_impl+0x11c/0xd50 [zfs]
[474171.261689]  [<ffffffffa0d147fd>] zfs_ioc_recv+0x17d/0x220 [zfs]
[474171.261725]  [<ffffffffa0c61d26>] ? dbuf_rele+0x36/0x70 [zfs]
[474171.261762]  [<ffffffffa0c899cc>] ? dnode_rele_and_unlock+0x7c/0xc0 [zfs]
[474171.261809]  [<ffffffffa0d15bee>] zfsdev_ioctl+0x60e/0x720 [zfs]
[474171.261815]  [<ffffffff811a6b92>] ? handle_mm_fault+0x342/0xfc0
[474171.261819]  [<ffffffff811fad0b>] do_vfs_ioctl+0x8b/0x5b0
[474171.261822]  [<ffffffff810996d0>] ? __do_page_fault+0x220/0x430
[474171.261823]  [<ffffffff811fb2a4>] SyS_ioctl+0x74/0x80
[474171.261826]  [<ffffffff816d5aa0>] entry_SYSCALL_64_fastpath+0x13/0x94
tuomari commented 7 years ago

This time a little different path, and about the same 5 day uptime as with the previous panics

ZFS version 0.7.0-rc3_2_gbb7ffda
SPL version 0.7.0-rc3
[511461.454420] VERIFY(zio->io_children[c][w] == 0) failed
[511461.454494] PANIC at zio.c:3729:zio_done()
[511461.454553] Showing stack for process 1400
[511461.454557] CPU: 4 PID: 1400 Comm: z_wr_iss Tainted: P           O    4.9.1.remod.kvm.ovs.1 #1
[511461.454559] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[511461.454561]  ffffc90005a23720 ffffffff813262fb ffffffffa0e5ce41 0000000000000e91
[511461.454566]  ffffc90005a23730 ffffffffa01c6b2f ffffc90005a238b0 ffffffffa01c6be7
[511461.454569]  0001d12be649372e 0000000000000028 ffffc90005a238c0 ffffc90005a23860
[511461.454572] Call Trace:
[511461.454582]  [<ffffffff813262fb>] dump_stack+0x4d/0x72
[511461.454598]  [<ffffffffa01c6b2f>] spl_dumpstack+0x3f/0x50 [spl]
[511461.454606]  [<ffffffffa01c6be7>] spl_panic+0xa7/0xd0 [spl]
[511461.454611]  [<ffffffff816d5829>] ? _raw_spin_unlock_irq+0x9/0x10
[511461.454618]  [<ffffffff810cd1fa>] ? finish_task_switch+0x7a/0x210
[511461.454622]  [<ffffffff816d030b>] ? __schedule+0x17b/0x610
[511461.454625]  [<ffffffff816d2eb9>] ? __mutex_unlock_slowpath+0xa9/0x140
[511461.454627]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[511461.454777]  [<ffffffffa0d7db75>] zio_done+0xbc5/0x1910 [zfs]
[511461.454780]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[511461.454848]  [<ffffffffa0d7eaf1>] ? zio_ready+0x231/0x5e0 [zfs]
[511461.454850]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[511461.454918]  [<ffffffffa0d79fae>] zio_nowait+0x10e/0x310 [zfs]
[511461.454975]  [<ffffffffa0c9def6>] dbuf_prefetch+0x4d6/0x610 [zfs]
[511461.455033]  [<ffffffffa0ca9630>] dmu_prefetch+0x220/0x240 [zfs]
[511461.455100]  [<ffffffffa0d14384>] space_map_load+0xf4/0x6d0 [zfs]
[511461.455103]  [<ffffffff816d2eb9>] ? __mutex_unlock_slowpath+0xa9/0x140
[511461.455167]  [<ffffffffa0cee717>] metaslab_load+0x67/0x1e0 [zfs]
[511461.455231]  [<ffffffffa0ceeb4a>] metaslab_activate+0xda/0x160 [zfs]
[511461.455296]  [<ffffffffa0cf18fe>] metaslab_alloc+0x82e/0x15a0 [zfs]
[511461.455302]  [<ffffffff810dba62>] ? enqueue_task_fair+0xb62/0x1260
[511461.455368]  [<ffffffffa0d7a46a>] zio_dva_allocate+0x17a/0x930 [zfs]
[511461.455371]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[511461.455379]  [<ffffffffa01ca3b0>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
[511461.455387]  [<ffffffffa01ca49d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
[511461.455393]  [<ffffffffa01c2b33>] ? taskq_member+0x13/0x20 [spl]
[511461.455461]  [<ffffffffa0d74363>] zio_execute+0x103/0x320 [zfs]
[511461.455469]  [<ffffffffa01c402b>] taskq_thread+0x29b/0x590 [spl]
[511461.455472]  [<ffffffff810d1170>] ? wake_up_q+0x70/0x70
[511461.455479]  [<ffffffffa01c3d90>] ? taskq_thread_should_stop+0x80/0x80 [spl]
[511461.455482]  [<ffffffff810c7079>] kthread+0xe9/0x100
[511461.455484]  [<ffffffff810c6f90>] ? kthread_park+0x60/0x60
[511461.455487]  [<ffffffff816d5cd2>] ret_from_fork+0x22/0x30
tuomari commented 7 years ago

Same version as before.

[Thu Feb  2 17:43:22 2017] VERIFY(zio->io_children[c][w] == 0) failed
[Thu Feb  2 17:43:22 2017] PANIC at zio.c:3729:zio_done()
[Thu Feb  2 17:43:22 2017] Showing stack for process 1423
[Thu Feb  2 17:43:22 2017] CPU: 6 PID: 1423 Comm: z_wr_iss Tainted: P           O    4.9.1.remod.kvm.ovs.1 #1
[Thu Feb  2 17:43:22 2017] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[Thu Feb  2 17:43:22 2017]  ffffc900062ab720 ffffffff813262fb ffffffffa1306e41 0000000000000e91
[Thu Feb  2 17:43:22 2017]  ffffc900062ab730 ffffffffa0346b2f ffffc900062ab8b0 ffffffffa0346be7
[Thu Feb  2 17:43:22 2017]  ffff88083fc98001 001da3a900000028 ffffc900062ab8c0 ffffc900062ab860
[Thu Feb  2 17:43:22 2017] Call Trace:
[Thu Feb  2 17:43:22 2017]  [<ffffffff813262fb>] dump_stack+0x4d/0x72
[Thu Feb  2 17:43:22 2017]  [<ffffffffa0346b2f>] spl_dumpstack+0x3f/0x50 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa0346be7>] spl_panic+0xa7/0xd0 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d5829>] ? _raw_spin_unlock_irq+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffff810cd1fa>] ? finish_task_switch+0x7a/0x210
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d030b>] ? __schedule+0x17b/0x610
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d2eb1>] ? __mutex_unlock_slowpath+0xa1/0x140
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d2eb9>] ? __mutex_unlock_slowpath+0xa9/0x140
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1227b75>] zio_done+0xbc5/0x1910 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1228af1>] ? zio_ready+0x231/0x5e0 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1223fae>] zio_nowait+0x10e/0x310 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1147ef6>] dbuf_prefetch+0x4d6/0x610 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1153630>] dmu_prefetch+0x220/0x240 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa11be384>] space_map_load+0xf4/0x6d0 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffff811cb36c>] ? __kmalloc_node+0xbc/0x2c0
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1198717>] metaslab_load+0x67/0x1e0 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1198b4a>] metaslab_activate+0xda/0x160 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa119b8fe>] metaslab_alloc+0x82e/0x15a0 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa122446a>] zio_dva_allocate+0x17a/0x930 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffffa034a3b0>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa034a49d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa0342b33>] ? taskq_member+0x13/0x20 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa121e363>] zio_execute+0x103/0x320 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa034402b>] taskq_thread+0x29b/0x590 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffff810d1170>] ? wake_up_q+0x70/0x70
[Thu Feb  2 17:43:22 2017]  [<ffffffffa0343d90>] ? taskq_thread_should_stop+0x80/0x80 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffff810c7079>] kthread+0xe9/0x100
[Thu Feb  2 17:43:22 2017]  [<ffffffff810c6f90>] ? kthread_park+0x60/0x60
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d5cd2>] ret_from_fork+0x22/0x30
tuomari commented 7 years ago

Have not hit this in a while. Closing as this is most likely fixed...