openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.73k forks source link

PANIC: blkptr at ffff88080812d048 DVA 1 has invalid VDEV 1 #4582

Closed JuliaVixen closed 3 years ago

JuliaVixen commented 8 years ago

So, I found an old hard drive from around 2009, which was half of a mirror apparently; plugged it in, and tried to import the pool. Then got a kernel panic...

localhost ~ # zpool import
  pool: backup
    id: 3472166890449163768
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
config:
         backup      UNAVAIL  missing device
           mirror-0  DEGRADED
             ad16    UNAVAIL
             sdg     ONLINE
Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.

localhost ~ # zpool import -o readonly=on backup
PANIC: blkptr at ffff88080812d048 DVA 1 has invalid VDEV 1

...And then the zpool process blocks on I/O forever. (In the D state)

[617879.924414] PANIC: blkptr at ffff88080812d048 DVA 1 has invalid VDEV 1
[617879.924533] Showing stack for process 27088
[617879.924535] CPU: 1 PID: 27088 Comm: zpool Tainted: P           O    4.4.6-gentoo #1
[617879.924535] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013
[617879.924536]  0000000000000000 ffff8806c67336e8 ffffffff8130feed 0000000000000003
[617879.924538]  0000000000000001 ffff8806c67336f8 ffffffffa09d2aa5 ffff8806c6733828
[617879.924539]  ffffffffa09d2b3d ffff88082c224000 ffff88082c220000 61207274706b6c62
[617879.924541] Call Trace:
[617879.924545]  [<ffffffff8130feed>] dump_stack+0x4d/0x64
[617879.924550]  [<ffffffffa09d2aa5>] spl_dumpstack+0x3d/0x3f [spl]
[617879.924552]  [<ffffffffa09d2b3d>] vcmn_err+0x96/0xd3 [spl]
[617879.924555]  [<ffffffff814cf75d>] ? schedule+0x72/0x81
[617879.924556]  [<ffffffff814d2048>] ? schedule_timeout+0x24/0x15b
[617879.924559]  [<ffffffff8105b4a0>] ? __sched_setscheduler+0x56a/0x73f
[617879.924562]  [<ffffffff810e9ac7>] ? cache_alloc_refill+0x69/0x4a3
[617879.924576]  [<ffffffffa0aa8634>] zfs_panic_recover+0x4d/0x4f [zfs]
[617879.924581]  [<ffffffffa0a58897>] ? arc_space_return+0x13cd/0x249a [zfs]
[617879.924588]  [<ffffffffa0ae8de2>] zfs_blkptr_verify+0x1c2/0x2dd [zfs]
[617879.924595]  [<ffffffffa0ae8f31>] zio_read+0x34/0x147 [zfs]
[617879.924600]  [<ffffffffa0a58897>] ? arc_space_return+0x13cd/0x249a [zfs]
[617879.924604]  [<ffffffffa0a5a6da>] arc_read+0x7f7/0x83d [zfs]
[617879.924606]  [<ffffffff814cf51c>] ? __schedule+0x5fe/0x744
[617879.924613]  [<ffffffffa0a6add8>] dmu_objset_open_impl+0xf0/0x65d [zfs]
[617879.924615]  [<ffffffff810685cc>] ? add_wait_queue+0x44/0x44
[617879.924624]  [<ffffffffa0a86930>] dsl_pool_init+0x2d/0x50 [zfs]
[617879.924635]  [<ffffffffa0a9fb8b>] spa_vdev_remove+0xb5d/0x219f [zfs]
[617879.924636]  [<ffffffff810685cc>] ? add_wait_queue+0x44/0x44
[617879.924639]  [<ffffffffa09e47d2>] ? nvlist_remove_nvpair+0x2a8/0x314 [znvpair]
[617879.924641]  [<ffffffffa09f8e40>] ? zpool_get_rewind_policy+0x116/0x13c [zcommon]
[617879.924652]  [<ffffffffa0aa0cc9>] spa_vdev_remove+0x1c9b/0x219f [zfs]
[617879.924653]  [<ffffffffa09f8db2>] ? zpool_get_rewind_policy+0x88/0x13c [zcommon]
[617879.924664]  [<ffffffffa0aa1821>] spa_import+0x1d1/0x684 [zfs]
[617879.924666]  [<ffffffffa09e4f53>] ? nvlist_remove_all+0x42/0x457 [znvpair]
[617879.924674]  [<ffffffffa0acad17>] zfs_secpolicy_smb_acl+0x2aa4/0x4fce [zfs]
[617879.924681]  [<ffffffffa0acfdfa>] pool_status_check+0x6b1/0x7a1 [zfs]
[617879.924689]  [<ffffffffa0acfacf>] pool_status_check+0x386/0x7a1 [zfs]
[617879.924690]  [<ffffffff810dbfec>] ? do_brk+0x227/0x250
[617879.924693]  [<ffffffff810fca17>] do_vfs_ioctl+0x3f5/0x43d
[617879.924694]  [<ffffffff810368f2>] ? __do_page_fault+0x24e/0x367
[617879.924696]  [<ffffffff810fca98>] SyS_ioctl+0x39/0x61
[617879.924697]  [<ffffffff814d2b57>] entry_SYSCALL_64_fastpath+0x12/0x6a
dweeezil commented 8 years ago

@JuliaVixen According to this error you saw during the zpool import:

Additional devices are known to be part of this pool, though their exact configuration cannot be determined.

the pool must have had another top-level vdev (i.e. stripe). The import, even with readonly=on should not have gotten as far as it did but, instead, should have produced the exact same error. Since you're running Gentoo, my guess is that this might be an issue with the new API.

FYI, the pool configuration displayed is not the complete pool configuration, which is only stored on an object within the pool. Instead, it's displaying only the parts of the pool configuration which were actually discovered. The error quoted above does not refer to the missing mirror leg but, instead, refers to a missing top-level vdev.

JuliaVixen commented 8 years ago

I plugged the drive back in to get the vdev labels... It kinda looks like there's only supposed to be two drives, if I understand this correctly.

localhost ~ # zdb -lu /dev/sdm
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 13
    name: 'backup'
    state: 1
    txg: 16
    pool_guid: 3472166890449163768
    hostid: 2558674546
    hostname: 'vulpis'
    top_guid: 7962344807192492976
    guid: 15917724193338767979
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 7962344807192492976
        metaslab_array: 23
        metaslab_shift: 31
        ashift: 9
        asize: 400083648512
        is_log: 0
        children[0]:
            type: 'disk'
            id: 0
            guid: 18134448193074441749
            path: '/dev/ad16'
            whole_disk: 0
        children[1]:
            type: 'disk'
            id: 1
            guid: 15917724193338767979
            path: '/dev/ada4'
            whole_disk: 0
Uberblock[4]
    magic = 0000000000bab10c
    version = 13
    txg = 4
    guid_sum = 8593195936635763240
    timestamp = 1262001087 UTC = Mon Dec 28 11:51:27 2009
Uberblock[5]
    magic = 0000000000bab10c
    version = 13
    txg = 5
    guid_sum = 8593195936635763240
    timestamp = 1262001087 UTC = Mon Dec 28 11:51:27 2009
Uberblock[7]
    magic = 0000000000bab10c
    version = 13
    txg = 7
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[8]
    magic = 0000000000bab10c
    version = 13
    txg = 8
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[9]
    magic = 0000000000bab10c
    version = 13
    txg = 9
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[10]
    magic = 0000000000bab10c
    version = 13
    txg = 10
    guid_sum = 5830045293826664715
    timestamp = 1262001132 UTC = Mon Dec 28 11:52:12 2009
Uberblock[14]
    magic = 0000000000bab10c
    version = 13
    txg = 14
    guid_sum = 5830045293826664715
    timestamp = 1262001249 UTC = Mon Dec 28 11:54:09 2009
Uberblock[16]
    magic = 0000000000bab10c
    version = 13
    txg = 16
    guid_sum = 5830045293826664715
    timestamp = 1262001249 UTC = Mon Dec 28 11:54:09 2009
--------------------------------------------
LABEL 1
--------------------------------------------
    version: 13
    name: 'backup'
    state: 1
    txg: 16
    pool_guid: 3472166890449163768
    hostid: 2558674546
    hostname: 'vulpis'
    top_guid: 7962344807192492976
    guid: 15917724193338767979
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 7962344807192492976
        metaslab_array: 23
        metaslab_shift: 31
        ashift: 9
        asize: 400083648512
        is_log: 0
        children[0]:
            type: 'disk'
            id: 0
            guid: 18134448193074441749
            path: '/dev/ad16'
            whole_disk: 0
        children[1]:
            type: 'disk'
            id: 1
            guid: 15917724193338767979
            path: '/dev/ada4'
            whole_disk: 0
Uberblock[4]
    magic = 0000000000bab10c
    version = 13
    txg = 4
    guid_sum = 8593195936635763240
    timestamp = 1262001087 UTC = Mon Dec 28 11:51:27 2009
Uberblock[5]
    magic = 0000000000bab10c
    version = 13
    txg = 5
    guid_sum = 8593195936635763240
    timestamp = 1262001087 UTC = Mon Dec 28 11:51:27 2009
Uberblock[7]
    magic = 0000000000bab10c
    version = 13
    txg = 7
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[8]
    magic = 0000000000bab10c
    version = 13
    txg = 8
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[9]
    magic = 0000000000bab10c
    version = 13
    txg = 9
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[10]
    magic = 0000000000bab10c
    version = 13
    txg = 10
    guid_sum = 5830045293826664715
    timestamp = 1262001132 UTC = Mon Dec 28 11:52:12 2009
Uberblock[14]
    magic = 0000000000bab10c
    version = 13
    txg = 14
    guid_sum = 5830045293826664715
    timestamp = 1262001249 UTC = Mon Dec 28 11:54:09 2009
Uberblock[16]
    magic = 0000000000bab10c
    version = 13
    txg = 16
    guid_sum = 5830045293826664715
    timestamp = 1262001249 UTC = Mon Dec 28 11:54:09 2009
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 13
    name: 'backup'
    state: 1
    txg: 16
    pool_guid: 3472166890449163768
    hostid: 2558674546
    hostname: 'vulpis'
    top_guid: 7962344807192492976
    guid: 15917724193338767979
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 7962344807192492976
        metaslab_array: 23
        metaslab_shift: 31
        ashift: 9
        asize: 400083648512
        is_log: 0
        children[0]:
            type: 'disk'
            id: 0
            guid: 18134448193074441749
            path: '/dev/ad16'
            whole_disk: 0
        children[1]:
            type: 'disk'
            id: 1
            guid: 15917724193338767979
            path: '/dev/ada4'
            whole_disk: 0
Uberblock[4]
    magic = 0000000000bab10c
    version = 13
    txg = 4
    guid_sum = 8593195936635763240
    timestamp = 1262001087 UTC = Mon Dec 28 11:51:27 2009
Uberblock[5]
    magic = 0000000000bab10c
    version = 13
    txg = 5
    guid_sum = 8593195936635763240
    timestamp = 1262001087 UTC = Mon Dec 28 11:51:27 2009
Uberblock[7]
    magic = 0000000000bab10c
    version = 13
    txg = 7
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[8]
    magic = 0000000000bab10c
    version = 13
    txg = 8
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[9]
    magic = 0000000000bab10c
    version = 13
    txg = 9
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[10]
    magic = 0000000000bab10c
    version = 13
    txg = 10
    guid_sum = 5830045293826664715
    timestamp = 1262001132 UTC = Mon Dec 28 11:52:12 2009
Uberblock[14]
    magic = 0000000000bab10c
    version = 13
    txg = 14
    guid_sum = 5830045293826664715
    timestamp = 1262001249 UTC = Mon Dec 28 11:54:09 2009
Uberblock[16]
    magic = 0000000000bab10c
    version = 13
    txg = 16
    guid_sum = 5830045293826664715
    timestamp = 1262001249 UTC = Mon Dec 28 11:54:09 2009
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 13
    name: 'backup'
    state: 1
    txg: 16
    pool_guid: 3472166890449163768
    hostid: 2558674546
    hostname: 'vulpis'
    top_guid: 7962344807192492976
    guid: 15917724193338767979
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 7962344807192492976
        metaslab_array: 23
        metaslab_shift: 31
        ashift: 9
        asize: 400083648512
        is_log: 0
        children[0]:
            type: 'disk'
            id: 0
            guid: 18134448193074441749
            path: '/dev/ad16'
            whole_disk: 0
        children[1]:
            type: 'disk'
            id: 1
            guid: 15917724193338767979
            path: '/dev/ada4'
            whole_disk: 0
Uberblock[4]
    magic = 0000000000bab10c
    version = 13
    txg = 4
    guid_sum = 8593195936635763240
    timestamp = 1262001087 UTC = Mon Dec 28 11:51:27 2009
Uberblock[5]
    magic = 0000000000bab10c
    version = 13
    txg = 5
    guid_sum = 8593195936635763240
    timestamp = 1262001087 UTC = Mon Dec 28 11:51:27 2009
Uberblock[7]
    magic = 0000000000bab10c
    version = 13
    txg = 7
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[8]
    magic = 0000000000bab10c
    version = 13
    txg = 8
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[9]
    magic = 0000000000bab10c
    version = 13
    txg = 9
    guid_sum = 5830045293826664715
    timestamp = 1262001131 UTC = Mon Dec 28 11:52:11 2009
Uberblock[10]
    magic = 0000000000bab10c
    version = 13
    txg = 10
    guid_sum = 5830045293826664715
    timestamp = 1262001132 UTC = Mon Dec 28 11:52:12 2009
Uberblock[14]
    magic = 0000000000bab10c
    version = 13
    txg = 14
    guid_sum = 5830045293826664715
    timestamp = 1262001249 UTC = Mon Dec 28 11:54:09 2009
Uberblock[16]
    magic = 0000000000bab10c
    version = 13
    txg = 16
    guid_sum = 5830045293826664715
    timestamp = 1262001249 UTC = Mon Dec 28 11:54:09 2009
dweeezil commented 8 years ago

@JuliaVixen Hmm, the label is missing vdev_children which is only possible for a pool created with a very old version of ZFS. It was added to illumos in illumos/illumos-gate@88ecc94 in the general timeframe of the txgs listed above (3 months earlier). On what OS was this pool created? It doesn't appear that it would contain much of anything; txg 4 is generally the creation txg and the highest txg shown by your labels is 16. With only 12 txgs ever applied to the pool, it's hard to believe there's much interesting in it nor that it was ever even imported on a live system for very long.

JuliaVixen commented 8 years ago

This pool was created on FreeBSD release, um... I forgot, 7.something I think? (I still have the root fs drive, so I could theoretically check.) In 2009, I probably plugged this drive in (with a second drive too), created a zpool mirror named "backup", backed some stuff up to it, unplugged the drives, and hid them away (apparently in different locations) until I found one of them again in 2016.

I'm pretty sure there's data on it, I just can't get any information about that from zdb or any other tools.

I've probably only imported this once or twice, in 2009, just to copy data to it.

DeHackEd commented 8 years ago

I agree with dweezil, the pool was only imported for around 3m18s and there should be another missing disk above and beyond the ad16 listed. There may be data on it but not much at all.

Allowing the import to proceed with a known missing top-level vdev just to promptly crash is a bug though.

dweeezil commented 8 years ago

Here's another interesting item shown by the vdev label: the guid_sum changed between txg 5 and txg 7 which means the pool configuration was changed early on.

I manually ran the vdev checksum calculation and it definitely doesn't match any of the uberblocks. @JuliaVixen I don't know much about the various versions of ZoL available for Gentoo but this still sounds like a regression introduced by the stable ABI as I understand is used on some Gentoo systems. The import process shouldn't get this far and as @DeHackEd pointed out, that's the real bug here. Can you try with a more stock build of ZoL, assuming you are in fact using a version with the new stable API?

drescherjm commented 8 years ago

The zfs-9999, spl-9999, zfs-kmod-9999 ebuilds will have the current master without the stable API patches.

JuliaVixen commented 8 years ago
[Backup 30T of data...]

localhost ~ # echo "sys-kernel/spl ~amd64" >> /etc/portage/package.accept_keywords
localhost ~ # echo "sys-fs/zfs-kmod ~amd64" >> /etc/portage/package.accept_keywords
localhost ~ # echo "sys-fs/zfs ~amd64" >> /etc/portage/package.accept_keywords
localhost ~ # echo "=sys-kernel/spl-9999 **" >> /etc/portage/package.accept_keywords
localhost ~ # echo "=sys-fs/zfs-kmod-9999 **" >> /etc/portage/package.accept_keywords
localhost ~ # echo "=sys-fs/zfs-9999 **" >> /etc/portage/package.accept_keywords
localhost ~ # emerge =zfs-9999

[Installation of 50 packages later...]
[Reboot just to make sure the old kernel modules didn't stick around...]
[Plug old drive back in...]

localhost ~ # zdb -l /dev/sdg
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 13
    name: 'backup'
    state: 1
    txg: 16
    pool_guid: 3472166890449163768
    hostid: 2558674546
[etc...]

localhost ~ # zpool import
no pools available to import
localhost ~ # zpool import -D
no pools available to import
localhost ~ # zpool import -o readonly=on backup
cannot import 'backup': no such pool available

localhost ~ # mkdir tmptmp
localhost ~ # cp -ai /dev/sdg* tmptmp
localhost ~ # zpool import -d tmptmp
   pool: backup
     id: 3472166890449163768
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
 config:

    backup      UNAVAIL  missing device
      mirror-0  DEGRADED
        ad16    UNAVAIL
        sdg     ONLINE

    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.

localhost ~ # zpool import -d tmptmp -o readonly=on backup

[  962.631909] PANIC: blkptr at ffff880807a7d048 DVA 1 has invalid VDEV 1
[  962.632014] Showing stack for process 4802
[  962.632016] CPU: 0 PID: 4802 Comm: zpool Tainted: P           O    4.4.6-gentoo #1
[  962.632016] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013
[  962.632018]  0000000000000000 ffff8808048f3798 ffffffff8130feed 0000000000000003
[  962.632019]  0000000000000001 ffff8808048f37a8 ffffffffa0a32239 ffff8808048f38d8
[  962.632021]  ffffffffa0a322d1 ffff8808048f3818 ffffffff814cf51c 61207274706b6c62
[  962.632022] Call Trace:
[  962.632026]  [<ffffffff8130feed>] dump_stack+0x4d/0x64
[  962.632031]  [<ffffffffa0a32239>] spl_dumpstack+0x3d/0x3f [spl]
[  962.632033]  [<ffffffffa0a322d1>] vcmn_err+0x96/0xd3 [spl]
[  962.632036]  [<ffffffff814cf51c>] ? __schedule+0x5fe/0x744
[  962.632038]  [<ffffffff814cf75d>] ? schedule+0x72/0x81
[  962.632039]  [<ffffffff814d2048>] ? schedule_timeout+0x24/0x15b
[  962.632042]  [<ffffffff8105b4a0>] ? __sched_setscheduler+0x56a/0x73f
[  962.632045]  [<ffffffff810e9ac7>] ? cache_alloc_refill+0x69/0x4a3
[  962.632059]  [<ffffffffa0ef0936>] zfs_panic_recover+0x4d/0x4f [zfs]
[  962.632064]  [<ffffffffa0ea18ca>] ? arc_space_return+0x13e9/0x24ba [zfs]
[  962.632071]  [<ffffffffa0f2eb5a>] zfs_blkptr_verify+0x248/0x2a1 [zfs]
[  962.632079]  [<ffffffffa0f2ebe7>] zio_read+0x34/0x147 [zfs]
[  962.632084]  [<ffffffffa0ea18ca>] ? arc_space_return+0x13e9/0x24ba [zfs]
[  962.632089]  [<ffffffffa0ea3731>] arc_read+0x817/0x85d [zfs]
[  962.632097]  [<ffffffffa0eb372f>] dmu_objset_open_impl+0xf0/0x65d [zfs]
[  962.632098]  [<ffffffff810685cc>] ? add_wait_queue+0x44/0x44
[  962.632109]  [<ffffffffa0ecead8>] dsl_pool_init+0x2d/0x50 [zfs]
[  962.632120]  [<ffffffffa0ee823f>] spa_vdev_remove+0xb5d/0x2220 [zfs]
[  962.632122]  [<ffffffffa0a30e9a>] ? taskq_create+0x30e/0x6a5 [spl]
[  962.632124]  [<ffffffff810685cc>] ? add_wait_queue+0x44/0x44
[  962.632126]  [<ffffffffa0a897d2>] ? nvlist_remove_nvpair+0x2a8/0x314 [znvpair]
[  962.632129]  [<ffffffffa0ae8e68>] ? zpool_get_rewind_policy+0x116/0x13c [zcommon]
[  962.632140]  [<ffffffffa0ee93ee>] spa_vdev_remove+0x1d0c/0x2220 [zfs]
[  962.632141]  [<ffffffffa0ae8dda>] ? zpool_get_rewind_policy+0x88/0x13c [zcommon]
[  962.632152]  [<ffffffffa0ee9f04>] spa_import+0x191/0x640 [zfs]
[  962.632161]  [<ffffffffa0f115cc>] zfs_secpolicy_smb_acl+0x1485/0x42ac [zfs]
[  962.632169]  [<ffffffffa0f1609f>] pool_status_check+0x3b4/0x48f [zfs]
[  962.632171]  [<ffffffff810fca17>] do_vfs_ioctl+0x3f5/0x43d
[  962.632173]  [<ffffffff810368f2>] ? __do_page_fault+0x24e/0x367
[  962.632174]  [<ffffffff810fca98>] SyS_ioctl+0x39/0x61
[  962.632176]  [<ffffffff814d2b57>] entry_SYSCALL_64_fastpath+0x12/0x6a

[In other terminal]
localhost ~ # ps auwx | grep zpool
root      4802  0.0  0.0 566452  4852 pts/0    D+   23:43   0:00 zpool import -d tmptmp -o readonly on backup

[This is the exact GIT stuff I'm currently running]

localhost ~ # grep  .  /usr/portage/distfiles/git3-src/*/*HEAD*
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:39cd90ef08bb6817dd57ac08e9de5c87af2681ed      https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:72e7de60262b8a1925e0a384a76cc1d745ea310e  not-for-merge   tag 'spl-0.4.0' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:60844c45530c731a83594162599181ab70ee3b6c  not-for-merge   tag 'spl-0.4.1' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:28d9aa198a32e3b683e741792436b69ead16de2e  not-for-merge   tag 'spl-0.4.2' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:17221eb570dea0bd581766a79656ad4c713ec759  not-for-merge   tag 'spl-0.4.3' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:46a1d3fe02e7a59242b417e12332b871285ecb2d  not-for-merge   tag 'spl-0.4.4' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:f264e472ef078b51e4856218f456e0699a4dbd62  not-for-merge   tag 'spl-0.4.5' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:6269bb7267809e221e4d63219e0960f8f6d71251  not-for-merge   tag 'spl-0.4.6' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:3dbd424037e7dcdf74f7b2caa822b840f94a6cca  not-for-merge   tag 'spl-0.4.7' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:e86467fcf952b2734ee9939ef48252437a85baea  not-for-merge   tag 'spl-0.4.8' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:5c04498004a2a00f3ccc2542cc11a3e9902a304d  not-for-merge   tag 'spl-0.4.9' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:55d0582828402420da093e2a9002c2941f43bc3e  not-for-merge   tag 'spl-0.5.0' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:5580c16c869ceed5c4c16280b87ccefcb966950e  not-for-merge   tag 'spl-0.5.1' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:48eea97fed64c0fd6bac0dbf94788d11d69aac47  not-for-merge   tag 'spl-0.5.2' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:7b85a4549f9624f69e52085f9ba72ec0845ec4e4  not-for-merge   tag 'spl-0.6.0-rc1' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:64152b9093ae98187d97051e47e8de37f6aeac4b  not-for-merge   tag 'spl-0.6.0-rc10' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:bbcd7e15c58bcc2c3ceeb031a72e03cacfcd27b5  not-for-merge   tag 'spl-0.6.0-rc11' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:1f49fc87fbea81d8e390b806f0499d6b633e5e2d  not-for-merge   tag 'spl-0.6.0-rc12' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:1cad00e56fcb7a6856ca84ffff4c3de17bcac6d4  not-for-merge   tag 'spl-0.6.0-rc13' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:085913f746de3ccd1703203b452efc7a2b6b77ad  not-for-merge   tag 'spl-0.6.0-rc14' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:de69baa886347682efd53ecfae8a3b02fd12b60f  not-for-merge   tag 'spl-0.6.0-rc2' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:70a9fde629fc58d867f9ffd3abea773b77b4b370  not-for-merge   tag 'spl-0.6.0-rc3' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:66d15f50659734e42876347ffc60a38899dd631c  not-for-merge   tag 'spl-0.6.0-rc4' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:bd8c888627201fbbbe3f5f031b4c199a4f374587  not-for-merge   tag 'spl-0.6.0-rc5' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:133c515e9962b00a391d360a19d6c911df793d21  not-for-merge   tag 'spl-0.6.0-rc6' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:658fb5fa08f10703b4d1861982dfc0d44da15db9  not-for-merge   tag 'spl-0.6.0-rc7' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:55d3957b3fdee19e6e65f851fdd83f8874d856bb  not-for-merge   tag 'spl-0.6.0-rc8' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:b9910c44f097a5387cad97b003e8b6d7403381c9  not-for-merge   tag 'spl-0.6.0-rc9' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:db4237ae0bc3fcb698c6f30962b2a67c03d2e1d7  not-for-merge   tag 'spl-0.6.1' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:3395c5fa0533ae7fc6ae89ba314d2685e6feef37  not-for-merge   tag 'spl-0.6.2' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:6ac27d4655cabe8b12193a8ae3378efa3c6d0537  not-for-merge   tag 'spl-0.6.3' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:5aafddabeae97971e05bd6f592e19c04857bf4f2  not-for-merge   tag 'spl-0.6.4' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/FETCH_HEAD:5882cac9e11b686172a1e396888aa867c48eb0fc  not-for-merge   tag 'spl-0.6.5' of https://github.com/zfsonlinux/spl
/usr/portage/distfiles/git3-src/zfsonlinux_spl.git/HEAD:ref: refs/heads/master
/usr/portage/distfiles/git3-src/zfsonlinux_zfs-images.git/FETCH_HEAD:3331601f6dc50ef2c9779c1656218701b48b276c       branch 'master' of https://github.com/zfsonlinux/zfs-images
/usr/portage/distfiles/git3-src/zfsonlinux_zfs-images.git/FETCH_HEAD:3331601f6dc50ef2c9779c1656218701b48b276c       https://github.com/zfsonlinux/zfs-images
/usr/portage/distfiles/git3-src/zfsonlinux_zfs-images.git/HEAD:ref: refs/heads/master
/usr/portage/distfiles/git3-src/zfsonlinux_zfs.git/FETCH_HEAD:bc2d809387debb95d82f47185d446f328da4d147      https://github.com/zfsonlinux/zfs
/usr/portage/distfiles/git3-src/zfsonlinux_zfs.git/HEAD:ref: refs/heads/master

localhost ~ # modinfo zfs | head
filename:       /lib/modules/4.4.6-gentoo/extra/zfs/zfs.ko
version:        0.6.5-281_gbc2d809
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
srcversion:     7B91AC3C57C8823BFB45845
depends:        spl,znvpair,zunicode,zcommon,zavl
vermagic:       4.4.6-gentoo SMP mod_unload modversions 
parm:           zvol_inhibit_dev:Do not create zvol device nodes (uint)
parm:           zvol_major:Major number for zvol device (uint)
localhost ~ # modinfo spl | head
filename:       /lib/modules/4.4.6-gentoo/extra/spl/spl.ko
version:        0.6.5-54_g39cd90e
license:        GPL
author:         OpenZFS on Linux
description:    Solaris Porting Layer
srcversion:     F46FA65506ED2842A5834D8
depends:        zlib_deflate
vermagic:       4.4.6-gentoo SMP mod_unload modversions 
parm:           spl_hostid:The system hostid. (ulong)
parm:           spl_hostid_path:The system hostid file (/etc/hostid) (charp)

localhost ~ # ls -l /lib/modules/4.4.6-gentoo/extra/zfs/zfs.ko
-rw-r--r-- 1 root root 1704200 May 18 23:24 /lib/modules/4.4.6-gentoo/extra/zfs/zfs.ko

localhost ~ # strings  /lib/modules/4.4.6-gentoo/extra/zfs/zfs.ko | grep 9999
/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/arc.c
/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/bplist.c
/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/bpobj.c
/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/dbuf.c
[etc. etc. It's the current package.]
JuliaVixen commented 8 years ago

Oh hey guess what! I just got this crash again with a drive from my old Solaris box. I know for certain that there is no missing VDEV this time; I have all of the drives plugged in.

At the console:

localhost ~ # zpool import -o readonly=on unmirrored
PANIC: blkptr at ffff880054b92048 DVA 1 has invalid VDEV 1

In the log:

[73663.722473] PANIC: blkptr at ffff880054b92048 DVA 1 has invalid VDEV 1
[73663.722576] Showing stack for process 16033
[73663.722578] CPU: 1 PID: 16033 Comm: zpool Tainted: P           O    4.4.6-gentoo #1
[73663.722578] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013
[73663.722580]  0000000000000000 ffff88043e437798 ffffffff8130feed 0000000000000003
[73663.722581]  0000000000000001 ffff88043e4377a8 ffffffffa0a9a239 ffff88043e4378d8
[73663.722583]  ffffffffa0a9a2d1 ffff88043e437818 ffffffff814cf51c 61207274706b6c62
[73663.722584] Call Trace:
[73663.722589]  [<ffffffff8130feed>] dump_stack+0x4d/0x64
[73663.722594]  [<ffffffffa0a9a239>] spl_dumpstack+0x3d/0x3f [spl]
[73663.722596]  [<ffffffffa0a9a2d1>] vcmn_err+0x96/0xd3 [spl]
[73663.722598]  [<ffffffff814cf51c>] ? __schedule+0x5fe/0x744
[73663.722600]  [<ffffffff814cf75d>] ? schedule+0x72/0x81
[73663.722602]  [<ffffffff814d2048>] ? schedule_timeout+0x24/0x15b
[73663.722605]  [<ffffffff8105b4a0>] ? __sched_setscheduler+0x56a/0x73f
[73663.722608]  [<ffffffff810e9ac7>] ? cache_alloc_refill+0x69/0x4a3
[73663.722623]  [<ffffffffa1000936>] zfs_panic_recover+0x4d/0x4f [zfs]
[73663.722628]  [<ffffffffa0fb18ca>] ? arc_space_return+0x13e9/0x24ba [zfs]
[73663.722636]  [<ffffffffa103eb5a>] zfs_blkptr_verify+0x248/0x2a1 [zfs]
[73663.722643]  [<ffffffffa103ebe7>] zio_read+0x34/0x147 [zfs]
[73663.722648]  [<ffffffffa0fb18ca>] ? arc_space_return+0x13e9/0x24ba [zfs]
[73663.722653]  [<ffffffffa0fb3731>] arc_read+0x817/0x85d [zfs]
[73663.722661]  [<ffffffffa0fc372f>] dmu_objset_open_impl+0xf0/0x65d [zfs]
[73663.722663]  [<ffffffff810685cc>] ? add_wait_queue+0x44/0x44
[73663.722673]  [<ffffffffa0fdead8>] dsl_pool_init+0x2d/0x50 [zfs]
[73663.722685]  [<ffffffffa0ff823f>] spa_vdev_remove+0xb5d/0x2220 [zfs]
[73663.722687]  [<ffffffffa0a98e9a>] ? taskq_create+0x30e/0x6a5 [spl]
[73663.722688]  [<ffffffff810685cc>] ? add_wait_queue+0x44/0x44
[73663.722691]  [<ffffffffa0aac7d2>] ? nvlist_remove_nvpair+0x2a8/0x314 [znvpair]
[73663.722693]  [<ffffffffa0ac0e68>] ? zpool_get_rewind_policy+0x116/0x13c [zcommon]
[73663.722704]  [<ffffffffa0ff93ee>] spa_vdev_remove+0x1d0c/0x2220 [zfs]
[73663.722706]  [<ffffffffa0ac0dda>] ? zpool_get_rewind_policy+0x88/0x13c [zcommon]
[73663.722717]  [<ffffffffa0ff9f04>] spa_import+0x191/0x640 [zfs]
[73663.722726]  [<ffffffffa10215cc>] zfs_secpolicy_smb_acl+0x1485/0x42ac [zfs]
[73663.722734]  [<ffffffffa102609f>] pool_status_check+0x3b4/0x48f [zfs]
[73663.722736]  [<ffffffff810fca17>] do_vfs_ioctl+0x3f5/0x43d
[73663.722737]  [<ffffffff810fca98>] SyS_ioctl+0x39/0x61
[73663.722739]  [<ffffffff814d2b57>] entry_SYSCALL_64_fastpath+0x12/0x6a

Yep, it's blocked... root 16033 0.0 0.0 47064 4592 tty4 D+ 01:47 0:00 zpool import -o readonly on unmirrored

Here's the disk label

localhost ~ # zdb -l /dev/sdl2
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 2
    name: 'unmirrored'
    state: 0
    txg: 9297562
    pool_guid: 17787833881665718298
    top_guid: 13027374485316129798
    guid: 13027374485316129798
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 13027374485316129798
        path: '/dev/dsk/c1d0p2'
        devid: 'id1,cmdk@AST3750640AS=____________3QD06J7R/s'
        whole_disk: 0
        metaslab_array: 13
        metaslab_shift: 32
        ashift: 9
        asize: 674616049664
        DTL: 184
--------------------------------------------
LABEL 1
--------------------------------------------
    version: 2
    name: 'unmirrored'
    state: 0
    txg: 9297562
    pool_guid: 17787833881665718298
    top_guid: 13027374485316129798
    guid: 13027374485316129798
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 13027374485316129798
        path: '/dev/dsk/c1d0p2'
        devid: 'id1,cmdk@AST3750640AS=____________3QD06J7R/s'
        whole_disk: 0
        metaslab_array: 13
        metaslab_shift: 32
        ashift: 9
        asize: 674616049664
        DTL: 184
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 2
    name: 'unmirrored'
    state: 0
    txg: 9297562
    pool_guid: 17787833881665718298
    top_guid: 13027374485316129798
    guid: 13027374485316129798
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 13027374485316129798
        path: '/dev/dsk/c1d0p2'
        devid: 'id1,cmdk@AST3750640AS=____________3QD06J7R/s'
        whole_disk: 0
        metaslab_array: 13
        metaslab_shift: 32
        ashift: 9
        asize: 674616049664
        DTL: 184
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 2
    name: 'unmirrored'
    state: 0
    txg: 9297562
    pool_guid: 17787833881665718298
    top_guid: 13027374485316129798
    guid: 13027374485316129798
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 13027374485316129798
        path: '/dev/dsk/c1d0p2'
        devid: 'id1,cmdk@AST3750640AS=____________3QD06J7R/s'
        whole_disk: 0
        metaslab_array: 13
        metaslab_shift: 32
        ashift: 9
        asize: 674616049664
        DTL: 184
localhost ~ # hdparm -i /dev/sdl

/dev/sdl:

 Model=ST3750640AS, FwRev=3.AAC, SerialNo=3QD06J7R
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1465149168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

 * signifies the current active mode

This disk was the boot drive on my old Solaris 10 box... SunOS 5.10 Generic 118855-14 May 2006 Solaris 10 6/06 s10x_u2wos_09a X86

Jul  1 01:41:18 localhost kernel: scsi 1:0:0:0: Direct-Access     ATA      ST3750640AS      C    PQ: 0 ANSI: 5
Jul  1 01:41:18 localhost kernel: sd 1:0:0:0: [sdl] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
Jul  1 01:41:18 localhost kernel: sd 1:0:0:0: Attached scsi generic sg12 type 0
Jul  1 01:41:18 localhost kernel: sd 1:0:0:0: [sdl] Write Protect is off
Jul  1 01:41:18 localhost kernel: sd 1:0:0:0: [sdl] Mode Sense: 00 3a 00 00
Jul  1 01:41:18 localhost kernel: sd 1:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul  1 01:41:18 localhost kernel: sdl: sdl1 sdl2
Jul  1 01:41:18 localhost kernel: sdl1: <solaris: [s0] sdl5 [s1] sdl6 [s2] sdl7 [s7] sdl8 [s8] sdl9 [s9] sdl10 >
Jul  1 01:41:18 localhost kernel: sd 1:0:0:0: [sdl] Attached SCSI disk
localhost ~ # file -s /dev/sdl5
/dev/sdl5: Unix Fast File system [v1] (little-endian), last mounted on /, last written at Fri Aug 28 18:45:31 2009, clean flag 253, number of blocks 5092605, number of data blocks 5015211, number of cylinder groups 104, block size 8192, fragment size 1024, minimum percentage of free blocks 1, rotational delay 0ms, disk rotational speed 60rps, TIME optimization
localhost ~ # mount -r /dev/sdl5 /mnt/temp/
localhost ~ # ls -a /mnt/temp
.   .TTauthority  .bash_history  .dtprofile  .gconfd         .iiim            .softwareupdate  .sunw  bin   cdrom  dev      etc     format.dat  kernel  lost+found  net             opt       proc  system  unmirrored  var
..  .Xauthority   .dt            .gconf      .gstreamer-0.8  .smc.properties  .ssh             TT_DB  boot  data   devices  export  home        lib     mnt         noautoshutdown  platform  sbin  tmp     usr         vol
behlendorf commented 8 years ago

@JuliaVixen ZFS is detecting a DVA (data virtual address) which appears to be damaged because it refers to a vdev which can't exist according to the pool configuration. Since the drive is from a very old Solaris system and ZFS version (pool version 2!), and because it's not the primary DVA I suspect it's just not being detected on the old setup. You could try setting the zfs_recover=1 module option to make the error non-fatal and importing the pool read-only.

JuliaVixen commented 7 years ago

I haven't tried the zfs_recover=1 option yet. (Sorry, been busy with other stuff.) But while I had this FreeBSD 11.0RC3 system up, I figured I'd try importing this drive on that, just to see what happens....

Well... FreeBSD has a kernel panic too.

panic: Solaris(panic): blkptr at 0xfffff80116bcf848 DVA 1 has invalid VDEV 1 cpuid=8 KDB: stack backtrace: [stuff...] zfs_blkptr_verify() appears to be where this panic gets thrown.

JuliaVixen commented 7 years ago

Ok, I booted with zfs.zfs_recover=1 and, typed zpool import in a terminal... No crash yet...

localhost ~ # zpool import
   pool: unmirrored
     id: 17787833881665718298
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
 config:

    unmirrored  UNAVAIL  missing device
      sdg2      ONLINE

    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.

Then I typed zpool import -o readonly=on unmirrored, and that caused a kernel panic...

[   75.602065] SPL: using hostid 0x00000000
[  103.700162] WARNING: blkptr at ffff880802306048 DVA 1 has invalid VDEV 1
[  103.715520] WARNING: blkptr at ffff880802880c40 DVA 1 has invalid VDEV 1
[  103.715771] WARNING: blkptr at ffff8808027e0000 DVA 1 has invalid VDEV 1
[  103.716003] WARNING: blkptr at ffff8808027d4240 DVA 0 has invalid VDEV 1
[  103.746942] WARNING: blkptr at ffff8808027d5640 DVA 0 has invalid VDEV 1
[  103.766755] WARNING: blkptr at ffff8808027e0080 DVA 1 has invalid VDEV 1
[  103.785574] WARNING: blkptr at ffff8808027e0100 DVA 1 has invalid VDEV 1
[  103.785633] WARNING: blkptr at ffff8808023436f8 DVA 1 has invalid VDEV 1
[  103.786688] WARNING: blkptr at ffff8808023436f8 DVA 1 has invalid VDEV 1
[  103.786741] WARNING: blkptr at ffff8808023436f8 DVA 1 has invalid VDEV 1
[  103.828517] WARNING: blkptr at ffff8808023436f8 DVA 1 has invalid VDEV 1
[  103.828578] WARNING: blkptr at ffff8808023436f8 DVA 1 has invalid VDEV 1
[  103.828621] WARNING: blkptr at ffff8808023436f8 DVA 1 has invalid VDEV 1
[  103.867807] WARNING: blkptr at ffff8808023436f8 DVA 1 has invalid VDEV 1
[  103.867871] WARNING: blkptr at ffff8808023436f8 DVA 1 has invalid VDEV 1
[  103.867914] WARNING: blkptr at ffff8808023436f8 DVA 1 has invalid VDEV 1
[  103.889641] VERIFY3(rvd->vdev_children == mrvd->vdev_children) failed (1 == 2)
[  103.889728] PANIC at spa.c:1717:spa_config_valid()
[  103.889804] Showing stack for process 5396
[  103.889806] CPU: 2 PID: 5396 Comm: zpool Tainted: P           O    4.6.7B #6
[  103.889807] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013
[  103.889808]  0000000000000000 ffff8808023439e8 ffffffff813cc8d6 ffffffffc1359822
[  103.889810]  ffffffffc13597f4 ffff8808023439f8 ffffffffc11469e5 ffff880802343b98
[  103.889812]  ffffffffc1146bda 0000000000000000 ffff880801036000 ffff880000000030
[  103.889814] Call Trace:
[  103.889819]  [<ffffffff813cc8d6>] dump_stack+0x4d/0x63
[  103.889828]  [<ffffffffc11469e5>] spl_dumpstack+0x3d/0x3f [spl]
[  103.889832]  [<ffffffffc1146bda>] spl_panic+0xb8/0xf6 [spl]
[  103.889872]  [<ffffffffc126577b>] ? spa_config_parse+0x22/0x100 [zfs]
[  103.889878]  [<ffffffffc1173ae8>] ? nvlist_lookup_common+0x6b/0x8d [znvpair]
[  103.889909]  [<ffffffffc126a8c1>] spa_load+0x1386/0x1c7a [zfs]
[  103.889914]  [<ffffffffc11960d8>] ? zpool_get_rewind_policy+0x116/0x13c [zcommon]
[  103.889946]  [<ffffffffc126b21e>] spa_load_best+0x69/0x251 [zfs]
[  103.889949]  [<ffffffffc119604a>] ? zpool_get_rewind_policy+0x88/0x13c [zcommon]
[  103.889982]  [<ffffffffc126bdf6>] spa_import+0x19f/0x653 [zfs]
[  103.890019]  [<ffffffffc12a5827>] zfs_ioc_pool_import+0xaf/0xec [zfs]
[  103.890056]  [<ffffffffc12ab0d9>] zfsdev_ioctl+0x40e/0x521 [zfs]
[  103.890059]  [<ffffffff811547d4>] vfs_ioctl+0x1c/0x2f
[  103.890060]  [<ffffffff81154e46>] do_vfs_ioctl+0x5cb/0x60e
[  103.890061]  [<ffffffff81154ec2>] SyS_ioctl+0x39/0x61
[  103.890064]  [<ffffffff81603a5f>] entry_SYSCALL_64_fastpath+0x17/0x93
JuliaVixen commented 7 years ago

While clearing out the rest of the stuff from my garage, I found the other half of the backup pool. Plugging both drives in, I can zfs import without error, everything is working fine....

So, I guess to reproduce this bug, I should remove a disk. Anyway, while I have it working, here's a dump of what a "working" configuration looks like...

If I just do zpool import, it will only see /dev/sdi, and ignore the other drives for some unknown reason. If I do zpool import -d tmp_devs, then it will see all the drives in the pool, for some unknown reason.

[The partitions are left over from whatever this drive was used as before, they're not really valid partitions, but I didn't zero out the disk before I used it in this ZFS pool.]

localhost ~ # mkdir foo

localhost ~ # cp -avi /dev/sdi* /dev/sdj* foo/
'/dev/sdi' -> 'foo/sdi'
'/dev/sdj' -> 'foo/sdj'
'/dev/sdj1' -> 'foo/sdj1'
'/dev/sdj2' -> 'foo/sdj2'
'/dev/sdj3' -> 'foo/sdj3'
'/dev/sdj4' -> 'foo/sdj4'

localhost ~ # zpool import -d foo
   pool: backup
     id: 3472166890449163768
  state: DEGRADED
 status: One or more devices contains corrupted data.
 action: The pool can be imported despite missing or damaged devices.  The
    fault tolerance of the pool may be compromised if imported.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
 config:

    backup                    DEGRADED
      mirror-0                DEGRADED
        sdj                   ONLINE
        15917724193338767979  UNAVAIL
      mirror-1                DEGRADED
        sdi                   ONLINE
        2036074377517197082   UNAVAIL

localhost ~ # zpool import -d foo -o readonly=on backup

localhost ~ # zpool status
  pool: backup
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    backup                    DEGRADED     0     0     0
      mirror-0                DEGRADED     0     0     0
        sdj                   ONLINE       0     0     0
        15917724193338767979  UNAVAIL      0     0     0  was /dev/ada4
      mirror-1                DEGRADED     0     0     0
        sdi                   ONLINE       0     0     0
        2036074377517197082   UNAVAIL      0     0     0  was /dev/ada3

errors: No known data errors

localhost ~ # ls -l /backup
total 3
drwxr-xr-x 18 root root 57 Dec 29  2009 linux1

localhost ~ # zdb -l /dev/sdi
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 22
    name: 'backup'
    state: 1
    txg: 9359
    pool_guid: 3472166890449163768
    hostid: 12756365
    hostname: 'hardy-core_installcd'
    top_guid: 10964104056227665195
    guid: 2683414997155590814
    vdev_children: 2
    vdev_tree:
        type: 'mirror'
        id: 1
        guid: 10964104056227665195
        metaslab_array: 27
        metaslab_shift: 32
        ashift: 9
        asize: 750151532544
        is_log: 0
        children[0]:
            type: 'disk'
            id: 0
            guid: 2683414997155590814
            path: '/dev/dsk/c2t0d0p0'
            devid: 'id1,sd@AST3750640AS=____________3QD0N0Q6/q'
            phys_path: '/pci@0,0/pci1043,8239@5,1/disk@0,0:q'
            whole_disk: 0
        children[1]:
            type: 'disk'
            id: 1
            guid: 2036074377517197082
            path: '/dev/ada3'
            whole_disk: 0
            not_present: 1
            DTL: 31
[etc.]
localhost ~ # zdb -l /dev/sdj
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 22
    name: 'backup'
    state: 1
    txg: 9359
    pool_guid: 3472166890449163768
    hostid: 12756365
    hostname: 'hardy-core_installcd'
    top_guid: 7962344807192492976
    guid: 18134448193074441749
    vdev_children: 2
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 7962344807192492976
        metaslab_array: 23
        metaslab_shift: 31
        ashift: 9
        asize: 400083648512
        is_log: 0
        children[0]:
            type: 'disk'
            id: 0
            guid: 18134448193074441749
            path: '/dev/dsk/c3t0d0p0'
            devid: 'id1,sd@AWDC_WD4000YR-01PLB0=_____WD-WMAMY1580352/q'
            phys_path: '/pci@0,0/pci1043,8239@5,2/disk@0,0:q'
            whole_disk: 0
        children[1]:
            type: 'disk'
            id: 1
            guid: 15917724193338767979
            path: '/dev/ada4'
            whole_disk: 0
            not_present: 1
            DTL: 29
[etc.]

Nothing in dmesg...

Oh hey, I found another drive, I can import it too!

localhost ~ # zpool export backup

localhost ~ # cp -avi /dev/sdh
sdh    sdh1   sdh10  sdh2   sdh3   sdh4   sdh5   sdh6   sdh7   sdh8   sdh9   
localhost ~ # cp -avi /dev/sdh* foo/
'/dev/sdh' -> 'foo/sdh'
'/dev/sdh1' -> 'foo/sdh1'
'/dev/sdh10' -> 'foo/sdh10'
'/dev/sdh2' -> 'foo/sdh2'
'/dev/sdh3' -> 'foo/sdh3'
'/dev/sdh4' -> 'foo/sdh4'
'/dev/sdh5' -> 'foo/sdh5'
'/dev/sdh6' -> 'foo/sdh6'
'/dev/sdh7' -> 'foo/sdh7'
'/dev/sdh8' -> 'foo/sdh8'
'/dev/sdh9' -> 'foo/sdh9'

localhost ~ # zpool import -d foo
   pool: backup
     id: 3472166890449163768
  state: DEGRADED
 status: One or more devices contains corrupted data.
 action: The pool can be imported despite missing or damaged devices.  The
    fault tolerance of the pool may be compromised if imported.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
 config:

    backup                   DEGRADED
      mirror-0               ONLINE
        sdj                  ONLINE
        sdh                  ONLINE
      mirror-1               DEGRADED
        sdi                  ONLINE
        2036074377517197082  UNAVAIL

localhost ~ # zpool import -d foo -o readonly=on backup

localhost ~ # zpool status
  pool: backup
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

    NAME                     STATE     READ WRITE CKSUM
    backup                   DEGRADED     0     0     0
      mirror-0               ONLINE       0     0     0
        sdj                  ONLINE       0     0     0
        sdh                  ONLINE       0     0     1
      mirror-1               DEGRADED     0     0     0
        sdi                  ONLINE       0     0     0
        2036074377517197082  UNAVAIL      0     0     0  was /dev/ada3

errors: No known data errors

localhost ~ # zdb -l /dev/sdh
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
    version: 13
    name: 'backup'
    state: 1
    txg: 16
    pool_guid: 3472166890449163768
    hostid: 2558674546
    hostname: 'vulpis'
    top_guid: 7962344807192492976
    guid: 15917724193338767979
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 7962344807192492976
        metaslab_array: 23
        metaslab_shift: 31
        ashift: 9
        asize: 400083648512
        is_log: 0
        children[0]:
            type: 'disk'
            id: 0
            guid: 18134448193074441749
            path: '/dev/ad16'
            whole_disk: 0
        children[1]:
            type: 'disk'
            id: 1
            guid: 15917724193338767979
            path: '/dev/ada4'
            whole_disk: 0

I don't know why LABEL 0 fails to unpack, Labels 1,2, and 3, are still unpackable.

Anyway, so I guess I'll pull /dev/sdi and see what happens....

JuliaVixen commented 7 years ago

Removed one drive.... nothing exciting happened.... But, then I removed the other drive, leaving only the drive I had tried to import [GUID: 15917724193338767979] back in May, when I opened this bug report... And kernel panic!

First the boring part:

localhost ~ # rm -v foo/sdi 
removed 'foo/sdi'

localhost ~ # zpool import -d foo
   pool: backup
     id: 3472166890449163768
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
 config:

    backup       UNAVAIL  missing device
      mirror-0   ONLINE
        sdj      ONLINE
        sdh      ONLINE

    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.

localhost ~ # zpool import -d foo -o readonly=on backup
cannot import 'backup': one or more devices is currently unavailable

localhost ~ # zpool status
no pools available

And then this triggers the kernel panic....

localhost ~ # rm -v foo/sdj*
removed 'foo/sdj'
removed 'foo/sdj1'
removed 'foo/sdj2'
removed 'foo/sdj3'
removed 'foo/sdj4'

localhost ~ # zpool import -d foo
   pool: backup
     id: 3472166890449163768
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
 config:

    backup      UNAVAIL  missing device
      mirror-0  DEGRADED
        ad16    UNAVAIL
        sdh     ONLINE

    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.

localhost ~ # zpool import -d foo -o readonly=on backup

And then block forever... So here's the stack dump...

[ 2166.289386] WARNING: blkptr at ffff88080340a048 DVA 1 has invalid VDEV 1
[ 2166.296987] WARNING: blkptr at ffff8800c5f94440 DVA 1 has invalid VDEV 1
[ 2166.306732] WARNING: blkptr at ffff880803172840 DVA 1 has invalid VDEV 1
[ 2166.312851] VERIFY3(rvd->vdev_children == mrvd->vdev_children) failed (1 == 2)
[ 2166.312895] PANIC at spa.c:1717:spa_config_valid()
[ 2166.312958] Showing stack for process 8978
[ 2166.312960] CPU: 3 PID: 8978 Comm: zpool Tainted: P           O    4.6.7B #6
[ 2166.312961] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013
[ 2166.312962]  0000000000000000 ffff880803a1b9e8 ffffffff813cc8d6 ffffffffc1478822
[ 2166.312965]  ffffffffc14787f4 ffff880803a1b9f8 ffffffffc12659e5 ffff880803a1bb98
[ 2166.312967]  ffffffffc1265bda 0000000000000000 0000000000000001 ffff880800000030
[ 2166.312969] Call Trace:
[ 2166.312973]  [<ffffffff813cc8d6>] dump_stack+0x4d/0x63
[ 2166.312982]  [<ffffffffc12659e5>] spl_dumpstack+0x3d/0x3f [spl]
[ 2166.312986]  [<ffffffffc1265bda>] spl_panic+0xb8/0xf6 [spl]
[ 2166.313021]  [<ffffffffc13847f2>] ? spa_config_parse+0x99/0x100 [zfs]
[ 2166.313050]  [<ffffffffc13898c1>] spa_load+0x1386/0x1c7a [zfs]
[ 2166.313055]  [<ffffffffc12b50d8>] ? zpool_get_rewind_policy+0x116/0x13c [zcommon]
[ 2166.313083]  [<ffffffffc138a21e>] spa_load_best+0x69/0x251 [zfs]
[ 2166.313085]  [<ffffffffc12b504a>] ? zpool_get_rewind_policy+0x88/0x13c [zcommon]
[ 2166.313113]  [<ffffffffc138adf6>] spa_import+0x19f/0x653 [zfs]
[ 2166.313146]  [<ffffffffc13c4827>] zfs_ioc_pool_import+0xaf/0xec [zfs]
[ 2166.313181]  [<ffffffffc13ca0d9>] zfsdev_ioctl+0x40e/0x521 [zfs]
[ 2166.313184]  [<ffffffff811547d4>] vfs_ioctl+0x1c/0x2f
[ 2166.313185]  [<ffffffff81154e46>] do_vfs_ioctl+0x5cb/0x60e
[ 2166.313188]  [<ffffffff8103f21a>] ? __do_page_fault+0x35f/0x4b5
[ 2166.313189]  [<ffffffff81154ec2>] SyS_ioctl+0x39/0x61
[ 2166.313191]  [<ffffffff81603a5f>] entry_SYSCALL_64_fastpath+0x17/0x93
stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.