openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

pool import cause panic in arc_read_nolock / zil_parse #1092

Closed lundman closed 11 years ago

lundman commented 11 years ago

Disclaimer; Now, this should be synced with master of this morning, but it does have local changes. "I don't think" those changes are in effect yet, but it entirely possible that I am wrong. So it could be not relevant here. In fact, when has upgrading a pool been tested on Linux.

My finger is hovering over the "delete issue" button, but if it is something that stands out immediately, please let me know.

# zpool import
Nov 15 12:50:55 mirror kernel: [990004.454585] SPL: using hostid 0x6dc2c2ce
pool: pool1
     id: 12736322493099843831
  state: ONLINE
 status: The pool is formatted using an older on-disk version.
 action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
 config:

    pool1                                ONLINE
      raidz1-0                           ONLINE
        scsi-1AMCC_RAG1S7RA929C2C00A0A0  ONLINE
        scsi-1AMCC_RAG28K9A929C2C002488  ONLINE
        scsi-1AMCC_RAG2TGYA929C2C000E17  ONLINE
        scsi-1AMCC_RAGU83ZC929C2D008F8B  ONLINE
        scsi-1AMCC_RAHGBPRD929C2C00AE98  ONLINE
        scsi-1AMCC_RAHJLUAD929C2C00A1A9  ONLINE
      raidz1-1                           ONLINE
        scsi-1AMCC_RAHJMDJD929C2C00ABD7  ONLINE
        scsi-1AMCC_RAHL0RND929C2D008BFC  ONLINE
        scsi-1AMCC_RAHL5NWD929C32000112  ONLINE
        scsi-1AMCC_RAHL8S8D929C2D002910  ONLINE
        scsi-1AMCC_RAHL9KWD929C2D003990  ONLINE
        scsi-1AMCC_RAHLB95D929C2D001E52  ONLINE
      raidz1-2                           ONLINE
        scsi-1AMCC_RAHLG27D929C2C006180  ONLINE
        scsi-1AMCC_RAHLGU4D929C2D006762  ONLINE
        scsi-1AMCC_RAHLNH0D929C2D00215B  ONLINE
        scsi-1AMCC_RAHLNK4D929C2C007078  ONLINE
        scsi-1AMCC_RAHLNKBD929C2C0066AE  ONLINE
        scsi-1AMCC_RAHLNLUD929C2C00327D  ONLINE
      raidz1-3                           ONLINE
        scsi-1AMCC_RAHLNZWD929C2D0061F9  ONLINE
        scsi-1AMCC_RAHLP10D929C32001FE1  ONLINE
        scsi-1AMCC_RAHLP18D929C2C0072CD  ONLINE
        scsi-1AMCC_RAHLP1KD929C2D00191C  ONLINE
        scsi-1AMCC_RAHLS74D929C2C000EDF  ONLINE
        scsi-1AMCC_RAHLS76D929C2D001E8C  ONLINE
      raidz1-4                           ONLINE
        scsi-1AMCC_RAHLSXLD929C2D0056BC  ONLINE
        scsi-1AMCC_RAHLSXXD929C2D009384  ONLINE
        scsi-1AMCC_RAHLSXZD929C2D00AE1D  ONLINE
        scsi-1AMCC_RAHLSZ4D929C2D0022DA  ONLINE
        scsi-1AMCC_RAHLWY5D929C2D009E9D  ONLINE
        scsi-1AMCC_RAHLWYPD929C2C001BA0  ONLINE
      raidz1-5                           ONLINE
        scsi-1AMCC_RAHLXD0D929C2D00751F  ONLINE
        scsi-1AMCC_RAHLXR3D929C2D001CC5  ONLINE
        scsi-1AMCC_RAHLY0LD929C2C0075F9  ONLINE
        scsi-1AMCC_RAHLY0SD929C2D002C1A  ONLINE
        scsi-1AMCC_RAHLY1SD929C2D0071BD  ONLINE
        scsi-1AMCC_RAHLY24D929C2C007DB0  ONLINE
    spares
      scsi-1AMCC_RAHLYV2D929C2D0067B9
      scsi-1AMCC_RAHLZ6KD929C2D008BA5
# zpool import pool1
Killed
 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [<ffffffffa053bdd8>]    arc_read_nolock+0x2d4/0x674 [zfs]
 PGD 118a57067 PUD 126c59067 PMD 0 
 Oops: 0000 [#1] SMP 

 Oops: 0000 [#1] SMP 
 CPU 0 
 Modules linked in: zpios(O) zfs(P) zcommon(P) zunicode(P) zavl(P) znvpair(P) spl(O) ansi_cprng vmac xcbc hmac seed rmd320 rmd256 rmd160 rmd128 cts ccm lzo ghash_generic gcm salsa20_generic salsa20_x86_64 camellia fcrypt pcbc tgr192 anubis wp512 khazad tea crc32c michael_mic arc4 cast6 cast5 deflate sha512_generic seqiv ctr xts lrw gf128mul cryptd aes_x86_64 aes_generic serpent twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common sha256_generic md4 cbc des_generic ecb sha1_generic zlib zlib_deflate ext2 loop evdev snd_pcm snd_timer snd tpm_tis tpm soundcore tpm_bios snd_page_alloc psmouse i2c_i801 i2c_core serio_raw pcspkr rng_core parport_pc parport container e752x_edac button processor thermal_sys shpchp edac_core ext4 mbcache jbd2 crc16 dm_mod sg sd_mod sr_mod cdrom crc_t10dif ata_generic ata_piix libata uhci_hcd 3w_9xxx aic79xx scsi_transport_spi e1000 ehci_hcd scsi_mod floppy usbcore usb_common [last 
kernel: unloaded: spl]

  Pid: 31168, comm: zpool Tainted: P           O 3.2.0-0.bpo.3-amd64 #1 Supermicro X6DH8-XG2/X6DH8-XG2
 RIP: 0010:[<ffffffffa053bdd8>]  [<ffffffffa053bdd8>] arc_read_nolock+0x2d4/0x674 [zfs]
 RSP: 0018:ffff88009e1457d8  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88011ea85d80 RCX: 0000000000000013
 RDX: 0000000000000000 RSI: ffff88009e145840 RDI: ffffffffa05eb8c0
 RBP: ffff8801011b8e80 R08: 004c7217a9e14a7d R09: ffff88011ea85d80
 R10: ffff88012fc0eab0 R11: ffff88009e1457c0 R12: ffff88009e1459b8
 R13: 0000000000000000 R14: ffffffffa05390ca R15: ffff88009e145a78
 FS:  00007fa165d3bb40(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000000 CR3: 000000011af1c000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process zpool (pid: 31168, threadinfo ffff88009e144000, task ffff880099b35690)

 kernel:[990019.461954] Call Trace:
  [<ffffffffa05b67d0>] ? zil_parse+0x2f8/0x55c [zfs]
  [<ffffffffa05b63c4>] ? zil_bp_tree_add+0x72/0x72 [zfs]
  [<ffffffffa05b6437>] ? zil_claim_log_block+0x73/0x73 [zfs]
  [<ffffffffa0548c54>] ? dmu_objset_from_ds+0x61/0x70 [zfs]
  [<ffffffff813651fd>] ? mutex_lock+0xd/0x2c
  [<ffffffffa05b6b5b>] ? zil_check_log_chain+0x127/0x152 [zfs]
  [<ffffffffa0546aa8>] ? dmu_objset_find_spa+0x2d8/0x2f0 [zfs]
  [<ffffffffa05465e4>] ? dmu_objset_is_snapshot+0x1a/0x1a [zfs]
  [<ffffffffa05468e1>] ? dmu_objset_find_spa+0x111/0x2f0 [zfs]
  [<ffffffffa05465e4>] ? dmu_objset_is_snapshot+0x1a/0x1a [zfs]
  [<ffffffffa0546ae4>] ? dmu_objset_find+0x24/0x29 [zfs]
  [<ffffffffa05b6a34>] ? zil_parse+0x55c/0x55c [zfs]
  [<ffffffffa0575992>] ? spa_load+0xebc/0x1125 [zfs]
  [<ffffffffa059201a>] ? zcrypt_keystore_init+0x4e/0x6d [zfs]
  [<ffffffffa0575c56>] ? spa_load_best+0x5b/0x1bb [zfs]
  [<ffffffffa0579db5>] ? spa_import+0x161/0x5b6 [zfs]
  [<ffffffffa059c544>] ? get_nvlist+0x99/0xb4 [zfs]
  [<ffffffffa059f6f8>] ? zfs_ioc_pool_import+0xb0/0x106 [zfs]
  [<ffffffffa059fb4d>] ? zfsdev_ioctl+0x114/0x16c [zfs]
  [<ffffffff81113a3f>] ? do_vfs_ioctl+0x464/0x4b1
  [<ffffffff81113ad7>] ? sys_ioctl+0x4b/0x70
  [<ffffffff8136b292>] ? system_call_fastpath+0x16/0x1b
 Code: 00 00 00 48 c7 43 08 00 00 00 00 4c 89 fe 48 c7 43 10 00 00 00 00 48 c7 43 18 00 00 00 00 48 89 ef e8 50 bd ff ff e9 69 fd ff ff <41> f6 45 00 08 74 14 48 8b 74 24 68 4c 89 fa 48 89 df e8 09 ab
behlendorf commented 11 years ago

This is a new one. Can you grab the line number of the NULL deref and post it in to the bug. You should be able to get it with gdb as follows. Do your local changes include and modifications to these arc functions?

$ gdb module/zfs/zfs.ko
Reading symbols from /home/behlendo/src/git/zfs/module/zfs/zfs.ko...done.
(gdb) list *(arc_read_nolock+0x2d4)
...
lundman commented 11 years ago

I did not know that gdb command, future-lundman thanks you!

(gdb) list *(arc_read_nolock+0x2d4)
0x5dd8 is in arc_read_nolock (module/zfs/../../module/zfs/arc.c:2976).
2971                                    buf_discard_identity(hdr);
2972                                    (void) arc_buf_remove_ref(buf, private);
2973                                    goto top; /* restart the IO request */
2974                            }
2975                            /* if this is a prefetch, we don't have a reference */
2976                            if (*arc_flags & ARC_PREFETCH) {
2977                                    (void) remove_reference(hdr, hash_lock,
2978                                        private);
2979                                    hdr->b_flags |= ARC_PREFETCH;
2980                            }

arc.c is the one file that I have not touched. But added maybe 2 calls that call into it.

behlendorf commented 11 years ago

One of your new callers is passing in NULL for arc_flags. Don't do that.

lundman commented 11 years ago
zil.c:234
    error = arc_read(NULL, zilog->zl_spa, bp, 0, arc_getbuf_func,
                     &abuf, ZIO_PRIORITY_SYNC_READ, zio_flags, NULL, &zb);

yeah I do! This is not relevant to master, so I will close. Sorry for the noise.