kernel NULL dereference at ddt_enter+0x3/0x30 [zfs]

eqvinox commented 9 years ago

spl de2a22fcb3e7e8d7c4f695aeb0caf9cdc8f82c9a zfs e82cdc3acc4afe52c3ddc3fe9e4f456fdef80f41 Linux 3.14.22 (+ grsecurity)

[753356.844021] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a9
[753356.875130] IP: [<ffffffffc031d3f3>] ddt_enter+0x3/0x30 [zfs]
[753356.903519] PGD 0 
[753356.927343] Oops: 0000 [#1] SMP 
[753356.952615] Modules linked in: xfs wp512 ib_srpt tcm_qla2xxx tcm_fc libfc iscsi_target_mod target_core_pscsi target_core_file target_core_iblock tcm_loop target_core_mod mei_me mei intel_powerclamp zfs(O) zunicode(O) zcommon(O) znvpair(O) zavl(O) spl(O) mlx4_en isci mlx4_core qla2xxx scsi_transport_fc scsi_tgt efivarfs
[753357.078726] CPU: 3 PID: 6433 Comm: z_fr_iss_3/0 Tainted: G           O 3.14.22-grsec-00006-g3c1449f #12
[753357.130700] Hardware name: Supermicro X9DR7/E-(J)LN4F/X9DR7/E-(J)LN4F, BIOS 3.0  07/31/2013
[753357.161475] task: ffff880f5c975740 ti: ffff880f5c975db0 task.ti: ffff880f5c975db0
[753357.190228] RIP: 0010:[<ffffffffc031d3f3>]  [<ffffffffc031d3f3>] ddt_enter+0x3/0x30 [zfs]
[753357.219585] RSP: 0000:ffffc90018d7bd18  EFLAGS: 00010246
[753357.245808] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
[753357.274189] RDX: 00000000006c6562 RSI: 0000000000000000 RDI: 0000000000000001
[753357.301479] RBP: ffffc90018d7bd38 R08: 0000000000000000 R09: 0000000000000001
[753357.328665] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880104012410
[753357.355520] R13: 8000000000000000 R14: ffff880f5c975740 R15: 0000000000000000
[753357.381956] FS:  0000000000000000(0000) GS:ffff88103e400000(0000) knlGS:0000000000000000
[753357.409291] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[753357.434483] CR2: 00000000000000a9 CR3: 000000002bad5000 CR4: 00000000001607f0
[753357.460400] Stack:
[753357.480860]  ffffc90018d7bd38 ffffffffc03b4d12 0000000000000400 ffff8801040123a0
[753357.507112]  ffffc90018d7bd78 ffffffffc03b5cd3 ffff880104012958 ffff880f90ef1120
[753357.532171]  ffff880f90ef1130 ffff880f5c975740 ffff880104012958 ffff880f69adbc00
[753357.557442] Call Trace:
[753357.577454]  [<ffffffffc03b4d12>] ? zio_ddt_free+0x22/0x70 [zfs]
[753357.600338]  [<ffffffffc03b5cd3>] zio_execute+0xc3/0x170 [zfs]
[753357.622550]  [<ffffffffc0224fc9>] taskq_thread+0x229/0x4a0 [spl]
[753357.645360]  [<ffffffffab12dc70>] ? wake_up_state+0x20/0x20
[753357.668484]  [<ffffffffc0224da0>] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[753357.690365]  [<ffffffffab11cb2d>] kthread+0xfd/0x120
[753357.710387]  [<ffffffffab11ca30>] ? kthread_create_on_node+0x250/0x250
[753357.731801]  [<ffffffffababc464>] ret_from_fork+0x74/0xa0
[753357.751676]  [<ffffffffab11ca30>] ? kthread_create_on_node+0x250/0x250
[753357.772432] Code: 00 00 48 89 e5 48 89 d1 48 c1 e9 27 83 e1 01 75 07 48 c1 ea 28 0f b6 c2 48 8b 84 c7 28 13 00 00 5d 48 0f ba 2c 24 3f c3 55 31 f6 <48> 8b 87 a8 00 00 00 48 89 e5 53 48 89 fb e8 9a 20 79 eb 65 48 
[753357.837398] RIP  [<ffffffffc031d3f3>] ddt_enter+0x3/0x30 [zfs]
[753357.858245]  RSP <ffffc90018d7bd18>
[753357.876617] CR2: 00000000000000a9
[753357.917220] ---[ end trace 9119ce9d75ddf408 ]---
[753357.951928] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:20
[753357.974793] in_atomic(): 0, irqs_disabled(): 1, pid: 6433, name: z_fr_iss_3/0
[753357.996460] INFO: lockdep is turned off.
[753358.014133] CPU: 3 PID: 6433 Comm: z_fr_iss_3/0 Tainted: G      D    O 3.14.22-grsec-00006-g3c1449f #12
[753358.051093] Hardware name: Supermicro X9DR7/E-(J)LN4F/X9DR7/E-(J)LN4F, BIOS 3.0  07/31/2013
[753358.073990]  ffff880fedadfcf0 ffffc90018d7b948 ffffffffabaa8533 ffff880f5c975740
[753358.096012]  ffffc90018d7b960 ffffffffab125e52 ffff880fedadfc90 ffffc90018d7b988
[753358.117183]  ffffffffabab0f05 ffff880f5c975740 ffff880f5c975740 0000000000000009
[753358.138759] Call Trace:
[753358.155605]  [<ffffffffabaa8533>] dump_stack+0x4d/0x6c
[753358.175414]  [<ffffffffab125e52>] __might_sleep+0xd2/0x100
[753358.198661]  [<ffffffffabab0f05>] down_read+0x25/0xb0
[753358.217734]  [<ffffffffab107c5f>] exit_signals+0x1f/0x130
[753358.237340]  [<ffffffffab0f4b60>] do_exit+0xb0/0xba0
[753358.256437]  [<ffffffffab0f56d7>] do_group_exit+0x47/0xc0
[753358.275939]  [<ffffffffabab3d8d>] oops_end+0x6d/0xa0
[753358.294697]  [<ffffffffabaa0c9f>] no_context+0x387/0x39c
[753358.313253]  [<ffffffffabaa0d1b>] __bad_area_nosemaphore+0x67/0x8a1
[753358.332728]  [<ffffffffab143912>] ? __lock_acquire.isra.32+0x232/0xa10
[753358.352293]  [<ffffffffabaa480d>] ? __slab_free+0xf9/0x26f
[753358.370832]  [<ffffffffab130e38>] ? local_clock+0x28/0x40
[753358.388526]  [<ffffffffabaa1563>] bad_area_nosemaphore+0xe/0x16
[753358.407334]  [<ffffffffabab669c>] __do_page_fault+0xac/0x650
[753358.426992]  [<ffffffffab141ecf>] ? lock_release_holdtime.part.25+0xf/0x1a0
[753358.450734]  [<ffffffffab14025b>] ? cpuacct_charge+0x7b/0xb0
[753358.472334]  [<ffffffffab140273>] ? cpuacct_charge+0x93/0xb0
[753358.494076]  [<ffffffffab1401e0>] ? cpuacct_css_alloc+0xa0/0xa0
[753358.515692]  [<ffffffffab130e38>] ? local_clock+0x28/0x40
[753358.536794]  [<ffffffffab143912>] ? __lock_acquire.isra.32+0x232/0xa10
[753358.558662]  [<ffffffffab131960>] ? account_entity_dequeue+0x80/0x90
[753358.579947]  [<ffffffffabab6c4c>] do_page_fault+0xc/0x20
[753358.599906]  [<ffffffffabab3232>] page_fault+0x22/0x30
[753358.618049]  [<ffffffffc031d3f3>] ? ddt_enter+0x3/0x30 [zfs]
[753358.635517]  [<ffffffffc0224fb5>] ? taskq_thread+0x215/0x4a0 [spl]
[753358.653308]  [<ffffffffc03b4d12>] ? zio_ddt_free+0x22/0x70 [zfs]
[753358.671400]  [<ffffffffc03b5cd3>] zio_execute+0xc3/0x170 [zfs]
[753358.688820]  [<ffffffffc0224fc9>] taskq_thread+0x229/0x4a0 [spl]
[753358.706086]  [<ffffffffab12dc70>] ? wake_up_state+0x20/0x20
[753358.722755]  [<ffffffffc0224da0>] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[753358.740368]  [<ffffffffab11cb2d>] kthread+0xfd/0x120
[753358.755974]  [<ffffffffab11ca30>] ? kthread_create_on_node+0x250/0x250
[753358.773181]  [<ffffffffababc464>] ret_from_fork+0x74/0xa0
[753358.788761]  [<ffffffffab11ca30>] ? kthread_create_on_node+0x250/0x250
[753510.702207] ABORT_TASK: Found referenced qla2xxx task_tag: 1219552

# zpool status -v
  pool: pool25
 state: ONLINE
  scan: scrub repaired 0 in 1h18m with 0 errors on Sun Oct 19 16:38:22 2014
config:

    NAME                  STATE     READ WRITE CKSUM
    pool25                ONLINE       0     0     0
      mirror-0            ONLINE       0     0     0
        eidolon-zfs--sda  ONLINE       0     0     0
        eidolon-zfs--sdb  ONLINE       0     0     0
    logs
      eidolon-zfs--zil    ONLINE       0     0     0

errors: No known data errors

  pool: saspool
 state: ONLINE
  scan: scrub repaired 0 in 0h20m with 0 errors on Sun Oct 19 15:18:59 2014
config:

    NAME                        STATE     READ WRITE CKSUM
    saspool                     ONLINE       0     0     0
      raidz1-0                  ONLINE       0     0     0
        Cvdev-5000cca03bc45350  ONLINE       0     0     0
        Cvdev-5000cca05c44bf24  ONLINE       0     0     0
        Cvdev-5000cca05c44ea0c  ONLINE       0     0     0
        Cvdev-5000cca05c44f404  ONLINE       0     0     0
    logs
      Cvdev-zil                 ONLINE       0     0     0

errors: No known data errors

# zfs list -t all -r -o name,quota,avail,used,refer,usedds,reservation,acltype,atime,relatime,sync
NAME              QUOTA  AVAIL   USED  REFER  USEDDS  RESERV   ACLTYPE  ATIME  RELATIME      SYNC
pool25             none   186G   136M  6.80M   6.80M    none       off     on       off  standard
pool25/d             2G  1.89G   116M   116M    116M    none       off     on       off  standard
saspool            none  8.75T  1.56T  40.4K   40.4K    none       off     on       off  standard
saspool/b          none  8.75T   640G   640G    640G    none  posixacl    off       off  disabled
saspool/d          none  8.75T  18.6G  35.9K   35.9K    none       off     on       off  standard
saspool/d/a         10G  9.36G   658M   658M    658M    none       off    off       off  standard
saspool/d/g         20G  17.1G  2.95G  2.95G   2.95G    none       off    off       off  standard
saspool/d/m        none  8.75T  7.67G  7.67G   7.67G     15G  posixacl     on       off  standard
saspool/i          100G  97.6G  2.44G  37.4K   37.4K    none       off    off       off  standard
saspool/i/a          4G  3.93G  73.7M  1.48G   73.7M    none       off    off       off  standard
saspool/i/g          4G  3.76G   242M  1.59G    242M    none       off    off       off  standard
saspool/i/h          4G  3.57G   444M  1.84G    444M    none       off    off       off  standard
saspool/i/m        none  97.6G  1.57G  1.48G   1.48G    none       off    off       off  standard
saspool/i/m@h         -      -      0  1.47G       -       -       off      -         -         -
saspool/i/m@a         -      -      0  1.47G       -       -       off      -         -         -
saspool/i/m@g         -      -  53.9K  1.47G       -       -       off      -         -         -
saspool/i/m@s         -      -      0  1.48G       -       -       off      -         -         -
saspool/i/s          4G  3.88G   125M  1.49G    125M    none       off    off       off  standard
saspool/j          none  8.75T   103G  34.4K   34.4K    none  posixacl    off       off  standard
saspool/j/h        none  8.75T   102G   102G    102G    none  posixacl    off       off  standard
saspool/j/r        none  8.75T   953M   953M    953M    none  posixacl    off       off  standard
saspool/u          none  8.75T   837G  34.4K   34.4K    none       off    off        on  standard
saspool/u/s          4T  3.18T   837G   723G    723G    none       off    off        on  disabled
saspool/u/s@m         -      -  25.4K   723G       -       -       off      -         -         -
saspool/u/s/p         -  3.18T  82.9G  82.9G   82.9G    none         -      -         -  disabled
saspool/u/s/t         -  3.18T  31.1G  31.1G   31.1G    none         -      -         -  disabled
saspool/u/s/z      none  3.18T  2.98M   723G   2.98M    none       off    off        on  disabled

System has 64GB of ECC memory. pool25 is being decomissioned, so it's unlikely to have been the origin of the issue. backing devices ("Cvdev-*") for ZFS are dm-crypt(LUKS) -> lvmcache (SSD) -> LVM -> SAS disks.

problem occured when trying to use mutt to send an e-mail, while a lot of background copying was happening (from external onto a zfs in saspool with sync=disabled); also one zvol is exported through a qla2462 FC card in target mode. several processes left hanging in kernel, among them the imap-server accessing the mailspool. accessing one particular subdirectory in the maildir hangs the userspace process (for example, ls /saspool/d/m/m/e on getdents() syscall).

eqvinox commented 9 years ago

Add. note: deduplication is disabled everywhere, but some snapshot/clones are in use as thin-provisioned VM containers.

behlendorf commented 9 years ago

@eqvinox has deduplication ever been enabled on this pool? For this crash to make some sense it would need to have been enabled at some point in the past because it crashed in the deduplication code. The crash also suggests that either the checksum type in the block pointer is invalid or perhaps there's an issue with the in-core deduplication table. Can you run zdb -DD rpool to dump the deduplication table stats.

eqvinox commented 9 years ago

@behlendorf let's put it like this: I've never consciously enabled it. And the pool is only a few weeks old.

# zdb -DD saspool
All DDTs are empty

Since I had to reboot the box, I've added a little bit of guard code around the section in question:

diff --git a/module/zfs/zio.c b/module/zfs/zio.c
index 0ba167c..0ee0986 100644
--- a/module/zfs/zio.c
+++ b/module/zfs/zio.c
@@ -2434,6 +2434,13 @@ zio_ddt_free(zio_t *zio)
        ASSERT(BP_GET_DEDUP(bp));
        ASSERT(zio->io_child_type == ZIO_CHILD_LOGICAL);

+       if (unlikely(!ddt)) {
+               printk(KERN_CRIT "PAN PAN PAN: zfs hit ddt==NULL in zio_ddt_free(%p), checksum=%d\n",
+                       zio, (int)(BP_GET_CHECKSUM(bp)));
+               dump_stack();
+               return (ZIO_PIPELINE_CONTINUE);
+       }
+
        ddt_enter(ddt);
        freedde = dde = ddt_lookup(ddt, bp, B_TRUE);
        if (dde) {

I'll see if it triggers again. (For the record, I have no clue what the code does, this is just blindly adding a safety check.)

behlendorf commented 9 years ago

@eqvinox OK, that's interesting. I was going to suggest a debugging patch like this but you beat me too it. Definitely let us know if you manage to trip the debugging.

eqvinox commented 9 years ago

@behlendorf well, I'm not sure my debug patch actually prints any useful information (like where the taskq was scheduled from, not sure if that information is available) - if you can improve on that - :)

For now my best guess is that it's somehow related to the large background copy happening in the back onto a zfs that has sync=disabled, while at the same time doing fsync on a file in another zfs, but this is solely based on my observation that it froze in the exact instant my mail client tried to send an e-mail.

eqvinox commented 9 years ago

@behlendorf Heh. I decided to resume my background copy an it just triggered again, except my debug code did apparently not work (probably loaded the unmodified version of the kernel module by accident)

Trace looks the same:

[62078.332571] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a9
[62078.332602] IP: [<ffffffffc03143f3>] ddt_enter+0x3/0x30 [zfs]
[62078.332602] PGD 0 
[62078.332604] Oops: 0000 [#1] SMP 
[62078.332614] Modules linked in: xfs wp512 target_core_pscsi target_core_file target_core_iblock tcm_fc libfc tcm_qla2xxx tcm_loop iscsi_target_mod ib_srpt target_core_mod intel_powerclamp mei_me mei zfs(O) zunicode(O) zcommon(O) znvpair(O) zavl(O) spl(O) mlx4_en isci mlx4_core qla2xxx scsi_transport_fc scsi_tgt efivarfs
[62078.332617] CPU: 4 PID: 8116 Comm: z_fr_iss_3/3 Tainted: G           O 3.14.22-grsec-00006-g3c1449f #12
[62078.332617] Hardware name: Supermicro X9DR7/E-(J)LN4F/X9DR7/E-(J)LN4F, BIOS 3.0  07/31/2013
[62078.332618] task: ffff8808aded95d0 ti: ffff8808aded9c40 task.ti: ffff8808aded9c40
[62078.332635] RIP: 0010:[<ffffffffc03143f3>]  [<ffffffffc03143f3>] ddt_enter+0x3/0x30 [zfs]
[62078.332636] RSP: 0000:ffffc9073281bd10  EFLAGS: 00010246
[62078.332637] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
[62078.332637] RDX: 00000000006c6562 RSI: 0000000000000000 RDI: 0000000000000001
[62078.332638] RBP: ffffc9073281bd38 R08: 0000000000000000 R09: 0000000000000001
[62078.332638] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88072bdd0660
[62078.332639] R13: ffff88072bdd05f0 R14: ffff8808aded95d0 R15: 0000000000000000
[62078.332640] FS:  0000000000000000(0000) GS:ffff88103e600000(0000) knlGS:0000000000000000
[62078.332640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[62078.332641] CR2: 00000000000000a9 CR3: 000000000dad7000 CR4: 00000000001607f0
[62078.332641] Stack:
[62078.332643]  ffffc9073281bd38 ffffffffc03abd1c 0000000000000400 ffff88072bdd05f0
[62078.332644]  8000000000000000 ffffc9073281bd78 ffffffffc03acd13 ffff88072bdd0ba8
[62078.332646]  ffff8808adf8fc00 ffff8808adf8fc10 ffff8808aded95d0 ffff88072bdd0ba8
[62078.332646] Call Trace:
[62078.332678]  [<ffffffffc03abd1c>] ? zio_ddt_free+0x2c/0xb0 [zfs]
[62078.332706]  [<ffffffffc03acd13>] zio_execute+0xc3/0x170 [zfs]
[62078.332714]  [<ffffffffc021bfc9>] taskq_thread+0x229/0x4a0 [spl]
[62078.332718]  [<ffffffff8d12dc70>] ? wake_up_state+0x20/0x20
[62078.332723]  [<ffffffffc021bda0>] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[62078.332725]  [<ffffffff8d11cb2d>] kthread+0xfd/0x120
[62078.332727]  [<ffffffff8d11ca30>] ? kthread_create_on_node+0x250/0x250
[62078.332730]  [<ffffffff8dabc464>] ret_from_fork+0x74/0xa0
[62078.332731]  [<ffffffff8d11ca30>] ? kthread_create_on_node+0x250/0x250
[62078.332746] Code: 00 00 48 89 e5 48 89 d1 48 c1 e9 27 83 e1 01 75 07 48 c1 ea 28 0f b6 c2 48 8b 84 c7 28 13 00 00 5d 48 0f ba 2c 24 3f c3 55 31 f6 <48> 8b 87 a8 00 00 00 48 89 e5 53 48 89 fb e8 9a b0 79 cd 65 48 
[62078.332763] RIP  [<ffffffffc03143f3>] ddt_enter+0x3/0x30 [zfs]
[62078.332763]  RSP <ffffc9073281bd10>
[62078.332763] CR2: 00000000000000a9

eqvinox commented 9 years ago

No userspace process got stuck this time, but the txg_sync kernel thread is now in D state.

eqvinox commented 9 years ago

My debug code didn't trigger because it's not actually a NULL ddt, instead it's something at 1 + an offset of 0xa8 = 0xa9.

ddt_enter():
    15d0:       55                      push   %rbp
    15d1:       31 f6                   xor    %esi,%esi
    15d3:       48 8b 87 a8 00 00 00    mov    0xa8(%rdi),%rax      <--------------- rdi = 1
    15da:       48 89 e5                mov    %rsp,%rbp
    15dd:       53                      push   %rbx
    15de:       48 89 fb                mov    %rdi,%rbx
    15e1:       e8 00 00 00 00          callq  15e6 <ddt_enter+0x16>

So now the question is: where do we get "1" as a pointer from? Any ideas?

eqvinox commented 9 years ago

Here we go: (just restarted the rsync from external device onto saspool once again)

(checksum=62 looks wrong to me, access beyond end of array?)

 [ 1192.362404] PAN PAN PAN: zfs hit ddt==0000000000000001 in zio_ddt_free(ffff880f30eccd30), spa=ffff881007834000, bp=ffff880f30eccda0, checksum=62
[ 1192.418661] dumping spa->spa_ddt contents:
[ 1192.443936]   [0] = ffffc900580e1180
[ 1192.468581]   [1] = ffffc900580dafc0
[ 1192.493581]   [2] = ffffc900580d4e00
[ 1192.517426]   [3] = ffffc900580cec40
[ 1192.539875]   [4] = ffffc900580c8a80
[ 1192.561445]   [5] = ffffc900580c28c0
[ 1192.582387]   [6] = ffffc900580bc700
[ 1192.604568]   [7] = ffffc900580b6540
[ 1192.625026]   [8] = ffffc900580b0380
[ 1192.645097]   [9] = ffffc900580aa1c0
[ 1192.664453] CPU: 3 PID: 2728 Comm: z_fr_iss_4/3 Tainted: G           O 3.14.22-grsec-00006-g3c1449f #12
[ 1192.704817] Hardware name: Supermicro X9DR7/E-(J)LN4F/X9DR7/E-(J)LN4F, BIOS 3.0  07/31/2013
[ 1192.730445]  ffff880f30eccda0 ffffc90057f43d08 ffffffffa4aa8533 000000000000000a
[ 1192.754065]  ffffc90057f43d38 ffffffffc03b4ded 0000000000000400 ffff880f30eccd30
[ 1192.777266]  8000000000000000 ffff880fe2cdc170 ffffc90057f43d78 ffffffffc03b5d83
[ 1192.799630] Call Trace:
[ 1192.816880]  [<ffffffffa4aa8533>] dump_stack+0x4d/0x6c
[ 1192.837149]  [<ffffffffc03b4ded>] zio_ddt_free+0xfd/0x120 [zfs]
[ 1192.857746]  [<ffffffffc03b5d83>] zio_execute+0xc3/0x170 [zfs]
[ 1192.877386]  [<ffffffffc0224fc9>] taskq_thread+0x229/0x4a0 [spl]
[ 1192.897233]  [<ffffffffa412dc70>] ? wake_up_state+0x20/0x20
[ 1192.917302]  [<ffffffffc0224da0>] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[ 1192.937420]  [<ffffffffa411cb2d>] kthread+0xfd/0x120
[ 1192.955425]  [<ffffffffa411ca30>] ? kthread_create_on_node+0x250/0x250
[ 1192.974638]  [<ffffffffa4abc464>] ret_from_fork+0x74/0xa0
[ 1192.992181]  [<ffffffffa411ca30>] ? kthread_create_on_node+0x250/0x250

code used to get this:

@@ -2434,6 +2434,21 @@ zio_ddt_free(zio_t *zio)
        ASSERT(BP_GET_DEDUP(bp));
        ASSERT(zio->io_child_type == ZIO_CHILD_LOGICAL);

+       if (unlikely((unsigned long)ddt < 4096)) {
+               printk(KERN_CRIT "PAN PAN PAN: zfs hit ddt==%p in zio_ddt_free(%p), spa=%p, bp=%p, checksum=%x\n",
+                       ddt, zio, spa, bp, (int)(BP_GET_CHECKSUM(bp)));
+               if (unlikely((unsigned long)spa < 4096)) {
+                       printk(KERN_CRIT "can't dump spa->spa_ddt, pointer invalid!\n");
+               } else {
+                       int c;
+                       printk(KERN_CRIT "dumping spa->spa_ddt contents:\n");
+                       for (c = 0; c < ZIO_CHECKSUM_FUNCTIONS; c++)
+                               printk(KERN_CRIT "  [%d] = %p\n", c, spa->spa_ddt[c]);
+               }
+               dump_stack();
+               return (ZIO_PIPELINE_CONTINUE);
+       }
+
        ddt_enter(ddt);

eqvinox commented 9 years ago

@behlendorf I think that's it on my end, if I understand correctly this is zfs tripping up because the checksum type in some bp (whatever that is) is stored wrong on disk?

I guess there need to be safety checks against invalid input added in zfs so it doesn't trip up on this, and on the other side I'm wondering if I need to somehow fix the pool? Though I'm not entirely sure how I ended up with bad data in the first place... (Memory is ECC, disks are encrypted so a random mis-read would result in a value off way farther than "62")

behlendorf commented 9 years ago

@eqvinox it definitely looks like the block pointer (bp) has been damaged. It would be helpful if we could inspect the entire block pointer. Can you update you're debugging patch to call the snprintf_blkptr function and then print the resulting buffer. This should give us an idea of what kind of block pointer we're dealing with and how exactly it's damaged. And hopeful it will show why we ended up in the dedup code.

In the meanwhile, it's should be safe to handle the error as you've already done.

eqvinox commented 9 years ago

@behlendorf sorry for the delay, was otherwise busy...

I had already added a call to snprintf_blkptr, but then I looked at what it actually does, and it does:

                       checksum =
                            zio_checksum_table[BP_GET_CHECKSUM(bp)].ci_name;

which would again due to missing error checking cause another kernel oops.

So, no, I'm afraid I won't do that, and I think this is about as far as I can go on debugging this issue... sorry.

behlendorf commented 9 years ago

@eqvinox No problem, thanks for the additional debugging.

behlendorf commented 7 years ago

Closing as stale.

openzfs / zfs

kernel NULL dereference at ddt_enter+0x3/0x30 [zfs] #2839