openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.61k stars 1.75k forks source link

ZFS + NFS soft lockups #825

Closed gsaraber closed 11 years ago

gsaraber commented 12 years ago

I'm experiencing soft lockups seemingly being caused by an NFS / ZFS interaction, the system ran for about 4 hours before locking up.

33TB ZFS filesystem,

Linux shrapnel 3.4.4 #1 SMP Thu Jul 5 16:17:00 CDT 2012 x86_64 Intel(R) Xeon(R) CPU 5140 @ 2.33GHz GenuineIntel GNU/Linux

it may have corrupted itself as it's refusing to mount [after reboot] right now, more to come.

--- trace ---

Jul 12 15:01:03 [kernel] [10337.508064] BUG: unable to handle kernel NULL pointer dereference at (null) Jul 12 15:01:03 [kernel] [10337.508094] IP: [] wake_up_common+0x24/0x90 Jul 12 15:01:03 [kernel] [10337.508116] PGD 1b6adf067 PUD 1b6b3a067 PMD 0 Jul 12 15:01:03 [kernel] [10337.508133] Oops: 0000 [#1] SMP Jul 12 15:01:03 [kernel] [10337.508147] CPU 0 Jul 12 15:01:03 [kernel] [10337.508152] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas megaraid_sas mptscsih mptbase Jul 12 15:01:03 [kernel] [10337.508201] Jul 12 15:01:03 [kernel] [10337.508207] Pid: 3295, comm: nfsd Tainted: P O 3.4.4 #1 Supermicro X7DB8/X7DB8 Jul 12 15:01:03 [kernel] [10337.508231] RIP: 0010:[] [] wake_up_common+0x24/0x90 Jul 12 15:01:03 [kernel] [10337.508251] RSP: 0018:ffff880162e35700 EFLAGS: 00010086 Jul 12 15:01:03 [kernel] [10337.508263] RAX: 0000000000000286 RBX: ffff8801b5caae68 RCX: 0000000000000000 Jul 12 15:01:03 [kernel] [10337.508278] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8801b5caae68 Jul 12 15:01:03 [kernel] [10337.508293] RBP: ffff880162e35740 R08: 0000000000000000 R09: 2222222222222222 Jul 12 15:01:03 [kernel] [10337.508309] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000286 Jul 12 15:01:03 [kernel] [10337.508325] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 Jul 12 15:01:03 [kernel] [10337.508341] FS: 0000000000000000(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000 Jul 12 15:01:03 [kernel] [10337.508424] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jul 12 15:01:03 [kernel] [10337.508502] CR2: 0000000000000000 CR3: 00000001b7283000 CR4: 00000000000007f0 Jul 12 15:01:03 [kernel] [10337.508584] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 12 15:01:03 [kernel] [10337.508665] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jul 12 15:01:03 [kernel] [10337.508748] Process nfsd (pid: 3295, threadinfo ffff880162e34000, task ffff8801a045d890) Jul 12 15:01:03 [kernel] [10337.508894] Stack: Jul 12 15:01:03 [kernel] [10337.508965] 0000000000000286 0000000300000001 0000000000000003 ffff8801b5caae68 Jul 12 15:01:03 [kernel] [10337.509002] 0000000000000286 0000000000000003 0000000000000001 0000000000000000 Jul 12 15:01:03 [kernel] [10337.509002] ffff880162e35780 ffffffff810aa543 2222222222222222 ffff8801b5caae38 Jul 12 15:01:03 [kernel] [10337.509002] Call Trace: Jul 12 15:01:03 [kernel] [10337.509002] [] wake_up+0x43/0x70 Jul 12 15:01:03 [kernel] [10337.509002] [] ? init_waitqueue_head+0x40/0x40 Jul 12 15:01:03 [kernel] [10337.509002] [] cv_wait+0xe/0x10 [spl] Jul 12 15:01:03 [kernel] [10337.509002] [] zfs_range_lock+0x2b4/0x5f0 [zfs] Jul 12 15:01:03 [kernel] [10337.509002] [] ? wake_up_process+0x10/0x20 Jul 12 15:01:03 [kernel] [10337.509002] [] zfs_write+0x59f/0xc40 [zfs] Jul 12 15:01:03 [kernel] [10337.509002] [] zpl_write_common+0x4d/0x80 [zfs] Jul 12 15:01:03 [kernel] [10337.509002] [] zpl_write+0x63/0x90 [zfs] Jul 12 15:01:03 [kernel] [10337.509002] [] ? zpl_write_common+0x80/0x80 [zfs] Jul 12 15:01:03 [kernel] [10337.509002] [] do_loop_readv_writev+0x51/0x80 Jul 12 15:01:03 [kernel] [10337.509002] [] do_readv_writev+0x1ae/0x1d0 Jul 12 15:01:03 [kernel] [10337.509002] [] ? zpl_release+0x60/0x60 [zfs] Jul 12 15:01:03 [kernel] [10337.509002] [] vfs_writev+0x37/0x50 Jul 12 15:01:03 [kernel] [10337.509002] [] nfsd_vfs_write.isra.14+0xe9/0x2d0 Jul 12 15:01:03 [kernel] [10337.509002] [] ? dentry_open+0x4a/0x80 Jul 12 15:01:03 [kernel] [10337.509002] [] ? nfsd_open+0x159/0x1c0 Jul 12 15:01:03 [kernel] [10337.509002] [] nfsd_write+0xf8/0x110 Jul 12 15:01:03 [kernel] [10337.509002] [] nfsd3_proc_write+0x9a/0xd0 Jul 12 15:01:03 [kernel] [10337.509002] [] nfsd_dispatch+0xbe/0x1c0 Jul 12 15:01:03 [kernel] [10337.509002] [] svc_process_common+0x328/0x5e0 Jul 12 15:01:03 [kernel] [10337.509002] [] ? try_to_wake_up+0x260/0x260 Jul 12 15:01:03 [kernel] [10337.509002] [] svc_process+0x101/0x150 Jul 12 15:01:03 [kernel] [10337.509002] [] nfsd+0xb5/0x150 Jul 12 15:01:03 [kernel] [10337.509002] [] ? nfsd_get_default_max_blksize+0x60/0x60 Jul 12 15:01:03 [kernel] [10337.509002] [] kthread+0x8e/0xa0 Jul 12 15:01:03 [kernel] [10337.509002] [] kernel_thread_helper+0x4/0x10 Jul 12 15:01:03 [kernel] [10337.509002] [] ? init_kthread_worker+0x40/0x40 Jul 12 15:01:03 [kernel] [10337.509002] [] ? gs_change+0xb/0xb Jul 12 15:01:03 [kernel] [10337.509002] Code: 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 89 cf 41 56 4d 89 c6 41 55 41 54 53 48 89 fb 48 83 ec 18 89 55 c8 48 8b 57 18 89 75 cc <4c> 8b 22 48 8d 42 e8 49 83 ec 18 48 39 c7 75 12 eb 40 66 2e 0f Jul 12 15:01:03 [kernel] [10337.509002] RSP Jul 12 15:01:03 [kernel] [10337.509002] CR2: 0000000000000000 Jul 12 15:01:03 [kernel] [10337.509002] ---[ end trace 294b029d488b2405 ]---

gsaraber commented 12 years ago

ZFS Layout, prior to lockup:

pool: faminepool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Jul 11 16:14:26 2012 104G scanned out of 18.7T at 54.1M/s, 99h57m to go 4.96G resilvered, 0.54% done config:

    NAME             STATE     READ WRITE CKSUM
    faminepool       DEGRADED     0     0     0
      raidz1-0       ONLINE       0     0     0
        sdm          ONLINE       0     0     0
        sds          ONLINE       0     0     0
        sdr          ONLINE       0     0     0
        sdp          ONLINE       0     0     0
        sdq          ONLINE       0     0     0
      raidz1-1       ONLINE       0     0     0
        sdk          ONLINE       0     0     0
        sdb          ONLINE       0     0     0
        sdt          ONLINE       0     0     0
        sdo          ONLINE       0     0     0
        sdl          ONLINE       0     0     0
      raidz1-2       DEGRADED     0     0     0
        sdh          ONLINE       0     0     0
        sdg          ONLINE       0     0     0
        sdn          ONLINE       0     0     0
        replacing-3  UNAVAIL      0     0     0
          sdi        UNAVAIL      0     0     0
          md0        ONLINE       0     0     0  (resilvering)
      raidz1-3       ONLINE       0     0     0
        sdc          ONLINE       0     0     0
        sdd          ONLINE       0     0     0
        sde          ONLINE       0     0     0
        sdf          ONLINE       0     0     0
        sdj          ONLINE       0     0     0

errors: No known data errors

gsaraber commented 12 years ago

Also- the machine has 6GB ram, 19GB swap, and I loaded ZFS with: options zfs zfs_arc_max=1073741824

gsaraber commented 12 years ago

More logs:

Jul 16 13:20:36 [kernel] [ 8238.091079] ------------[ cut here ]------------
Jul 16 13:20:36 [kernel] [ 8238.091079] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x9a/0xc0()
Jul 16 13:20:36 [kernel] [ 8238.091079] Hardware name: X7DB8
Jul 16 13:20:36 [kernel] [ 8238.091079] Watchdog detected hard LOCKUP on cpu 2
Jul 16 13:20:36 [kernel] [ 8238.091079] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas mptscsih megaraid_sas mptbase
Jul 16 13:20:36 [kernel] [ 8238.091079] Pid: 3039, comm: nfsd Tainted: P           O 3.4.4 #2
Jul 16 13:20:36 [kernel] [ 8238.091079] Call Trace:
Jul 16 13:20:36 [kernel] [ 8238.091079]  <NMI>  [<ffffffff8108193a>] warn_slowpath_common+0x7a/0xb0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81081a11>] warn_slowpath_fmt+0x41/0x50
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff810efc1a>] watchdog_overflow_callback+0x9a/0xc0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff8111834c>] __perf_event_overflow+0x9c/0x220
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81118c84>] perf_event_overflow+0x14/0x20
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81055b14>] intel_pmu_handle_irq+0x184/0x2e0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff8104fc61>] perf_event_nmi_handler+0x21/0x30
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff8104413a>] do_nmi+0x11a/0x3a0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81882062>] end_repeat_nmi+0x1a/0x1e
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff8146cb97>] ? delay_tsc+0x37/0x60

                - Last output repeated 2 times -
Jul 16 13:20:36 [kernel] [ 8238.091079]  <<EOE>>  [<ffffffff8146caca>] __delay+0xa/0x10
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81473c5e>] do_raw_spin_lock+0x7e/0x140
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81881991>] _raw_spin_lock_irqsave+0x11/0x20
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff810aa52d>] __wake_up+0x2d/0x70
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffffa006170d>] cv_wait_common+0x14d/0x1b0 [spl]
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffffa006178e>] __cv_wait+0xe/0x10 [spl]
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffffa02263b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffffa0230b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffffa01aea4a>] ? dbuf_read+0x60a/0x800 [zfs]
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffffa0243edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffffa0243f73>] zpl_write+0x63/0x90 [zfs]
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffffa0243f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffffa0243aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 16 13:20:36 [kernel] [ 8238.091079]  [<ffffffff81883510>] ? gs_change+0xb/0xb
Jul 16 13:20:36 [kernel] [ 8238.091079] ---[ end trace 4dd9aece5d5ee6ed ]---
Jul 16 13:20:36 [kernel] [ 8238.091079] BUG: unable to handle kernel paging request at 0000081e00000194
Jul 16 13:20:37 [kernel] [ 8238.091079] IP: [<ffffffff8187b856>] spin_dump+0x61/0x8f
Jul 16 13:20:37 [kernel] [ 8238.091079] PGD 0 
Jul 16 13:20:37 [kernel] [ 8238.091079] Oops: 0000 [#1] SMP 
Jul 16 13:20:37 [kernel] [ 8238.091079] CPU 2 
Jul 16 13:20:37 [kernel] [ 8238.091079] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas mptscsih megaraid_sas mptbase
Jul 16 13:20:37 [kernel] [ 8238.091079] 
Jul 16 13:20:37 [kernel] [ 8238.091079] RIP: 0010:[<ffffffff8187b856>]  [<ffffffff8187b856>] spin_dump+0x61/0x8f
Jul 16 13:20:37 [kernel] [ 8238.091079] RSP: 0018:ffff8801ae50b6e0  EFLAGS: 00010006
Jul 16 13:20:37 [kernel] [ 8238.091079] RAX: 000000000000003e RBX: ffff8801ae7ba268 RCX: ffffffff81c86ef6
Jul 16 13:20:37 [kernel] [ 8238.091079] RDX: 000000000000c6c6 RSI: 0000000000000082 RDI: ffffffff81e23bc0
Jul 16 13:20:37 [kernel] [ 8238.091079] RBP: ffff8801ae50b6f0 R08: 00000000ffffffff R09: 0000000000000000
Jul 16 13:20:37 [kernel] [ 8238.091079] R10: 0000000000000000 R11: 0000000000000000 R12: 0000081e00000000
Jul 16 13:20:37 [kernel] [ 8238.091079] R13: 000000008b0ebd58 R14: 0000000000000000 R15: 0000000000000000
Jul 16 13:20:37 [kernel] [ 8238.091079] FS:  0000000000000000(0000) GS:ffff8801bfd00000(0000) knlGS:0000000000000000
Jul 16 13:20:37 [kernel] [ 8238.091079] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 16 13:20:37 [kernel] [ 8238.091079] CR2: 0000081e00000194 CR3: 0000000189691000 CR4: 00000000000007e0
Jul 16 13:20:37 [kernel] [ 8238.091079] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 16 13:20:37 [kernel] [ 8238.091079] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 16 13:20:37 [kernel] [ 8238.091079] Process nfsd (pid: 3039, threadinfo ffff8801ae50a000, task ffff8801b81f3d50)
Jul 16 13:20:37 [kernel] [ 8238.091079] Stack:
Jul 16 13:20:37 [kernel] [ 8238.091079]  ffff8801ae7ba268 000000008b0ebd58 ffff8801ae50b720 ffffffff81473cd3
Jul 16 13:20:37 [kernel] [ 8238.091079]  0000000000000286 ffff88015d10d590 0000000000000003 0000000000000001
Jul 16 13:20:37 [kernel] [ 8238.091079]  ffff8801ae50b740 ffffffff81881991 2222222222222222 ffff8801ae7ba268
Jul 16 13:20:37 [kernel] [ 8238.091079] Call Trace:
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff81473cd3>] do_raw_spin_lock+0xf3/0x140
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff81881991>] _raw_spin_lock_irqsave+0x11/0x20
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff810aa52d>] __wake_up+0x2d/0x70
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffffa006170d>] cv_wait_common+0x14d/0x1b0 [spl]
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffffa006178e>] __cv_wait+0xe/0x10 [spl]
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffffa02263b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffffa0230b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffffa01aea4a>] ? dbuf_read+0x60a/0x800 [zfs]
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffffa0243edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffffa0243f73>] zpl_write+0x63/0x90 [zfs]
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffffa0243f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffffa0243aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 16 13:20:37 [kernel] [ 8238.091079]  [<ffffffff81883510>] ? gs_change+0xb/0xb
Jul 16 13:20:37 [kernel] [ 8238.091079] Code: 00 48 8d 88 40 03 00 00 31 c0 65 8b 14 25 30 ce 00 00 e8 54 9b ff ff 41 83 c8 ff 4d 85 e4 44 8b 4b 08 48 c7 c1 f6 6e c8 81 74 10 <45> 8b 84 24 94 01 00 00 49 8d 8c 24 40 03 00 00 8b 53 04 48 89 
Jul 16 13:20:37 [kernel] [ 8238.091079] RIP  [<ffffffff8187b856>] spin_dump+0x61/0x8f
Jul 16 13:20:37 [kernel] [ 8238.091079]  RSP <ffff8801ae50b6e0>
Jul 16 13:20:37 [kernel] [ 8238.091079] CR2: 0000081e00000194
Jul 16 13:20:37 [kernel] [ 8238.091079] ---[ end trace 4dd9aece5d5ee6ee ]---
Jul 16 13:20:37 [kernel] [ 8412.011000] INFO: Stall ended before state dump start
Jul 16 13:20:37 [kernel] [ 8238.091079] }  (t=173935 jiffies)
Jul 16 13:20:37 [kernel] [ 8238.091079] sending NMI to all CPUs:
Jul 16 13:20:37 [kernel] [ 8412.011740] NMI backtrace for cpu 0
Jul 16 13:20:37 [kernel] [ 8412.011858] CPU 0 
Jul 16 13:20:37 [kernel] [ 8412.011907] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas mptscsih megaraid_sas mptbase
Jul 16 13:20:37 [kernel] [ 8412.012009] 
Jul 16 13:20:37 [kernel] [ 8412.012009] Pid: 0, comm: swapper/0 Tainted: P      D W  O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 16 13:20:37 [kernel] [ 8412.012009] RIP: 0010:[<ffffffff8108fb84>]  [<ffffffff8108fb84>] run_timer_softirq+0x64/0x370
Jul 16 13:20:37 [kernel] [ 8412.012009] RSP: 0018:ffff8801bfc03e50  EFLAGS: 00000002
Jul 16 13:20:37 [kernel] [ 8412.012009] RAX: 00000001007bc78a RBX: ffffffff81ff0ac0 RCX: 000000000000fbfa
Jul 16 13:20:37 [kernel] [ 8412.012009] RDX: 00000001007a5088 RSI: ffff8801bfc03e10 RDI: ffffffff81ff0ac0
Jul 16 13:20:37 [kernel] [ 8412.012009] RBP: ffff8801bfc03ed0 R08: 0000000000000000 R09: 0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.012009] R10: 000000000002c581 R11: 0000000000000000 R12: ffff8801bfc03e90
Jul 16 13:20:37 [kernel] [ 8412.012009] R13: ffffffff81ff1330 R14: ffffffff81e01fd8 R15: ffff8801bfc03e90
Jul 16 13:20:37 [kernel] [ 8412.012009] FS:  0000000000000000(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.012009] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 16 13:20:37 [kernel] [ 8412.012009] CR2: 00007fb8b228f000 CR3: 00000001a8e07000 CR4: 00000000000007f0
Jul 16 13:20:37 [kernel] [ 8412.012009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.012009] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 16 13:20:37 [kernel] [ 8412.012009] Process swapper/0 (pid: 0, threadinfo ffffffff81e00000, task ffffffff81e13020)
Jul 16 13:20:37 [kernel] [ 8412.012009] Stack:
Jul 16 13:20:37 [kernel] [ 8412.012009]  ffffffff81469534 ffff8801bfc03e88 ffff8801bfc0d800 ffff8801bfc0d650
Jul 16 13:20:37 [kernel] [ 8412.012009]  ffffffff81e01fd8 ffffffff81e01fd8 ffffffff8108ed70 ffff8801b7153680
Jul 16 13:20:37 [kernel] [ 8412.012009]  ffff8801bfc03e90 ffff8801bfc03e90 ffffffff810c2ab0 ffffffff81e04088
Jul 16 13:20:37 [kernel] [ 8412.012009] Call Trace:
Jul 16 13:20:37 [kernel] [ 8412.012009]  <IRQ> 
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81469534>] ? timerqueue_add+0x74/0xc0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8108ed70>] ? add_timer_on+0x100/0x100
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810c2ab0>] ? ktime_get+0x60/0xe0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81088599>] __do_softirq+0x99/0x210
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810adef5>] ? check_preempt_curr+0x75/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810c948f>] ? tick_program_event+0x1f/0x30
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8188360c>] call_softirq+0x1c/0x30
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8104263d>] do_softirq+0x4d/0x80
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81088955>] irq_exit+0x65/0x70
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8105e219>] smp_apic_timer_interrupt+0x69/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 16 13:20:37 [kernel] [ 8412.012009]  <EOI> 
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81048de2>] ? mwait_idle+0xa2/0x230
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81856399>] rest_init+0x6d/0x74
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81ef3b3d>] start_kernel+0x32b/0x338
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81ef3612>] ? repair_env_string+0x5a/0x5a
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81ef3321>] x86_64_start_reservations+0x131/0x135
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81ef3415>] x86_64_start_kernel+0xf0/0xf7
Jul 16 13:20:37 [kernel] [ 8412.012009] Code: 1e 7f 00 65 4c 8b 34 25 b0 b9 00 00 4c 89 75 a8 4c 89 75 a0 0f 1f 00 48 8b 05 89 e4 e4 00 48 8b 53 20 48 39 d0 0f 88 1c 01 00 00 <41> 89 d5 41 81 e5 ff 00 00 00 0f 84 2c 01 00 00 4d 63 ed 48 83 
Jul 16 13:20:37 [kernel] [ 8412.012009] Call Trace:
Jul 16 13:20:37 [kernel] [ 8412.012009]  <IRQ>  [<ffffffff81469534>] ? timerqueue_add+0x74/0xc0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8108ed70>] ? add_timer_on+0x100/0x100
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810c2ab0>] ? ktime_get+0x60/0xe0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81088599>] __do_softirq+0x99/0x210
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810adef5>] ? check_preempt_curr+0x75/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810c948f>] ? tick_program_event+0x1f/0x30
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8188360c>] call_softirq+0x1c/0x30
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8104263d>] do_softirq+0x4d/0x80
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81088955>] irq_exit+0x65/0x70
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8105e219>] smp_apic_timer_interrupt+0x69/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  <EOI>  [<ffffffff81048de2>] ? mwait_idle+0xa2/0x230
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81856399>] rest_init+0x6d/0x74
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81ef3b3d>] start_kernel+0x32b/0x338
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81ef3612>] ? repair_env_string+0x5a/0x5a
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81ef3321>] x86_64_start_reservations+0x131/0x135
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81ef3415>] x86_64_start_kernel+0xf0/0xf7
Jul 16 13:20:37 [kernel] [ 8238.091079] NMI backtrace for cpu 2
Jul 16 13:20:37 [kernel] [ 8238.091079] CPU 2 
Jul 16 13:20:37 [kernel] [ 8238.091079] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas mptscsih megaraid_sas mptbase
Jul 16 13:20:37 [kernel] [ 8412.012009] 
Jul 16 13:20:37 [kernel] [ 8412.012009] Pid: 3039, comm: nfsd Tainted: P      D W  O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 16 13:20:37 [kernel] [ 8412.012009] RIP: 0010:[<ffffffff8146cb75>]  [<ffffffff8146cb75>] delay_tsc+0x15/0x60
Jul 16 13:20:37 [kernel] [ 8412.012009] RSP: 0018:ffff8801bfd03d98  EFLAGS: 00000807
Jul 16 13:20:37 [kernel] [ 8412.012009] RAX: 00000000043ced83 RBX: 0000000000002710 RCX: 000000000000d5d4
Jul 16 13:20:37 [kernel] [ 8412.012009] RDX: 000000000000121c RSI: 0000000000000096 RDI: 0000000000239ae5
Jul 16 13:20:37 [kernel] [ 8412.012009] RBP: ffff8801bfd03d98 R08: 0000000000000002 R09: 0000000000000001
Jul 16 13:20:37 [kernel] [ 8412.012009] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e2e9c0
Jul 16 13:20:37 [kernel] [ 8412.012009] R13: ffffffff81e2ea80 R14: ffff8801bfd03f50 R15: 0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.012009] FS:  0000000000000000(0000) GS:ffff8801bfd00000(0000) knlGS:0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.012009] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 16 13:20:37 [kernel] [ 8412.012009] CR2: 0000081e00000194 CR3: 0000000189691000 CR4: 00000000000007e0
Jul 16 13:20:37 [kernel] [ 8412.012009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.012009] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 16 13:20:37 [kernel] [ 8412.012009] Process nfsd (pid: 3039, threadinfo ffff8801ae50a000, task ffff8801b81f3d50)
Jul 16 13:20:37 [kernel] [ 8412.012009] Stack:
Jul 16 13:20:37 [kernel] [ 8412.012009]  ffff8801bfd03da8 ffffffff8146caf8 ffff8801bfd03dc8 ffffffff8105effa
Jul 16 13:20:37 [kernel] [ 8412.012009]  0000000000000000 ffff8801bfd0db20 ffff8801bfd03e18 ffffffff810f6c6a
Jul 16 13:20:37 [kernel] [ 8412.012009]  0000000000000008 0000000000000092 ffff8801bfd03e18 0000000000000002
Jul 16 13:20:37 [kernel] [ 8412.012009] Call Trace:
Jul 16 13:20:37 [kernel] [ 8412.012009]  <IRQ> 
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8146caf8>] __const_udelay+0x28/0x30
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8105effa>] arch_trigger_all_cpu_backtrace+0x6a/0x90
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810f6c6a>] __rcu_pending+0x19a/0x4d0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810f7460>] rcu_check_callbacks+0xb0/0x1a0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81090863>] update_process_times+0x43/0x80
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810c9678>] tick_sched_timer+0x58/0x140
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a57ce>] __run_hrtimer+0x6e/0x1b0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810c9620>] ? tick_init_highres+0x20/0x20
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810adf04>] ? check_preempt_curr+0x84/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a5e73>] hrtimer_interrupt+0xf3/0x220
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8105e214>] smp_apic_timer_interrupt+0x64/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 16 13:20:37 [kernel] [ 8412.012009]  <EOI> 
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81881991>] ? _raw_spin_lock_irqsave+0x11/0x20
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8188191b>] ? _raw_spin_unlock_irq+0xb/0x10
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810d5b91>] ? acct_collect+0x171/0x1b0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81085d83>] do_exit+0x653/0x900
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81083957>] ? kmsg_dump+0x57/0xf0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81043bee>] oops_end+0x9e/0xe0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81874ce9>] no_context+0x271/0x280
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a6f05>] ? atomic_notifier_call_chain+0x15/0x20
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81874ebe>] __bad_area_nosemaphore+0x1c6/0x1e5
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81874eeb>] bad_area_nosemaphore+0xe/0x10
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8106894d>] do_page_fault+0x2ad/0x430
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81082595>] ? console_unlock+0x1e5/0x260
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81881d6f>] page_fault+0x1f/0x30
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8187b856>] ? spin_dump+0x61/0x8f
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa006170d>] cv_wait_common+0x14d/0x1b0 [spl]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa006178e>] __cv_wait+0xe/0x10 [spl]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa02263b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0230b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa01aea4a>] ? dbuf_read+0x60a/0x800 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0243edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0243f73>] zpl_write+0x63/0x90 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0243f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0243aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81883510>] ? gs_change+0xb/0xb
Jul 16 13:20:37 [kernel] [ 8412.012009] Code: 00 00 f7 e2 48 8d 7a 01 ff 15 a0 68 9f 00 5d c3 66 0f 1f 44 00 00 55 65 44 8b 04 25 30 ce 00 00 48 89 e5 66 66 90 0f ae e8 0f 31 <89> c6 eb 16 0f 1f 80 00 00 00 00 f3 90 65 8b 0c 25 30 ce 00 00 
Jul 16 13:20:37 [kernel] [ 8412.012009] Call Trace:
Jul 16 13:20:37 [kernel] [ 8412.012009]  <IRQ>  [<ffffffff8146caf8>] __const_udelay+0x28/0x30
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8105effa>] arch_trigger_all_cpu_backtrace+0x6a/0x90
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810f6c6a>] __rcu_pending+0x19a/0x4d0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810f7460>] rcu_check_callbacks+0xb0/0x1a0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81090863>] update_process_times+0x43/0x80
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810c9678>] tick_sched_timer+0x58/0x140
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a57ce>] __run_hrtimer+0x6e/0x1b0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810c9620>] ? tick_init_highres+0x20/0x20
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810adf04>] ? check_preempt_curr+0x84/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a5e73>] hrtimer_interrupt+0xf3/0x220
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8105e214>] smp_apic_timer_interrupt+0x64/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 16 13:20:37 [kernel] [ 8412.012009]  <EOI>  [<ffffffff81881991>] ? _raw_spin_lock_irqsave+0x11/0x20
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8188191b>] ? _raw_spin_unlock_irq+0xb/0x10
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810d5b91>] ? acct_collect+0x171/0x1b0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81085d83>] do_exit+0x653/0x900
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81083957>] ? kmsg_dump+0x57/0xf0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81043bee>] oops_end+0x9e/0xe0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81874ce9>] no_context+0x271/0x280
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a6f05>] ? atomic_notifier_call_chain+0x15/0x20
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81874ebe>] __bad_area_nosemaphore+0x1c6/0x1e5
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81874eeb>] bad_area_nosemaphore+0xe/0x10
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8106894d>] do_page_fault+0x2ad/0x430
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81082595>] ? console_unlock+0x1e5/0x260
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81881d6f>] page_fault+0x1f/0x30
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8187b856>] ? spin_dump+0x61/0x8f
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa006178e>] __cv_wait+0xe/0x10 [spl]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa02263b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0230b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa01aea4a>] ? dbuf_read+0x60a/0x800 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0243edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0243f73>] zpl_write+0x63/0x90 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0243f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffffa0243aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 16 13:20:37 [kernel] [ 8412.012009]  [<ffffffff81883510>] ? gs_change+0xb/0xb
Jul 16 13:20:37 [kernel] [ 8412.012001] NMI backtrace for cpu 1
Jul 16 13:20:37 [kernel] [ 8412.012001] CPU 1 
Jul 16 13:20:37 [kernel] [ 8412.012001] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas mptscsih megaraid_sas mptbase
Jul 16 13:20:37 [kernel] [ 8412.012001] 
Jul 16 13:20:37 [kernel] [ 8412.012001] Pid: 0, comm: swapper/1 Tainted: P      D W  O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 16 13:20:37 [kernel] [ 8412.012001] RIP: 0010:[<ffffffff8108fbb6>]  [<ffffffff8108fbb6>] run_timer_softirq+0x96/0x370
Jul 16 13:20:37 [kernel] [ 8412.012001] RSP: 0018:ffff8801bfc83e50  EFLAGS: 00000086
Jul 16 13:20:37 [kernel] [ 8412.012001] RAX: ffff8801b80ac9c0 RBX: ffff8801b80ac000 RCX: 000000000000a2a1
Jul 16 13:20:37 [kernel] [ 8412.012001] RDX: ffff8801b80ac9c0 RSI: ffff8801bfc83e10 RDI: ffff8801b80ac000
Jul 16 13:20:37 [kernel] [ 8412.012001] RBP: ffff8801bfc83ed0 R08: ffff8801bfc8d2c0 R09: ffffffff81714372
Jul 16 13:20:37 [kernel] [ 8412.012001] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bfc83e90
Jul 16 13:20:37 [kernel] [ 8412.012001] R13: ffff8801b80ac990 R14: ffff8801b80f3fd8 R15: ffff8801bfc83e90
Jul 16 13:20:37 [kernel] [ 8412.012001] FS:  0000000000000000(0000) GS:ffff8801bfc80000(0000) knlGS:0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.012001] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 16 13:20:37 [kernel] [ 8412.012001] CR2: 00007fb8b228f000 CR3: 00000001b6061000 CR4: 00000000000007e0
Jul 16 13:20:37 [kernel] [ 8412.012001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.012001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 16 13:20:37 [kernel] [ 8412.012001] Process swapper/1 (pid: 0, threadinfo ffff8801b80f2000, task ffff8801b807caf0)
Jul 16 13:20:37 [kernel] [ 8412.012001] Stack:
Jul 16 13:20:37 [kernel] [ 8412.012001]  ffffffff81469534 ffff8801bfc83e88 ffff8801bfc8d800 ffff8801bfc8d650
Jul 16 13:20:37 [kernel] [ 8412.012001]  ffff8801b80f3fd8 ffff8801b80f3fd8 ffffffff817d6740 0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.012001]  ffff8801b80ac9c0 ffff8801bfc83e90 ffffffff810c2ab0 ffffffff81e04088
Jul 16 13:20:37 [kernel] [ 8412.012001] Call Trace:
Jul 16 13:20:37 [kernel] [ 8412.012001]  <IRQ> 
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff81469534>] ? timerqueue_add+0x74/0xc0
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff817d6740>] ? ipv6_del_addr+0x3f0/0x3f0
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff810c2ab0>] ? ktime_get+0x60/0xe0
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff81088599>] __do_softirq+0x99/0x210
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff810c948f>] ? tick_program_event+0x1f/0x30
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 16 13:20:37 [kernel] [ 8412.012001]  <EOI> 
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff81048de2>] ? mwait_idle+0xa2/0x230
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff8186ca9a>] start_secondary+0x1a9/0x1ad
Jul 16 13:20:37 [kernel] [ 8412.012001] Code: 00 00 00 0f 84 2c 01 00 00 4d 63 ed 48 83 c2 01 49 c1 e5 04 48 89 53 20 49 01 dd 49 8b 55 30 49 8d 45 30 48 89 55 c0 4c 89 62 08 <49> 8b 55 38 48 89 55 c8 4c 89 22 49 89 45 30 4c 8b 7d c0 48 89 
Jul 16 13:20:37 [kernel] [ 8412.012001] Call Trace:
Jul 16 13:20:37 [kernel] [ 8412.012001]  <IRQ>  [<ffffffff81469534>] ? timerqueue_add+0x74/0xc0
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff817d6740>] ? ipv6_del_addr+0x3f0/0x3f0
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff810c2ab0>] ? ktime_get+0x60/0xe0
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff81088599>] __do_softirq+0x99/0x210
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff810c948f>] ? tick_program_event+0x1f/0x30
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff8188360c>] call_softirq+0x1c/0x30
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff8104263d>] do_softirq+0x4d/0x80
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff81088955>] irq_exit+0x65/0x70
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff8105e219>] smp_apic_timer_interrupt+0x69/0xa0
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 16 13:20:37 [kernel] [ 8412.012001]  <EOI>  [<ffffffff81048de2>] ? mwait_idle+0xa2/0x230
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 16 13:20:37 [kernel] [ 8412.012001]  [<ffffffff8186ca9a>] start_secondary+0x1a9/0x1ad
Jul 16 13:20:37 [kernel] [ 8412.012002] NMI backtrace for cpu 3
Jul 16 13:20:37 [kernel] [ 8412.012002] CPU 3 
Jul 16 13:20:37 [kernel] [ 8412.012002] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas mptscsih megaraid_sas mptbase
Jul 16 13:20:37 [kernel] [ 8412.012002] 
Jul 16 13:20:37 [kernel] [ 8412.012002] Pid: 0, comm: swapper/3 Tainted: P      D W  O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 16 13:20:37 [kernel] [ 8412.012002] RIP: 0010:[<ffffffff8108fbb6>]  [<ffffffff8108fbb6>] run_timer_softirq+0x96/0x370
Jul 16 13:20:37 [kernel] [ 8412.012002] RSP: 0018:ffff8801bfd83e50  EFLAGS: 00000082
Jul 16 13:20:37 [kernel] [ 8412.012002] RAX: ffff8801b811c970 RBX: ffff8801b811c000 RCX: 0000000000007776
Jul 16 13:20:37 [kernel] [ 8412.012002] RDX: ffff8801b811c970 RSI: ffff8801bfd83e10 RDI: ffff8801b811c000
Jul 16 13:20:37 [kernel] [ 8412.012002] RBP: ffff8801bfd83ed0 R08: 0000000000000000 R09: ffffffff81714372
Jul 16 13:20:37 [kernel] [ 8412.012002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bfd83e90
Jul 16 13:20:37 [kernel] [ 8412.062525] mptscsih: ioc0: attempting task abort! (sc=ffff8801a85d9a00)
Jul 16 13:20:37 [kernel] [ 8412.062530] sd 9:0:4:0: [sds] CDB: Read(10): 28 00 b9 12 b2 78 00 00 08 00
Jul 16 13:20:37 [kernel] [ 8412.062553] R13: ffff8801b811c940 R14: ffff8801b8123fd8 R15: ffff8801bfd83e90
Jul 16 13:20:37 [kernel] [ 8412.062553] FS:  0000000000000000(0000) GS:ffff8801bfd80000(0000) knlGS:0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.062553] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 16 13:20:37 [kernel] [ 8412.062553] CR2: 00007fb8b228f000 CR3: 00000001b4b1c000 CR4: 00000000000007e0
Jul 16 13:20:37 [kernel] [ 8412.062553] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 16 13:20:37 [kernel] [ 8412.062553] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 16 13:20:37 [kernel] [ 8412.062553] Process swapper/3 (pid: 0, threadinfo ffff8801b8122000, task ffff8801b8111470)
Jul 16 13:20:37 [kernel] [ 8412.062553] Stack:
Jul 16 13:20:37 [kernel] [ 8412.062553]  ffffffff81469534 ffff8801bfd83e88 ffff8801bfd8d800 ffff8801bfd8d650
Jul 16 13:20:37 [kernel] [ 8412.062553]  ffff8801b8123fd8 ffff8801b8123fd8 ffffffff81449ca0 ffff8801b55607e0
Jul 16 13:20:37 [kernel] [ 8412.062553]  ffff8801b811c970 ffff8801bfd83e90 ffffffff810c2ab0 ffffffff81e04088
Jul 16 13:20:37 [kernel] [ 8412.062553] Call Trace:
Jul 16 13:20:37 [kernel] [ 8412.062553]  <IRQ> 
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81469534>] ? timerqueue_add+0x74/0xc0
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81449ca0>] ? blk_abort_queue+0x120/0x120
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff810c2ab0>] ? ktime_get+0x60/0xe0
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81088599>] __do_softirq+0x99/0x210
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff810c948f>] ? tick_program_event+0x1f/0x30
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff8188360c>] call_softirq+0x1c/0x30
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff8104263d>] do_softirq+0x4d/0x80
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81088955>] irq_exit+0x65/0x70
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff8105e219>] smp_apic_timer_interrupt+0x69/0xa0
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 16 13:20:37 [kernel] [ 8412.062553]  <EOI> 
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff8186ca9a>] start_secondary+0x1a9/0x1ad
Jul 16 13:20:37 [kernel] [ 8412.062553] Code: 00 00 00 0f 84 2c 01 00 00 4d 63 ed 48 83 c2 01 49 c1 e5 04 48 89 53 20 49 01 dd 49 8b 55 30 49 8d 45 30 48 89 55 c0 4c 89 62 08 <49> 8b 55 38 48 89 55 c8 4c 89 22 49 89 45 30 4c 8b 7d c0 48 89 
Jul 16 13:20:37 [kernel] [ 8412.062553] Call Trace:
Jul 16 13:20:37 [kernel] [ 8412.062553]  <IRQ>  [<ffffffff81469534>] ? timerqueue_add+0x74/0xc0
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81449ca0>] ? blk_abort_queue+0x120/0x120
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff810c2ab0>] ? ktime_get+0x60/0xe0
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81088599>] __do_softirq+0x99/0x210
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff810c948f>] ? tick_program_event+0x1f/0x30
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff8188360c>] call_softirq+0x1c/0x30
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff8104263d>] do_softirq+0x4d/0x80
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81088955>] irq_exit+0x65/0x70
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff8105e219>] smp_apic_timer_interrupt+0x69/0xa0
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 16 13:20:37 [kernel] [ 8412.062553]  <EOI>  [<ffffffff81048de2>] ? mwait_idle+0xa2/0x230
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 16 13:20:37 [kernel] [ 8412.062553]  [<ffffffff8186ca9a>] start_secondary+0x1a9/0x1ad
Jul 16 13:20:37 [kernel] [ 8412.416299] mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff8801a85d9a00)
gsaraber commented 12 years ago
Jul 16 14:39:37 [kernel] [ 4236.279031] BUG: unable to handle kernel NULL pointer dereference at           (null)
Jul 16 14:39:37 [kernel] [ 4236.279057] IP: [<ffffffff810a93a4>] __wake_up_common+0x24/0x90
Jul 16 14:39:37 [kernel] [ 4236.279078] PGD 14c921067 PUD 14610c067 PMD 0 
Jul 16 14:39:37 [kernel] [ 4236.279095] Oops: 0000 [#1] SMP 
Jul 16 14:39:37 [kernel] [ 4236.279109] CPU 0 
Jul 16 14:39:37 [kernel] [ 4236.279115] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas mptscsih megaraid_sas mptbase
Jul 16 14:39:37 [kernel] [ 4236.279164] 
Jul 16 14:39:37 [kernel] [ 4236.279170] Pid: 3125, comm: nfsd Tainted: P           O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 16 14:39:37 [kernel] [ 4236.279193] RIP: 0010:[<ffffffff810a93a4>]  [<ffffffff810a93a4>] __wake_up_common+0x24/0x90
Jul 16 14:39:37 [kernel] [ 4236.279213] RSP: 0018:ffff88018a72d700  EFLAGS: 00010086
Jul 16 14:39:37 [kernel] [ 4236.279225] RAX: 0000000000000286 RBX: ffff88013a9dae68 RCX: 0000000000000000
Jul 16 14:39:37 [kernel] [ 4236.279240] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88013a9dae68
Jul 16 14:39:37 [kernel] [ 4236.279255] RBP: ffff88018a72d740 R08: 0000000000000000 R09: 2222222222222222
Jul 16 14:39:37 [kernel] [ 4236.279271] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000286
Jul 16 14:39:37 [kernel] [ 4236.279286] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
Jul 16 14:39:37 [kernel] [ 4236.279303] FS:  0000000000000000(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000
Jul 16 14:39:37 [kernel] [ 4236.279385] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 16 14:39:37 [kernel] [ 4236.279462] CR2: 0000000000000000 CR3: 0000000158eff000 CR4: 00000000000007f0
Jul 16 14:39:37 [kernel] [ 4236.279542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 16 14:39:37 [kernel] [ 4236.279621] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 16 14:39:37 [kernel] [ 4236.279702] Process nfsd (pid: 3125, threadinfo ffff88018a72c000, task ffff8801b68fcaf0)
Jul 16 14:39:37 [kernel] [ 4236.279848] Stack:
Jul 16 14:39:37 [kernel] [ 4236.279919]  0000000000000286 0000000300000001 0000000000000003 ffff88013a9dae68
Jul 16 14:39:37 [kernel] [ 4236.280002]  0000000000000286 0000000000000003 0000000000000001 0000000000000000
Jul 16 14:39:37 [kernel] [ 4236.280002]  ffff88018a72d780 ffffffff810aa543 2222222222222222 ffff88013a9dae38
Jul 16 14:39:37 [kernel] [ 4236.280002] Call Trace:
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff810aa543>] __wake_up+0x43/0x70
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffffa005d70d>] cv_wait_common+0x14d/0x1b0 [spl]
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffffa005d78e>] __cv_wait+0xe/0x10 [spl]
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffffa020a3b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffffa0214b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff81881919>] ? _raw_spin_unlock_irq+0x9/0x10
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffffa0227edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffffa0227f73>] zpl_write+0x63/0x90 [zfs]
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffffa0227f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffffa0227aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 16 14:39:37 [kernel] [ 4236.280002]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 16 14:39:37 [kernel] [ 4236.284307]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 16 14:39:37 [kernel] [ 4236.284307]  [<ffffffff81883510>] ? gs_change+0xb/0xb
Jul 16 14:39:37 [kernel] [ 4236.284307] RIP  [<ffffffff810a93a4>] __wake_up_common+0x24/0x90
Jul 16 14:39:37 [kernel] [ 4236.284307]  RSP <ffff88018a72d700>
gsaraber commented 12 years ago

Another one this morning: About 3 minutes after the below BUG, NFS quits working, requiring a reboot. The reboot command doesn't actually work, so I have to reboot using ctrl+alt+sysrq+s/u/b

Linux shrapnel 3.4.4 #2 SMP Mon Jul 16 10:14:24 CDT 2012 x86_64 Intel(R) Xeon(R) CPU 5140 @ 2.33GHz GenuineIntel GNU/Linux 6GB Ram options zfs zfs_arc_max=2147483648

Jul 19 08:35:21 [kernel] [61407.923034] BUG: unable to handle kernel NULL pointer dereference at           (null)
Jul 19 08:35:21 [kernel] [61407.923060] IP: [<ffffffff810a93a4>] __wake_up_common+0x24/0x90
Jul 19 08:35:21 [kernel] [61407.923082] PGD 1af160067 PUD 1b20f0067 PMD 0 
Jul 19 08:35:21 [kernel] [61407.923099] Oops: 0000 [#1] SMP 
Jul 19 08:35:21 [kernel] [61407.923112] CPU 0 
Jul 19 08:35:21 [kernel] [61407.923118] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) megaraid_sas mptsas mptscsih mptbase
Jul 19 08:35:21 [kernel] [61407.923167] 
Jul 19 08:35:21 [kernel] [61407.923174] Pid: 3107, comm: nfsd Tainted: P           O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 19 08:35:21 [kernel] [61407.923197] RIP: 0010:[<ffffffff810a93a4>]  [<ffffffff810a93a4>] __wake_up_common+0x24/0x90
Jul 19 08:35:21 [kernel] [61407.923217] RSP: 0018:ffff8801a0113700  EFLAGS: 00010086
Jul 19 08:35:21 [kernel] [61407.923229] RAX: 0000000000000286 RBX: ffff8801b6119e68 RCX: 0000000000000000
Jul 19 08:35:21 [kernel] [61407.923244] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8801b6119e68
Jul 19 08:35:21 [kernel] [61407.923259] RBP: ffff8801a0113740 R08: 0000000000000000 R09: 2222222222222222
Jul 19 08:35:21 [kernel] [61407.923274] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000286
Jul 19 08:35:21 [kernel] [61407.923289] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
Jul 19 08:35:21 [kernel] [61407.923305] FS:  0000000000000000(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000
Jul 19 08:35:21 [kernel] [61407.923387] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 19 08:35:21 [kernel] [61407.923465] CR2: 0000000000000000 CR3: 00000001b4bd1000 CR4: 00000000000007f0
Jul 19 08:35:21 [kernel] [61407.923545] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 19 08:35:21 [kernel] [61407.923626] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 19 08:35:21 [kernel] [61407.923706] Process nfsd (pid: 3107, threadinfo ffff8801a0112000, task ffff88019ef3a210)
Jul 19 08:35:21 [kernel] [61407.923852] Stack:
Jul 19 08:35:21 [kernel] [61407.923923]  0000000000000286 0000000300000001 0000000000000003 ffff8801b6119e68
Jul 19 08:35:21 [kernel] [61407.924003]  0000000000000286 0000000000000003 0000000000000001 0000000000000000
Jul 19 08:35:21 [kernel] [61407.924003]  ffff8801a0113780 ffffffff810aa543 2222222222222222 ffff8801b6119e38
Jul 19 08:35:21 [kernel] [61407.924003] Call Trace:
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff810aa543>] __wake_up+0x43/0x70
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffffa006578e>] __cv_wait+0xe/0x10 [spl]
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffffa03da3b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffffa03e4b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff810b093c>] ? try_to_wake_up+0x1cc/0x260
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff81881949>] ? _raw_spin_unlock+0x9/0x10
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffffa0066c3c>] ? tsd_exit+0x19c/0x1d0 [spl]
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffffa03f7edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffffa03f7f73>] zpl_write+0x63/0x90 [zfs]
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffffa03f7f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffffa03f7aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 19 08:35:21 [kernel] [61407.924003]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 19 08:35:21 [kernel] [61407.924003] Code: 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 89 cf 41 56 4d 89 c6 41 55 41 54 53 48 89 fb 48 83 ec 18 89 55 c8 48 8b 57 18 89 75 cc <4c> 8b 22 48 8d 42 e8 49 83 ec 18 48 39 c7 75 12 eb 40 66 2e 0f 
Jul 19 08:35:21 [kernel] [61407.927817] RIP  [<ffffffff810a93a4>] __wake_up_common+0x24/0x90
Jul 19 08:35:21 [kernel] [61407.927817]  RSP <ffff8801a0113700>
Jul 19 08:35:21 [kernel] [61407.927817] CR2: 0000000000000000
Jul 19 08:35:21 [kernel] [61407.927817] ---[ end trace a2adcb19cd142e60 ]---
gsaraber commented 12 years ago

Another crash dump from today attached below, it occurred to us that the machine ran fine copying 15TB of data from EXT4 to ZFS, it was up for a week. Now the ZFS filesystem is receiving NFS writes, some minimal reads & writes via Samba, and copying all data using rsync from ZFS to another server. Now with all this extra networking we're experiencing mostly soft lockups and some hardlocks. and every once in a while it causes i/o errors on the server's root drive (sda), so we swapped the root drive out, and we're still getting the i/o errors, which go away after a reboot. Hope any of the above helps..

Jul 25 09:09:58 [kernel] [ 3970.480963] ------------[ cut here ]------------
Jul 25 09:09:58 [kernel] [ 3970.480963] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x9a/0xc0()
Jul 25 09:09:58 [kernel] [ 3970.480963] Hardware name: X7DB8
Jul 25 09:09:58 [kernel] [ 3970.480963] Watchdog detected hard LOCKUP on cpu 2
Jul 25 09:09:58 [kernel] [ 3970.480963] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas megaraid_sas mptscsih mptbase
Jul 25 09:09:58 [kernel] [ 3970.480963] Pid: 3113, comm: nfsd Tainted: P           O 3.4.4 #2
Jul 25 09:09:58 [kernel] [ 3970.480963] Call Trace:
Jul 25 09:09:58 [kernel] [ 3970.480963]  <NMI>  [<ffffffff8108193a>] warn_slowpath_common+0x7a/0xb0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81081a11>] warn_slowpath_fmt+0x41/0x50
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810efc1a>] watchdog_overflow_callback+0x9a/0xc0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8111834c>] __perf_event_overflow+0x9c/0x220
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81118c84>] perf_event_overflow+0x14/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81055b14>] intel_pmu_handle_irq+0x184/0x2e0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8104fc61>] perf_event_nmi_handler+0x21/0x30
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8104413a>] do_nmi+0x11a/0x3a0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81882062>] end_repeat_nmi+0x1a/0x1e
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8146cb97>] ? delay_tsc+0x37/0x60
                - Last output repeated 2 times -
Jul 25 09:09:58 [kernel] [ 3970.480963]  <<EOE>>  [<ffffffff8146caca>] __delay+0xa/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81473c5e>] do_raw_spin_lock+0x7e/0x140
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881991>] _raw_spin_lock_irqsave+0x11/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810aa52d>] __wake_up+0x2d/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005d70d>] cv_wait_common+0x14d/0x1b0 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005d78e>] __cv_wait+0xe/0x10 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa01f73b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0201b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810b093c>] ? try_to_wake_up+0x1cc/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881949>] ? _raw_spin_unlock+0x9/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005ec3c>] ? tsd_exit+0x19c/0x1d0 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214f73>] zpl_write+0x63/0x90 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81883510>] ? gs_change+0xb/0xb
Jul 25 09:09:58 [kernel] [ 3970.480963] ---[ end trace 70fa81eda6fcb0f5 ]---
Jul 25 09:09:58 [kernel] [ 3970.480963] BUG: unable to handle kernel paging request at 0000081e00000194
Jul 25 09:09:58 [kernel] [ 3970.480963] IP: [<ffffffff8187b856>] spin_dump+0x61/0x8f
Jul 25 09:09:58 [kernel] [ 3970.480963] PGD 0 
Jul 25 09:09:58 [kernel] [ 3970.480963] Oops: 0000 [#1] SMP 
Jul 25 09:09:58 [kernel] [ 3970.480963] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas megaraid_sas mptscsih mptbase
Jul 25 09:09:58 [kernel] [ 3970.480963] 
Jul 25 09:09:58 [kernel] [ 3970.480963] Pid: 3113, comm: nfsd Tainted: P        W  O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 25 09:09:58 [kernel] [ 3970.480963] RIP: 0010:[<ffffffff8187b856>]  [<ffffffff8187b856>] spin_dump+0x61/0x8f
Jul 25 09:09:58 [kernel] [ 3970.480963] RSP: 0018:ffff8801b2ee76e0  EFLAGS: 00010006
Jul 25 09:09:58 [kernel] [ 3970.480963] RAX: 000000000000003e RBX: ffff880075518868 RCX: ffffffff81c86ef6
Jul 25 09:09:58 [kernel] [ 3970.480963] RDX: 000000000000a9a9 RSI: 0000000000000082 RDI: ffffffff81e23bc0
Jul 25 09:09:58 [kernel] [ 3970.480963] RBP: ffff8801b2ee76f0 R08: 00000000ffffffff R09: 0000000000000000
Jul 25 09:09:58 [kernel] [ 3970.480963] R10: 0000000000000000 R11: 0000000000000000 R12: 0000081e00000000
Jul 25 09:09:58 [kernel] [ 3970.480963] R13: 000000008b0ed4c8 R14: 0000000000000000 R15: 0000000000000000
Jul 25 09:09:58 [kernel] [ 3970.480963] FS:  0000000000000000(0000) GS:ffff8801bfd00000(0000) knlGS:0000000000000000
Jul 25 09:09:58 [kernel] [ 3970.480963] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 25 09:09:58 [kernel] [ 3970.480963] CR2: 0000081e00000194 CR3: 000000016c8bb000 CR4: 00000000000007e0
Jul 25 09:09:58 [kernel] [ 3970.480963] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 25 09:09:58 [kernel] [ 3970.480963] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 25 09:09:58 [kernel] [ 3970.480963] Process nfsd (pid: 3113, threadinfo ffff8801b2ee6000, task ffff8801b726d1c0)
Jul 25 09:09:58 [kernel] [ 3970.480963] Stack:
Jul 25 09:09:58 [kernel] [ 3970.480963]  ffff880075518868 000000008b0ed4c8 ffff8801b2ee7720 ffffffff81473cd3
Jul 25 09:09:58 [kernel] [ 3970.480963]  0000000000000286 ffff880184647a88 0000000000000003 0000000000000001
Jul 25 09:09:58 [kernel] [ 3970.480963]  ffff8801b2ee7740 ffffffff81881991 2222222222222222 ffff880075518868
Jul 25 09:09:58 [kernel] [ 3970.480963] Call Trace:
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81473cd3>] do_raw_spin_lock+0xf3/0x140
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881991>] _raw_spin_lock_irqsave+0x11/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810aa52d>] __wake_up+0x2d/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005d70d>] cv_wait_common+0x14d/0x1b0 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005d78e>] __cv_wait+0xe/0x10 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa01f73b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0201b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810b093c>] ? try_to_wake_up+0x1cc/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881949>] ? _raw_spin_unlock+0x9/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005ec3c>] ? tsd_exit+0x19c/0x1d0 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214f73>] zpl_write+0x63/0x90 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81883510>] ? gs_change+0xb/0xb
Jul 25 09:09:58 [kernel] [ 3970.480963] RIP  [<ffffffff8187b856>] spin_dump+0x61/0x8f
Jul 25 09:09:58 [kernel] [ 3970.480963]  RSP <ffff8801b2ee76e0>
Jul 25 09:09:58 [kernel] [ 3970.480963] CR2: 0000081e00000194
Jul 25 09:09:58 [kernel] [ 3970.480963] ---[ end trace 70fa81eda6fcb0f6 ]---
Jul 25 09:09:58 [kernel] [ 3970.480963] INFO: rcu_sched self-detected stall on CPU { 2}  (t=174118 jiffies)
Jul 25 09:09:58 [kernel] [ 3970.480963] sending NMI to all CPUs:
Jul 25 09:09:58 [kernel] [ 3970.480963] NMI backtrace for cpu 2
Jul 25 09:09:58 [kernel] [ 3970.480963] CPU 2 
Jul 25 09:09:58 [kernel] [ 3970.480963] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas megaraid_sas mptscsih mptbase
Jul 25 09:09:58 [kernel] [ 3970.480963] 
Jul 25 09:09:58 [kernel] [ 3970.480963] Pid: 3113, comm: nfsd Tainted: P      D W  O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 25 09:09:58 [kernel] [ 3970.480963] RIP: 0010:[<ffffffff8146cb75>]  [<ffffffff8146cb75>] delay_tsc+0x15/0x60
Jul 25 09:09:58 [kernel] [ 3970.480963] RSP: 0018:ffff8801bfd03c40  EFLAGS: 00000807
Jul 25 09:09:58 [kernel] [ 3970.480963] RAX: 000000005318b98c RBX: 0000000000002710 RCX: 000000000000c6c5
Jul 25 09:09:58 [kernel] [ 3970.480963] RDX: 0000000000000907 RSI: 0000000000000086 RDI: 0000000000239a97
Jul 25 09:09:58 [kernel] [ 3970.480963] RBP: ffff8801bfd03c40 R08: 0000000000000002 R09: 0000000000000001
Jul 25 09:09:58 [kernel] [ 3970.480963] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e2e9c0
Jul 25 09:09:58 [kernel] [ 3970.480963] R13: ffffffff81e2ea80 R14: ffff8801bfd03df8 R15: 0000000000000000
Jul 25 09:09:58 [kernel] [ 3970.480963] FS:  0000000000000000(0000) GS:ffff8801bfd00000(0000) knlGS:0000000000000000
Jul 25 09:09:58 [kernel] [ 3970.480963] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 25 09:09:58 [kernel] [ 3970.480963] CR2: 0000081e00000194 CR3: 000000016c8bb000 CR4: 00000000000007e0
Jul 25 09:09:58 [kernel] [ 3970.480963] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 25 09:09:58 [kernel] [ 3970.480963] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 25 09:09:58 [kernel] [ 3970.480963] Process nfsd (pid: 3113, threadinfo ffff8801b2ee6000, task ffff8801b726d1c0)
Jul 25 09:09:58 [kernel] [ 3970.480963] Stack:
Jul 25 09:09:58 [kernel] [ 3970.480963]  ffff8801bfd03c50 ffffffff8146caf8 ffff8801bfd03c70 ffffffff8105effa
Jul 25 09:09:58 [kernel] [ 3970.480963]  0000000000000000 ffff8801bfd0db20 ffff8801bfd03cc0 ffffffff810f6c6a
Jul 25 09:09:58 [kernel] [ 3970.480963]  0000000000000008 0000000000000086 ffff8801bfd03cc0 0000000000000002
Jul 25 09:09:58 [kernel] [ 3970.480963] Call Trace:
Jul 25 09:09:58 [kernel] [ 3970.480963]  <IRQ> 
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8146caf8>] __const_udelay+0x28/0x30
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8105effa>] arch_trigger_all_cpu_backtrace+0x6a/0x90
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810f6c6a>] __rcu_pending+0x19a/0x4d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810f7460>] rcu_check_callbacks+0xb0/0x1a0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81090863>] update_process_times+0x43/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810c9678>] tick_sched_timer+0x58/0x140
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a57ce>] __run_hrtimer+0x6e/0x1b0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810c9620>] ? tick_init_highres+0x20/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8158e2d2>] ? put_device+0x12/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a5e73>] hrtimer_interrupt+0xf3/0x220
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8105e214>] smp_apic_timer_interrupt+0x64/0xa0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8108855c>] ? __do_softirq+0x5c/0x210
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8188360c>] call_softirq+0x1c/0x30
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8104263d>] do_softirq+0x4d/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81088955>] irq_exit+0x65/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8105d4ff>] smp_call_function_single_interrupt+0x2f/0x40
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff818830e7>] call_function_single_interrupt+0x67/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  <EOI> 
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881991>] ? _raw_spin_lock_irqsave+0x11/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8188191b>] ? _raw_spin_unlock_irq+0xb/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810d5b91>] ? acct_collect+0x171/0x1b0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81085d83>] do_exit+0x653/0x900
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81083957>] ? kmsg_dump+0x57/0xf0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81043bee>] oops_end+0x9e/0xe0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81874ce9>] no_context+0x271/0x280
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a6f05>] ? atomic_notifier_call_chain+0x15/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81874eeb>] bad_area_nosemaphore+0xe/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8106894d>] do_page_fault+0x2ad/0x430
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81082595>] ? console_unlock+0x1e5/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881d6f>] page_fault+0x1f/0x30
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187b856>] ? spin_dump+0x61/0x8f
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81473cd3>] do_raw_spin_lock+0xf3/0x140
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881991>] _raw_spin_lock_irqsave+0x11/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810aa52d>] __wake_up+0x2d/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005d70d>] cv_wait_common+0x14d/0x1b0 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005d78e>] __cv_wait+0xe/0x10 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa01f73b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0201b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810b093c>] ? try_to_wake_up+0x1cc/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881949>] ? _raw_spin_unlock+0x9/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005ec3c>] ? tsd_exit+0x19c/0x1d0 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214f73>] zpl_write+0x63/0x90 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81883510>] ? gs_change+0xb/0xb
Jul 25 09:09:58 [kernel] [ 3970.480963] Code: 00 00 f7 e2 48 8d 7a 01 ff 15 a0 68 9f 00 5d c3 66 0f 1f 44 00 00 55 65 44 8b 04 25 30 ce 00 00 48 89 e5 66 66 90 0f ae e8 0f 31 <89> c6 eb 16 0f 1f 80 00 00 00 00 f3 90 65 8b 0c 25 30 ce 00 00 
Jul 25 09:09:58 [kernel] [ 3970.480963] Call Trace:
Jul 25 09:09:58 [kernel] [ 3970.480963]  <IRQ>  [<ffffffff8146caf8>] __const_udelay+0x28/0x30
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8105effa>] arch_trigger_all_cpu_backtrace+0x6a/0x90
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810f6c6a>] __rcu_pending+0x19a/0x4d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810f7460>] rcu_check_callbacks+0xb0/0x1a0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81090863>] update_process_times+0x43/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810c9678>] tick_sched_timer+0x58/0x140
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a57ce>] __run_hrtimer+0x6e/0x1b0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810c9620>] ? tick_init_highres+0x20/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8158e2d2>] ? put_device+0x12/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a5e73>] hrtimer_interrupt+0xf3/0x220
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8105e214>] smp_apic_timer_interrupt+0x64/0xa0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8108855c>] ? __do_softirq+0x5c/0x210
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8188360c>] call_softirq+0x1c/0x30
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8104263d>] do_softirq+0x4d/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81088955>] irq_exit+0x65/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff818830e7>] call_function_single_interrupt+0x67/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  <EOI>  [<ffffffff81881991>] ? _raw_spin_lock_irqsave+0x11/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8188191b>] ? _raw_spin_unlock_irq+0xb/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810d5b91>] ? acct_collect+0x171/0x1b0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81085d83>] do_exit+0x653/0x900
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81083957>] ? kmsg_dump+0x57/0xf0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81043bee>] oops_end+0x9e/0xe0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81874ce9>] no_context+0x271/0x280
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a6f05>] ? atomic_notifier_call_chain+0x15/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81874ebe>] __bad_area_nosemaphore+0x1c6/0x1e5
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81874eeb>] bad_area_nosemaphore+0xe/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8106894d>] do_page_fault+0x2ad/0x430
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81082595>] ? console_unlock+0x1e5/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881d6f>] page_fault+0x1f/0x30
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187b856>] ? spin_dump+0x61/0x8f
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81473cd3>] do_raw_spin_lock+0xf3/0x140
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881991>] _raw_spin_lock_irqsave+0x11/0x20
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810aa52d>] __wake_up+0x2d/0x70
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005d70d>] cv_wait_common+0x14d/0x1b0 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a1ee0>] ? __init_waitqueue_head+0x40/0x40
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005d78e>] __cv_wait+0xe/0x10 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa01f73b4>] zfs_range_lock+0x2b4/0x5f0 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0201b5f>] zfs_write+0x59f/0xc40 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810b093c>] ? try_to_wake_up+0x1cc/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81881949>] ? _raw_spin_unlock+0x9/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff8187fcc7>] ? __mutex_lock_slowpath+0x1f7/0x2c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa005ec3c>] ? tsd_exit+0x19c/0x1d0 [spl]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214edd>] zpl_write_common+0x4d/0x80 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214f73>] zpl_write+0x63/0x90 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214f10>] ? zpl_write_common+0x80/0x80 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81173191>] do_loop_readv_writev+0x51/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff811734ae>] do_readv_writev+0x1ae/0x1d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffffa0214aa0>] ? zpl_release+0x60/0x60 [zfs]
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81173507>] vfs_writev+0x37/0x50
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bb589>] nfsd_vfs_write.isra.14+0xe9/0x2d0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81170eca>] ? dentry_open+0x4a/0x80
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bc5a9>] ? nfsd_open+0x159/0x1c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812bdc68>] nfsd_write+0xf8/0x110
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812c4cea>] nfsd3_proc_write+0x9a/0xd0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b805e>] nfsd_dispatch+0xbe/0x1c0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81818c88>] svc_process_common+0x328/0x5e0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810b09d0>] ? try_to_wake_up+0x260/0x260
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81819291>] svc_process+0x101/0x150
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b77f5>] nfsd+0xb5/0x150
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff812b7740>] ? nfsd_get_default_max_blksize+0x60/0x60
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a146e>] kthread+0x8e/0xa0
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81883514>] kernel_thread_helper+0x4/0x10
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff810a13e0>] ? __init_kthread_worker+0x40/0x40
Jul 25 09:09:58 [kernel] [ 3970.480963]  [<ffffffff81883510>] ? gs_change+0xb/0xb
Jul 25 09:09:58 [kernel] [ 4144.598000] NMI backtrace for cpu 0
Jul 25 09:09:58 [kernel] [ 4144.598000] CPU 0 
Jul 25 09:09:58 [kernel] [ 4144.598000] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas megaraid_sas mptscsih mptbase
Jul 25 09:09:58 [kernel] [ 4144.598000] 
Jul 25 09:09:58 [kernel] [ 4144.598000] Pid: 0, comm: swapper/0 Tainted: P      D W  O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 25 09:09:58 [kernel] [ 4144.598000] RIP: 0010:[<ffffffff81048de2>]  [<ffffffff81048de2>] mwait_idle+0xa2/0x230
Jul 25 09:09:58 [kernel] [ 4144.598000] RSP: 0018:ffffffff81e01ef8  EFLAGS: 00000246
Jul 25 09:09:58 [kernel] [ 4144.598000] RAX: 0000000000000000 RBX: ffffffff81e01fd8 RCX: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.598000] FS:  0000000000000000(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.598000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 25 09:09:58 [kernel] [ 4144.598000] CR2: 00007f7020633000 CR3: 0000000198576000 CR4: 00000000000007f0
Jul 25 09:09:58 [kernel] [ 4144.598000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.598000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 25 09:09:58 [kernel] [ 4144.598000] Process swapper/0 (pid: 0, threadinfo ffffffff81e00000, task ffffffff81e13020)
Jul 25 09:09:58 [kernel] [ 4144.598000] Stack:
Jul 25 09:09:58 [kernel] [ 4144.598000]  ffffffff81e01fd8 ffffffff81ed48b0 ffffffff81f365e0 ffff8801bff813c0
Jul 25 09:09:58 [kernel] [ 4144.598000]  ffffffff81e01f38 ffffffff81049bd6 ffffffffffffffff ffffffff81f33940
Jul 25 09:09:58 [kernel] [ 4144.598000]  ffffffff81e01f48 ffffffff81856399 ffffffff81e01f98 ffffffff81ef3b3d
Jul 25 09:09:58 [kernel] [ 4144.598000] Call Trace:
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81856399>] rest_init+0x6d/0x74
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81ef3b3d>] start_kernel+0x32b/0x338
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81ef3612>] ? repair_env_string+0x5a/0x5a
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81ef3321>] x86_64_start_reservations+0x131/0x135
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81ef3415>] x86_64_start_kernel+0xf0/0xf7
Jul 25 09:09:58 [kernel] [ 4144.598000] Code: 34 25 b0 b9 00 00 48 89 d1 48 8d 86 38 e0 ff ff 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 0f 85 cb 00 00 00 31 c0 fb 0f 01 c9 <8b> 05 c0 0a e8 00 65 44 8b 24 25 30 ce 00 00 85 c0 75 6b 8b 05 
Jul 25 09:09:58 [kernel] [ 4144.598000] Call Trace:
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81856399>] rest_init+0x6d/0x74
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81ef3b3d>] start_kernel+0x32b/0x338
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81ef3612>] ? repair_env_string+0x5a/0x5a
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81ef3321>] x86_64_start_reservations+0x131/0x135
Jul 25 09:09:58 [kernel] [ 4144.598000]  [<ffffffff81ef3415>] x86_64_start_kernel+0xf0/0xf7
Jul 25 09:09:58 [kernel] [ 4144.598000] NMI backtrace for cpu 1
Jul 25 09:09:58 [kernel] [ 4144.598000] INFO: rcu_sched detected stalls on CPUs/tasks: { 2} (detected by 0, t=174118 jiffies)
Jul 25 09:09:58 [kernel] [ 4144.598000] INFO: Stall ended before state dump start
Jul 25 09:09:58 [kernel] [ 4144.598000] INFO: rcu_bh detected stalls on CPUs/tasks: { 2} (detected by 0, t=174118 jiffies)
Jul 25 09:09:58 [kernel] [ 4144.598000] INFO: Stall ended before state dump start
Jul 25 09:09:58 [kernel] [ 4144.634206] CPU 1 
Jul 25 09:09:58 [kernel] [ 4144.634206] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas megaraid_sas mptscsih mptbase
Jul 25 09:09:58 [kernel] [ 4144.634206] 
Jul 25 09:09:58 [kernel] [ 4144.634206] Pid: 0, comm: swapper/1 Tainted: P      D W  O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 25 09:09:58 [kernel] [ 4144.634206] RIP: 0010:[<ffffffff81048de2>]  [<ffffffff81048de2>] mwait_idle+0xa2/0x230
Jul 25 09:09:58 [kernel] [ 4144.634206] RSP: 0018:ffff8801b80f3ee8  EFLAGS: 00000246
Jul 25 09:09:58 [kernel] [ 4144.634206] RAX: 0000000000000000 RBX: ffff8801b80f3fd8 RCX: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.634206] RDX: 0000000000000000 RSI: ffff8801b80f3fd8 RDI: ffff8801bfc8d9e0
Jul 25 09:09:58 [kernel] [ 4144.634206] RBP: ffff8801b80f3f08 R08: 0000000000000000 R09: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.634206] R10: 0140000000000000 R11: 0000000000000000 R12: 0000000000000001
Jul 25 09:09:58 [kernel] [ 4144.634206] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.634206] FS:  0000000000000000(0000) GS:ffff8801bfc80000(0000) knlGS:0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.634206] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 25 09:09:58 [kernel] [ 4144.634206] CR2: 00007f32419d6000 CR3: 00000001b3d5c000 CR4: 00000000000007e0
Jul 25 09:09:58 [kernel] [ 4144.634206] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.634206] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 25 09:09:58 [kernel] [ 4144.634206] Process swapper/1 (pid: 0, threadinfo ffff8801b80f2000, task ffff8801b807caf0)
Jul 25 09:09:58 [kernel] [ 4144.634206] Stack:
Jul 25 09:09:58 [kernel] [ 4144.634206]  ffff8801b80f3fd8 ffffffff81ed48b0 0000000000000000 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.634206]  ffff8801b80f3f28 ffffffff81049bd6 0000000000000001 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.634206]  ffff8801b80f3f48 ffffffff8186ca9a 0000000000096000 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.634206] Call Trace:
Jul 25 09:09:58 [kernel] [ 4144.634206]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 25 09:09:58 [kernel] [ 4144.634206] Code: 34 25 b0 b9 00 00 48 89 d1 48 8d 86 38 e0 ff ff 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 0f 85 cb 00 00 00 31 c0 fb 0f 01 c9 <8b> 05 c0 0a e8 00 65 44 8b 24 25 30 ce 00 00 85 c0 75 6b 8b 05 
Jul 25 09:09:58 [kernel] [ 4144.634206] Call Trace:
Jul 25 09:09:58 [kernel] [ 4144.634206]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 25 09:09:58 [kernel] [ 4144.634206]  [<ffffffff8186ca9a>] start_secondary+0x1a9/0x1ad
Jul 25 09:09:58 [kernel] [ 4144.637380] NMI backtrace for cpu 3
Jul 25 09:09:58 [kernel] [ 4144.637380] CPU 3 
Jul 25 09:09:58 [kernel] [ 4144.637380] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas megaraid_sas mptscsih mptbase
Jul 25 09:09:58 [kernel] [ 4144.637380] 
Jul 25 09:09:58 [kernel] [ 4144.637380] Pid: 0, comm: swapper/3 Tainted: P      D W  O 3.4.4 #2 Supermicro X7DB8/X7DB8
Jul 25 09:09:58 [kernel] [ 4144.637380] RIP: 0010:[<ffffffff81048de2>]  [<ffffffff81048de2>] mwait_idle+0xa2/0x230
Jul 25 09:09:58 [kernel] [ 4144.637380] RSP: 0018:ffff8801b8123ee8  EFLAGS: 00000246
Jul 25 09:09:58 [kernel] [ 4144.637380] RAX: 0000000000000000 RBX: ffff8801b8123fd8 RCX: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.637380] RDX: 0000000000000000 RSI: ffff8801b8123fd8 RDI: ffff8801bfd8d9e0
Jul 25 09:09:58 [kernel] [ 4144.637380] RBP: ffff8801b8123f08 R08: 0000000000000000 R09: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.637380] R10: 0140000000000000 R11: 0000000000000000 R12: 0000000000000003
Jul 25 09:09:58 [kernel] [ 4144.637380] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.637380] FS:  0000000000000000(0000) GS:ffff8801bfd80000(0000) knlGS:0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.637380] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 25 09:09:58 [kernel] [ 4144.637380] CR2: 00007f32419d6000 CR3: 0000000198576000 CR4: 00000000000007e0
Jul 25 09:09:58 [kernel] [ 4144.637380] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.637380] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 25 09:09:58 [kernel] [ 4144.637380] Process swapper/3 (pid: 0, threadinfo ffff8801b8122000, task ffff8801b8111470)
Jul 25 09:09:58 [kernel] [ 4144.637380] Stack:
Jul 25 09:09:58 [kernel] [ 4144.645977]  ffff8801b8123fd8 ffffffff81ed48b0 0000000000000000 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.645977]  ffff8801b8123f28 ffffffff81049bd6 0000000000000003 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.645977]  ffff8801b8123f48 ffffffff8186ca9a 0000000000096000 0000000000000000
Jul 25 09:09:58 [kernel] [ 4144.645977] Call Trace:
Jul 25 09:09:58 [kernel] [ 4144.645977]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 25 09:09:58 [kernel] [ 4144.647370]  [<ffffffff8186ca9a>] start_secondary+0x1a9/0x1ad
Jul 25 09:09:58 [kernel] [ 4144.647370] Code: 34 25 b0 b9 00 00 48 89 d1 48 8d 86 38 e0 ff ff 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 0f 85 cb 00 00 00 31 c0 fb 0f 01 c9 <8b> 05 c0 0a e8 00 65 44 8b 24 25 30 ce 00 00 85 c0 75 6b 8b 05 
Jul 25 09:09:58 [kernel] [ 4144.648204] Call Trace:
Jul 25 09:09:58 [kernel] [ 4144.648204]  [<ffffffff81049bd6>] cpu_idle+0xa6/0xc0
Jul 25 09:09:58 [kernel] [ 4144.648204]  [<ffffffff8186ca9a>] start_secondary+0x1a9/0x1ad
Jul 25 09:09:58 [kernel] [ 4144.651301] ------------[ cut here ]------------
Jul 25 09:09:58 [kernel] [ 4144.651429] WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x25f/0x270()
Jul 25 09:09:58 [kernel] [ 4144.651617] Hardware name: X7DB8
Jul 25 09:09:58 [kernel] [ 4144.651733] NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
Jul 25 09:09:58 [kernel] [ 4144.651854] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) mptsas megaraid_sas mptscsih mptbase
Jul 25 09:09:58 [kernel] [ 4144.652666] Pid: 0, comm: swapper/3 Tainted: P      D W  O 3.4.4 #2
Jul 25 09:09:58 [kernel] [ 4144.652793] Call Trace:
Jul 25 09:09:58 [kernel] [ 4144.652912]  <IRQ>  [<ffffffff8108193a>] warn_slowpath_common+0x7a/0xb0
Jul 25 09:09:58 [kernel] [ 4144.653097]  [<ffffffff81081a11>] warn_slowpath_fmt+0x41/0x50
Jul 25 09:09:58 [kernel] [ 4144.653226]  [<ffffffff8173bfdf>] dev_watchdog+0x25f/0x270
Jul 25 09:09:58 [kernel] [ 4144.653348]  [<ffffffff8108fc3b>] run_timer_softirq+0x11b/0x370
Jul 25 09:09:58 [kernel] [ 4144.653470]  [<ffffffff8173bd80>] ? pfifo_fast_init+0x90/0x90
Jul 25 09:09:58 [kernel] [ 4144.653593]  [<ffffffff81088599>] __do_softirq+0x99/0x210
Jul 25 09:09:58 [kernel] [ 4144.653715]  [<ffffffff810adef5>] ? check_preempt_curr+0x75/0xa0
Jul 25 09:09:58 [kernel] [ 4144.653841]  [<ffffffff810c948f>] ? tick_program_event+0x1f/0x30
Jul 25 09:09:58 [kernel] [ 4144.653967]  [<ffffffff8188360c>] call_softirq+0x1c/0x30
Jul 25 09:09:58 [kernel] [ 4144.654103]  [<ffffffff8104263d>] do_softirq+0x4d/0x80
Jul 25 09:09:58 [kernel] [ 4144.654230]  [<ffffffff81088955>] irq_exit+0x65/0x70
Jul 25 09:09:58 [kernel] [ 4144.654356]  [<ffffffff8105e219>] smp_apic_timer_interrupt+0x69/0xa0
Jul 25 09:09:58 [kernel] [ 4144.654483]  [<ffffffff81882cc7>] apic_timer_interrupt+0x67/0x70
Jul 25 09:09:58 [kernel] [ 4144.654609]  <EOI>  [<ffffffff81048de2>] ? mwait_idle+0xa2/0x230
Jul 25 09:09:58 [kernel] [ 4144.654907]  [<ffffffff8186ca9a>] start_secondary+0x1a9/0x1ad
Jul 25 09:09:58 [kernel] [ 4144.655035] ---[ end trace 70fa81eda6fcb0f7 ]---
Jul 25 09:09:58 [kernel] [ 4144.655522] mptscsih: ioc0: attempting task abort! (sc=ffff88008381f100)
Jul 25 09:09:58 [kernel] [ 4144.655710] sd 9:0:1:0: [sdp] CDB: Read(10): 28 00 0a 53 a4 48 00 00 08 00
Jul 25 09:09:58 [kernel] [ 4144.656459] mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88008381f100)
Jul 25 09:09:58 [kernel] [ 4144.660223] e1000e 0000:06:00.1: eth1: Reset adapter
Jul 25 09:09:58 [kernel] [ 4144.697015] bonding: bond0: link status definitely down for interface eth1, disabling it
Jul 25 09:10:01 [kernel] [ 4147.483384] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 25 09:10:01 [kernel] [ 4147.497011] bonding: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex.
behlendorf commented 12 years ago

This looks like a duplicate of issue #837 . You might also try the latest SPL which fixes an issue in the conditional variable implementation.

htkmmo commented 12 years ago

I'm also getting hard lockups perhaps triggered by ZFS's heavy usage of system resources. So far I haven't been able to get any relevant info from kernel logs. There is a similarity with your case in using the same SAS controller driver (mptscsih/mptsas). Due to the rather large array size I assume also in your case some SAS expanders are present. Basically, my suspicion is that the locks might be triggered by improper handling of disks seen through expanders by the LSI driver. (as 1st step "lsiutil" can be used for a lower level verification of SAS cabling and ports status).

I've been using XFS and Linux MD RAID on the same hardware for may years now without issues. Although ZFS code can be at fault this seems less likely since I assume there's quite heavy testing being done with it (perhaps mostly on virtual machines). However with LSI's driver lockups do occur perhaps triggered by the multitude of IO threads ZFS appears to be using.

I wish I could test with a different controller but I'm limited to 6 SATA ports on the board and I have no other way to connect a 2x SATA-SSD root + 4x SAS array and the remaining 12x SATA drives. So far 3.2.30, 3,5,3 and 3,5,5 kernels resulted in lockup errors. Things got quite a bit more stable after disconnecting the 12x SATA array (which used expanders), reflashing controllers + clearing persistent mappings from SAS BIOS and porting LSI's own driver into the 3.5 kernel.

gsaraber commented 12 years ago

I do think you're on to something, it seems like heavy load/more threads seem to trigger the lockups, but I have no real evidence, just a gut feeling. I downloaded & installed lsiutil, I'm not sure what I'm looking for, and it's also only recognizing one of my LSI controllers [I have an older one and a newish one]: 08:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08) 0a:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon](rev 03)

Either way, I'm not using expanders, each cable from the LSI goes to a connector on a hotplug backplane. one sata cable = one harddrive.

I wish I could help more with chasing this problem down, but I've switched back to using EXT4, LVM and linux software raid, and experienced no more problems.

htkmmo commented 12 years ago

With lsiutil you can select a controller and use 20 and 12 commands to get link error stats. However, I assume your setup is fine since the problems went away.

Looks like the issue to be more on ZFS's side. Since my previous kernels locked up without much info, even though I had the hard&soft lock detection in place I enabled more debug features in a new kernel config. With this one I get the 1st warning below

[    5.059216] sd 0:0:2:0: >[sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    5.063889]  sdc: sdc1
[    5.067247] sd 0:0:2:0: >[sdc] Attached SCSI disk
[    5.120906] 
[    5.121211] =============================================
[    5.121654] [ INFO: possible recursive locking detected ]
[    5.121983] 3.5.5.dbg #2 Not tainted
[    5.122268] ---------------------------------------------
[    5.122588] vdev_open/0/1235 is trying to acquire lock:
[    5.122931]  (&zio->io_lock){+.+...}, at: [] zio_add_child+0xde/0x1b0
[    5.123346] 
[    5.123346] but task is already holding lock:
[    5.123346]  (&zio->io_lock){+.+...}, at: [] zio_add_child+0x80/0x1b0
[    5.123346] 
[    5.123346] other info that might help us debug this:
[    5.123346]  Possible unsafe locking scenario:
[    5.123346] 
[    5.123346]        CPU0
[    5.123346]        ----
[    5.123346]   lock(&zio->io_lock);
[    5.123346]   lock(&zio->io_lock);
[    5.123346] 
[    5.123346]  *** DEADLOCK ***
[    5.123346] 
[    5.123346]  May be due to missing lock nesting notation
[    5.123346] 
[    5.123346] 1 lock held by vdev_open/0/1235:
[    5.123346]  #0:  (&zio->io_lock){+.+...}, at: [] zio_add_child+0x80/0x1b0
[    5.123346] 
[    5.123346] stack backtrace:
[    5.123346] Pid: 1235, comm: vdev_open/0 Not tainted 3.5.5.dbg #2
[    5.123346] Call Trace:
[    5.123346]  [] __lock_acquire+0x13b8/0x1df0
[    5.123346]  [] ? spl_kmem_cache_alloc+0xfd/0xed0
[    5.123346]  [] lock_acquire+0x9e/0xd0
[    5.123346]  [] ? zio_add_child+0xde/0x1b0
[    5.123346]  [] mutex_lock_nested+0x64/0x330
[    5.123346]  [] ? zio_add_child+0xde/0x1b0
[    5.123346]  [] ? spl_kmem_cache_alloc+0xfd/0xed0
[    5.123346]  [] zio_add_child+0xde/0x1b0
[    5.123346]  [] zio_create+0x46b/0x6b0
[    5.123346]  [] zio_read_phys+0x69/0x80
[    5.123346]  [] ? vdev_accessible+0x70/0x70
[    5.123346]  [] vdev_probe+0x154/0x260
[    5.123346]  [] ? vdev_accessible+0x70/0x70
[    5.123346]  [] ? zfs_post_state_change+0x10/0x20
[    5.123346]  [] vdev_open+0x293/0x440
[    5.123346]  [] vdev_open_child+0x21/0x40
[    5.123346]  [] taskq_thread+0x220/0x580
[    5.123346]  [] ? _raw_spin_unlock_irqrestore+0x3a/0x70
[    5.123346]  [] ? try_to_wake_up+0x2f0/0x2f0
[    5.123346]  [] ? task_done+0x140/0x140
[    5.123346]  [] kthread+0x8e/0xa0
[    5.123346]  [] kernel_thread_helper+0x4/0x10
[    5.123346]  [] ? retint_restore_args+0xe/0xe
[    5.123346]  [] ? __init_kthread_worker+0x70/0x70
[    5.123346]  [] ? gs_change+0xb/0xb
[    5.185606] SPL: using hostid 0x30307830

I wish there would be more direct source code/line numbers shown (if you know what, if any, kernel config option can do this please let me know).

This happens after booting the kernel on an initramdisk, after SCSI drives are detected and SPL(+ZFS) is initialized as kernel built-in. I was able to reproduce it on a virtual machine. On real hardware I get the same notification except that the Sys (system) pool can't be imported and the mounts created. As on HW I can't proceed past the initial import I can't see the debug kernel in action :(

Is my setup at fault or is ZFS really locking itself up? Testing shows the "using host id ..." message only after I call "zpool import -f Sys" (I thought as kernel built-in it would initialize earlier but perhaps this print out is not part of its initialization).

htkmmo commented 12 years ago

Forgot to add, I'm using spl/zfs-0.6.0.rc11 and, indeed, as built in it initializes in fact quite early

[    1.536050] microcode: CPU0 sig=0xf61, pf=0x1, revision=0x1
[    1.536500] microcode: CPU1 sig=0xf61, pf=0x1, revision=0x1
[    1.537349] microcode: Microcode Update Driver: v2.00 , Peter Oruba
[    1.539639] audit: initializing netlink socket (disabled)
[    1.540180] type=2000 audit(1349883847.539:1): initialized
[    1.591440] SPL: Loaded module v0.6.0-rc11
[    1.593103] SPLAT: Loaded module v0.6.0-rc11
[    1.602154] ZFS: Loaded module v0.6.0-rc11, ZFS pool version 28, ZFS filesystem version 5
[    1.603494] msgmni has been set to 1467
[    1.606312] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    1.606875] io scheduler noop registered
[    1.607170] io scheduler deadline registered
[    1.608328] io scheduler cfq registered (default)
[    1.613673] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
behlendorf commented 12 years ago

[ 5.121211] ============================================= [ 5.121654] [ INFO: possible recursive locking detected ] [ 5.121983] 3.5.5.dbg #2 Not tainted

This is a false positive I've been meaning to fix for quite some time now. It has to do with the kernel' lock dependency analysis misinterpreting how this locking is used. The code can be reworked to provide hints to the kernel so it doesn't complain, but that's been a pretty low priority since most kernel ship with the lockdep checking disabled.

As for the lock up issues I suspect there's some blame to be had by both the hardware and ZFS. If the hardware isn't 100% rock solid ZFS will often take proactive measures to handle what it believes are failing drives. Other filesystems are more tolerant and simply ignore the bad behavior from the drives until then completely fail.

ZFS may also be pushing the drives/controller harder and making the problem that much worse.

That said, we will be digging deeper in to these sort to issue to ensure it's stable.

htkmmo commented 12 years ago

Thanks for the info. I'm surprised you do not check more often with PROVE_LOCKING enabled. At least for me, as a rookie, this looks useful. The hardware I have is solid and I'm quite sure there shouldn't be any "complaints" to ZFS from the disks and controller. I'm not yet willing to return to my previous setup hence I'll see if I can find a way to actually have zpool import succeed on real hardware. I guess the lock notification should not impact on functionality although without the debug features the import does succeed for some reasons ...

Beside using the debug kernel I'm rather out of ideas how to detect the hardlock (even if lockdep analysis may not turn so useful with abundant false positives). There is a BMC controller and I use its watchdog to reset the machine when locked. Perhaps it can be used to trigger an interrupt on the CPU to print out a trace or something ...

behlendorf commented 11 years ago

@htkmmo Are you still suffering from this issue with the latest code? There have been a few fixes in this area which may have resolved the issue, and I've heard no other reports of this type of deadlock.

htkmmo commented 11 years ago

Hopefully it went away :) I did upgrade up to rc13 but had no time to do stress testing. There are no issues with light usage. With heavy usage the problems can be either XFS in read/write mode or LSI driver or ZFS or something with these combined. When I'll get more time on my hands I'll upgrade to 0.6.1 and do larger data movements. For now, this issue could be closed. Thanks for following through!

behlendorf commented 11 years ago

OK, thanks. Then I'm going to close this. By all means open a new issue if you recreate the problem.