openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.47k stars 1.73k forks source link

crash: "BUG: scheduling while atomic" while zfs sending #1979

Closed dward closed 10 years ago

dward commented 10 years ago

We're running a 2.6.32 kernel with ZFS 0.6.1. The following crash occurred about 36 hours after the start of a zfs send of around 900GB of data. Let me know what other information you would like.

Dec 15 13:35:52 master kernel: Call Trace:
Dec 15 13:35:52 master kernel: Pid: 16664, comm: zfs Tainted: P           2.6.32-100.24.4.tn #1
Dec 15 13:35:52 master kernel: BUG: unable to handle kernel paging request at 0000000a00001711
Dec 15 13:35:52 master kernel: IP: [<ffffffff810157b1>] print_context_stack+0x68/0xff
Dec 15 13:35:52 master kernel: BUG: scheduling while atomic: zfs/16664/0xffff8800
Dec 15 13:35:52 master kernel: Modules linked in: bonding(U) nfs(U) fscache(U) tun(U) nfsd(U) lockd(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U) exportfs(U) ipt_MASQUERADE(U) ipt_addrtype(U) xt_tcpudp(U) xt_state(U) ipt_LOG(U) iptable_mangle(U) iptable_nat(U) nf_nat(U) nf_conntrack_ipv6(U) nf_conntrack_ipv4(U) nf_conntrack(U) nf_defrag_ipv4(U) iptable_filter(U) ip_tables(U) x_tables(U) ib_iser(U) rdma_cm(U) ib_cm(U) iw_cm(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) iscsi_tcp(U) bnx2i(U) cnic(U) uio(U) ipv6(U) cxgb3i(U) cxgb3(U) mdio(U) libiscsi_tcp(U) libiscsi(U) scsi_transport_iscsi(U) bridge(U) stp(U) llc(U) zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate(U) video(U) output(U) sbs(U) sbshc(U) ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) parport_pc(U) lp(U) parport(U) kvm_intel(U) kvm(U) joydev(U) radeon(U) ttm(U) drm_kms_helper(U) drm(U) i2c_algo_bit(U) i2c_core(U) hpilo(U) hpwdt(U) bnx2(U) serio_raw(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) iTCO_wdt(U) iTCO_vendor_support(U) snd(U) soundcore(U) snd_page_alloc(U) pcspkr(U) usb_storage(U) shpchp(U) cciss(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) [last unloaded: microcode]
Dec 15 13:35:52 master kernel: Oops: 0000 [#1] SMP
Dec 15 13:35:52 master kernel: Thread overran stack, or stack corrupted
Dec 15 13:35:52 master kernel: last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
Dec 15 13:35:52 master kernel:  [<ffffffff8104d720>] __schedule_bug+0x61/0x65
Dec 15 13:35:52 master kernel: CPU 9
Dec 15 13:35:52 master kernel: Modules linked in: bonding(U) nfs(U) fscache(U) tun(U) nfsd(U) lockd(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U) exportfs(U) ipt_MASQUERADE(U) ipt_addrtype(U) xt_tcpudp(U) xt_state(U) ipt_LOG(U) iptable_mangle(U) iptable_nat(U) nf_nat(U) nf_conntrack_ipv6(U) nf_conntrack_ipv4(U) nf_conntrack(U) nf_defrag_ipv4(U) iptable_filter(U) ip_tables(U) x_tables(U) ib_iser(U) rdma_cm(U) ib_cm(U) iw_cm(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) iscsi_tcp(U) bnx2i(U) cnic(U) uio(U) ipv6(U) cxgb3i(U) cxgb3(U) mdio(U) libiscsi_tcp(U) libiscsi(U) scsi_transport_iscsi(U) bridge(U) stp(U) llc(U) zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate(U) video(U) output(U) sbs(U) sbshc(U) ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) parport_pc(U) lp(U) parport(U) kvm_intel(U) kvm(U) joydev(U) radeon(U) ttm(U) drm_kms_helper(U) drm(U) i2c_algo_bit(U) i2c_core(U) hpilo(U) hpwdt(U) bnx2(U) serio_raw(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) iTCO_wdt(U) iTCO_vendor_support(U) snd(U) soundcore(U) snd_page_alloc(U) pcspkr(U) usb_storage(U) shpchp(U) cciss(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) [last unloaded: microcode]
Dec 15 13:35:52 master kernel: RAX: 0000000000000032 RBX: ffff88009f60c5a0 RCX: 0000000000001033
Dec 15 13:35:52 master kernel: RIP: 0010:[<ffffffff810157b1>]  [<ffffffff810157b1>] print_context_stack+0x68/0xff
Dec 15 13:35:52 master kernel: PGD 18d9fb067 PUD 0
Dec 15 13:35:52 master kernel: RDX: 0000000a00000001 RSI: ffff88009f60c000 RDI: 0000000000000246
Dec 15 13:35:52 master kernel: Pid: 16664, comm: zfs Tainted: P           2.6.32-100.24.4.tn #1 ProLiant DL360 G7
Dec 15 13:35:52 master kernel: R13: ffff88009f60c5b8 R14: 0000000000000000 R15: ffffffff815d9255
Dec 15 13:35:52 master kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 15 13:35:52 master kernel: RSP: 0018:ffff88009f60c458  EFLAGS: 00010216
Dec 15 13:35:52 master kernel: CR2: 0000000a00001711 CR3: 0000000297f27000 CR4: 00000000000026e0
Dec 15 13:35:52 master kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 15 13:35:52 master kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 15 13:35:52 master kernel: FS:  00007f3cdc3f7b60(0000) GS:ffff880041520000(0000) knlGS:0000000000000000
Dec 15 13:35:52 master kernel: Process zfs (pid: 16664, threadinfo ffff88009f60c000, task ffff8800d742e700)
Dec 15 13:35:52 master kernel: RBP: ffff88009f60c498 R08: ffff88009f60c448 R09: 776f687320776f6e
Dec 15 13:35:52 master kernel:  ffff88009f60c598 ffffffff81443320 ffff88009f60c000 ffff88009f60c580
Dec 15 13:35:52 master kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8104d720
Dec 15 13:35:52 master kernel: <0> ffff88009f60c598 ffff88009f60c598 ffffffff815d9255 ffffffff81443320
Dec 15 13:35:52 master kernel: <0> ffff88009f60c508 ffffffff81014abd ffff88009f60c4dc ffff880041523fc0
Dec 15 13:35:52 master kernel: Call Trace:
Dec 15 13:35:52 master kernel:  [<ffffffff810156b1>] show_trace_log_lvl+0x51/0x5d
Dec 15 13:35:52 master kernel:  [<ffffffff81014abd>] dump_trace+0x262/0x271
Dec 15 13:35:52 master kernel:  [<ffffffff810156d2>] show_trace+0x15/0x17
Dec 15 13:35:52 master kernel:  [<ffffffff8108a181>] ? print_modules+0xb1/0xcc
Dec 15 13:35:52 master kernel:  [<ffffffff810158bf>] dump_stack+0x77/0x80
Dec 15 13:35:52 master kernel:  [<ffffffff81436f92>] schedule+0x8a/0x76b
Dec 15 13:35:52 master kernel:  [<ffffffff8104d720>] __schedule_bug+0x61/0x65
Dec 15 13:35:52 master kernel:  [<ffffffff81437765>] io_schedule+0x42/0x5c
Dec 15 13:35:52 master kernel:  [<ffffffff8107d836>] ? ktime_get_ts+0x89/0x92
Dec 15 13:35:52 master kernel:  [<ffffffffa0251ad7>] cv_wait_common+0xda/0x173 [spl]
Dec 15 13:35:52 master kernel:  [<ffffffff81017fad>] ? read_tsc+0xd/0x25
Dec 15 13:35:52 master kernel: Stack:
Dec 15 13:35:52 master kernel:  [<ffffffff810425b3>] ? need_resched+0x23/0x2d
Dec 15 13:35:52 master kernel:  [<ffffffff81075ac4>] ? autoremove_wake_function+0x0/0x3d
Dec 15 13:35:52 master kernel:  [<ffffffffa034285d>] dmu_tx_hold_free+0x1a2/0x689 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa0337f2d>] dmu_free_long_range+0x48/0x68 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa03af495>] zio_wait+0x124/0x152 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa033125c>] ? dbuf_read+0x113/0x4a7 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa03aa598>] zfs_zinactive+0xaf/0xe3 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffff810425cb>] ? should_resched+0xe/0x2f
Dec 15 13:35:52 master kernel:  [<ffffffffa038ea71>] zfs_rmnode+0xad/0x2d8 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa03b5c86>] ? zpl_inode_delete+0x0/0x5d [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffff8112d2fe>] clear_inode+0x86/0xdf
Dec 15 13:35:52 master kernel:  [<ffffffff8112d621>] generic_delete_inode+0xee/0x147
Dec 15 13:35:52 master kernel:  [<ffffffff8112cdac>] iput+0x66/0x6a
Dec 15 13:35:52 master kernel:  [<ffffffffa03b5cde>] zpl_inode_delete+0x58/0x5d [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffff8112a665>] dentry_iput+0xb8/0xca
Dec 15 13:35:52 master kernel:  [<ffffffff8112a763>] d_kill+0x49/0x6a
Dec 15 13:35:52 master kernel:  [<ffffffffa0337d79>] dmu_free_long_range_impl+0x15d/0x1e0 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa03b5bfe>] zpl_clear_inode+0xe/0x10 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa03a017b>] zfs_inactive+0x120/0x152 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa0251b88>] __cv_wait_io+0x18/0x1a [spl]
Dec 15 13:35:52 master kernel:  [<ffffffff810e21f7>] do_try_to_free_pages+0x1cc/0x310
Dec 15 13:35:52 master kernel:  [<ffffffff8112a958>] __shrink_dcache_sb+0x1d4/0x272
Dec 15 13:35:52 master kernel:  [<ffffffff810e2471>] try_to_free_pages+0x89/0x98
Dec 15 13:35:52 master kernel:  [<ffffffff8112d696>] generic_drop_inode+0x1c/0x5b
Dec 15 13:35:52 master kernel:  [<ffffffffa03b5c86>] ? zpl_inode_delete+0x0/0x5d [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffff810dbbad>] __alloc_pages_nodemask+0x379/0x593
Dec 15 13:35:52 master kernel:  [<ffffffff810e04d7>] ? isolate_pages_global+0x0/0x1ba
Dec 15 13:35:52 master kernel:  [<ffffffff8110b6bf>] ____cache_alloc_node+0x121/0x134
Dec 15 13:35:52 master kernel:  [<ffffffff810e1d67>] shrink_slab+0xe1/0x153
Dec 15 13:35:52 master kernel:  [<ffffffff8110b4fe>] fallback_alloc+0x12e/0x1ce
Dec 15 13:35:52 master kernel:  [<ffffffff8110b32b>] kmem_getpages+0x4f/0xf4
Dec 15 13:35:52 master kernel:  [<ffffffff8112aafe>] shrink_dcache_memory+0xf4/0x17a
Dec 15 13:35:52 master kernel:  [<ffffffff8110bcdf>] kmem_cache_alloc_node_notrace+0x84/0xb9
Dec 15 13:35:52 master kernel:  [<ffffffff8139cf19>] ? __alloc_skb+0x72/0x13d
Dec 15 13:35:52 master kernel:  [<ffffffff8139cf19>] __alloc_skb+0x72/0x13d
Dec 15 13:35:52 master kernel:  [<ffffffff8139cf19>] ? __alloc_skb+0x72/0x13d
Dec 15 13:35:52 master kernel:  [<ffffffff8110bd5a>] __kmalloc_node+0x46/0x73
Dec 15 13:35:52 master kernel:  [<ffffffff813983aa>] sock_alloc_send_pskb+0x9e/0x2ba
Dec 15 13:35:52 master kernel:  [<ffffffff8139830a>] alloc_skb+0x13/0x15
Dec 15 13:35:52 master kernel:  [<ffffffff8139cf69>] ? __alloc_skb+0xc2/0x13d
Dec 15 13:35:52 master kernel:  [<ffffffff813985db>] sock_alloc_send_skb+0x15/0x17
Dec 15 13:35:52 master kernel:  [<ffffffff81072ba2>] ? pid_vnr+0x22/0x24
Dec 15 13:35:52 master kernel:  [<ffffffff81117cd3>] do_sync_write+0xe7/0x12b
Dec 15 13:35:52 master kernel:  [<ffffffff810425cb>] ? should_resched+0xe/0x2f
Dec 15 13:35:52 master kernel:  [<ffffffff81075ac4>] ? autoremove_wake_function+0x0/0x3d
Dec 15 13:35:52 master kernel:  [<ffffffff814133bd>] unix_stream_sendmsg+0x11b/0x2bd
Dec 15 13:35:52 master kernel:  [<ffffffffa0328b9d>] ? add_reference+0xc3/0xdd [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffff81118471>] vfs_write+0xc3/0x10a
Dec 15 13:35:52 master kernel:  [<ffffffff81394318>] sock_aio_write+0xc0/0xd4
Dec 15 13:35:52 master kernel:  [<ffffffffa024e636>] vn_rdwr+0xb1/0x23c [spl]
Dec 15 13:35:52 master kernel:  [<ffffffff8139424f>] __sock_sendmsg+0x5e/0x67
Dec 15 13:35:52 master kernel:  [<ffffffffa033fe85>] backup_cb+0x5c9/0x5fe [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa0340749>] traverse_visitbp+0x272/0x50b [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa0340e25>] traverse_dnode+0x74/0xe9 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa032bfb1>] ? arc_read+0xeb/0xfd [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa0340814>] traverse_visitbp+0x33d/0x50b [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa034065d>] traverse_visitbp+0x186/0x50b [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffff811e7e78>] ? security_file_permission+0x16/0x18
Dec 15 13:35:52 master kernel:  [<ffffffffa033f6f8>] dump_bytes+0x87/0xb1 [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa0340749>] traverse_visitbp+0x272/0x50b [zfs]
Dec 15 13:35:52 master kernel:  [<ffffffffa0340749>] traverse_visitbp+0x272/0x50b [zfs]
behlendorf commented 10 years ago

This was fixed by commit a16878805388c4d96cb8a294de965071d138a47b which has been merged to master and will appear in 0.6.3. If you need to you can safely cherry pick the fix.

dward commented 10 years ago

Thanks! I'll give it a shot.