hhoffstaette / kernel-patches

Custom Linux kernel patches
39 stars 7 forks source link

kernel BUG at fs/btrfs/relocation.c:4548! #4

Closed disaster123 closed 8 years ago

disaster123 commented 8 years ago

Bug with latest master.

Code is: BUG_ON(rc->stage == UPDATE_DATA_PTRS && root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID);

[CODE] ------------[ cut here ]------------ kernel BUG at fs/btrfs/relocation.c:4548! invalid opcode: 0000 [#1] SMP Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables 8021q garp bonding coretemp loop usbhid i40e(O) ehci_pci ehci_hcd i2c_i801 sb_edac ipmi_si vxlan usbcore ip6_udp_tunnel edac_core i2c_core usb_common udp_tunnel shpchp ipmi_msghandler button btrfs dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 md_mod sg sd_mod ixgbe mdio ahci ptp aacraid libahci pps_core CPU: 6 PID: 23002 Comm: bash Tainted: G O 4.4.9+24-ph #1 Hardware name: Supermicro X10DRH/X10DRH-IT, BIOS 1.0c 02/18/2015 task: ffff880ef6c3ca00 ti: ffff8809f9b54000 task.ti: ffff8809f9b54000 RIP: 0010:[] [] btrfs_reloc_cow_block+0x338/0x380 [btrfs] RSP: 0018:ffff8809f9b57948 EFLAGS: 00010246 RAX: ffff88105af20000 RBX: ffff88105cea4000 RCX: ffff88027e6993b0 RDX: ffff8803e9fa7c70 RSI: ffff88105cea4000 RDI: ffff8808549b4000 RBP: ffff8809f9b579b8 R08: ffff8808549b4000 R09: 0000000000001000 R10: ffff880780f52000 R11: ffff88075731cd40 R12: ffff880462d49000 R13: ffff8803e9fa7c70 R14: ffff88105cea4000 R15: 0000000000000000 FS: 00007f1435d13700(0000) GS:ffff88085fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f44e1f121a0 CR3: 0000000ca44f4000 CR4: 00000000001406e0 Stack: ffff8809f9b579b8 ffffffffc0444d8d ffffea00096b2740 0000000000000000 ffff8809f9b579b8 ffffffffc0495c68 0000000000000001 0000000000000000 ffff8803e9fa7c70 ffff88027e6993b0 ffff8803e9fa7c70 ffff8808549b4000 Call Trace: [] ? update_ref_for_cow+0x21d/0x340 [btrfs] [] ? write_extent_buffer+0xb8/0x130 [btrfs] [] btrfs_cow_block+0x39d/0x5d0 [btrfs] [] btrfs_cow_block+0x129/0x1d0 [btrfs] [] btrfs_search_slot+0x1c2/0x950 [btrfs] [] ? btrfs_drop_extent_cache+0x346/0x3f0 [btrfs] [] ? btrfs_read_tree_root+0xb7/0x130 [btrfs] [] ? btrfs_get_delayed_node+0x8c/0xd0 [btrfs] [] btrfs_truncate_inode_items+0x171/0xea0 [btrfs] [] ? add_reloc_root+0x83/0x110 [btrfs] [] ? btrfs_init_reloc_root+0x89/0xc0 [btrfs] [] ? btrfs_record_root_in_trans+0x60/0x80 [btrfs] [] btrfs_evict_inode+0x3cc/0x590 [btrfs] [] ? __inode_wait_for_writeback+0x6d/0xc0 [] evict+0xbb/0x1a0 [] iput+0x1a4/0x210 [] drop_pagecache_sb+0xc4/0xf0 [] ? do_coredump+0xeb0/0xeb0 [] iterate_supers+0xeb/0xf0 [] drop_caches_sysctl_handler+0x59/0xc0 [] proc_sys_call_handler+0xcc/0xf0 [] proc_sys_write+0x14/0x20 [] __vfs_write+0x18/0x40 [] vfs_write+0xaa/0x1a0 [] SyS_write+0x4f/0xa0 [] entry_SYSCALL_64_fastpath+0x12/0x71 Code: b3 f0 00 00 00 44 0f b6 78 64 41 0f 93 c6 48 83 fa f8 0f 84 ee fd ff ff e9 66 fd ff ff 48 83 be df 01 00 00 f7 0f 85 14 fd ff ff <0f> 0b 48 3b 7e 20 0f 84 03 fe ff ff 0f 0b be d2 11 00 00 48 c7 RIP [] btrfs_reloc_cow_block+0x338/0x380 [btrfs] RSP ---[ end trace 49badf4b2e1ac7f2 ]--- [/CODE]

hhoffstaette commented 8 years ago

So if I read this correctly this happens when you did "echo x > drop_caches"..interesting. Obviously should not happen, but I have too little information to make better guesses.

So far my best guess would be to try without these two patches: btrfs-20160426-001-don't-wait-for-unrelated-IO-to-finish-before-relocation.patch btrfs-20160426-002-don't-do-unnecessary-delalloc-flushes-when-relocating.patch

They are the only recent changes that might be related, so please test without them (just moving them aside should work) and let me know how it goes. So far I have not seen any negative effect from them, but drop_caches is pretty brutal (there were even talks about removing it) and it can trigger weird corner-cases in file systems.

disaster123 commented 8 years ago

has this happened before without drop_caches, in normal operation? (I guess not)

no

does it happen every time, i.e. repeatably?

have to check this i updated to latest master yesterday and it ocurred that night - while dropping caches - sorry forgot to tell that

Yes it was a value of 3 and yes the FS was busy that time doing balancing (but nothing else)

hhoffstaette commented 8 years ago

yes the FS was busy that time doing balancing (but nothing else)

What I suspected, thanks. Please try without those two patches I mentioned and in the meantime I will get in touch with Filipe and see what he thinks.

disaster123 commented 8 years ago

Thanks. Will do so.

hhoffstaette commented 8 years ago

I got a reply from Filipe. Apparently this is not caused by the two patches above but by something else (some other existing bug), and he'll have a patch soon. I'll keep this open until then.

disaster123 commented 8 years ago

Geat thanks. so i'll skip my testing without those patches.

disaster123 commented 8 years ago

OK thanks.

hhoffstaette commented 8 years ago

If I understand correctly c304a7aaad should fix it (or at least something very related), so I'm closing this.

disaster123 commented 8 years ago

THX