multipath-tcp / mptcp

⚠️⚠️⚠️ Deprecated 🚫 Out-of-tree Linux Kernel implementation of MultiPath TCP. 👉 Use https://github.com/multipath-tcp/mptcp_net-next repo instead ⚠️⚠️⚠️
https://github.com/multipath-tcp/mptcp_net-next
Other
890 stars 335 forks source link

kernel bug: soft lockup at _raw_spin_lock (not duplicate) #305

Closed Ehekatl closed 5 years ago

Ehekatl commented 5 years ago

I saw a previous issue but we are running the latest version on Ubuntu 14.04.5, installed from apt (linux-mptcp/stretch,now 20180925184532)

This happened a few time already

Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.983469] watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [swapper/9:0] Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986206] task: ffff8807fab30500 task.stack: ffffc900031cc000 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986180] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype nf_nat br_netfilter bridge stp llc ip6table_filter ip6_tables overlay nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter xt_CT nf_conntrack libcrc32c xt_multiport iptable_raw ip_tables x_tables binfmt_misc dm_crypt dm_mod dax fuse ppdev parport_pc parport i2c_piix4 serio_raw evdev ena(O) ext4 crc16 mbcache jbd2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper nvme cryptd nvme_core button Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986204] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G O L 4.14.70.mptcp #11 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986205] Hardware name: Amazon EC2 c5.4xlarge/, BIOS 1.0 10/16/2017 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986209] RIP: 0010:native_queued_spin_lock_slowpath+0x21/0x190 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986210] RSP: 0018:ffff8807fe443a58 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986211] RAX: 0000000000000001 RBX: ffff8807d7ab7380 RCX: ffff8807fe45ccc0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986212] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8807d7ab7408 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986212] RBP: ffff8807d7ab7408 R08: 0000000000000001 R09: 0000000000000000 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986213] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986213] R13: ffffffff81ca3ce0 R14: ffff8807da606e40 R15: ffff8807de3bedc0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986215] FS: 0000000000000000(0000) GS:ffff8807fe440000(0000) knlGS:0000000000000000 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986215] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986216] CR2: 0000000001a938c8 CR3: 000000000200a002 CR4: 00000000007606e0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986218] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986219] PKRU: 55555554 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986219] Call Trace: Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986221] Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986224] _raw_spin_lock+0x1d/0x20 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986227] mptcp_disconnect+0xd5/0x140 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986232] tcp_disconnect+0x4f9/0x540 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986233] inet_child_forget+0x30/0xc0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986235] inet_csk_reqsk_queue_add+0x8e/0xa0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986236] inet_csk_complete_hashdance+0x43/0x90 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986238] mptcp_check_req_master+0x95/0xb0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986240] tcp_check_req+0x513/0x620 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986242] ? tcp_v4_inbound_md5_hash+0x62/0x1b0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986243] ? ip_route_input_rcu+0xa23/0xb20 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986244] tcp_v4_rcv+0x7fa/0xd40 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986246] ip_local_deliver_finish+0xa6/0x1d0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986248] ip_local_deliver+0x5b/0xc0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986249] ? ip_rcv_finish+0x3a0/0x3a0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986250] ip_rcv+0x26c/0x380 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986251] ? inet_del_offload+0x40/0x40 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986254] netif_receive_skb_core+0x821/0xaf0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986256] ? inet_gro_receive+0x204/0x2b0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986257] ? netif_receive_skb_internal+0x24/0xc0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986259] netif_receive_skb_internal+0x24/0xc0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986260] napi_gro_receive+0xb8/0xe0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986264] ena_io_poll+0x69c/0xfd0 [ena] Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986266] net_rx_action+0x26a/0x3c0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986268] __do_softirq+0x10d/0x2a5 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986271] irq_exit+0xb6/0xc0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986273] do_IRQ+0x52/0xd0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986274] common_interrupt+0x7d/0x7d Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986276] Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986277] RIP: 0010:native_safe_halt+0x2/0x10 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986277] RSP: 0018:ffffc900031cfec8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff3c Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986279] RAX: ffffffff81632680 RBX: ffff8807fab30500 RCX: 0000000000000000 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986279] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986280] RBP: 0000000000000009 R08: 0000000000000002 R09: 0000000000000001 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986280] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8807fab30500 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986281] R13: ffff8807fab30500 R14: 0000000000000000 R15: 0000000000000000 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986282] ? cpuidle_text_start+0x8/0x8 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986284] default_idle+0x1a/0xf0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986286] do_idle+0x166/0x1d0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986288] cpu_startup_entry+0x5f/0x70 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986291] start_secondary+0x19e/0x1e0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986292] secondary_startup_64+0xa5/0xb0 Dec 24 16:23:28 ip-10-0-102-85 kernel: [255905.986293] Code: e8 c5 ff ff ff 5e 5a c3 66 90 0f 1f 44 00 00 0f 1f 44 00 00 ba 01 00 00 00 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2 f3 c3 f3 90 ec 81 fe 00 01 00 00 0f 84 91 00 00 00 41 b8 01 01 00 00 b9

cpaasch commented 5 years ago

This looks like a bug fixed with a9f6d31ff833 ("mptcp: Disable bottom-half before processing SYN/ACK"). You should update your kernel to the mptcp_v0.94 branch. If you still see the issue with the updated kernel, please reopen the issue.