multipath-tcp / mptcp

⚠️⚠️⚠️ Deprecated 🚫 Out-of-tree Linux Kernel implementation of MultiPath TCP. 👉 Use https://github.com/multipath-tcp/mptcp_net-next repo instead ⚠️⚠️⚠️
https://github.com/multipath-tcp/mptcp_net-next
Other
890 stars 335 forks source link

Scheduler BLEST: Panic in 4.19.55 #356

Closed yverbin closed 5 years ago

yverbin commented 5 years ago

In my opinion, the problem occurs when usb modems are connected to pc.

uname -a
Linux host 4.19.55.mptcp #20190622190130 SMP Sat Jun 22 19:02:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 008: ID 19d2:1405 ZTE WCDMA Technologies MSM
Bus 001 Device 004: ID 046d:c31c Logitech, Inc. Keyboard K120
Bus 001 Device 003: ID 05e3:0608 Genesys Logic, Inc. Hub
Bus 001 Device 002: ID 13d3:3273 IMC Networks 802.11 n/g/b Wireless LAN USB Mini-Card
Bus 001 Device 007: ID 12d1:14db Huawei Technologies Co., Ltd. E353/E3131
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
 PGD 8000000073faa067 P4D 8000000073faa067 PUD 73fad067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 3305 Comm: agent Not tainted 4.19.55.mptcp #20190622190130
 Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS CLBTM210 06/01/2015
 RIP: 0010:blest_get_available_subflow+0x2a7/0x3c0 [mptcp_blest]
 Code: 00 48 0f bf 46 2a 8b b5 a4 05 00 00 48 0f af c2 89 ca 48 b9 cf f7 53 e3 a5 9b c4 20 48 0f af d0 48 c1 ea 03 48 89 d0 48 f7 e1 <41> 8b 84 24 a0 00 00 00 03 83 ac 06 00 00 2b 83 64 05 00 00 89 f1
 RSP: 0018:ffffa00880f77c00 EFLAGS: 00010a87
 RAX: cccccccccd5f62c0 RBX: ffff8bf671f7ec40 RCX: 20c49ba5e353f7cf
 RDX: 00000000007b70cc RSI: 00000000000ab000 RDI: 0000000000000001
 RBP: ffff8bf66b646b00 R08: 0000000000000003 R09: 000000008a958b91
 R10: 0000000000000000 R11: 000000000000000b R12: 0000000000000000
 R13: ffff8bf6713d0ac0 R14: ffff8bf671eab0c0 R15: 0000000000000000
 FS:  00007f91e14deee8(0000) GS:ffff8bf677180000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000000a0 CR3: 000000006b6cc000 CR4: 00000000001006e0
 Call Trace:
  mptcp_blest_next_segment+0x1ca/0x210 [mptcp_blest]
  mptcp_write_xmit+0xc3/0x4b0
  __tcp_push_pending_frames+0x38/0xd0
  tcp_sendmsg_locked+0x3b0/0xe60
  tcp_sendmsg+0x27/0x40
  sock_sendmsg+0x36/0x40
  sock_write_iter+0x87/0x100
  __vfs_write+0x114/0x1a0
  vfs_write+0xb0/0x190
  ksys_write+0x5a/0xd0
  do_syscall_64+0x55/0x100
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x47d920
 Code: 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 49 c7 c2 00 00 00 00 49 c7 c0 00 00 00 00 49 c7 c1 00 00 00 00 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
 RSP: 002b:000000c4205d5918 EFLAGS: 00000212 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047d920
 RDX: 000000000000401d RSI: 000000c42065c000 RDI: 000000000000001c
 RBP: 000000c4205d5970 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000212 R12: 000000000000004e
 R13: 00000000004e73fb R14: 0000000000000028 R15: 0000000000000009
 Modules linked in: veth xt_nat xt_tcpudp ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xt_multiport iptable_filter ip_tables x_tables bpfilter ctr ccm overlay cdc_ether usbnet mii squashfs loop mptcp_blest fuse arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib mac80211 cfg80211 joydev crc_ccitt rfkill intel_rapl bridge stp llc intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate snd_hda_codec_hdmi snd_hda_intel lpc_ich snd_hda_codec mfd_core snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore rtc_cmos pcc_cpufreq evdev sr_mod cdrom tcp_westwood sch_fq_codel sg ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp
  libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 uas usb_storage hid_generic usbhid hid ext4 crc16 mbcache jbd2 fscrypto btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod ahci libahci crct10dif_pclmul crc32_pclmul i915 crc32c_intel xhci_pci libata xhci_hcd drm_kms_helper ghash_clmulni_intel igb cryptd dca i2c_algo_bit usbcore scsi_mod drm thermal fan video button
 CR2: 00000000000000a0
 ---[ end trace fba65a065ce12dba ]---
 RIP: 0010:blest_get_available_subflow+0x2a7/0x3c0 [mptcp_blest]
 Code: 00 48 0f bf 46 2a 8b b5 a4 05 00 00 48 0f af c2 89 ca 48 b9 cf f7 53 e3 a5 9b c4 20 48 0f af d0 48 c1 ea 03 48 89 d0 48 f7 e1 <41> 8b 84 24 a0 00 00 00 03 83 ac 06 00 00 2b 83 64 05 00 00 89 f1
 RSP: 0018:ffffa00880f77c00 EFLAGS: 00010a87
 RAX: cccccccccd5f62c0 RBX: ffff8bf671f7ec40 RCX: 20c49ba5e353f7cf
 RDX: 00000000007b70cc RSI: 00000000000ab000 RDI: 0000000000000001
 RBP: ffff8bf66b646b00 R08: 0000000000000003 R09: 000000008a958b91
 R10: 0000000000000000 R11: 000000000000000b R12: 0000000000000000
 R13: ffff8bf6713d0ac0 R14: ffff8bf671eab0c0 R15: 0000000000000000
 FS:  00007f91e14deee8(0000) GS:ffff8bf677180000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000000a0 CR3: 000000006b6cc000 CR4: 00000000001006e0
dweb32 commented 5 years ago

Hello @yverbin,

did you already test the default scheduler and does this also happen with it? Do you have any further hints how to reproduce this?

yverbin commented 5 years ago

Hello @s6dlwebe. It looks like that all is fine with default scheduler but i dont have enough statistics. Several tests have been passed successfully until cases with modems and blest scheduler

cpaasch commented 5 years ago

@s6dlwebe :

                /* is the required space available in the mptcp meta send window?
                 * we assume that all bytes inflight on the slow path will be acked in besttp->srtt seconds
                 * (just like the SKB if it was sent now) -> that means that those inflight bytes will
                 * keep occupying space in the meta window until then
                 */
                slow_inflight_bytes = besttp->write_seq - besttp->snd_una;
                slow_bytes = skb->len + slow_inflight_bytes; // bytes of this SKB plus those in flight already

skb can be NULL (see the callers of blest_get_available_subflow().

dweb32 commented 5 years ago

good catch! Thank you for your quick reference to the problem.

Today I will find some time and provide a patch for this bug.

I also have to do some other minor maintenance (adding mptcp_retransmit tracepoint and penalizing all subflows) on BLEST anyway. Seems like @matttbe hasn't committed my patch-set yet and I had some problems getting git-sendmail configuration to work on my new machine. I will just resent both later to the mailing list or do you prefer a PR for this bugfix?

matttbe commented 5 years ago

Seems like @matttbe hasn't committed my patch-set yet and I had some problems getting git-sendmail configuration to work on my new machine. I will just resent both later to the mailing list or do you prefer a PR for this bugfix?

Sorry I might have missed something. Are you talking about #351 ?

dweb32 commented 5 years ago

Sorry I might have missed something. Are you talking about #351 ?

No problem - I was the one who did not follow your workflow, so I am sorry. I sent the two patches directly to you instead to the mailing list because I did had some trouble getting git-sendemail to work on my machine. But I will just send them again later today with this bugfix.

No, I was not talking about #351. That will take me some extra time to include your feedback.

yverbin commented 5 years ago

I have collected statistics and modems do not affect the problem. With default scheduler all works fine

dweb32 commented 5 years ago

I have collected statistics and modems do not affect the problem. With default scheduler all works fine

Thank you @yverbin for reporting and investigating the problem with us 👍 . Although BLEST and the default scheduler have a lot in common, this bug is indeed only related to BLEST as @cpaasch has already spotted 🥇 the problem in the part that does the blocking estimation.

yverbin commented 5 years ago

I compiled a kernel from commit https://github.com/s6dlwebe/mptcp/commit/66efd0e8534a2f8f737cff4234c5d9d00096d86c. At first glance this problem is solved 👍