multipath-tcp / mptcp_net-next

Development version of the Upstream MultiPath TCP Linux kernel 🐧
https://mptcp.dev
Other
274 stars 39 forks source link

Divide error on device removal #508

Open jeid64 opened 1 month ago

jeid64 commented 1 month ago

Howdy, I'm using the latest commit off of export to run the bpf_red scheduler. I'm using NetworkManager to configure mptcp endpoint with flag 160 (sunflower, fullmesh). When I unplug devices to simulate network dropouts, I get a divide error in dmesg, at which point no netlink sockets work anymore for mptcp endpoint, and NetworkManager hangs. This seems to reoccur 100% of the time. Is anyone else using NetworkManager to manage their devices or do you recommend using mptpcpd?

[  434.820308] usb 3-1.3: USB disconnect, device number 20
[  434.820482] rndis_host 3-1.3:1.0 enp0s20f0u1u3: unregister 'rndis_host' usb-0000:00:14.0-1.3, ZTE RNDIS device
[  434.846554] Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
[  434.846567] CPU: 1 PID: 4968 Comm: NetworkManager Not tainted 6.10.0-200.fc40.x86_64 #1
[  434.846570] Hardware name: GPD G1618-03/G1618-03, BIOS 2.22 04/29/2021
[  434.846573] RIP: 0010:tcp_tso_segs+0x84/0xd0
[  434.846581] Code: 05 00 00 41 89 c6 41 d3 ee 80 f9 1f 0f 87 cb 2a 27 00 41 83 fe 1f 76 37 8b 83 2c 02 00 00 4c 39 e0 44 89 e9 49 0f 47 c4 31 d2 <48> f7 f1 39 c5 0f 42 e8 0f b7 83 2a 02 00 00 39 c5 0f 46 c5 48 83
[  434.846584] RSP: 0000:ffffaf0dc19a7520 EFLAGS: 00010246
[  434.846588] RAX: 0000000000000179 RBX: ffff99d587c8a500 RCX: 0000000000000000
[  434.846591] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99d587c8a500
[  434.846593] RBP: 0000000000000002 R08: 0000000000000820 R09: 0000000000000000
[  434.846595] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000179
[  434.846596] R13: 0000000000000000 R14: 0000000000000042 R15: ffff99d587c8a500
[  434.846599] FS:  00007faf91208580(0000) GS:ffff99d5df880000(0000) knlGS:0000000000000000
[  434.846602] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  434.846604] CR2: 0000000043942000 CR3: 0000000147e58001 CR4: 0000000000f70ef0
[  434.846607] PKRU: 55555554
[  434.846609] Call Trace:
[  434.846613]  <TASK>
[  434.846619]  ? __die_body.cold+0x19/0x27
[  434.846629]  ? die+0x2e/0x50
[  434.846635]  ? do_trap+0xca/0x110
[  434.846648]  ? do_error_trap+0x6a/0x90
[  434.846651]  ? tcp_tso_segs+0x84/0xd0
[  434.846655]  ? exc_divide_error+0x38/0x50
[  434.846661]  ? tcp_tso_segs+0x84/0xd0
[  434.846664]  ? asm_exc_divide_error+0x1a/0x20
[  434.846672]  ? tcp_tso_segs+0x84/0xd0
[  434.846676]  tcp_write_xmit+0x78/0x16c0
[  434.846680]  __tcp_push_pending_frames+0x36/0xf0
[  434.846684]  __mptcp_push_pending+0xef/0x2a0
[  434.846693]  __mptcp_close_ssk+0x20f/0x560
[  434.846697]  mptcp_pm_nl_rm_addr_or_subflow+0x150/0x330
[  434.846706]  mptcp_pm_remove_subflow+0x2f/0x60
[  434.846710]  mptcp_pm_nl_del_addr_doit+0x1b7/0x370
[  434.846716]  genl_family_rcv_msg_doit+0xef/0x150
[  434.846725]  genl_rcv_msg+0x1b7/0x2c0
[  434.846730]  ? __pfx_mptcp_pm_nl_del_addr_doit+0x10/0x10
[  434.846733]  ? __pfx_genl_rcv_msg+0x10/0x10
[  434.846737]  netlink_rcv_skb+0x50/0x100
[  434.846743]  genl_rcv+0x28/0x40
[  434.846747]  netlink_unicast+0x242/0x370
[  434.846751]  netlink_sendmsg+0x21b/0x470
[  434.846755]  ____sys_sendmsg+0x396/0x3d0
[  434.846763]  ___sys_sendmsg+0x9a/0xe0
[  434.846769]  __sys_sendmsg+0xcc/0x100
[  434.846775]  do_syscall_64+0x82/0x160
[  434.846779]  ? syscall_exit_to_user_mode+0x72/0x220
[  434.846784]  ? do_syscall_64+0x8e/0x160
[  434.846789]  ? __sys_sendmsg+0xdc/0x100
[  434.846794]  ? syscall_exit_to_user_mode+0x72/0x220
[  434.846796]  ? do_syscall_64+0x8e/0x160
[  434.846798]  ? __rseq_handle_notify_resume+0xa6/0x4d0
[  434.846804]  ? clockevents_program_event+0x9f/0x110
[  434.846811]  ? switch_fpu_return+0x4e/0xd0
[  434.846818]  ? clear_bhb_loop+0x45/0xa0
[  434.846823]  ? clear_bhb_loop+0x45/0xa0
[  434.846826]  ? clear_bhb_loop+0x45/0xa0
[  434.846830]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  434.846834] RIP: 0033:0x7faf9214d84b
[  434.846889] Code: 48 89 e5 48 83 ec 20 89 55 ec 48 89 75 f0 89 7d f8 e8 69 5c f7 ff 8b 55 ec 48 8b 75 f0 41 89 c0 8b 7d f8 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 45 f8 e8 c1 5c f7 ff 48 8b
[  434.846892] RSP: 002b:00007fff077ae3d0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[  434.846895] RAX: ffffffffffffffda RBX: 00005557d7820520 RCX: 00007faf9214d84b
[  434.846898] RDX: 0000000000000000 RSI: 00007fff077ae410 RDI: 000000000000000b
[  434.846900] RBP: 00007fff077ae3f0 R08: 0000000000000000 R09: 000000000000000d
[  434.846902] R10: 00005557d76fb010 R11: 0000000000000293 R12: 00005557d780fd74
[  434.846904] R13: 00005557d77c65a8 R14: 00005557d774d060 R15: 0000000000000002
[  434.846908]  </TASK>
[  434.846909] Modules linked in: cdc_acm xt_REDIRECT nft_compat overlay uinput hid_apple rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun ip_set nf_tables qrtr uhid bnep sunrpc snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_sof_probes snd_soc_intel_hda_dsp_common snd_soc_rt700 regmap_sdw snd_hda_codec_hdmi snd_soc_dmic snd_hda_codec_realtek binfmt_misc snd_hda_codec_generic snd_hda_scodec_component snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence vfat fat snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_soc_core snd_compress ac97_bus
[  434.847002]  snd_pcm_dmaengine snd_hda_intel intel_uncore_frequency snd_intel_dspcfg snd_intel_sdw_acpi intel_uncore_frequency_common snd_hda_codec x86_pkg_temp_thermal intel_powerclamp snd_hda_core coretemp snd_hwdep spi_nor mei_pxp mei_hdcp kvm_intel joydev mtd gpio_keys snd_seq intel_rapl_msr iwlmvm snd_seq_device snd_pcm btusb kvm snd_timer btrtl mac80211 btintel snd libarc4 rapl btbcm intel_cstate mei_me processor_thermal_device_pci_legacy rndis_host btmtk cdc_ether intel_uncore mei processor_thermal_device bluetooth xpad usbnet iwlwifi processor_thermal_wt_hint soundcore mii i2c_i801 spi_intel_pci spi_intel processor_thermal_rfim wmi_bmof pcspkr i2c_smbus processor_thermal_rapl intel_rapl_common thunderbolt idma64 processor_thermal_wt_req processor_thermal_power_floor processor_thermal_mbox intel_soc_dts_iosf igen6_edac goodix_ts intel_pmc_core soc_button_array int3403_thermal int340x_thermal_zone intel_vsec int3400_thermal intel_hid pmt_telemetry acpi_thermal_rel sparse_keymap pmt_class acpi_tad acpi_pad
[  434.847106]  brcmfmac brcmutil cfg80211 mmc_core rfkill amdgpu tcp_bbr sch_fq ledtrig_timer hid_playstation amdxcp led_class_multicolor ff_memless loop nfnetlink zstd zram xe drm_gpuvm drm_exec gpu_sched drm_suballoc_helper drm_ttm_helper uas usb_storage i915 cec drm_buddy crct10dif_pclmul i2c_algo_bit crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme ghash_clmulni_intel drm_display_helper nvme_core sha512_ssse3 sha256_ssse3 spi_pxa2xx_platform ttm sha1_ssse3 dw_dmac nvme_auth video wmi pinctrl_tigerlake scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse i2c_dev dm_multipath
[  434.847195] ---[ end trace 0000000000000000 ]---
[  434.847198] RIP: 0010:tcp_tso_segs+0x84/0xd0
[  434.847204] Code: 05 00 00 41 89 c6 41 d3 ee 80 f9 1f 0f 87 cb 2a 27 00 41 83 fe 1f 76 37 8b 83 2c 02 00 00 4c 39 e0 44 89 e9 49 0f 47 c4 31 d2 <48> f7 f1 39 c5 0f 42 e8 0f b7 83 2a 02 00 00 39 c5 0f 46 c5 48 83
[  434.847206] RSP: 0000:ffffaf0dc19a7520 EFLAGS: 00010246
[  434.847209] RAX: 0000000000000179 RBX: ffff99d587c8a500 RCX: 0000000000000000
[  434.847212] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99d587c8a500
[  434.847214] RBP: 0000000000000002 R08: 0000000000000820 R09: 0000000000000000
[  434.847215] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000179
[  434.847217] R13: 0000000000000000 R14: 0000000000000042 R15: ffff99d587c8a500
[  434.847219] FS:  00007faf91208580(0000) GS:ffff99d5df880000(0000) knlGS:0000000000000000
[  434.847222] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  434.847224] CR2: 0000000043942000 CR3: 0000000147e58001 CR4: 0000000000f70ef0
[  434.847227] PKRU: 55555554
pabeni commented 1 month ago

mptcp endpoint with flag 160 (sunflower, fullmesh).

Could you please report the output of:

ip mptcp endpoint

and

ss -MaeimnO

just before unplugging the cable?

Also, could you please provide a decoded stack trace? You will have to install the kernel debuginfo packages

matttbe commented 1 month ago

@jeid64 Thank you for the bug report!

@pabeni Thank you for having looked!

Also, could you please provide a decoded stack trace? You will have to install the kernel debuginfo packages

Just in case this is needed, you can find more info about that in our wiki