linux-surface / kernel

Linux kernel with modifications for Microsoft Surface devices.
Other
118 stars 33 forks source link

Mwifiex: tx timeout randomly occurs in AP mode #155

Open nexplorer-3e opened 3 months ago

nexplorer-3e commented 3 months ago
mwifiex_pcie 0000:01:00.0 ap0: NETDEV WATCHDOG: CPU: 3: transmit queue 1 timed out 5120 ms
mwifiex_pcie 0000:01:00.0: 4303588352 : Tx timeout(#7), bss_type-num = 1-1
mwifiex_pcie 0000:01:00.0: tx_timeout_cnt exceeds threshold.        Triggering card reset!

This transaction failure in ap mode cause timeout to let driver force reset the card, finally cause kernel to stuck. Kitakar5525 had made a commit to let it not mess up kernel, the kernel may still stuck before driver discovered the timeout though:

kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 5007s! [kworker/u12:1:13296]
kernel: Modules linked in: [last unloaded: br_netfilter]
kernel: CPU: 0 PID: 13296 Comm: kworker/u12:1 Tainted: G             L     6.8.8-surface-1 #1
kernel: Hardware name: Microsoft Corporation Surface 3/Surface 3, BIOS 1.51116.238 03/09/2015
kernel: Workqueue: MWIFIEX_WORK_QUEUE mwifiex_main_work_queue [mwifiex]
kernel: RIP: 0010:mwifiex_wmm_process_tx+0x100/0x8f0 [mwifiex]
kernel: Code: 9a 00 c1 85 c9 78 5c 4c 89 e7 e8 eb c2 fd d9 41 0f b6 06 48 89 c2 48 83 c0 60 48 c1 e2 04 48 c1 e0 04 4e 8b 84 3a 00 06 00 00 <4c> 01 f8 4c 39 c0 74 1d 41 80 78 45 00 75 0e 49 8d 50 10 49 3b 50
kernel: RSP: 0018:ffffa1aa4903fd70 EFLAGS: 00000206
kernel: RAX: 0000000000000600 RBX: 0000000000000002 RCX: 0000000000000007
kernel: RDX: 0000000000000000 RSI: 00000000fffffe00 RDI: ffff8c64c39e46c0
kernel: RBP: ffffa1aa4903fde0 R08: ffff8c63f1d56420 R09: 0000000000000101
kernel: R10: ffffffff9ba060c8 R11: 0000000000000424 R12: ffff8c64c39e46c0
kernel: R13: ffff8c64e2f54000 R14: ffffffffc1009aaa R15: ffff8c64c39e4000
kernel: FS:  0000000000000000(0000) GS:ffff8c64fba00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000000c0017e1010 CR3: 000000011c434000 CR4: 00000000001006f0
kernel: Call Trace:
kernel:  <IRQ>
kernel:  ? show_regs+0x68/0x70
kernel:  ? watchdog_timer_fn+0x202/0x280
kernel:  ? __pfx_watchdog_timer_fn+0x10/0x10
kernel:  ? __hrtimer_run_queues+0x107/0x270
kernel:  ? hrtimer_interrupt+0x109/0x240
kernel:  ? __sysvec_apic_timer_interrupt+0x50/0x120
kernel:  ? sysvec_apic_timer_interrupt+0x7b/0x90
kernel:  </IRQ>
kernel:  <TASK>
kernel:  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
kernel:  ? mwifiex_wmm_process_tx+0x100/0    x8f0 [mwifiex]
kernel:  mwifiex_main_process+0x5d9/0x960 [mwifiex]
kernel:  mwifiex_main_work_queue+0x25/0x30 [mwifiex]
kernel:  process_scheduled_works+0x9d/0x390
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  worker_thread+0x15b/0x2d0
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xf9/0x130
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x3c/0x60
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  </TASK>

After some digging up, there are some possible solutions:

  1. Try to apply PCI reset method to surface 3 (also see patchwork link in solution 2) https://github.com/linux-surface/kernel/pull/72
  2. Downgrading firmware https://patchwork.kernel.org/project/linux-wireless/patch/1397710914-10061-2-git-send-email-bzhao@marvell.com/ https://gitlab.com/kernel-firmware/linux-firmware/ https://github.com/Corben78/mwifiex-firmware/tree/master/mrvl
  3. Use mwlwifi driver https://patchwork.kernel.org/project/linux-wireless/patch/260962939eeb4dbbb6e462cc010aac21@SC-EXCH02.marvell.com/ https://github.com/kaloz/mwlwifi/tree/master

Recommend label: A:"Mwifiex" D:"Surface 3"

If there are need to attach hostapd.conf, please let me know and I may attached later.

See also:

  1. https://linux-wireless.vger.kernel.narkive.com/6cYwkesp/mwifiex-and-sd8787-tx-queue-timeout-in-ap-mode
  2. https://github.com/linux-surface/kernel/pull/70#issuecomment-726087363