morrownr / USB-WiFi

USB WiFi Adapter Information for Linux
2.44k stars 164 forks source link

List of Bug Reports for the mt7921au chipset / mt7921u driver... #107

Open morrownr opened 1 year ago

morrownr commented 1 year ago

This issue is for maintaining a list of problematic issues that need work. This list will be maintained and updated in this first post by @morrownr . Please add posts to this issue as you have updated information for the existing BUGs in the list or if you have information about a new BUG. Thank you.

Dear Mediatek devs... help is appreciated.


Bug: (2024-04-18) See: https://github.com/morrownr/USB-WiFi/issues/392 . WDS/4addr not supported in AP mode. First reported with Alfa AXML adapter that uses the mt7921au chipset and mt7921u driver). The OP is unable to use WDS/4addr in AP mode.

Status: Open

Info: It was reported that this capability does work with an adapter that uses the mt7612u chipset/driver.


Bug: (2024-03-26) See: https://github.com/morrownr/USB-WiFi/issues/378 Wifi adapter not showing up. First reported with Alfa AXML adapter that uses the mt7921au chipset and mt7921u driver). The adapter is non-functional until using the workaround below.

Status: Open

Workaround: the workaround is to run modprobe -r btusb first, then plug in the usb wifi adapter.

More input is needed. Is this a problem with btusb?


Bug: (2023-12-22) Many Linux distros are detecting Bluetooth capability in mt7921au based adapters but none of the adapters on the market have Bluetooth turned on so it won't work. Linux should not be detecting Bluetooth capability when it is actually not available.

Status: Open and ongoing

Here is a link to a location where you can get a copy of the Intel White Paper that explains the details of why USB3 capable WiFi adapters should not have Bluetooth capability turned on:

https://www.usb.org/document-library/usb-30-radio-frequency-interference-impact-24-ghz-wireless-devices

USB3 WiFi adapters should not have Bluetooth turned on as the USB3 will cause interference with Bluetooth. If makers decide they really want Bluetooth capability in an adapter then they need to limit wifi to USB2 capability. All adapters with the mt7921au chipset that I am aware of have Bluetooth turned off so WiFi can operate in USB3 mode. However, there is a bug in that Bluetooth capability is still being detected by Linux distros and the driver/firmware is loading. Systems act like Bluetooth is available but when you try to use the Bluetooth, it won't work. It is not clear to me how this can be fixed but it really does need to be fixed.

This is not a problem with PCIe cards. I have a mt7922 based PCIe card. Wifi and Bluetooth work well together because wifi uses the PCIe bus and not USB. Please understand that issue in this bug is not exclusive to this chipset. This is an issue will all USB WiFi adapters. The adapters that have USB wifi capability and BT capabilities over the years have limited USB to USB2 to avoid the problem of interference.


Bug: (2023-12-07) Active monitor mode breaks driver.

Status: open

Reporter: @ZerBea Link: https://github.com/openwrt/mt76/issues/839 Problem: Using Active Monitor mode breaks the driver

Driver reports that active monitor mode is possible:

$ iw list | grep active Device supports active monitor (which will ACK incoming frames)

But if hcxdumptool set active monitor mode, it stops working.

If active monitor mode is disabled, everything's fine

0 ERROR(s) during runtime 638 Packet(s) captured by kernel 0 Packet(s) dropped by kernel 1 SHB written to pcapng dumpfile 1 IDB written to pcapng dumpfile 1 ECB written to pcapng dumpfile 83 EPB written to pcapng dumpfile

exit on sigterm I don't think the problem is related to hcxdumptool, because it can be reproduced with iw, ip link and tshark, too:

$ sudo ip link set wlp22s0f0u4i3 down $ sudo iw dev wlp22s0f0u4i3 set type monitor $ sudo ip link set wlp22s0f0u4i3 up $ tsahrk -i wlp22s0f0u4i3 22 packets captured

$ sudo ip link set wlp22s0f0u4i3 down $ sudo iw dev wlp22s0f0u4i3 set monitor active $ sudo ip link set wlp22s0f0u4i3 up $ tshark -i wlp22s0f0u4i3 Capturing on 'wlp22s0f0u4i3' ^C 0 packets captured

Background: Running active monitor mode, the device ACK incoming frames addressed to the virtual MAC of the device. This feature is really useful to perform PMKID attacks. At the moment, active monitor mode is working on:

mt76x0u mt76x2u

It is not working on:

mt7601u mt7921u

I see two options: active monitor mode should be fixed, or active monitor mode capability should not be reported by the driver

mt7601u $ iw list | grep active Device supports active monitor (which will ACK incoming frames)

mt7921u $ iw list | grep active Device supports active monitor (which will ACK incoming frames)


Bug: LED does not function in several of the usb wifi adapters that use the mt7921au chipset.

Status: open, it is unclear what the problem is.

Reported by @morrownr Confirmed by numerous users.


Bug: AP Mode DFS (5 GHz) support is non-functional Status: open

Reported by @morrownr Confirmed by numerous users.

This is really a serious omission in that in many places in the world there are limited non-DFS channels available leading to high levels of congestion.

Dear Mediatek, does your usb chipset competitor support DFS channels in AP Mode? Yes they do. See: out-of-kernel drivers for rtl8812au, rtl8811au, rtl8812bu and rtl8811cu. You need to think about this. Sincerely.


Bug: txpower reading is showing as unusually low as in 3 dBm using iw. Status: open

Reported by several individuals.

This reading must be wrong because actual usage suggests the reading should be much higher.


Bug: (feature request) mt7921u driver does not support 2 interfaces of AP mode on one adapter Status: open

Reported by @whitslack

mt7921u driver does not support 2 instances of AP mode whereas this was common on some drivers for older adapters.

Now:

valid interface combinations:

     * #{ managed, P2P-client } <= 2, #{ AP, P2P-GO } <= 1,
       total <= 2, #channels <= 2

What we want:

valid interface combinations:

     * #{ managed, P2P-client } <= 2, #{ AP, P2P-GO } <= 2,
       total <= 2, #channels <= 2

Bug: connection is dropped and the only way to correct the situation is to reboot (AP mode) Status: open

Testing to see if SG helps performance:

scatter-gather test with mt7921au based adapter

Issue: connection drops and the only resolution is to reboot the system.

Raspberry Pi 4B RasPiOS 2023-05-03

I changed the modulate parameter and rebooted between each test so as to alternate on and off.

iperf3 -c 192.168.1.1 -t 300

scatter-gather off (disable_usb_sg=1)

1:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-300.00 sec  19.9 GBytes   569 Mbits/sec    4             sender
[  5]   0.00-300.01 sec  19.9 GBytes   569 Mbits/sec                  receiver

2: 
[  5]   0.00-300.00 sec  19.9 GBytes   570 Mbits/sec    5             sender
[  5]   0.00-300.01 sec  19.9 GBytes   570 Mbits/sec                  receiver

3:
[  5]   0.00-300.00 sec  20.0 GBytes   573 Mbits/sec    2             sender
[  5]   0.00-300.01 sec  20.0 GBytes   573 Mbits/sec                  receiver

scatter-gather on (disable_usb_sg=0)

1:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-300.00 sec  19.9 GBytes   570 Mbits/sec    1             sender
[  5]   0.00-300.01 sec  19.9 GBytes   570 Mbits/sec                  receiver

2:
[  5]   0.00-300.00 sec  20.0 GBytes   572 Mbits/sec   48             sender
[  5]   0.00-300.01 sec  20.0 GBytes   572 Mbits/sec                  receiver

3.
[  5]   0.00-300.00 sec  19.9 GBytes   571 Mbits/sec    0             sender
[  5]   0.00-300.02 sec  19.9 GBytes   571 Mbits/sec                  receiver

Observation: So much for needing to average the results. I was careful to check that sg was on or off. I have no explanation for how the results could be so close. I see no evidence that sg is providing any performance increase.

Previous to this testing session, I have been able to see the issue of the connection being dropped and only a reboot will connect the situation. It happened twice a few days ago while testing with sg on. There is a history of this with mt7612u adapters. I have yet to duplicate the issue with sg off.

Conclusion: Further testing on different platforms is needed. I will test x86_64 next. Given the history of sg causing problems such as connections dropping that can only be corrected with a reboot, it may be better for the default to be disable_usb_sg=1 with a follow up to determine what the problem is.


morrownr commented 1 year ago

The last msg from you said things were looking up after you turned scatter-gather off. Are you good to go now?

Nop my last message said I had scatter-gather turned off from the beginning and this didn't fix the issue :)

I guess I need to put my glasses on. I'll go back up through the messages and refresh my memory.

FWIW: I may not be around here much over the next 10 days or so as I have other prior things to handle that don't involve computers.

morrownr commented 1 year ago

@gifter77

I have a service on the machine that pings my router at regular intervals and restarts the mt7921u module if connection is lost.

I am wondering if this could be a timing issue. Could you put some delays in the service to see if that helps.

Nevertheless, that service should not be necessary. The real issue is why are you losing connection in the first place?

Are you seeing anything in the log that might relate to that question?

gifter77 commented 1 year ago

The last msg from you said things were looking up after you turned scatter-gather off. Are you good to go now?

There is a delay already. If I disable the service I still lose connection and have to disconnect and reconnect the adapter to re-establish connection.

Are you seeing anything in the log that might relate to that question?

No I cannot see why I am losing connection in the first place.

morrownr commented 1 year ago

Are you seeing anything in the log that might relate to that question?

No I cannot see why I am losing connection in the first place.

I see where you noted a signal level of -70. That is really a poor signal. Anything less than -65 can lead to far less than optimal connections.

Have you considered aluminum foil or a beer can to help the signal?

Or repositioning the adapter with an extension cable?

Or repositioning your router?

Or have you tried all available channels?

There could also be router settings that are contributing to the issue but since you are not seeing any help in the log, this could just be normal operation with a very poor signal. I think taking action to get the signal to a better level should be priority one for now.

gifter77 commented 1 year ago

I will move the adapter next to the router and see if I can reproduce the issue.

morrownr commented 1 year ago

I will move the adapter next to the router and see if I can reproduce the issue.

Can I suggest a location that gives -35 to -45?

gifter77 commented 1 year ago

Can I suggest a location that gives -35 to -45?

@morrownr I cannot reproduce when I move the Pi4 next to the router (-38dBm signal). Easy to reproduce with -70dBm signal. I still think this is a bug though cause it shouldn't silently drop like that and never reconnect until the kernel driver is restarted.

morrownr commented 1 year ago

@gifter77

I still think this is a bug though cause it shouldn't silently drop like that and never reconnect until the kernel driver is restarted.

I agree.

However, given this information, we need to look at some additional things:

Please keep good notes as I will post a bug at the top for us to use as a reference we you are ready. I will keep an eye out for releated information. The problem is that we really do have the problem isolated.

The Mediatek devs are human and wifi has a lot of sources of problems. If we can get the reproducibility down to something simple that works every time, we can probably get this issue worked.

adrienlemasle commented 1 year ago

What are you using to connect? Network Manager, wpa_supplicant, other? The reason I ask is that I those all have logs of their own that can be used to help see what is going on. Is the wifi interface this there when you run iw dev after the connection goes down?

I use wpa_supplicant.

I will try to run the experiments you mentioned when I have some time. For now I have abandoned using these adapters in my 3D printers and reverted to using the Pi4 on-board Wifi, which actually performs better for me.

fhteagle commented 11 months ago

FYI, stability significantly improved after completely disabling bluetooth service in systemctl, when using a CF-953AX on a Raspi4B (Raspbian Bullseye kernel 6.1.x?, with custom compiled hostapd and updated firmware .bin files). Previoiusly, even with restart=unless-stopped in the hostapd service definition file, a manual reset hostapd and/or Raspi reboot was pretty much an every morning affair. Since that change, AP has been up ~8 days continuously with no manual hostapd reset or reboot required (systemctl service has restarted it periodically but that is completely automatic now).

morrownr commented 11 months ago

@fhteagle

Interesting post. Are you using the 32 bit or 64 bit RasPiOS?

The last testing of an AP with a mt7921au based adapter I did was around 3 months ago as I was working to tighten down the hostapd.conf for WiFi 6 in the AP mode section of the Main Menu. WiFi 6 is complicated and WiFi 6e is right up there with rocket science so maybe someday with the WiFi 6e.

I was using the 5-3-23 64 bit RasPiOS on a RasPi4B. The adapter was a CF-951AX. I used WiFi 6 and the 5 GHz channel 36 as it is not used heavily in my area. I compiled a new hostapd and updated the firmware. I was using the adapter in a USB3 port woth a right angle USB adapter. No powered hubs were in use. In fact, nothing else was plugged into a USB port, not even a leyboard or mouse as run headless with VNC.

The only problems I encountered were with bad configurations in hostapd.conf. Once my trial and error testing got what I think are the best settings, I simply used the setup for 2 weeks. It was rock solid for 2 weeks and I finally took it down because I needed the Pi for other needs.

I was looking forward to a new RasPiOS based on Debian 12 which I think will help wireless in several ways. I'm now wondering if I need to try to duplicate your setup to see if we can find the problem. Thoughts?

@morrownr

fhteagle commented 11 months ago
$ uname -a
Linux ***hostname here*** 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux

Tried directly in USB2 and USB3 ports, and through the very high power USB3 hub I have plugged into the pi. Nothing made a difference. As soon as I disabled bluetooth.service, I had instant stability improvements and systemctl could auto-restart hostapd in the rare case that it was needed.

You could play around with things if you want (it's obviously fun for you), but for me it really does not matter. I do not need BT on that unit, but I do need my second AP to be rock solid, so the trade is easy to make.

morrownr commented 11 months ago

Tried directly in USB2 and USB3 ports, and through the very high power USB3 hub I have plugged into the pi.

Yes, but have you tried with the powered USB3 hub not plugged in at all? The powered hubs can cause problems on Pi's even if the device is not plugged into the hub. My general rule on powered hubs on RasPi's while using a usb wifi adapter is "just don't do it". USB3 is not mankind's greatest invention to begin with and then there are issues with the USB subsystem on the Pi's, to include the crappy USB3 chipset.

As soon as I disabled bluetooth.service, I had instant stability improvements ...

I just delete the BT firmware file for the adapter and that ends that discussion. BT is not active on any of the mt7921au based adapters that I am aware of but has shown as available but not functional. I think there was a little mistake somewhere either at Comfast or Mediatek enginneering in that somewhere the information that USB3 and BT don't work together got lost. For BT to work on these adapters, wifi would have to be limited to USB2 or you have inference issues.

I do need my second AP to be rock solid.

I'm just offering to work it into my schedule if it is a problem for you. We could wait until the new RasPiOS based on Debian 12 is available. What ever you want to do.

@morrownr

bjlockie commented 11 months ago

On 2023-08-10 17:49, morrownr wrote:

USB3 is not mankind's greatest invention to begin with I think USB4 is worse. :-)

baltic-tea commented 8 months ago

Didn't see this topic right away, but anyway... my bug report is too long for a one comment 😎

https://github.com/morrownr/USB-WiFi/issues/327#issue-1974929466

lr1729 commented 8 months ago

I have also found that the same issue as reported by @gifter77 and @leezu happens only when a device is on the edge of reception of the router (CF-953AX in AP mode), and the crash happens often when someone enters or leaves the range. This is on kernel 6.6 with scatter gather disabled, on the latest firmware, using the hostapd-WiFi6.conf in this repo. This is the dmesg log:

Nov 06 11:49:58 debian kernel: xor: automatically using best checksumming function   avx       
Nov 06 11:49:58 debian kernel: Btrfs loaded, zoned=yes, fsverity=yes
Nov 06 14:02:36 debian kernel: mt7921u 2-1:1.0: Message 00020002 (seq 15) timeout
Nov 06 14:02:36 debian kernel: mt7921u 2-1:1.0: timed out waiting for pending tx
Nov 06 14:02:36 debian kernel: ------------[ cut here ]------------
Nov 06 14:02:36 debian kernel: WARNING: CPU: 2 PID: 322188 at kernel/kthread.c:660 kthread_park+0x85/0xa0
Nov 06 14:02:36 debian kernel: Modules linked in: btrfs blake2b_generic xor raid6_pq ufs hfsplus hfs cdrom minix msdos jfs xfs cmac ccm tun btusb btrtl uvcvideo btbcm videobuf2_vmalloc btintel uvc btmtk videobuf2_memops videobuf2_v4l2 bluetooth videodev nft_chain_nat videobuf2_common ecdh_generic mc xt_MASQUERADE snd_sof_pci_intel_cnl nf_nat snd_sof_intel_hda_common xt_mark soundwire_intel xt_conntrack snd_sof_intel_hda_mlink soundwire_cadence nf_conntrack snd_sof_intel_hda nf_defrag_ipv6 snd_sof_pci nf_defrag_ipv4 snd_hda_codec_hdmi snd_sof_xtensa_dsp nft_compat snd_sof nf_tables snd_sof_utils libcrc32c soundwire_generic_allocation soundwire_bus nfnetlink snd_ctl_led snd_soc_skl intel_tcc_cooling snd_hda_codec_realtek x86_pkg_temp_thermal snd_soc_hdac_hda intel_powerclamp snd_hda_ext_core snd_hda_codec_generic vfat coretemp snd_soc_sst_ipc ledtrig_audio fat snd_soc_sst_dsp kvm_intel snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core kvm snd_compress ac97_bus irqbypass snd_pcm_dmaengine mt7921u crct10dif_pclmul snd_hda_intel
Nov 06 14:02:36 debian kernel:  polyval_clmulni mt7921_common polyval_generic snd_intel_dspcfg gf128mul mt76_connac_lib snd_intel_sdw_acpi ghash_clmulni_intel mt76_usb sha512_ssse3 snd_hda_codec mt76 i915 aesni_intel snd_hda_core drm_buddy snd_hwdep crypto_simd i2c_algo_bit processor_thermal_device_pci_legacy mac80211 mei_hdcp mei_pxp cryptd snd_pcm processor_thermal_device ttm processor_thermal_rfim rapl drm_display_helper hp_wmi snd_timer spi_nor processor_thermal_mbox iTCO_wdt intel_rapl_msr joydev intel_cstate mei_me snd processor_thermal_rapl sparse_keymap libarc4 cec intel_pmc_bxt intel_rapl_common mousedev intel_uncore platform_profile intel_wmi_thunderbolt mtd ee1004 iTCO_vendor_support int3403_thermal wmi_bmof soundcore pcspkr intel_gtt mei intel_pch_thermal intel_soc_dts_iosf int340x_thermal_zone cfg80211 int3400_thermal acpi_thermal_rel wireless_hotkey acpi_pad rfkill hid_multitouch serio_raw mac_hid fuse loop dm_mod ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 atkbd libps2 vivaldi_fmap r8169 nvme realtek
Nov 06 14:02:36 debian kernel:  crc32_pclmul intel_lpss_pci mdio_devres nvme_core crc32c_intel i2c_i801 spi_intel_pci xhci_pci intel_lpss i2c_smbus spi_intel nvme_common libphy i2c_hid_acpi xhci_pci_renesas idma64 i2c_hid video i8042 serio wmi
Nov 06 14:02:36 debian kernel: CPU: 2 PID: 322188 Comm: kworker/u16:5 Not tainted 6.5.8-tkg-cfs #1 b1cf83867824fe2947ff03d2af55139fb47fabe5
Nov 06 14:02:36 debian kernel: Hardware name: HP HP Laptop 14-cf1xxx/852E, BIOS F.73 11/18/2022
Nov 06 14:02:36 debian kernel: Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
Nov 06 14:02:36 debian kernel: RIP: 0010:kthread_park+0x85/0xa0
Nov 06 14:02:36 debian kernel: Code: 00 48 85 c0 74 2d 31 c0 5b 5d c3 cc cc cc cc 0f 0b 48 8b ab a8 06 00 00 a8 04 74 ac 0f 0b b8 da ff ff ff 5b 5d c3 cc cc cc cc <0f> 0b b8 f0 ff ff ff eb d5 0f 0b eb cf 66 66 2e 0f 1f 84 00 00 00
Nov 06 14:02:36 debian kernel: RSP: 0018:ffff9fec89417d68 EFLAGS: 00010202
Nov 06 14:02:36 debian kernel: RAX: 0000000000000004 RBX: ffff8d868403a100 RCX: 0000000000000100
Nov 06 14:02:36 debian kernel: RDX: ffffffff83180128 RSI: 0000000000000287 RDI: ffff8d868403a100
Nov 06 14:02:36 debian kernel: RBP: ffff8d8686983000 R08: ffffffff83180128 R09: 0000000000000000
Nov 06 14:02:36 debian kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff8d86892120e8
Nov 06 14:02:36 debian kernel: R13: ffff8d86845798a8 R14: ffff8d86892120e8 R15: 0000000000000100
Nov 06 14:02:36 debian kernel: FS:  0000000000000000(0000) GS:ffff8d89de480000(0000) knlGS:0000000000000000
Nov 06 14:02:36 debian kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 06 14:02:36 debian kernel: CR2: 00007f0dec6d0710 CR3: 0000000077020002 CR4: 00000000003706e0
Nov 06 14:02:36 debian kernel: Call Trace:
Nov 06 14:02:36 debian kernel:  <TASK>
Nov 06 14:02:36 debian kernel:  ? kthread_park+0x85/0xa0
Nov 06 14:02:36 debian kernel:  ? __warn+0x81/0x130
Nov 06 14:02:36 debian kernel:  ? kthread_park+0x85/0xa0
Nov 06 14:02:36 debian kernel:  ? report_bug+0x171/0x1a0
Nov 06 14:02:36 debian kernel:  ? handle_bug+0x41/0x70
Nov 06 14:02:36 debian kernel:  ? exc_invalid_op+0x17/0x70
Nov 06 14:02:36 debian kernel:  ? asm_exc_invalid_op+0x1a/0x20
Nov 06 14:02:36 debian kernel:  ? kthread_park+0x85/0xa0
Nov 06 14:02:36 debian kernel:  mt76u_stop_tx+0x216/0x2f0 [mt76_usb 9c4c9104b8dcb657438d9fff3056938711c2dbab]
Nov 06 14:02:36 debian kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Nov 06 14:02:36 debian kernel:  mt7921u_mac_reset+0x6d/0x1a0 [mt7921u 0399204592415ed6c16b35e038e76cd6404e4d5c]
Nov 06 14:02:36 debian kernel:  mt7921_mac_reset_work+0x97/0x180 [mt7921_common 81100bffabffa59a770facb02357897e6a29a15f]
Nov 06 14:02:36 debian kernel:  process_one_work+0x1df/0x3e0
Nov 06 14:02:36 debian kernel:  worker_thread+0x51/0x390
Nov 06 14:02:36 debian kernel:  ? __pfx_worker_thread+0x10/0x10
Nov 06 14:02:36 debian kernel:  kthread+0xe5/0x120
Nov 06 14:02:36 debian kernel:  ? __pfx_kthread+0x10/0x10
Nov 06 14:02:36 debian kernel:  ret_from_fork+0x31/0x50
Nov 06 14:02:36 debian kernel:  ? __pfx_kthread+0x10/0x10
Nov 06 14:02:36 debian kernel:  ret_from_fork_asm+0x1b/0x30
Nov 06 14:02:36 debian kernel:  </TASK>
Nov 06 14:02:36 debian kernel: ---[ end trace 0000000000000000 ]---
Nov 06 14:02:36 debian kernel: mt7921u 2-1:1.0: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a
Nov 06 14:02:36 debian kernel: mt7921u 2-1:1.0: WM Firmware Version: ____010000, Build Time: 20230526130958
Nov 06 14:03:03 debian kernel: r8169 0000:02:00.0 eno1: Link is Down
morrownr commented 8 months ago

@deren

Thanks for all of the good work you have been doing. The purpose of this message is to let you know that multiple people are reporting a problem with the mt7921u driver:

@lr1729 is posting about it right above this message and he adds that it appears to be the same as what @gifter77 and @leezu have reported on up the thread. (keep in mind that this thread is not about one issue but is a thread to collect bugs so you have to stay on the messages from the 3 gents mentioned in this paragraph.)

Problem: Driver is crashing when operating near the edge of acceptable signal.

Desired outcome: No crash and recovery.

If you need us to report this to linux-wireless, let us know.

@morrownr

deren commented 8 months ago

@morrownr

Thanks for this report. I think we can report the issue here and it's easier to share information.

Let me clarify the test condition here. If there are anything wrong, please let me know.

  1. AP mode
  2. around RSSI -70dBm
  3. some STAs join/leave this AP frequently
  4. bluetooth service disabled

Looks like the issue can be reproduced in Rpi4+Raspbian only, right? If so, I was working with Ubuntu laptop only and need to setup a proper debug environment. I will play this issue with latest drv+fw and update my finding here.

@LorenzoBianconi / @objelf If you have any idea, please join the discussion here.

@deren

lr1729 commented 8 months ago

@deren I was able to reproduce this issue with those test conditions both on an rpi4 and an x86 debian laptop. If my setup is relevant I set up a NAT access point using the adapter with these commands

sudo iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
sudo iptables -A FORWARD -i eno1 -o wlxe0e1a93544f3 -m state --state RELATED,ESTABLISHED -j ACCEPT
sudo iptables -A FORWARD -i wlxe0e1a93544f3 -o eno1 -j ACCEPT

Used pi-hole as the dhcp/dnsmasq server, and set the interface ip with /sbin/ip addr add 192.168.1.1/24 dev wlxe0e1a93544f3. I then start the systemd hostapd service with this config. hostapd.conf.txt

gifter77 commented 8 months ago

@deren for me the issue happens when using the adapter in a standard way, i.e I'm not using AP mode.

@lr1729 do you happen to have an Asus router somewhere in your setup? I wonder if this is somewhat related to the router as well.

henkv1 commented 8 months ago

I encounter these issues at night when there is no traffic at all, but when all devices get disconnected/timed out:

[Thu Nov  9 00:35:23 2023] mt7921u 2-1:1.0: Message 00020002 (seq 7) timeout
[Thu Nov  9 00:35:24 2023] mt7921u 2-1:1.0: timed out waiting for pending tx

Nov 09 00:34:48 harrie hostapd[6067]: wlan1: AP-STA-POLL-OK 84:b8:b8:a3:7c:cd
Nov 09 00:35:19 harrie hostapd[6067]: wlan1: AP-STA-DISCONNECTED 04:b4:29:21:b8:03
Nov 09 00:35:19 harrie hostapd[6067]: wlan1: STA 04:b4:29:21:b8:03 IEEE 802.11: disassociated due to inactivity
Nov 09 00:35:19 harrie hostapd[6067]: wlan1: STA 04:b4:29:21:b8:03 IEEE 802.11: disassociated due to inactivity
Nov 09 00:35:20 harrie hostapd[6067]: wlan1: STA 04:b4:29:21:b8:03 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Nov 09 00:35:20 harrie hostapd[6067]: wlan1: STA 04:b4:29:21:b8:03 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Nov 09 00:35:55 harrie hostapd[6067]: wlan1: AP-STA-DISCONNECTED 84:b8:b8:a3:7c:cd
Nov 09 00:35:55 harrie hostapd[6067]: wlan1: STA 84:b8:b8:a3:7c:cd IEEE 802.11: disassociated due to inactivity
Nov 09 00:35:55 harrie hostapd[6067]: wlan1: STA 84:b8:b8:a3:7c:cd IEEE 802.11: disassociated due to inactivity
Nov 09 00:35:56 harrie hostapd[6067]: wlan1: STA 84:b8:b8:a3:7c:cd IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Nov 09 00:35:56 harrie hostapd[6067]: wlan1: STA 84:b8:b8:a3:7c:cd IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

[Thu Nov  9 23:17:47 2023] mt7921u 2-1:1.0: Message 00020002 (seq 6) timeout
[Thu Nov  9 23:17:48 2023] mt7921u 2-1:1.0: timed out waiting for pending tx
[Thu Nov  9 23:17:48 2023] mt7921u 2-1:1.0: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a

[Thu Nov  9 23:17:48 2023] mt7921u 2-1:1.0: WM Firmware Version: ____010000, Build Time: 20230526130958

Nov 09 23:16:47 harrie hostapd[153663]: wlan1: AP-STA-POLL-OK 84:b8:b8:a3:7c:cd
Nov 09 23:17:43 harrie hostapd[153663]: wlan1: AP-STA-DISCONNECTED 4e:0b:61:46:bd:34
Nov 09 23:17:43 harrie hostapd[153663]: wlan1: STA 4e:0b:61:46:bd:34 IEEE 802.11: disassociated due to inactivity
Nov 09 23:17:43 harrie hostapd[153663]: wlan1: STA 4e:0b:61:46:bd:34 IEEE 802.11: disassociated due to inactivity
Nov 09 23:17:44 harrie hostapd[153663]: wlan1: STA 4e:0b:61:46:bd:34 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Nov 09 23:17:44 harrie hostapd[153663]: wlan1: STA 4e:0b:61:46:bd:34 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Nov 09 23:17:57 harrie hostapd[153663]: wlan1: AP-STA-DISCONNECTED 84:b8:b8:a3:7c:cd
Nov 09 23:17:57 harrie hostapd[153663]: wlan1: STA 84:b8:b8:a3:7c:cd IEEE 802.11: disassociated due to inactivity
Nov 09 23:17:57 harrie hostapd[153663]: wlan1: STA 84:b8:b8:a3:7c:cd IEEE 802.11: disassociated due to inactivity
Nov 09 23:17:58 harrie hostapd[153663]: wlan1: STA 84:b8:b8:a3:7c:cd IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Nov 09 23:17:58 harrie hostapd[153663]: wlan1: STA 84:b8:b8:a3:7c:cd IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
morrownr commented 8 months ago

@lr1729 @gifter77 @deren @leezu @enkv1

I'm going to setup to do some testing. I probably have an adequate amount of hardware to test various setups. Hopefully I can start posting results next week.

Here is what @deren posted in an effort to clarify things:

AP mode

some STAs join/leave this AP frequently bluetooth service disabled

My thoughts:

AP mode - at least one of the reports seems to indicated managed mode - need claification

around RSSI -70dBm - this is my understanding as well

some STAs join/leave this AP frequently - need claification

bluetooth service disabled - this issue is about usb wifi adapters - no modern usb wifi adapter should have bluetooth on as the wifi will be limited to USB2 - Ref: Intel white paper from many years ago - USB3 cables and connections will emit a signal that will interfere with bluetooth. FYI: I actually have a pre-production adapter based on the mt7921au chipset that has bluetooth active. It works really well but USB3 is turned off. When I reported it to the maker, the reply was basically "oh shit" and the production model is wifi only... as it should be. This is not an issue with PCIe because the wifi is not using a usb bus.

My plan:

Test 1

router: based on mt7981soc (WiFi 6 capable) client: CF-951AX based on mt7921au chipset client os: Ubuntu 23.10 and cpu: intel i7 I will get the signal down into the range of -65 to -70 and use iperf3 to pound it hard

Test 2

router: RasPi4B with Alfa AXML (mt7921au) in AP mode client: various adapters router os: RasPiOS 2023-10-10 using my AP guide here on site client os: Ubuntu 23.10 I will get the signal down into the range of -65 to -70 and use iperf3 to pound it hard

If anyone sees that I need to do something different, please speak up. I'd like to get this issue down to the point it is readily reproducible.

Regards

lr1729 commented 8 months ago

I found the issue in AP mode in my case happens intermittently throughout the day, but does not appear to be related to how much traffic is sent, it occurs even when there is very minimal traffic. Here is the same trace log I reproduced on a raspberry pi

Nov 11 10:30:01.158268 raspberrypi kernel: bcmgenet fd580000.ethernet end0: Link is Up - 1Gbps/Full - flow control off
Nov 11 10:30:03.325462 raspberrypi kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlx90de80d6877b: link becomes ready
Nov 11 10:36:14.792838 raspberrypi kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlx90de80d6877b: link becomes ready
Nov 11 12:41:12.404852 raspberrypi kernel: mt7921u 2-1:1.0: Message 00020003 (seq 4) timeout
Nov 11 12:41:15.476857 raspberrypi kernel: mt7921u 2-1:1.0: Message 00020003 (seq 5) timeout
Nov 11 12:41:15.748851 raspberrypi kernel: mt7921u 2-1:1.0: timed out waiting for pending tx
Nov 11 12:41:15.816961 raspberrypi kernel: ------------[ cut here ]------------
Nov 11 12:41:15.817332 raspberrypi kernel: WARNING: CPU: 1 PID: 2328 at kernel/kthread.c:659 kthread_park+0xc4/0xe0
Nov 11 12:41:15.817732 raspberrypi kernel: Modules linked in: cmac ctr aes_arm64 aes_generic ccm libaes tun nft_chain_nat xt_MASQUERADE mt7921u nf_nat mt7921_common mt7>
Nov 11 12:41:15.818181 raspberrypi kernel: CPU: 1 PID: 2328 Comm: kworker/u8:1 Tainted: G         C         6.1.0-rpi4-rpi-v8 #1  Debian 1:6.1.54-1+rpt2
Nov 11 12:41:15.818373 raspberrypi kernel: Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
Nov 11 12:41:15.818576 raspberrypi kernel: Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
Nov 11 12:41:15.818789 raspberrypi kernel: pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Nov 11 12:41:15.818932 raspberrypi kernel: pc : kthread_park+0xc4/0xe0
Nov 11 12:41:15.819234 raspberrypi kernel: lr : mt76u_stop_tx+0x288/0x350 [mt76_usb]
Nov 11 12:41:15.819417 raspberrypi kernel: sp : ffffffc00a5d3c40
Nov 11 12:41:15.819593 raspberrypi kernel: x29: ffffffc00a5d3c40 x28: 0000000000000000 x27: ffffff80434f8a18
Nov 11 12:41:15.819769 raspberrypi kernel: x26: ffffff80434f08a0 x25: ffffff8042cc8880 x24: 0000000000000100
Nov 11 12:41:15.819940 raspberrypi kernel: x23: ffffff80434f2068 x22: ffffff80434f2048 x21: ffffff80434f4820
Nov 11 12:41:15.820134 raspberrypi kernel: x20: ffffff80425bed00 x19: ffffff8041b81f00 x18: 0000000000000000
Nov 11 12:41:15.820310 raspberrypi kernel: x17: 0000000000000000 x16: ffffffd1c56b4920 x15: 00003d08aec8dcc0
Nov 11 12:41:15.820479 raspberrypi kernel: x14: 001293a3758c129c x13: 0012ed20cf48e88a x12: ffffffd1c623ccd0
Nov 11 12:41:15.820631 raspberrypi kernel: x11: 0000000000000204 x10: 0000000000001a90 x9 : ffffffd17dc309e8
Nov 11 12:41:15.820835 raspberrypi kernel: x8 : ffffffc00a5d3ae8 x7 : 0000000000000000 x6 : ffffffd1c6bdf7a8
Nov 11 12:41:15.821030 raspberrypi kernel: x5 : ffffffd1c6a6e000 x4 : ffffffd1c6a6e118 x3 : 0000000000002800
Nov 11 12:41:15.821219 raspberrypi kernel: x2 : 0000000000000000 x1 : 0000000000001fe0 x0 : 0000000000000004
Nov 11 12:41:15.821392 raspberrypi kernel: Call trace:
Nov 11 12:41:15.821565 raspberrypi kernel:  kthread_park+0xc4/0xe0
Nov 11 12:41:15.821749 raspberrypi kernel:  mt76u_stop_tx+0x288/0x350 [mt76_usb]
Nov 11 12:41:15.821919 raspberrypi kernel:  mt7921u_mac_reset+0x88/0x28c [mt7921u]
Nov 11 12:41:15.822095 raspberrypi kernel:  mt7921_mac_reset_work+0xa8/0x1c0 [mt7921_common]
Nov 11 12:41:15.822266 raspberrypi kernel:  process_one_work+0x200/0x474
Nov 11 12:41:15.822434 raspberrypi kernel:  worker_thread+0x74/0x43c
Nov 11 12:41:15.822597 raspberrypi kernel:  kthread+0xfc/0x110
Nov 11 12:41:15.822765 raspberrypi kernel:  ret_from_fork+0x10/0x20
Nov 11 12:41:15.822915 raspberrypi kernel: ---[ end trace 0000000000000000 ]---
Nov 11 12:41:15.956849 raspberrypi kernel: mt7921u 2-1:1.0: firmware: direct-loading firmware mediatek/WIFI_MT7961_patch_mcu_1_2_hdr.bin
Nov 11 12:41:15.958674 raspberrypi kernel: mt7921u 2-1:1.0: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a
Nov 11 12:41:15.968870 raspberrypi kernel: mt7921u 2-1:1.0: firmware: direct-loading firmware mediatek/WIFI_RAM_CODE_MT7961_1.bin
Nov 11 12:41:15.970738 raspberrypi kernel: mt7921u 2-1:1.0: WM Firmware Version: ____010000, Build Time: 20230526130958
Nov 11 12:42:05.908832 raspberrypi kernel: mt7921u 2-1:1.0: Message 00020002 (seq 6) timeout
Nov 11 12:42:08.980840 raspberrypi kernel: mt7921u 2-1:1.0: Message 000008ed (seq 7) timeout
Nov 11 12:42:09.188836 raspberrypi kernel: mt7921u 2-1:1.0: firmware: direct-loading firmware mediatek/WIFI_MT7961_patch_mcu_1_2_hdr.bin
Nov 11 12:42:09.190604 raspberrypi kernel: mt7921u 2-1:1.0: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a
Nov 11 12:42:09.200932 raspberrypi kernel: mt7921u 2-1:1.0: firmware: direct-loading firmware mediatek/WIFI_RAM_CODE_MT7961_1.bin
Nov 11 12:42:09.202835 raspberrypi kernel: mt7921u 2-1:1.0: WM Firmware Version: ____010000, Build Time: 20230526130958
Nov 11 12:42:11.672224 raspberrypi systemd-shutdown[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
Nov 11 12:42:11.672433 raspberrypi systemd-shutdown[1]: Failed to set timeout to 10min: Invalid argument
Nov 11 12:42:11.700824 raspberrypi systemd-shutdown[1]: Syncing filesystems and block devices.
Nov 11 12:42:12.106058 raspberrypi systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Nov 11 12:42:12.106336 raspberrypi systemd-journald[270]: Received SIGTERM from PID 1 (systemd-shutdow).
morrownr commented 8 months ago

@lr1729

Here is the same trace log I reproduced on a raspberry pi...

Need you to be precise. It matters. Is this a RasPi4B?

Also, I'm not finding what usb adapter you are using. Can you provide that info and whether you are using an extension cable or powered hub?

The above is needed from the others as well.

FYI: I am setup to do Test 2 as shown above. I am showing signal -69 to -71 and am using an Alfa AXML (mt7921au chipset) as the AP on a RasPi4b using RasPiOS 23023-10-10. It took a lot of distance and 4 walls to get the signal down to that level. Hell, in my experience, most adapters can't even work with signals that low but the Alfa is hanging in there. I'll start my testing tomorrow and report what I find with Test 2 before moving to Test 1.

lr1729 commented 8 months ago

Yes, it is a Raspberry Pi 4B on the last 64 bit raspios, I am not using an extension cable or powered hub. I have tested a COMFAST CF-953AX and Fenvi FU-AX1800 and have had the same results with both. I have been able to reproduce the issue most often by connecting to the AP and moving out of range until the device disconnects.

ritech commented 8 months ago

Interesting questions I think the chip design of MediaTek did not consider power consumption

morrownr commented 8 months ago

I think the chip design of MediaTek did not consider power consumption...

Not sure that I agree but for now, can we have this discussion in an alternate location so we can concentrate on the issue at hand.

Thanks.

morrownr commented 8 months ago

@lr1729

Yes, it is a Raspberry Pi 4B on the last 64 bit raspios, I am not using an extension cable or powered hub. I have tested a COMFAST CF-953AX and Fenvi FU-AX1800 and have had the same results with both. I have been able to reproduce the issue most often by connecting to the AP and moving out of range until the device disconnects.

I have had Test 2 (adapter in AP mode on a RasPi4B) as shown above going for about 24 hours at this point. I have put heavy loads with iperf3 and I have let it set for periods and I've connected with multiple systems with various usage levels and signal strengths. I have not experienced one drop. The AP (the Alfa AXML) is set for support up to WiFi 6 (5 GHZ band). The 2 clients that are at a signal strength of -70 are seeing stable stable results (which is amazing). Let me alter the test. I am going to add my notebook computer to the mix so that I can walk away from the AP while monitoring with wavemon. I am also going to replace the Alfa AXML with a Comfast cf-951ax (akso with the mt7921au chipset).

The search to narrow things down and duplicate the problem continues.

morrownr commented 8 months ago

@lr1729

If I am understand your AP setup correctly, you are using the RasPi4B with the RasPiOS 2023-10-10 and a CF-953AX adapter. Tell me what you are using for a software AP setup? An AP guide? What channel are you using for the AP? What security are you using for the AP? WPA3? This is proving to be really hard to duplicate so far so I need details so I can find what is different between our setups.

lr1729 commented 8 months ago

Got it. I used a NAT setup because I had issues with the bridge setup before and wanted the dhcp and dns server on the pi. From a clean installation, my steps are:

  1. Turn on predictable interface names
  2. Run the iptables commands to set up NAT (my interface name is wlxe0e1a93544f3)
    sudo iptables -t nat -A POSTROUTING -o end0 -j MASQUERADE
    sudo iptables -A FORWARD -i end0 -o wlxe0e1a93544f3 -m state --state RELATED,ESTABLISHED -j ACCEPT
    sudo iptables -A FORWARD -i wlxe0e1a93544f3 -o end0 -j ACCEPT
  3. Enable ip forwarding sudo sysctl -w net.ipv4.ip_forward=1
  4. Install pi hole curl -sSL https://install.pi-hole.net | bash on the ethernet interface, in my case end0, and enable the dhcp server through the web ui for a different subnet, I used 192.168.1.2-192.168.1.255. Could also be done using dnsmasq directly
  5. Set the interface ip, sudo ip addr add 192.168.1.1/24 dev wlxe0e1a93544f3
  6. Install hostapd and use this config file hostapd.conf.txt

Its mostly the same as the example config in this repo, I use channel 149 with 80mhz width and WPA3/WPA2 mixed, but I've had the same issue no matter what other config options I tried. I also disabled onboard wifi and bluetooth, disabled usb_sg and updated the firmware and whatnot, and made the iptables, interface ip, ip forwarding options set on boot.

I typically only encounter the issue a few times per day, but this is with around 10 devices using it as the router daily while moving in and out of range frequently.

I've been unable to reproduce it purposefully so far. It happens when I'm not expecting it somehow, and I just notice its somewhat correlated with when someone leaves the area

deren commented 8 months ago

@lr1729

I still cannot see the problem here. Hope you can try to figure out something more.

  1. Can you provide a longer dmesg log? I would like to see how often the problem show up.
  2. Can you help to check if set the rekey timeout shorter?

    Time interval for rekeying GTK (broadcast/multicast encryption keys) in

    seconds. (dot11RSNAConfigGroupRekeyTime)

    wpa_group_rekey=600

  3. (If possible) Can you verify the AP running in OPEN mode?
gifter77 commented 8 months ago

FYI @morrownr for me to reproduce consistently is a bit more complicated. I need:

I received a Pi 5 last week, I'll see if I can reproduce on this as well. I think the Pi 5 doesn't use this annoying VL805 USB chipset anymore.

lr1729 commented 8 months ago

@deren Here are the dmesg logs from the past 20 boots, boot_logs.txt

I will try changing the rekey timeout and having the AP be open and update if the issue continues to occur.

Update: I encountered the same crash with both WPA disabled and shortening the rekey time

deren commented 8 months ago

I see the same issue once, after add internet access in the test.

Here is a quick patch for the crash problem. Can someone help to check with this patch? With the patch applied, the timeout issue may still show up but can be recovered after reset process.

diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index 5e5c7bf51174..becaca529e93 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -1009,7 +1009,8 @@ void mt76u_stop_tx(struct mt76_dev *dev)
                                usb_kill_urb(q->entry[j].urb);
                }

-               mt76_worker_disable(&dev->tx_worker);
+               if (!test_bit(MT76_MCU_RESET, &dev->phy.state))
+                       mt76_worker_disable(&dev->tx_worker);

                /* On device removal we maight queue skb's, but mt76u_tx_kick()
                 * will fail to submit urb, cleanup those skb's manually.
@@ -1026,7 +1027,8 @@ void mt76u_stop_tx(struct mt76_dev *dev)
                        }
                }

-               mt76_worker_enable(&dev->tx_worker);
+               if (!test_bit(MT76_MCU_RESET, &dev->phy.state))
+                       mt76_worker_enable(&dev->tx_worker);
        }
LorenzoBianconi commented 8 months ago

I see the same issue once, after add internet access in the test.

Here is a quick patch to the crash problem. Can someone help to check with this patch? With the patch applied, the timeout issue may still show up but can be recovered after reset process.

diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c index 5e5c7bf51174..becaca529e93 100644 --- a/drivers/net/wireless/mediatek/mt76/usb.c +++ b/drivers/net/wireless/mediatek/mt76/usb.c @@ -1009,7 +1009,8 @@ void mt76u_stop_tx(struct mt76_dev *dev) usb_kill_urb(q->entry[j].urb); }

  • mt76_worker_disable(&dev->tx_worker);

  • if (!test_bit(MT76_MCU_RESET, &dev->phy.state))

  • mt76_worker_disable(&dev->tx_worker);

            /* On device removal we maight queue skb's, but mt76u_tx_kick()
             * will fail to submit urb, cleanup those skb's manually.

    @@ -1026,7 +1027,8 @@ void mt76u_stop_tx(struct mt76_dev *dev) } }

  • mt76_worker_enable(&dev->tx_worker);

  • if (!test_bit(MT76_MCU_RESET, &dev->phy.state))

  • mt76_worker_enable(&dev->tx_worker); }

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

@Deren: what about doing something like:

diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c index 5e5c7bf51174..dade3f3db4a5 100644 --- a/drivers/net/wireless/mediatek/mt76/usb.c +++ b/drivers/net/wireless/mediatek/mt76/usb.c @@ -1032,6 +1032,12 @@ void mt76u_stop_tx(struct mt76_dev *dev) cancel_work_sync(&dev->usb.stat_work); clear_bit(MT76_READING_STATS, &dev->phy.state);

Regards, Lorenzo

deren commented 8 months ago

@deren: what about doing something like: diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c index 5e5c7bf51174..dade3f3db4a5 100644 --- a/drivers/net/wireless/mediatek/mt76/usb.c +++ b/drivers/net/wireless/mediatek/mt76/usb.c @@ -1032,6 +1032,12 @@ void mt76u_stop_tx(struct mt76_dev *dev) cancel_work_sync(&dev->usb.stat_work); clear_bit(MT76_READING_STATS, &dev->phy.state); + ret = wait_event_timeout(dev->tx_wait, + !test_bit(MT76_MCU_RESET, &dev->phy.state), + HZ); + if (!ret) + dev_err(dev->dev, "timed out waiting for mcu reset\n"); + mt76_worker_enable(&dev->usb.status_worker); mt76_tx_status_check(dev, true); Regards, Lorenzo

not sure. But the problem caused by mt76_worker_disable(&dev->tx_worker) called twice, we may need to avoid this call flow.

mt7921u_mac_reset()
  => mt76_worker_disable()
  => mt76u_stop_tx()
    => mt76_worker_disable()

Regards, Deren

LorenzoBianconi commented 8 months ago

@deren: what about doing something like: diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c index 5e5c7bf51174..dade3f3db4a5 100644 --- a/drivers/net/wireless/mediatek/mt76/usb.c +++ b/drivers/net/wireless/mediatek/mt76/usb.c @@ -1032,6 +1032,12 @@ void mt76u_stop_tx(struct mt76_dev *dev) cancel_work_sync(&dev->usb.stat_work); clear_bit(MT76_READING_STATS, &dev->phy.state); + ret = wait_event_timeout(dev->tx_wait, + !test_bit(MT76_MCU_RESET, &dev->phy.state), + HZ); + if (!ret) + dev_err(dev->dev, "timed out waiting for mcu reset\n"); + mt76_worker_enable(&dev->usb.status_worker); mt76_tx_status_check(dev, true); Regards, Lorenzo

not sure. But the problem caused by mt76_worker_disable(&dev->tx_worker) called twice, we may need to avoid this call flow.

I can't see why mt76_worker_disable() triggers the issue since it just parks the thread. Can you please share more details?

Regards, Lorenzo

mt7921u_mac_reset() => mt76_worker_disable() => mt76u_stop_tx() => mt76_worker_disable()

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

-- UNIX is Sexy: who | grep -i blonde | talk; cd ~; wine; talk; touch; unzip; touch; strip; gasp; finger; gasp; mount; fsck; more; yes; gasp; umount; make clean; sleep

deren commented 8 months ago

There may be some misunderstanding.....

The original problem is caused by command timeout (for some unknown reason)

Nov 12 08:01:25 debian kernel: mt7921u 2-1:1.0: Message 00020003 (seq 1) timeout
Nov 12 08:01:29 debian kernel: mt7921u 2-1:1.0: timed out waiting for pending tx
Nov 12 08:01:29 debian kernel: ------------[ cut here ]------------
Nov 12 08:01:29 debian kernel: WARNING: CPU: 5 PID: 13035 at kernel/kthread.c:659 kthread_park+0x85/0xa0

I'm not sure what the real problem is and just try to fix another kthread_part() warning at this moment. So, the command timeout problem is still there. We need to figure out why the sta_rec update timeout.

Regards, Deren

bjlockie commented 8 months ago

What version of the kernel is it?

bjlockie commented 8 months ago

Old kernels don't work. What does lsusb say about the device?

bjlockie commented 8 months ago

I'm sure it can be made to work. https://github.com/morrownr/USB-WiFi/issues/218 OR it could be missing firmware. What does this command output? $ sudo dmesg | grep mt79

bjlockie commented 8 months ago

Is there a place to report Kali Linux bugs? Maybe the new firmware has a bug?

bjlockie commented 8 months ago

Does dmesg show the same probe failed error? If yes, try a different USB port.

7ERr0r commented 8 months ago
Bug: AP suicides when OnePlus 6 tries to connect. (edit: solved) ``` root@pi4b:~# ./hostapd -v hostapd v2.11-devel-hostap_2_10-1493-g30748d2b3 root@pi4b:~# uname -r 6.1.21-v8+ ``` ``` pi@pi4b:~$ sudo dmesg | grep mt79 [ 9.520814] usbcore: registered new interface driver mt7921u [ 9.527921] mt7921u 2-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a [ 9.803626] mt7921u 2-2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958 ``` ``` ubuntu@hp-wifi6-laptop$ iw dev wlo1 link Connected to ee:ee:ee:ee:ee:ee (on wlo1) SSID: MyOpenWifi freq: 5180 RX: 830886790 bytes (299752 packets) TX: 57511630 bytes (219993 packets) signal: -34 dBm rx bitrate: 1080.6 MBit/s 80MHz HE-MCS 10 HE-NSS 2 HE-GI 0 HE-DCM 0 tx bitrate: 1200.9 MBit/s 80MHz HE-MCS 11 HE-NSS 2 HE-GI 0 HE-DCM 0 bss flags: short-slot-time dtim period: 2 beacon int: 100 ubuntu@hp-wifi6-laptop$ lspci | grep Wireless 03:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8852AE 802.11ax PCIe Wireless Network Adapter ``` Scenario: - AP enabled in WiFi 6 ax mode - Fairphone 4 connects in WiFi 5 mode, Speedtest.net = 296 Mbit/s - HP Laptop connects in WiFi 6 mode, Speedtest.net = 386 Mbit/s - OnePlus 6 connects in WiFi 5 mode, no internet - AP instantly dies* *no client can connect afterwards and the problem gets fixed by restarting `hostapd` Before AP dies: ``` pi@pi4b:~ $ ip a s wlan1 4: wlan1: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff inet 192.168.19.1/24 brd 192.168.19.255 scope global noprefixroute wlan1 valid_lft forever preferred_lft forever inet6 xxxx/64 scope link valid_lft forever preferred_lft forever ``` After OnePlus 6 connects: ``` pi@pi4b:~ $ ip a s wlan1 4: wlan1: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff inet6 xxxx/64 scope link valid_lft forever preferred_lft forever ``` `dmesg` shows nothing: ``` [ 20.762039] tun: Universal TUN/TAP device driver, 1.6 [ 157.882419] IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready [ 827.126982] IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready ``` `hostapd` logs look the same for all 3 devices, but only OnePlus 6 fails. How to reproduce === Run Comfast CF-951AX in AP mode on Pi 4B. Then connect OnePlus 6 phone

Thanks for all the tutorials! WiFi 6 runs around 400 Mbit/s with all iptables firewalls enabled.

soyersoyer commented 8 months ago

I think this is the same problem: https://github.com/NetworkConfiguration/dhcpcd/issues/36 add noarp to your dhcpd.conf or disable mac randomization on your phone.

7ERr0r commented 8 months ago

I think this is the same problem: NetworkConfiguration/dhcpcd#36 add noarp to your dhcpd.conf or disable mac randomization on your phone.

Thank you! Works. Speedtest.net on OnePlus 6 is now 286 Mbit/s

fayaaz commented 8 months ago

I see the same issue once, after add internet access in the test.

Here is a quick patch for the crash problem. Can someone help to check with this patch? With the patch applied, the timeout issue may still show up but can be recovered after reset process.

diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index 5e5c7bf51174..becaca529e93 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -1009,7 +1009,8 @@ void mt76u_stop_tx(struct mt76_dev *dev)
                                usb_kill_urb(q->entry[j].urb);
                }

-               mt76_worker_disable(&dev->tx_worker);
+               if (!test_bit(MT76_MCU_RESET, &dev->phy.state))
+                       mt76_worker_disable(&dev->tx_worker);

                /* On device removal we maight queue skb's, but mt76u_tx_kick()
                 * will fail to submit urb, cleanup those skb's manually.
@@ -1026,7 +1027,8 @@ void mt76u_stop_tx(struct mt76_dev *dev)
                        }
                }

-               mt76_worker_enable(&dev->tx_worker);
+               if (!test_bit(MT76_MCU_RESET, &dev->phy.state))
+                       mt76_worker_enable(&dev->tx_worker);
        }

@deren I tried your patch on the 6.5 kernel and still get the error. Running on a Rpi4B, no other peripherals connected except ethernet cable. The adapter is the ALFA AWUS036AXML.

  280.542791] mt7921u 2-2:1.3: Message 00020002 (seq 10) timeout
[  280.822787] mt7921u 2-2:1.3: timed out waiting for pending tx

It only happens when using speedtest.net for some reason. It only happens when my motorola g31 is using the AP, but not when a device with a Wifi6 card (M2 macbook pro) connects and runs the speed test.

EDIT: with this patch the network comes back after 6-7 minutes

ov 20 16:41:59 pirouter hostapd[697]: wlx00c0cab3c3b3: STA 9a:01:96:33:9e:15 IEEE 802.11: disassociated due to inactivity
Nov 20 16:42:00 pirouter hostapd[697]: wlx00c0cab3c3b3: STA 9a:01:96:33:9e:15 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Nov 20 16:43:59 pirouter hostapd[697]: wlx00c0cab3c3b3: STA 9a:01:96:33:9e:15 IEEE 802.11: authenticated
Nov 20 16:43:59 pirouter hostapd[697]: wlx00c0cab3c3b3: STA 9a:01:96:33:9e:15 IEEE 802.11: associated (aid 1)
7ERr0r commented 8 months ago

Bug: Task hangs, probably when USB contact is lose.

Call trace:
 __switch_to+0xf8/0x1e0
 __schedule+0x2a8/0x830
 schedule+0x60/0x100
 schedule_preempt_disabled+0x20/0x38
 __mutex_lock.isra.17+0x3e4/0xa78
 __mutex_lock_slowpath+0x1c/0x28
 mutex_lock+0x3c/0x68
 mt7921_mac_work+0x3c/0xd0 [mt7921_common]
 process_one_work+0x208/0x480
 worker_thread+0x50/0x428
 kthread+0xfc/0x110
 ret_from_fork+0x10/0x20
Logs ``` Nov 21 19:50:15 pi4irdm7 rngd[490]: stats: Time spent starving for entropy: (min=0; avg=0.000; max=0)us Nov 21 20:17:01 pi4irdm7 CRON[681702]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Nov 21 20:28:32 pi4irdm7 kernel: [185907.577058] mt7921u 2-2:1.3: vendor request req:63 off:d7f0 failed:-71 Nov 21 20:28:32 pi4irdm7 kernel: [185907.671235] mt7921u 2-2:1.3: vendor request req:63 off:d7e4 failed:-71 Nov 21 20:28:32 pi4irdm7 kernel: [185907.774038] mt7921u 2-2:1.3: vendor request req:63 off:d7f4 failed:-71 Nov 21 20:28:33 pi4irdm7 kernel: [185907.878005] mt7921u 2-2:1.3: vendor request req:63 off:d7e8 failed:-71 ... Nov 21 20:28:33 pi4irdm7 kernel: [185908.724166] mt7921u 2-2:1.3: vendor request req:63 off:53c4 failed:-71 Nov 21 20:28:36 pi4irdm7 kernel: [185911.615296] mt7921u 2-2:1.3: vendor request req:66 off:53c4 failed:-110 Nov 21 20:28:39 pi4irdm7 kernel: [185914.815384] mt7921u 2-2:1.3: vendor request req:63 off:d02c failed:-110 Nov 21 20:28:43 pi4irdm7 kernel: [185918.015420] mt7921u 2-2:1.3: vendor request req:63 off:d054 failed:-110 ... Nov 21 20:33:29 pi4irdm7 kernel: [186204.099457] mt7921u 2-2:1.3: vendor request req:63 off:d7e8 failed:-110 Nov 21 20:33:32 pi4irdm7 kernel: [186207.299507] mt7921u 2-2:1.3: vendor request req:63 off:d7f8 failed:-110 Nov 21 20:33:36 pi4irdm7 hostapd: wlan1: STA xx:xx:xx:xx:xx:xx IEEE 802.11: disassociated due to inactivity Nov 21 20:33:36 pi4irdm7 hostapd: wlan1: STA yy:yy:yy:yy:yy:yy IEEE 802.11: disassociated due to inactivity Nov 21 20:33:36 pi4irdm7 kernel: [186211.523587] mt7921u 2-2:1.3: vendor request req:63 off:d02c failed:-110 Nov 21 20:33:37 pi4irdm7 hostapd: wlan1: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE) Nov 21 20:33:39 pi4irdm7 kernel: [186214.723630] mt7921u 2-2:1.3: vendor request req:63 off:d054 failed:-110 ... Nov 21 20:39:32 pi4irdm7 kernel: [186567.880541] mt7921u 2-2:1.3: vendor request req:63 off:4230 failed:-110 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309660] INFO: task kworker/u8:0:676002 blocked for more than 120 seconds. Nov 21 20:39:33 pi4irdm7 kernel: [186568.309703] Tainted: G C 6.1.21-v8+ #1642 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 21 20:39:33 pi4irdm7 kernel: [186568.309729] task:kworker/u8:0 state:D stack:0 pid:676002 ppid:2 flags:0x00000008 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309766] Workqueue: phy1 mt7921_mac_work [mt7921_common] Nov 21 20:39:33 pi4irdm7 kernel: [186568.309839] Call trace: Nov 21 20:39:33 pi4irdm7 kernel: [186568.309849] __switch_to+0xf8/0x1e0 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309885] __schedule+0x2a8/0x830 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309907] schedule+0x60/0x100 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309928] schedule_preempt_disabled+0x20/0x38 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309950] __mutex_lock.isra.17+0x3e4/0xa78 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309966] __mutex_lock_slowpath+0x1c/0x28 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309982] mutex_lock+0x3c/0x68 Nov 21 20:39:33 pi4irdm7 kernel: [186568.309997] mt7921_mac_work+0x3c/0xd0 [mt7921_common] Nov 21 20:39:33 pi4irdm7 kernel: [186568.310040] process_one_work+0x208/0x480 Nov 21 20:39:33 pi4irdm7 kernel: [186568.310064] worker_thread+0x50/0x428 Nov 21 20:39:33 pi4irdm7 kernel: [186568.310084] kthread+0xfc/0x110 Nov 21 20:39:33 pi4irdm7 kernel: [186568.310102] ret_from_fork+0x10/0x20 Nov 21 20:39:36 pi4irdm7 kernel: [186571.080647] mt7921u 2-2:1.3: vendor request req:63 off:4230 failed:-110 ... Nov 21 20:41:31 pi4irdm7 kernel: [186686.410211] mt7921u 2-2:1.3: vendor request req:63 off:4230 failed:-110 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143249] INFO: task kworker/u8:0:676002 blocked for more than 241 seconds. Nov 21 20:41:34 pi4irdm7 kernel: [186689.143280] Tainted: G C 6.1.21-v8+ #1642 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 21 20:41:34 pi4irdm7 kernel: [186689.143300] task:kworker/u8:0 state:D stack:0 pid:676002 ppid:2 flags:0x00000008 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143326] Workqueue: phy1 mt7921_mac_work [mt7921_common] Nov 21 20:41:34 pi4irdm7 kernel: [186689.143380] Call trace: Nov 21 20:41:34 pi4irdm7 kernel: [186689.143389] __switch_to+0xf8/0x1e0 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143415] __schedule+0x2a8/0x830 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143432] schedule+0x60/0x100 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143448] schedule_preempt_disabled+0x20/0x38 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143465] __mutex_lock.isra.17+0x3e4/0xa78 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143477] __mutex_lock_slowpath+0x1c/0x28 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143488] mutex_lock+0x3c/0x68 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143499] mt7921_mac_work+0x3c/0xd0 [mt7921_common] Nov 21 20:41:34 pi4irdm7 kernel: [186689.143531] process_one_work+0x208/0x480 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143551] worker_thread+0x50/0x428 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143565] kthread+0xfc/0x110 Nov 21 20:41:34 pi4irdm7 kernel: [186689.143579] ret_from_fork+0x10/0x20 Nov 21 20:41:34 pi4irdm7 kernel: [186689.610250] mt7921u 2-2:1.3: vendor request req:63 off:4230 failed:-110 ... Nov 21 21:37:13 pi4irdm7 kernel: [190028.025345] mt7921u 2-2:1.3: vendor request req:66 off:53c4 failed:-110 Nov 21 21:37:13 pi4irdm7 hostapd: wlan1: STA zz:zz:zz:zz:zz:zz IEEE 802.11: disassociated due to inactivity Nov 21 21:37:14 pi4irdm7 hostapd: wlan1: STA zz:zz:zz:zz:zz:zz IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE) Nov 21 21:37:16 pi4irdm7 kernel: [190031.481415] mt7921u 2-2:1.3: vendor request req:63 off:d02c failed:-110 ```
fayaaz commented 8 months ago

@7ERr0r are you using a version 6.1 kernel? I had the same problem until I moved to a 6.5 kernel.

Your kernel version says 6.1.21-v8+ so I assume you're on a RPI - you can build a more recent kernel - I followed https://www.raspberrypi.com/documentation/computers/linux_kernel.html and checked out the 6.5 branch.

I still get the hang and AP crashing in 5 Ghz mode once the speed ramps up doing a speedtest.

morrownr commented 8 months ago

@7ERr0r

I have a guide to installing a more recent kernel in the RasPiOS here on this site:

Go to the Main Menu and look under menu item 9

How to Compile and Install New PasPiOS Kernels

It is pretty easy.