morrownr / USB-WiFi

USB WiFi Adapter Information for Linux
2.58k stars 171 forks source link

List of Bug Reports for the mt7921au chipset / mt7921u driver... #107

Open morrownr opened 2 years ago

morrownr commented 2 years ago

This issue is for maintaining a list of problematic issues that need work. This list will be maintained and updated in this first post by @morrownr . Please add posts to this issue as you have updated information for the existing BUGs in the list or if you have information about a new BUG. Thank you.

Dear Mediatek devs... help is appreciated.


Bug: (2024-04-18) See: https://github.com/morrownr/USB-WiFi/issues/392 . WDS/4addr not supported in AP mode. First reported with Alfa AXML adapter that uses the mt7921au chipset and mt7921u driver). The OP is unable to use WDS/4addr in AP mode.

Status: Open

Info: It was reported that this capability does work with an adapter that uses the mt7612u chipset/driver.


Bug: (2024-03-26) See: https://github.com/morrownr/USB-WiFi/issues/378 Wifi adapter not showing up. First reported with Alfa AXML adapter that uses the mt7921au chipset and mt7921u driver). The adapter is non-functional until using the workaround below.

Status: Open

Workaround: the workaround is to run modprobe -r btusb first, then plug in the usb wifi adapter.

More input is needed. Is this a problem with btusb?


Bug: (2023-12-22) Many Linux distros are detecting Bluetooth capability in mt7921au based adapters but none of the adapters on the market have Bluetooth turned on so it won't work. Linux should not be detecting Bluetooth capability when it is actually not available.

Status: Open and ongoing

Here is a link to a location where you can get a copy of the Intel White Paper that explains the details of why USB3 capable WiFi adapters should not have Bluetooth capability turned on:

https://www.usb.org/document-library/usb-30-radio-frequency-interference-impact-24-ghz-wireless-devices

USB3 WiFi adapters should not have Bluetooth turned on as the USB3 will cause interference with Bluetooth. If makers decide they really want Bluetooth capability in an adapter then they need to limit wifi to USB2 capability. All adapters with the mt7921au chipset that I am aware of have Bluetooth turned off so WiFi can operate in USB3 mode. However, there is a bug in that Bluetooth capability is still being detected by Linux distros and the driver/firmware is loading. Systems act like Bluetooth is available but when you try to use the Bluetooth, it won't work. It is not clear to me how this can be fixed but it really does need to be fixed.

This is not a problem with PCIe cards. I have a mt7922 based PCIe card. Wifi and Bluetooth work well together because wifi uses the PCIe bus and not USB. Please understand that issue in this bug is not exclusive to this chipset. This is an issue will all USB WiFi adapters. The adapters that have USB wifi capability and BT capabilities over the years have limited USB to USB2 to avoid the problem of interference.


Bug: (2023-12-07) Active monitor mode breaks driver.

Status: open

Reporter: @ZerBea Link: https://github.com/openwrt/mt76/issues/839 Problem: Using Active Monitor mode breaks the driver

Driver reports that active monitor mode is possible:

$ iw list | grep active Device supports active monitor (which will ACK incoming frames)

But if hcxdumptool set active monitor mode, it stops working.

If active monitor mode is disabled, everything's fine

0 ERROR(s) during runtime 638 Packet(s) captured by kernel 0 Packet(s) dropped by kernel 1 SHB written to pcapng dumpfile 1 IDB written to pcapng dumpfile 1 ECB written to pcapng dumpfile 83 EPB written to pcapng dumpfile

exit on sigterm I don't think the problem is related to hcxdumptool, because it can be reproduced with iw, ip link and tshark, too:

$ sudo ip link set wlp22s0f0u4i3 down $ sudo iw dev wlp22s0f0u4i3 set type monitor $ sudo ip link set wlp22s0f0u4i3 up $ tsahrk -i wlp22s0f0u4i3 22 packets captured

$ sudo ip link set wlp22s0f0u4i3 down $ sudo iw dev wlp22s0f0u4i3 set monitor active $ sudo ip link set wlp22s0f0u4i3 up $ tshark -i wlp22s0f0u4i3 Capturing on 'wlp22s0f0u4i3' ^C 0 packets captured

Background: Running active monitor mode, the device ACK incoming frames addressed to the virtual MAC of the device. This feature is really useful to perform PMKID attacks. At the moment, active monitor mode is working on:

mt76x0u mt76x2u

It is not working on:

mt7601u mt7921u

I see two options: active monitor mode should be fixed, or active monitor mode capability should not be reported by the driver

mt7601u $ iw list | grep active Device supports active monitor (which will ACK incoming frames)

mt7921u $ iw list | grep active Device supports active monitor (which will ACK incoming frames)


Bug: LED does not function in several of the usb wifi adapters that use the mt7921au chipset.

Status: open, it is unclear what the problem is.

Reported by @morrownr Confirmed by numerous users.


Bug: AP Mode DFS (5 GHz) support is non-functional Status: open

Reported by @morrownr Confirmed by numerous users.

This is really a serious omission in that in many places in the world there are limited non-DFS channels available leading to high levels of congestion.

Dear Mediatek, does your usb chipset competitor support DFS channels in AP Mode? Yes they do. See: out-of-kernel drivers for rtl8812au, rtl8811au, rtl8812bu and rtl8811cu. You need to think about this. Sincerely.


Bug: txpower reading is showing as unusually low as in 3 dBm using iw. Status: open

Reported by several individuals.

This reading must be wrong because actual usage suggests the reading should be much higher.


Bug: (feature request) mt7921u driver does not support 2 interfaces of AP mode on one adapter Status: open

Reported by @whitslack

mt7921u driver does not support 2 instances of AP mode whereas this was common on some drivers for older adapters.

Now:

valid interface combinations:

     * #{ managed, P2P-client } <= 2, #{ AP, P2P-GO } <= 1,
       total <= 2, #channels <= 2

What we want:

valid interface combinations:

     * #{ managed, P2P-client } <= 2, #{ AP, P2P-GO } <= 2,
       total <= 2, #channels <= 2

Bug: connection is dropped and the only way to correct the situation is to reboot (AP mode) Status: open

Testing to see if SG helps performance:

scatter-gather test with mt7921au based adapter

Issue: connection drops and the only resolution is to reboot the system.

Raspberry Pi 4B RasPiOS 2023-05-03

I changed the modulate parameter and rebooted between each test so as to alternate on and off.

iperf3 -c 192.168.1.1 -t 300

scatter-gather off (disable_usb_sg=1)

1:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-300.00 sec  19.9 GBytes   569 Mbits/sec    4             sender
[  5]   0.00-300.01 sec  19.9 GBytes   569 Mbits/sec                  receiver

2: 
[  5]   0.00-300.00 sec  19.9 GBytes   570 Mbits/sec    5             sender
[  5]   0.00-300.01 sec  19.9 GBytes   570 Mbits/sec                  receiver

3:
[  5]   0.00-300.00 sec  20.0 GBytes   573 Mbits/sec    2             sender
[  5]   0.00-300.01 sec  20.0 GBytes   573 Mbits/sec                  receiver

scatter-gather on (disable_usb_sg=0)

1:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-300.00 sec  19.9 GBytes   570 Mbits/sec    1             sender
[  5]   0.00-300.01 sec  19.9 GBytes   570 Mbits/sec                  receiver

2:
[  5]   0.00-300.00 sec  20.0 GBytes   572 Mbits/sec   48             sender
[  5]   0.00-300.01 sec  20.0 GBytes   572 Mbits/sec                  receiver

3.
[  5]   0.00-300.00 sec  19.9 GBytes   571 Mbits/sec    0             sender
[  5]   0.00-300.02 sec  19.9 GBytes   571 Mbits/sec                  receiver

Observation: So much for needing to average the results. I was careful to check that sg was on or off. I have no explanation for how the results could be so close. I see no evidence that sg is providing any performance increase.

Previous to this testing session, I have been able to see the issue of the connection being dropped and only a reboot will connect the situation. It happened twice a few days ago while testing with sg on. There is a history of this with mt7612u adapters. I have yet to duplicate the issue with sg off.

Conclusion: Further testing on different platforms is needed. I will test x86_64 next. Given the history of sg causing problems such as connections dropping that can only be corrected with a reboot, it may be better for the default to be disable_usb_sg=1 with a follow up to determine what the problem is.


lr1729 commented 9 months ago

I see the same issue once, after add internet access in the test.

Here is a quick patch for the crash problem. Can someone help to check with this patch? With the patch applied, the timeout issue may still show up but can be recovered after reset process.

diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index 5e5c7bf51174..becaca529e93 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -1009,7 +1009,8 @@ void mt76u_stop_tx(struct mt76_dev *dev)
                                usb_kill_urb(q->entry[j].urb);
                }

-               mt76_worker_disable(&dev->tx_worker);
+               if (!test_bit(MT76_MCU_RESET, &dev->phy.state))
+                       mt76_worker_disable(&dev->tx_worker);

                /* On device removal we maight queue skb's, but mt76u_tx_kick()
                 * will fail to submit urb, cleanup those skb's manually.
@@ -1026,7 +1027,8 @@ void mt76u_stop_tx(struct mt76_dev *dev)
                        }
                }

-               mt76_worker_enable(&dev->tx_worker);
+               if (!test_bit(MT76_MCU_RESET, &dev->phy.state))
+                       mt76_worker_enable(&dev->tx_worker);
        }

The patch does prevent the driver crash from happening, though the sequence timeout error still occurs. The access point does appear to recover after a few minutes, though sometimes hostapd has trouble restarting after the error.

7ERr0r commented 9 months ago

Wifi works on rpi-6.7.y kernel, but SSH is unresponsive via a WiFi 6 laptop. https://asciinema.org/a/T2Z5ptWGQugL5RsV0iO1JlDTD

[Spoiler] My previous comment rpi-6.5.y === > you can build a more recent kernel [...] and checked out the 6.5 branch. Well now it's worse. SSH is very unresponsive on `rpi-6.5.y` kernel via WiFi (Ethernet works well). Speedtest.net for WiFi 6 laptop is 168 Mbit/s iperf3 around 170 Mbit/s rpi-6.7.y === Kernel `rpi-6.7.y` works better. But still, ssh is unresponsive on WiFi - like 200+ ms ping. Using ssh and pinging in the background makes it a lot better. `ping 192.168.19.1 -i 0.02` Looks like some QoS or packet grouping. Stats: Speedtest.net for WiFi 6 laptop is 418 Mbit/s on kernel `rpi-6.7.y` iperf3 is 680 Mbit/s
[Spoiler] Logs Logs rpi-6.5.y === ``` pi@pi4irdm7:~ $ ethtool -i wlan1 driver: mt7921u version: 6.5.12-v8-MW_CUSTOM_KERNEL+ firmware-version: ____010000-20230117170942 ``` From a Wifi 6 laptop: ``` # Ping fluctuates - should be around 1 ms... 64 bytes from pi4irdm7: icmp_seq=813 ttl=64 time=58.5 ms 64 bytes from pi4irdm7: icmp_seq=814 ttl=64 time=80.6 ms 64 bytes from pi4irdm7: icmp_seq=815 ttl=64 time=103 ms 64 bytes from pi4irdm7: icmp_seq=816 ttl=64 time=22.7 ms ubuntu@hp_laptop:~$ iw dev wlo1 link Connected to xx:xx:xx:xx:xx:xx (on wlo1) SSID: MyWifi freq: 5180 RX: 31283791 bytes (192594 packets) TX: 1135154383 bytes (632964 packets) signal: -27 dBm rx bitrate: 600.4 MBit/s 80MHz HE-MCS 11 HE-NSS 1 HE-GI 0 HE-DCM 0 tx bitrate: 1200.9 MBit/s 80MHz HE-MCS 11 HE-NSS 2 HE-GI 0 HE-DCM 0 bss flags: short-slot-time dtim period: 2 beacon int: 100 ubuntu@hp_laptop:~$ iperf3 -c 192.168.19.1 Connecting to host 192.168.19.1, port 5201 [ 5] local 192.168.19.55 port 56452 connected to 192.168.19.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 24.2 MBytes 203 Mbits/sec 0 632 KBytes [ 5] 1.00-2.00 sec 21.2 MBytes 178 Mbits/sec 0 814 KBytes [ 5] 2.00-3.00 sec 22.5 MBytes 189 Mbits/sec 0 895 KBytes [ 5] 3.00-4.00 sec 22.5 MBytes 189 Mbits/sec 0 895 KBytes [ 5] 4.00-5.00 sec 21.2 MBytes 178 Mbits/sec 0 895 KBytes [ 5] 5.00-6.00 sec 22.5 MBytes 189 Mbits/sec 0 895 KBytes [ 5] 6.00-7.00 sec 20.0 MBytes 168 Mbits/sec 0 987 KBytes [ 5] 7.00-8.00 sec 20.0 MBytes 168 Mbits/sec 0 1.01 MBytes [ 5] 8.00-9.00 sec 22.5 MBytes 189 Mbits/sec 0 1.01 MBytes [ 5] 9.00-10.00 sec 21.2 MBytes 178 Mbits/sec 0 1.01 MBytes ``` Logs rpi-6.7.y === ``` pi@pi4irdm7:~ $ ethtool -i wlan1 driver: mt7921u version: 6.7.0-rc2-v8-rpi-6.7.y-48e386b+ firmware-version: ____010000-20230117170942 # Tested with 2023.05.26 too pi@pi4irdm7:~ $ ethtool -i wlan1 driver: mt7921u version: 6.7.0-rc2-v8-rpi-6.7.y-48e386b+ firmware-version: ____010000-20230526130958 $ iperf3 -c 192.168.19.1 Connecting to host 192.168.19.1, port 5201 [ 5] local 192.168.19.55 port 60188 connected to 192.168.19.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 85.2 MBytes 715 Mbits/sec 0 1.92 MBytes [ 5] 1.00-2.00 sec 62.5 MBytes 524 Mbits/sec 0 2.05 MBytes [ 5] 2.00-3.00 sec 83.8 MBytes 703 Mbits/sec 0 2.05 MBytes [ 5] 3.00-4.00 sec 80.0 MBytes 671 Mbits/sec 30 1.51 MBytes [ 5] 4.00-5.00 sec 85.0 MBytes 713 Mbits/sec 0 1.64 MBytes [ 5] 5.00-6.00 sec 77.5 MBytes 650 Mbits/sec 30 1.22 MBytes [ 5] 6.00-7.00 sec 86.2 MBytes 724 Mbits/sec 0 1.29 MBytes [ 5] 7.00-8.00 sec 80.0 MBytes 671 Mbits/sec 22 987 KBytes [ 5] 8.00-9.00 sec 85.0 MBytes 713 Mbits/sec 0 1.03 MBytes [ 5] 9.00-10.00 sec 85.0 MBytes 713 Mbits/sec 0 1.07 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 810 MBytes 680 Mbits/sec 82 sender [ 5] 0.00-10.05 sec 807 MBytes 674 Mbits/sec receiver iperf Done. # Sometimes it's perfect: $ ping 192.168.19.1 -i 0.02 64 bytes from 192.168.19.1: icmp_seq=22080 ttl=64 time=0.985 ms 64 bytes from 192.168.19.1: icmp_seq=22081 ttl=64 time=0.884 ms 64 bytes from 192.168.19.1: icmp_seq=22082 ttl=64 time=0.892 ms 64 bytes from 192.168.19.1: icmp_seq=22083 ttl=64 time=0.996 ms 64 bytes from 192.168.19.1: icmp_seq=22084 ttl=64 time=0.883 ms 64 bytes from 192.168.19.1: icmp_seq=22085 ttl=64 time=0.880 ms 64 bytes from 192.168.19.1: icmp_seq=22086 ttl=64 time=0.885 ms 64 bytes from 192.168.19.1: icmp_seq=22087 ttl=64 time=0.882 ms 64 bytes from 192.168.19.1: icmp_seq=22088 ttl=64 time=0.882 ms # Sometimes a second of waiting... 4 bytes from 192.168.19.1: icmp_seq=699 ttl=64 time=3.99 ms 64 bytes from 192.168.19.1: icmp_seq=700 ttl=64 time=0.992 ms 64 bytes from 192.168.19.1: icmp_seq=701 ttl=64 time=1.01 ms 64 bytes from 192.168.19.1: icmp_seq=702 ttl=64 time=1.02 ms 64 bytes from 192.168.19.1: icmp_seq=703 ttl=64 time=366 ms 64 bytes from 192.168.19.1: icmp_seq=704 ttl=64 time=342 ms 64 bytes from 192.168.19.1: icmp_seq=705 ttl=64 time=318 ms 64 bytes from 192.168.19.1: icmp_seq=706 ttl=64 time=294 ms 64 bytes from 192.168.19.1: icmp_seq=707 ttl=64 time=270 ms 64 bytes from 192.168.19.1: icmp_seq=708 ttl=64 time=246 ms 64 bytes from 192.168.19.1: icmp_seq=709 ttl=64 time=222 ms 64 bytes from 192.168.19.1: icmp_seq=710 ttl=64 time=198 ms 64 bytes from 192.168.19.1: icmp_seq=711 ttl=64 time=174 ms 64 bytes from 192.168.19.1: icmp_seq=712 ttl=64 time=150 ms 64 bytes from 192.168.19.1: icmp_seq=713 ttl=64 time=126 ms 64 bytes from 192.168.19.1: icmp_seq=714 ttl=64 time=102 ms 64 bytes from 192.168.19.1: icmp_seq=715 ttl=64 time=78.2 ms 64 bytes from 192.168.19.1: icmp_seq=716 ttl=64 time=54.1 ms 64 bytes from 192.168.19.1: icmp_seq=717 ttl=64 time=100 ms 64 bytes from 192.168.19.1: icmp_seq=718 ttl=64 time=76.0 ms 64 bytes from 192.168.19.1: icmp_seq=719 ttl=64 time=55.4 ms 64 bytes from 192.168.19.1: icmp_seq=720 ttl=64 time=32.0 ms 64 bytes from 192.168.19.1: icmp_seq=721 ttl=64 time=7.77 ms 64 bytes from 192.168.19.1: icmp_seq=722 ttl=64 time=0.957 ms 64 bytes from 192.168.19.1: icmp_seq=723 ttl=64 time=1.01 ms 64 bytes from 192.168.19.1: icmp_seq=724 ttl=64 time=42.2 ms 64 bytes from 192.168.19.1: icmp_seq=725 ttl=64 time=18.3 ms 64 bytes from 192.168.19.1: icmp_seq=726 ttl=64 time=0.921 ms 64 bytes from 192.168.19.1: icmp_seq=727 ttl=64 time=1.03 ms 64 bytes from 192.168.19.1: icmp_seq=728 ttl=64 time=876 ms 64 bytes from 192.168.19.1: icmp_seq=729 ttl=64 time=852 ms 64 bytes from 192.168.19.1: icmp_seq=730 ttl=64 time=824 ms 64 bytes from 192.168.19.1: icmp_seq=731 ttl=64 time=800 ms 64 bytes from 192.168.19.1: icmp_seq=732 ttl=64 time=772 ms ```
fhteagle commented 9 months ago

SSH unresponsive might be the hardware random number generator bug. I had to downgrade kernels on one of my Pis because it was closing due to broken pipe after 1-5 keystrokes or so. That was a PITA for sure.

7ERr0r commented 9 months ago

Well SSH is unresponsive only on WiFi. Pinging AP with 10ms interval fixes the lagging issue.

Bug report

AP freezes randomly - bug no. 6 ? Once when watching YouTube and doing speedtest.net I had ping -i 0.1 in the background

[Spoiler] Case 1: regular usage dmesg ```dmesg [ 9.373765] mt7921u 2-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a [ 9.632681] mt7921u 2-2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958 [ 16.535364] eth0: renamed from vethcf0cfbe [ 4393.743478] mt7921u 2-2:1.3: rx urb failed: -75 [ 4393.846463] mt7921u 2-2:1.3: vendor request req:63 off:8178 failed:-71 [ 4393.952872] mt7921u 2-2:1.3: vendor request req:63 off:d02c failed:-71 [ 4394.058569] mt7921u 2-2:1.3: vendor request req:63 off:d054 failed:-71 [ 4394.164201] mt7921u 2-2:1.3: vendor request req:63 off:d058 failed:-71 ... [ 4398.980800] mt7921u 2-2:1.3: vendor request req:63 off:53b8 failed:-71 [ 4399.068815] mt7921u 2-2:1.3: vendor request req:63 off:53c4 failed:-71 [ 4402.204836] mt7921u 2-2:1.3: Message 00020002 (seq 15) timeout [ 4402.311823] mt7921u 2-2:1.3: vendor request req:63 off:d02c failed:-71 [ 4402.419033] mt7921u 2-2:1.3: vendor request req:63 off:d054 failed:-71 ... [ 4681.869279] mt7921u 2-2:1.3: vendor request req:01 off:0a20 failed:-110 [ 4685.069094] mt7921u 2-2:1.3: vendor request req:04 off:0001 failed:-110 [ 4685.069154] mt7921u 2-2:1.3: chip reset failed [ 4688.163096] mt7921u 2-2:1.3: Message 00020001 (seq 1) timeout [ 4690.212026] ------------[ cut here ]------------ [ 4690.212050] WARNING: CPU: 0 PID: 12235 at net/wireless/util.c:1457 cfg80211_calculate_bitrate_he+0x25c/0x2c0 [cfg80211] [ 4690.212333] Modules linked in: tun xt_nat xt_tcpudp veth nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge nft_chain_nat xt_MASQUERADE nf_nat xt_state xt_conntrack nf_conntrack nf_defra g_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep overlay 8021q garp stp llc mt7921u mt7921_common mt792x_usb mt792x_lib mt76_connac_lib mt76_u sb mt76 mac80211 btusb btrtl libarc4 btintel brcmfmac_wcc vc4 snd_soc_hdmi_codec drm_display_helper brcmfmac cec hci_uart drm_dma_helper brcmutil drm_kms_helper bcm2835_isp(C) bcm2835_codec(C) bcm2835_v4l2(C) rp ivid_hevc(C) btbcm bcm2835_mmal_vchiq(C) bluetooth v4l2_mem2mem snd_soc_core videobuf2_vmalloc videobuf2_dma_contig cfg80211 videobuf2_memops videobuf2_v4l2 v3d snd_compress videodev gpu_sched snd_pcm_dmaengine snd_bcm2835(C) drm_shmem_helper raspberrypi_hwmon ecdh_generic ecc i2c_brcmstb snd_pcm rfkill libaes snd_timer vc_sm_cma(C) videobuf2_common snd mc raspberrypi_gpiomem uio_pdrv_genirq uio [ 4690.212746] nvmem_rmem drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6 [ 4690.212792] CPU: 0 PID: 12235 Comm: hostapd Tainted: G C 6.7.0-rc2-v8-rpi-6.7.y-48e386b+ #1 [ 4690.212807] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT) [ 4690.212814] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4690.212827] pc : cfg80211_calculate_bitrate_he+0x25c/0x2c0 [cfg80211] [ 4690.213011] lr : cfg80211_calculate_bitrate+0x230/0x648 [cfg80211] [ 4690.213188] sp : ffffffc081f33540 [ 4690.213195] x29: ffffffc081f33540 x28: ffffff8100813e00 x27: ffffff810001f080 [ 4690.213218] x26: ffffff810a370000 x25: ffffff810a3703c0 x24: ffffff81082fd000 [ 4690.213239] x23: 00000000000000d0 x22: ffffff81082fd0d0 x21: 0000000000000000 [ 4690.213259] x20: ffffffc081f33780 x19: ffffff8109f8a100 x18: 0cea122a0dac8927 [ 4690.213280] x17: 000000000b9f76c0 x16: 0675091506d65a47 x15: 0000000005cfbb60 [ 4690.213300] x14: 0240717102625a00 x13: 000000000206cc80 x12: 010f4471011f2ba0 [ 4690.213320] x11: 0000000000f42400 x10: 0087a238008f9a27 x9 : 000063ff00008556 [ 4690.213341] x8 : 0000c80100018fff x7 : 000027ff00002c71 x6 : 00003201000042ab [ 4690.213361] x5 : 0000000000000003 x4 : 000000000000000b x3 : 0000180000001aac [ 4690.213380] x2 : 00001e0000002154 x1 : 000014000000163a x0 : ffffffc081f33780 [ 4690.213401] Call trace: [ 4690.213407] cfg80211_calculate_bitrate_he+0x25c/0x2c0 [cfg80211] [ 4690.213586] cfg80211_calculate_bitrate+0x230/0x648 [cfg80211] [ 4690.213763] nl80211_put_sta_rate+0x60/0x468 [cfg80211] [ 4690.213941] nl80211_send_station+0x73c/0xc98 [cfg80211] [ 4690.214117] nl80211_get_station+0xec/0x168 [cfg80211] [ 4690.214294] genl_family_rcv_msg_doit+0xc8/0x138 [ 4690.214321] genl_rcv_msg+0x1ec/0x280 [ 4690.214330] netlink_rcv_skb+0x64/0x150 [ 4690.214345] genl_rcv+0x40/0x60 [ 4690.214361] netlink_unicast+0x27c/0x350 [ 4690.214376] netlink_sendmsg+0x1cc/0x448 [ 4690.214391] __sock_sendmsg+0x64/0xc0 [ 4690.214409] ____sys_sendmsg+0x268/0x2a8 [ 4690.214418] ___sys_sendmsg+0x88/0xf0 [ 4690.214429] __sys_sendmsg+0x70/0xd8 [ 4690.214439] __arm64_sys_sendmsg+0x2c/0x40 [ 4690.214450] invoke_syscall+0x50/0x128 [ 4690.214469] el0_svc_common.constprop.0+0x48/0xf0 [ 4690.214485] do_el0_svc+0x24/0x38 [ 4690.214499] el0_svc+0x3c/0xd0 [ 4690.214514] el0t_64_sync_handler+0xc0/0xc8 [ 4690.214527] el0t_64_sync+0x190/0x198 [ 4690.214538] ---[ end trace 0000000000000000 ]--- [ 4693.487168] mt7921u 2-2:1.3: timed out waiting for pending tx [ 4693.555458] ------------[ cut here ]------------ [ 4693.555474] WARNING: CPU: 0 PID: 11 at kernel/kthread.c:659 kthread_park+0xc4/0xe0 [ 4693.555511] Modules linked in: tun xt_nat xt_tcpudp veth nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge nft_chain_nat xt_MASQUERADE nf_nat xt_state xt_conntrack nf_conntrack nf_defra g_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep overlay 8021q garp stp llc mt7921u mt7921_common mt792x_usb mt792x_lib mt76_connac_lib mt76_u sb mt76 mac80211 btusb btrtl libarc4 btintel brcmfmac_wcc vc4 snd_soc_hdmi_codec drm_display_helper brcmfmac cec hci_uart drm_dma_helper brcmutil drm_kms_helper bcm2835_isp(C) bcm2835_codec(C) bcm2835_v4l2(C) rp ivid_hevc(C) btbcm bcm2835_mmal_vchiq(C) bluetooth v4l2_mem2mem snd_soc_core videobuf2_vmalloc videobuf2_dma_contig cfg80211 videobuf2_memops videobuf2_v4l2 v3d snd_compress videodev gpu_sched snd_pcm_dmaengine snd_bcm2835(C) drm_shmem_helper raspberrypi_hwmon ecdh_generic ecc i2c_brcmstb snd_pcm rfkill libaes snd_timer vc_sm_cma(C) videobuf2_common snd mc raspberrypi_gpiomem uio_pdrv_genirq uio [ 4693.555940] nvmem_rmem drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6 [ 4693.555987] CPU: 0 PID: 11 Comm: kworker/u8:0 Tainted: G WC 6.7.0-rc2-v8-rpi-6.7.y-48e386b+ #1 [ 4693.556001] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT) [ 4693.556010] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common] [ 4693.556059] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4693.556072] pc : kthread_park+0xc4/0xe0 [ 4693.556088] lr : mt76u_stop_tx+0x26c/0x378 [mt76_usb] [ 4693.556125] sp : ffffffc08007bc40 [ 4693.556132] x29: ffffffc08007bc40 x28: ffffffd5a7a369c0 x27: ffffff810a379c78 [ 4693.556155] x26: ffffff810a371fb8 x25: 0000000000000000 x24: ffffff810a374760 [ 4693.556177] x23: ffffff810a377360 x22: ffffff810a371fb8 x21: ffffff810a371f98 [ 4693.556197] x20: ffffff8109c13e00 x19: ffffff810aa23e00 x18: 0000000000000000 [ 4693.556218] x17: 0000000000000001 x16: ffffffd5a64bd280 x15: 001b4e8187841c08 [ 4693.556239] x14: 001b4e73d450541c x13: 000000000000038b x12: 00000000fa83b2da [ 4693.556260] x11: 0000000000000002 x10: 0000000000001a90 x9 : ffffffd57111d9f4 [ 4693.556282] x8 : ffffff81002458f0 x7 : 0000000000000001 x6 : 0000000000000000 [ 4693.556302] x5 : ffffffd5a7a3e000 x4 : ffffffd5a7a3e0a8 x3 : 0000000000002800 [ 4693.556322] x2 : 0000000000000000 x1 : 0000000000001fe0 x0 : 0000000000000004 [ 4693.556343] Call trace: [ 4693.556349] kthread_park+0xc4/0xe0 [ 4693.556365] mt76u_stop_tx+0x26c/0x378 [mt76_usb] [ 4693.556391] mt7921u_mac_reset+0x8c/0x298 [mt7921u] [ 4693.556413] mt7921_mac_reset_work+0xac/0x1b8 [mt7921_common] [ 4693.556443] process_one_work+0x148/0x390 [ 4693.556455] worker_thread+0x338/0x450 [ 4693.556466] kthread+0x120/0x130 [ 4693.556481] ret_from_fork+0x10/0x20 [ 4693.556496] ---[ end trace 0000000000000000 ]--- [ 4696.749630] mt7921u 2-2:1.3: vendor request req:01 off:1890 failed:-110 ``` Last moments before death: ``` ubuntu@hp144:~$ ping 192.168.19.1 -i 0.01 64 bytes from 192.168.19.1: icmp_seq=21310 ttl=64 time=1.74 ms 64 bytes from 192.168.19.1: icmp_seq=21311 ttl=64 time=1.68 ms 64 bytes from 192.168.19.1: icmp_seq=21312 ttl=64 time=1.99 ms 64 bytes from 192.168.19.1: icmp_seq=21313 ttl=64 time=2.25 ms # now silence for few seconds From 192.168.19.55 icmp_seq=21608 Destination Host Unreachable From 192.168.19.55 icmp_seq=21609 Destination Host Unreachable From 192.168.19.55 icmp_seq=21610 Destination Host Unreachable ```

Second time when speedtesting on a phone + pinging 5ms from laptop

[Spoiler] Case 2: doing speedtest.net on Fairphone 4 (WiFi 5) dmesg ``` [ 22.558865] tun: Universal TUN/TAP device driver, 1.6 [ 579.542233] mt7921u 2-2:1.3: Message 00020002 (seq 7) timeout [ 579.814259] mt7921u 2-2:1.3: timed out waiting for pending tx [ 579.879573] ------------[ cut here ]------------ [ 579.879583] WARNING: CPU: 2 PID: 167 at kernel/kthread.c:659 kthread_park+0xc4/0xe0 [ 579.879601] Modules linked in: tun xt_nat xt_tcpudp veth nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge nft_chain_nat xt_MASQUERADE nf_nat xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep overlay 8021q garp stp llc mt7921u mt7921_common mt792x_usb mt792x_lib mt76_connac_lib mt76_usb mt76 mac80211 vc4 snd_soc_hdmi_codec brcmfmac_wcc drm_display_helper btusb btrtl libarc4 btintel cec brcmfmac hci_uart btbcm brcmutil bluetooth drm_dma_helper cfg80211 drm_kms_helper ecdh_generic bcm2835_codec(C) bcm2835_v4l2(C) snd_soc_core v3d rpivid_hevc(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) v4l2_mem2mem ecc gpu_sched rfkill videobuf2_vmalloc videobuf2_dma_contig libaes videobuf2_memops drm_shmem_helper videobuf2_v4l2 snd_compress raspberrypi_hwmon videodev i2c_brcmstb snd_bcm2835(C) snd_pcm_dmaengine snd_pcm videobuf2_common vc_sm_cma(C) mc snd_timer snd raspberrypi_gpiomem uio_pdrv_genirq uio [ 579.879779] nvmem_rmem drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6 [ 579.879798] CPU: 2 PID: 167 Comm: kworker/u8:1 Tainted: G C 6.7.0-rc2-v8-rpi-6.7.y-48e386b+ #1 [ 579.879804] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT) [ 579.879808] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common] [ 579.879830] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 579.879835] pc : kthread_park+0xc4/0xe0 [ 579.879841] lr : mt76u_stop_tx+0x26c/0x378 [mt76_usb] [ 579.879858] sp : ffffffc080a63c40 [ 579.879861] x29: ffffffc080a63c40 x28: ffffffe63b4369c0 x27: ffffff810a389c78 [ 579.879869] x26: ffffff810a381fb8 x25: 0000000000000000 x24: ffffff810a384760 [ 579.879878] x23: ffffff810a387360 x22: ffffff810a381fb8 x21: ffffff810a381f98 [ 579.879885] x20: ffffff8109938480 x19: ffffff810a3d3e00 x18: 0000000000000000 [ 579.879893] x17: 0000000000000001 x16: ffffffe639ebd280 x15: 0012d49a4a9670ea [ 579.879901] x14: 0013c5a12c946b56 x13: 00000000000003d4 x12: 00000000fa83b2da [ 579.879909] x11: 0000000000000002 x10: 0000000000001a90 x9 : ffffffe5c49949f4 [ 579.879917] x8 : ffffff81015e77f0 x7 : 0000000000000001 x6 : 0000000000000000 [ 579.879925] x5 : ffffffe63b43e000 x4 : ffffffe63b43e0a8 x3 : 0000000000002800 [ 579.879932] x2 : 0000000000000000 x1 : 0000000000001fe0 x0 : 0000000000000004 [ 579.879940] Call trace: [ 579.879943] kthread_park+0xc4/0xe0 [ 579.879949] mt76u_stop_tx+0x26c/0x378 [mt76_usb] [ 579.879960] mt7921u_mac_reset+0x8c/0x298 [mt7921u] [ 579.879969] mt7921_mac_reset_work+0xac/0x1b8 [mt7921_common] [ 579.879981] process_one_work+0x148/0x390 [ 579.879986] worker_thread+0x338/0x450 [ 579.879990] kthread+0x120/0x130 [ 579.879996] ret_from_fork+0x10/0x20 [ 579.880003] ---[ end trace 0000000000000000 ]--- [ 580.014391] mt7921u 2-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a [ 580.024801] mt7921u 2-2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958 ``` Exact moment of death (pinging from WiFi 6 laptop): ``` ubuntu@hp144:~$ ping 192.168.19.1 -i 0.005 ... 64 bytes from 192.168.19.1: icmp_seq=53191 ttl=64 time=21.7 ms 64 bytes from 192.168.19.1: icmp_seq=53192 ttl=64 time=11.5 ms 64 bytes from 192.168.19.1: icmp_seq=53193 ttl=64 time=20.9 ms 64 bytes from 192.168.19.1: icmp_seq=53194 ttl=64 time=159 ms 64 bytes from 192.168.19.1: icmp_seq=53195 ttl=64 time=149 ms 64 bytes from 192.168.19.1: icmp_seq=53196 ttl=64 time=201 ms 64 bytes from 192.168.19.1: icmp_seq=53197 ttl=64 time=191 ms 64 bytes from 192.168.19.1: icmp_seq=53198 ttl=64 time=180 ms 64 bytes from 192.168.19.1: icmp_seq=53199 ttl=64 time=170 ms 64 bytes from 192.168.19.1: icmp_seq=53200 ttl=64 time=63.9 ms 64 bytes from 192.168.19.1: icmp_seq=53201 ttl=64 time=51.9 ms 64 bytes from 192.168.19.1: icmp_seq=53202 ttl=64 time=40.8 ms 64 bytes from 192.168.19.1: icmp_seq=53203 ttl=64 time=21.6 ms 64 bytes from 192.168.19.1: icmp_seq=53204 ttl=64 time=11.5 ms 64 bytes from 192.168.19.1: icmp_seq=53205 ttl=64 time=1.25 ms 64 bytes from 192.168.19.1: icmp_seq=53206 ttl=64 time=37.0 ms 64 bytes from 192.168.19.1: icmp_seq=53207 ttl=64 time=27.1 ms From 192.168.19.55 icmp_seq=54032 Destination Host Unreachable From 192.168.19.55 icmp_seq=54033 Destination Host Unreachable From 192.168.19.55 icmp_seq=54034 Destination Host Unreachable From 192.168.19.55 icmp_seq=54035 Destination Host Unreachable From 192.168.19.55 icmp_seq=54036 Destination Host Unreachable From 192.168.19.55 icmp_seq=54037 Destination Host Unreachable From 192.168.19.55 icmp_seq=54038 Destination Host Unreachable ```

Can someone help to check with this patch?

I've applied patch from deren on 6.7 kernel. Outcome TBD

whitslack commented 9 months ago

Possibly related to the unresponsiveness issue, I have an MT7612U and an MT7921AU both connected to the same x86 Linux system serving as an access point and router. The two radios are on non-overlapping channels in the 2.4-GHz band. When I return home from being out, my phone always sees the beacons from the 7612u but very often does not see the beacons from the 7921u unless I wait through many scanning cycles or repeatedly toggle off and on my phone's Wi-Fi radio to force rescanning. Both interfaces on the AP have the same beacon interval and DTIM settings. I have to wonder if maybe there's a problem with the packet scheduling or transmit multi-queueing in the mt7921u driver. That could also explain the inconsistent responsiveness and the tx timeout errors.

7ERr0r commented 9 months ago

After patching rpi-6.7.y kernel using deren's test_bit the AP suicides sometimes for about 60 seconds and recovers. https://github.com/morrownr/USB-WiFi/issues/107#issuecomment-1809932650

[Spoiler] Timeout logs Timeout 1 === I can kinda constistently reproduce the 60-second timeout, doing in parallel: - ping with 10 ms interval on WiFi 6 laptop - Speedtest.net on WiFi 5 phone dmesg is empty now. ``` [ 8.791785] mt7921u 2-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a [ 9.054080] mt7921u 2-2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958 ... [ 15.921062] eth0: renamed from veth9c00c50 ``` Timeout 2 === Edit: Yet another test and this time `dmesg` shows: ``` [42815.874817] mt7921u 2-2:1.3: Message 00020002 (seq 4) timeout [42816.158790] mt7921u 2-2:1.3: timed out waiting for pending tx [42816.360668] mt7921u 2-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a [42816.373431] mt7921u 2-2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958 ``` For some reason AP responded after 44 seconds and died. ``` $ ping 192.168.19.1 -i 0.01 64 bytes from 192.168.19.1: icmp_seq=34581 ttl=64 time=53.0 ms 64 bytes from 192.168.19.1: icmp_seq=34582 ttl=64 time=37.0 ms 64 bytes from 192.168.19.1: icmp_seq=34583 ttl=64 time=44354 ms 64 bytes from 192.168.19.1: icmp_seq=34584 ttl=64 time=44339 ms (redacted) 64 bytes from 192.168.19.1: icmp_seq=34831 ttl=64 time=39853 ms From 192.168.19.55 icmp_seq=37021 Destination Host Unreachable From 192.168.19.55 icmp_seq=37022 Destination Host Unreachable ``` hostapd process is still up after AP death, but speedtest.net shows 135 Mbit/s (half the previous speed). Channel width is still 80 MHz in Wifi Analyzer. After restarting hostapd throughput in speedtest.net jumps to 285 Mbit/s. Timeout 3 === I had set `echo Y > /sys/module/mt76_usb/parameters/disable_usb_sg` No dmesg logs this time... ``` $ ping 192.168.19.1 -i 0.01 64 bytes from 192.168.19.1: icmp_seq=45197 ttl=64 time=28.4 ms 64 bytes from 192.168.19.1: icmp_seq=45198 ttl=64 time=12.4 ms 64 bytes from 192.168.19.1: icmp_seq=45199 ttl=64 time=61060 ms 64 bytes from 192.168.19.1: icmp_seq=45200 ttl=64 time=61040 ms (redacted) 64 bytes from 192.168.19.1: icmp_seq=49492 ttl=64 time=0.867 ms 64 bytes from 192.168.19.1: icmp_seq=49493 ttl=64 time=0.814 ms 64 bytes from 192.168.19.1: icmp_seq=49494 ttl=64 time=0.817 ms 64 bytes from 192.168.19.1: icmp_seq=49495 ttl=64 time=0.834 ms ``` And the AP works after timeout with full speed.

tcpdump -B 4096 -i usbmon2 -w usb-cf-951ax.pcap Looks like it's a driver issue since device responds with USB packets.

image The first bump is download from speedtest.net. Second one is upload, which didn't finish (wifi timeout).

fayaaz commented 9 months ago

I accidentally used a cat5 cable which capped the speedtest at 100Mbps - now I don't get the ap crashing. This isn't a good solution though.

7ERr0r commented 9 months ago

capped the speedtest at 100Mbps - now I don't get the ap crashing.

mt7921u still crashes after rate-limiting the connection.

tc qdisc add dev wlan1 root tbf rate 140Mbit latency 50ms burst 1540
tc qdisc add dev eth0 root tbf rate 30Mbit latency 50ms burst 1540
[Spoiler] dmesg logs ``` [Nov26 15:32] mt7921u 2-2:1.3: Message 00020002 (seq 4) timeout [ +0.283973] mt7921u 2-2:1.3: timed out waiting for pending tx [ +0.201878] mt7921u 2-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a [ +0.012763] mt7921u 2-2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958 [Nov26 15:49] usb 2-2: USB disconnect, device number 2 [ +5.108904] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd [ +0.021600] usb 2-2: New USB device found, idVendor=0e8d, idProduct=7961, bcdDevice= 1.00 [ +0.000016] usb 2-2: New USB device strings: Mfr=6, Product=7, SerialNumber=8 [ +0.000006] usb 2-2: Product: Wireless_Device [ +0.000005] usb 2-2: Manufacturer: MediaTek Inc. [ +0.000004] usb 2-2: SerialNumber: 000000000 [ +0.018979] Bluetooth: hci1: urb 000000005f849d58 failed to resubmit (2) [ +2.010546] Bluetooth: hci1: Opcode 0x0c03 failed: -110 [ +0.129675] usb 2-2: reset SuperSpeed USB device number 3 using xhci_hcd [ +0.056113] mt7921u 2-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a [ +0.261094] mt7921u 2-2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958 [ +1.729112] Bluetooth: hci1: Opcode 0x0c03 failed: -110 [ +2.016042] Bluetooth: hci1: Opcode 0x0c03 failed: -110 [Nov29 17:16] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 4 [ +0.000039] xhci_hcd 0000:01:00.0: Looking for event-dma 000000040f016f80 trb-start 000000040f016fb0 trb-end 000000040f016fd0 seg-start 000000040f016000 seg-end 000000040f016ff0 [ +0.140069] usb 2-2: USB disconnect, device number 3 [ +0.319929] usb 1-1.2: new high-speed USB device number 3 using xhci_hcd [ +0.101766] usb 1-1.2: New USB device found, idVendor=0e8d, idProduct=7961, bcdDevice= 1.00 [ +0.000015] usb 1-1.2: New USB device strings: Mfr=6, Product=7, SerialNumber=8 [ +0.000006] usb 1-1.2: Product: Wireless_Device [ +0.000005] usb 1-1.2: Manufacturer: MediaTek Inc. [ +0.000004] usb 1-1.2: SerialNumber: 000000000 [ +2.022242] Bluetooth: hci1: Opcode 0x0c03 failed: -110 [ +0.080620] usb 1-1.2: reset high-speed USB device number 3 using xhci_hcd [ +0.565780] mt7921u 1-1.2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a [ +0.012839] mt7921u 1-1.2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958 [ +1.548824] Bluetooth: hci1: Opcode 0x0c03 failed: -110 [ +2.015994] Bluetooth: hci1: Opcode 0x0c03 failed: -110 Also got timeout today (3 Dec) but there is nothing in dmesg... ```
MEL1H commented 9 months ago

Regarding to bug 6, issue still continues with kernel 6.6. Whenever I do a speedtest, result is as below.

Linux md-ap 6.6.3-v8+ #1 SMP PREEMPT Fri Dec 1 16:56:43 UTC 2023 aarch64 GNU /Linux

[ 9.446279] usbcore: registered new interface driver mt7921u [ 9.454161] mt7921u 2-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 202311 09190918a [ 9.711262] mt7921u 2-2:1.3: WM Firmware Version: __010000, Build Time: 20231109190959 [ 573.413483] mt7921u 2-2:1.3: Message 00020003 (seq 12) timeout [ 573.701501] mt7921u 2-2:1.3: timed out waiting for pending tx [ 573.791465] Modules linked in: cmac ctr aes_arm64 aes_generic ccm bnep mt7 921u mt7921_common mt792x_lib mt76_connac_lib mt792x_usb exfat 8021q mt76_usb garp stp llc mt76 mac80211 nft_chain_nat xt_MASQUERADE nf_nat xt_state btusb xt_mark btrtl xt_comment btintel xt_conntrack nf_conntrack libarc4 nf_defrag _ipv6 btbcm nf_defrag_ipv4 cfg80211 nft_compat bluetooth vc4 nf_tables nfnetl ink ecdh_generic ecc rfkill libaes snd_soc_hdmi_codec drm_display_helper cec drm_dma_helper v3d drm_kms_helper gpu_sched bcm2835_codec(C) bcm2835_v4l2(C) rpivid_hevc(C) drm_shmem_helper v4l2_mem2mem bcm2835_isp(C) bcm2835_mmal_vchi q(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops snd_soc_core vid eobuf2_v4l2 i2c_brcmstb raspberrypi_hwmon videodev snd_bcm2835(C) snd_compres s snd_pcm_dmaengine snd_pcm videobuf2_common raspberrypi_gpiomem mc vc_sm_cma (C) snd_timer snd uio_pdrv_genirq uio nvmem_rmem sg i2c_dev drm fuse drm_pane l_orientation_quirks backlight ip_tables x_tables ipv6 [ 573.791915] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common] [ 573.792113] x17: 0000000000000001 x16: ffffffdde2cbcdb0 x15: 0011c40a3066b 11a [ 573.792134] x14: 0014355117b7404a x13: ffffffdde392ca08 x12: 00000000fa83b 2da [ 573.792155] x11: 0000000000000002 x10: 0000000000001a90 x9 : ffffffdd80422 994 [ 573.792175] x8 : ffffffc08007bae8 x7 : 0000000000000000 x6 : ffffffdde438e db8 [ 573.792194] x5 : ffffffdde421e000 x4 : ffffffdde421e118 x3 : 0000000000002 800 [ 573.792282] mt7921u_mac_reset+0x8c/0x298 [mt7921u] [ 573.792303] mt7921_mac_reset_work+0xac/0x1b8 [mt7921_common] [ 573.930816] mt7921u 2-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 202311 09190918a [ 573.942768] mt7921u 2-2:1.3: WM Firmware Version: __010000, Build Time: 20231109190959

7ERr0r commented 9 months ago

Timeout didn't fix itself after 10 days :( Only after reconnecting USB physically hostapd works.

Hostapd SIGSEGV ```console root@pi4irdm7:~# dmesg -H | tail ... [ +0.080620] usb 1-1.2: reset high-speed USB device number 3 using xhci_hcd [ +0.565780] mt7921u 1-1.2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a [ +0.012839] mt7921u 1-1.2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958 [ +1.548824] Bluetooth: hci1: Opcode 0x0c03 failed: -110 [ +2.015994] Bluetooth: hci1: Opcode 0x0c03 failed: -110 [Dec11 19:30] mt7921u 1-1.2:1.3: timed out waiting for pending tx root@pi4irdm7:~# /root/hostapd -P /run/hostapd.pid /etc/hostapd/hostapd.conf Could not set interface wlan1 flags (UP): Connection timed out nl80211: Could not set interface 'wlan1' UP nl80211: deinit ifname=wlan1 disabled_11b_rates=0 Segmentation fault ``` ```console (gdb) run Starting program: /root/hostapd -P /run/hostapd.pid /etc/hostapd/hostapd.conf [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". Could not set interface wlan1 flags (UP): Connection timed out nl80211: Could not set interface 'wlan1' UP nl80211: deinit ifname=wlan1 disabled_11b_rates=0 Program received signal SIGSEGV, Segmentation fault. 0x00000055555cb700 in nl80211_teardown_ap (bss=bss@entry=0x55556b89a0) at ../src/drivers/driver_nl80211.c:6118 6118 bss->flink->beacon_set = 0; ``` ```c static void nl80211_teardown_ap(struct i802_bss *bss) { struct wpa_driver_nl80211_data *drv = bss->drv; wpa_printf(MSG_DEBUG, "nl80211: Teardown AP(%s) - device_ap_sme=%d use_monitor=%d", bss->ifname, drv->device_ap_sme, drv->use_monitor); if (drv->device_ap_sme) { wpa_driver_nl80211_probe_req_report(bss, 0); if (!drv->use_monitor) nl80211_mgmt_unsubscribe(bss, "AP teardown (dev SME)"); } else if (drv->use_monitor) nl80211_remove_monitor_interface(drv); else nl80211_mgmt_unsubscribe(bss, "AP teardown"); nl80211_put_wiphy_data_ap(bss); bss->flink->beacon_set = 0; // here SIGSEGV } ```
EasyNetDev commented 8 months ago

Hi,

I've tried my USB WiFi dongle Comfast with mt7921u driver on Linux Kernel 6.6.7 on Odroid XU4 and I got a crash in hostapd mode:

[  710.956327] mt7921u 4-1.2:1.3: Message 00020003 (seq 7) timeout
[  711.023061] ------------[ cut here ]------------
[  711.023193] WARNING: CPU: 2 PID: 0 at drivers/net/wireless/mediatek/mt76/usb.c:578 mt76u_complete_rx+0x1c8/0x1cc [mt76_usb]
[  711.023426] rx urb mismatch
[  711.023459] Modules linked in: aes_arm_bs crypto_simd cryptd bridge stp llc vrf ip_gre ip_tunnel gre algif_hash algif_skcipher af_alg bnep wireguard curve25519_neon libchacha20poly1305 chacha_neon poly1305_arm ip6_udp_tunnel udp_tunnel libcurve25519_generic sunrpc nft_masq nft_nat mt7921u mt792x_usb mt7921_common mt792x_lib mt76_connac_lib nft_chain_nat mt76_usb nf_nat mt76 mac80211 cfg80211 btusb btrtl btbcm btmtk btintel bluetooth ecdh_generic rfkill ecc onboard_usb_hub s5p_cec nft_ct evdev nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 zstd lz4hc lz4hc_compress lz4 lz4_compress lzo_rle zram zsmalloc binfmt_misc nf_tables nfnetlink loop fuse ip_tables ipv6 btrfs blake2b_neon blake2b_generic xor xor_neon lzo_compress zlib_deflate raid6_pq clk_exynos_clkout gpio_keys
[  711.025834] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.6.7 #4
[  711.025907] Hardware name: Samsung Exynos (Flattened Device Tree)
[  711.025995]  unwind_backtrace from show_stack+0x10/0x14
[  711.026135]  show_stack from dump_stack_lvl+0x40/0x4c
[  711.026289]  dump_stack_lvl from __warn+0x78/0x154
[  711.026456]  __warn from warn_slowpath_fmt+0x120/0x1b4
[  711.026592]  warn_slowpath_fmt from mt76u_complete_rx+0x1c8/0x1cc [mt76_usb]
[  711.026816]  mt76u_complete_rx [mt76_usb] from __usb_hcd_giveback_urb+0x64/0xf4
[  711.027003]  __usb_hcd_giveback_urb from usb_giveback_urb_bh+0x98/0x134
[  711.027131]  usb_giveback_urb_bh from tasklet_action_common+0xe0/0x38c
[  711.027277]  tasklet_action_common from __do_softirq+0x11c/0x3b4
[  711.027392]  __do_softirq from irq_exit+0x94/0xc0
[  711.027491]  irq_exit from call_with_stack+0x18/0x20
[  711.027616]  call_with_stack from __irq_svc+0x98/0xc8
[  711.027707] Exception stack(0xf08bdf50 to 0xf08bdf98)
[  711.027777] df40:                                     00000003 00000001 00000002 00000000
[  711.027851] df60: c19a2f80 c1213934 c1104014 c110405c 00000000 00000000 00000000 00000000
[  711.027918] df80: 00000017 f08bdfa0 c0b8c1ac c0b8cd9c 600c0013 ffffffff
[  711.027968]  __irq_svc from default_idle_call+0x50/0x110
[  711.028055]  default_idle_call from do_idle+0x208/0x290
[  711.028157]  do_idle from cpu_startup_entry+0x28/0x2c
[  711.028251]  cpu_startup_entry from secondary_start_kernel+0x17c/0x1f4
[  711.028355]  secondary_start_kernel from 0x40101700
[  711.028486] ---[ end trace 0000000000000000 ]---
[  711.266170] mt7921u 4-1.2:1.3: timed out waiting for pending tx
[  711.292215] ------------[ cut here ]------------
[  711.292258] WARNING: CPU: 1 PID: 127 at kernel/kthread.c:659 kthread_park+0x120/0x124
[  711.292349] Modules linked in: aes_arm_bs crypto_simd cryptd bridge stp llc vrf ip_gre ip_tunnel gre algif_hash algif_skcipher af_alg bnep wireguard curve25519_neon libchacha20poly1305 chacha_neon poly1305_arm ip6_udp_tunnel udp_tunnel libcurve25519_generic sunrpc nft_masq nft_nat mt7921u mt792x_usb mt7921_common mt792x_lib mt76_connac_lib nft_chain_nat mt76_usb nf_nat mt76 mac80211 cfg80211 btusb btrtl btbcm btmtk btintel bluetooth ecdh_generic rfkill ecc onboard_usb_hub s5p_cec nft_ct evdev nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 zstd lz4hc lz4hc_compress lz4 lz4_compress lzo_rle zram zsmalloc binfmt_misc nf_tables nfnetlink loop fuse ip_tables ipv6 btrfs blake2b_neon blake2b_generic xor xor_neon lzo_compress zlib_deflate raid6_pq clk_exynos_clkout gpio_keys
[  711.293174] CPU: 1 PID: 127 Comm: kworker/u16:1 Tainted: G        W          6.6.7 #4
[  711.293197] Hardware name: Samsung Exynos (Flattened Device Tree)
[  711.293217] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
[  711.293316]  unwind_backtrace from show_stack+0x10/0x14
[  711.293360]  show_stack from dump_stack_lvl+0x40/0x4c
[  711.293412]  dump_stack_lvl from __warn+0x78/0x154
[  711.293461]  __warn from warn_slowpath_fmt+0x1ac/0x1b4
[  711.293487]  warn_slowpath_fmt from kthread_park+0x120/0x124
[  711.293517]  kthread_park from mt76u_stop_tx+0x234/0x298 [mt76_usb]
[  711.293570]  mt76u_stop_tx [mt76_usb] from mt7921u_mac_reset+0x74/0x1bc [mt7921u]
[  711.293651]  mt7921u_mac_reset [mt7921u] from mt7921_mac_reset_work+0x80/0x158 [mt7921_common]
[  711.293690]  mt7921_mac_reset_work [mt7921_common] from process_one_work+0x134/0x3e8
[  711.293725]  process_one_work from worker_thread+0x27c/0x4ac
[  711.293745]  worker_thread from kthread+0x110/0x12c
[  711.293766]  kthread from ret_from_fork+0x14/0x28
[  711.293786] Exception stack(0xf0de9fb0 to 0xf0de9ff8)
[  711.293804] 9fa0:                                     00000000 00000000 00000000 00000000
[  711.293822] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  711.293837] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  711.293853] ---[ end trace 0000000000000000 ]---
[  711.453751] mt7921u 4-1.2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a

[  711.471210] mt7921u 4-1.2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958

Looks like this driver is unusable in hostapd mode. I got the same crashes on Raspberry Pi 3 and 2. I thought is the power consumption is too high for this dongle, but on Odroid XU4 I got 5V 4A and USB 3.0 ports.

For the moment I'm giving up trying to build a WiFi router with this dongle.

whitslack commented 8 months ago

Looks like this driver is unusable in hostapd mode.

"Unusable" would be a bit of an exaggeration. My main home Wi-Fi network is serviced by an mt7921u running on Linux 6.5.13, and it goes days to weeks at a stretch between crashes, which do include the rx urb mismatch that you experienced, among other failure modes. Of course, an access point should never crash, so at best I would consider the driver to be "unstable," but that's not the same as "unusable."

@EasyNetDev: Are you using the latest firmware for your mt7921u? I don't know whether that impacts USB protocol correctness, but it's one thing to check.

EasyNetDev commented 8 months ago

I've notice something strange now. I've tried another approach: using an external powered USB HUB: https://spacer.ro/produs/hub-spacer-4-porturi1-quick-chage-sph-4usb30-1qc/

I've installed the the same USB dongle and I was able to keep download / upload for longer time without crashing. I will use an USB analyzer to check how much power is requiring this device to operate in HostAP.

deren commented 8 months ago

HI @fayaaz

In my test, the abnormal behavior disappeared after the patch applied. Could you please help to verify the problem in your environment? Note: still under development at this moment, not final yet.

diff --git a/drivers/net/wireless/mediatek/mt76/mt792x_usb.c b/drivers/net/wireless/mediatek/mt76/mt792x_usb.c
index 2dd283caed36..24b8a42a871e 100644
--- a/drivers/net/wireless/mediatek/mt76/mt792x_usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt792x_usb.c
@@ -121,44 +121,25 @@ static void mt792xu_uhw_wr(struct mt76_dev *dev, u32 addr, u32 val)

 static void mt792xu_dma_prefetch(struct mt792x_dev *dev)
 {
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(0),
-        MT_WPDMA0_MAX_CNT_MASK, 4);
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(0),
-        MT_WPDMA0_BASE_PTR_MASK, 0x80);
-
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(1),
-        MT_WPDMA0_MAX_CNT_MASK, 4);
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(1),
-        MT_WPDMA0_BASE_PTR_MASK, 0xc0);
-
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(2),
-        MT_WPDMA0_MAX_CNT_MASK, 4);
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(2),
-        MT_WPDMA0_BASE_PTR_MASK, 0x100);
-
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(3),
-        MT_WPDMA0_MAX_CNT_MASK, 4);
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(3),
-        MT_WPDMA0_BASE_PTR_MASK, 0x140);
-
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(4),
-        MT_WPDMA0_MAX_CNT_MASK, 4);
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(4),
-        MT_WPDMA0_BASE_PTR_MASK, 0x180);
-
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(16),
-        MT_WPDMA0_MAX_CNT_MASK, 4);
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(16),
-        MT_WPDMA0_BASE_PTR_MASK, 0x280);
-
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(17),
-        MT_WPDMA0_MAX_CNT_MASK, 4);
-   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(17),
-        MT_WPDMA0_BASE_PTR_MASK,  0x2c0);
+#define DMA_PREFETCH_CONF(_idx_, _cnt_, _base_) \
+   mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL((_idx_)), \
+        MT_WPDMA0_MAX_CNT_MASK | MT_WPDMA0_BASE_PTR_MASK, \
+        FIELD_PREP(MT_WPDMA0_MAX_CNT_MASK, (_cnt_)) | \
+        FIELD_PREP(MT_WPDMA0_BASE_PTR_MASK, (_base_)))
+
+   DMA_PREFETCH_CONF(0, 4, 0x080);
+   DMA_PREFETCH_CONF(1, 4, 0x0c0);
+   DMA_PREFETCH_CONF(2, 4, 0x100);
+   DMA_PREFETCH_CONF(3, 4, 0x140);
+   DMA_PREFETCH_CONF(4, 4, 0x180);
+   DMA_PREFETCH_CONF(16, 4, 0x280);
+   DMA_PREFETCH_CONF(17, 4, 0x2c0);
 }

 static void mt792xu_wfdma_init(struct mt792x_dev *dev)
 {
+   int i;
+
    mt792xu_dma_prefetch(dev);

    mt76_clear(dev, MT_UWFDMA0_GLO_CFG, MT_WFDMA0_GLO_CFG_OMIT_RX_INFO);
@@ -169,10 +150,27 @@ static void mt792xu_wfdma_init(struct mt792x_dev *dev)
         MT_WFDMA0_GLO_CFG_TX_DMA_EN |
         MT_WFDMA0_GLO_CFG_RX_DMA_EN);

-   /* disable dmashdl */
-   mt76_clear(dev, MT_UWFDMA0_GLO_CFG_EXT0,
-          MT_WFDMA0_CSR_TX_DMASHDL_ENABLE);
-   mt76_set(dev, MT_DMASHDL_SW_CONTROL, MT_DMASHDL_DMASHDL_BYPASS);
+   mt76_rmw(dev, MT_DMASHDL_REFILL, MT_DMASHDL_REFILL_MASK, 0xffe00000);
+   mt76_clear(dev, MT_DMASHDL_PAGE, MT_DMASHDL_GROUP_SEQ_ORDER);
+   mt76_rmw(dev, MT_DMASHDL_PKT_MAX_SIZE,
+        MT_DMASHDL_PKT_MAX_SIZE_PLE | MT_DMASHDL_PKT_MAX_SIZE_PSE,
+        FIELD_PREP(MT_DMASHDL_PKT_MAX_SIZE_PLE, 1) |
+        FIELD_PREP(MT_DMASHDL_PKT_MAX_SIZE_PSE, 0));
+   for (i = 0; i < 5; i++)
+       mt76_wr(dev, MT_DMASHDL_GROUP_QUOTA(i),
+           FIELD_PREP(MT_DMASHDL_GROUP_QUOTA_MIN, 0x3) |
+           FIELD_PREP(MT_DMASHDL_GROUP_QUOTA_MAX, 0xfff));
+   for (i = 5; i < 16; i++)
+       mt76_wr(dev, MT_DMASHDL_GROUP_QUOTA(i),
+           FIELD_PREP(MT_DMASHDL_GROUP_QUOTA_MIN, 0x0) |
+           FIELD_PREP(MT_DMASHDL_GROUP_QUOTA_MAX, 0x0));
+   mt76_wr(dev, MT_DMASHDL_Q_MAP(0), 0x32013201);
+   mt76_wr(dev, MT_DMASHDL_Q_MAP(1), 0x32013201);
+   mt76_wr(dev, MT_DMASHDL_Q_MAP(2), 0x55555444);
+   mt76_wr(dev, MT_DMASHDL_Q_MAP(3), 0x55555444);
+
+   mt76_wr(dev, MT_DMASHDL_SCHED_SET(0), 0x76540123);
+   mt76_wr(dev, MT_DMASHDL_SCHED_SET(1), 0xFEDCBA98);

    mt76_set(dev, MT_WFDMA_DUMMY_CR, MT_WFDMA_NEED_REINIT);
 }
diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index 5e5c7bf51174..a503bc94d5ef 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -873,7 +873,7 @@ mt76u_tx_queue_skb(struct mt76_dev *dev, struct mt76_queue *q,
    if (err < 0)
        return err;

-   mt76u_fill_bulk_urb(dev, USB_DIR_OUT, q2ep(q->hw_idx),
+   mt76u_fill_bulk_urb(dev, USB_DIR_OUT, qid >= MT_TXQ_PSD ? 5 : q2ep(q->hw_idx),
                q->entry[idx].urb, mt76u_complete_tx,
                &q->entry[idx]);
EasyNetDev commented 8 months ago

@whitslack I've updates to the latest versions of firmware today. The previous firmwares were from middle of 2023. I'm testing with the latest ones.

@deren I'm trying also your patch on my Odroid XU4 Kernel 6.6.7. I've applied and I'm trying to compile the kernel. I will keep you updated if you want.

EasyNetDev commented 8 months ago

The firmware update to latest one:

[   16.777305] usbcore: registered new interface driver mt7921u
[   16.783866] mt7921u 4-1.1.4:1.3: HW/SW Version: 0x8a108a10, Build Time: 20231109190918a
[   16.797299] mt7921u 4-1.1.4:1.3: WM Firmware Version: ____010000, Build Time: 20231109190959
[   18.456640] mt7921u 4-1.1.4:1.3 wlxe0e1a93655e3: renamed from wlan0
[   26.432668] mt7921u 4-1.1.4:1.3 wlxe0e1a93655e3: entered allmulticast mode
[   26.433092] mt7921u 4-1.1.4:1.3 wlxe0e1a93655e3: entered promiscuous mode

The error is:

[98052.715939] mt7921u 4-1.1.4:1.3: Message 00020003 (seq 5) timeout
[98053.005783] mt7921u 4-1.1.4:1.3: timed out waiting for pending tx
[98053.210794] mt7921u 4-1.1.4:1.3: HW/SW Version: 0x8a108a10, Build Time: 20231109190918a

[98053.233538] mt7921u 4-1.1.4:1.3: WM Firmware Version: ____010000, Build Time: 20231109190959
[98084.475971] page_pool_release_retry() stalled pool shutdown 9 inflight 80408 sec

Still geting timeout and I'm loosing WiFi connection.

EasyNetDev commented 8 months ago

Yesterday evening I did some tests with Kernel 6.6.7 + the patch from @deren and it worked pretty well until I tried the dongle directly in the USB port of my Odroid XU4. Then this happen:

[  248.753349] usb 4-1.1.1: new SuperSpeed USB device number 11 using xhci-hcd
[  248.787536] usb 4-1.1.1: New USB device found, idVendor=0e8d, idProduct=7961, bcdDevice= 1.00
[  248.787663] usb 4-1.1.1: New USB device strings: Mfr=6, Product=7, SerialNumber=8
[  248.787731] usb 4-1.1.1: Product: Wireless_Device
[  248.787785] usb 4-1.1.1: Manufacturer: MediaTek Inc.
[  248.787840] usb 4-1.1.1: SerialNumber: 000000000
[  248.874796] Bluetooth: hci0: urb 2ae41daf failed to resubmit (2)
[  248.875830] Bluetooth: hci0: urb 1eacc356 failed to resubmit (2)
[  248.876079] Bluetooth: hci0: urb afba8b3c failed to resubmit (2)
[  251.845086] Bluetooth: hci0: Device setup in 2912180 usecs
[  251.845200] Bluetooth: hci0: HCI Enhanced Setup Synchronous Connection command is advertised, but not supported.
[  253.913038] Bluetooth: hci0: Opcode 0x0c03 failed: -110
[  255.993207] Bluetooth: hci0: Failed to read MSFT supported features (-110)
[  258.073139] Bluetooth: hci0: AOSP get vendor capabilities (-110)
[  263.193687] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  268.473239] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  268.693104] usb 4-1.1.1: device not accepting address 11, error -62
[  274.153507] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  279.433022] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  279.652976] usb 4-1.1.1: device not accepting address 11, error -62
[  285.113304] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  290.393208] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  290.613073] usb 4-1.1.1: device not accepting address 11, error -62
[  296.073365] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  301.353299] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  301.573184] usb 4-1.1.1: device not accepting address 11, error -62
[  301.576951] mt7921u: probe of 4-1.1.1:1.3 failed with error -5
[  301.581606] usb 4-1.1.1: USB disconnect, device number 11
[  307.033386] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  312.313383] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  312.533049] usb 4-1.1.1: device not accepting address 12, error -62
[  317.993310] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  323.273369] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  323.493152] usb 4-1.1.1: device not accepting address 13, error -62
[  323.494408] usb 4-1.1-port1: attempt power cycle
[  328.953363] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  334.233277] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  334.453052] usb 4-1.1.1: device not accepting address 14, error -62
[  339.913234] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  345.193242] xhci-hcd xhci-hcd.8.auto: Timeout while waiting for setup device command
[  345.413167] usb 4-1.1.1: device not accepting address 15, error -62
[  345.414636] usb 4-1.1-port1: unable to enumerate USB device

Tried with official 6.1.68-edge-odroidxu4, 6,6.7 and 6.6.7 patched kernel. None of them are working anymore. The dongle is installed under my Asus Laptop with Windows and is working ok. I tested the USB ports from Odroid with some Ethernet USB NICs and are working ok.

EasyNetDev commented 8 months ago

Moving back to previous firmware looks like the dongle is able to boot correctly:

[   10.697715] Bluetooth: Core ver 2.22
[   10.697829] NET: Registered PF_BLUETOOTH protocol family
[   10.697841] Bluetooth: HCI device and connection manager initialized
[   10.697862] Bluetooth: HCI socket layer initialized
[   10.697877] Bluetooth: L2CAP socket layer initialized
[   10.697919] Bluetooth: SCO socket layer initialized
[   10.715643] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[   10.729819] usb 3-1.1: USB disconnect, device number 3
[   10.731962] usbcore: registered new interface driver btusb
[   10.750567] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20230526131214
[   10.886740] Bluetooth: hci0: Device setup in 145756 usecs
[   10.886760] Bluetooth: hci0: HCI Enhanced Setup Synchronous Connection command is advertised, but not supported.
[   10.886904] Bluetooth: hci0: urb 862a709f failed to resubmit (2)
[   11.546118] usb 3-1.1: new high-speed USB device number 4 using xhci-hcd
[   11.690400] usb 3-1.1: New USB device found, idVendor=05e3, idProduct=0610, bcdDevice= 6.63
[   11.690513] usb 3-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[   11.690551] usb 3-1.1: Product: USB2.1 Hub
[   11.690581] usb 3-1.1: Manufacturer: GenesysLogic
[   12.956151] Bluetooth: hci0: Opcode 0x0c03 failed: -110
[   15.036109] Bluetooth: hci0: Failed to read MSFT supported features (-110)
[   17.116171] Bluetooth: hci0: AOSP get vendor capabilities (-110)
[   17.221103] usb 4-1.1.2: reset SuperSpeed USB device number 4 using xhci-hcd
[   17.253942] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20230526131214
[   17.378286] Bluetooth: hci0: Device setup in 123575 usecs
[   17.378394] Bluetooth: hci0: HCI Enhanced Setup Synchronous Connection command is advertised, but not supported.
[   17.414444] usbcore: registered new interface driver mt7921u
[   17.419951] mt7921u 4-1.1.2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a
[   17.432720] mt7921u 4-1.1.2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958
[   17.449817] Bluetooth: hci0: AOSP extensions version v1.00
[   17.449856] Bluetooth: hci0: AOSP quality report is supported
[   19.097624] mt7921u 4-1.1.2:1.3 wlxe0e1a93655e3: renamed from wlan0
[   20.491597] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   20.491612] Bluetooth: BNEP filters: protocol multicast
[   20.491627] Bluetooth: BNEP socket layer initialized
[   20.501002] Bluetooth: MGMT ver 1.22
[   20.524892] NET: Registered PF_ALG protocol family
[   20.905228] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20230526131214
[   21.045195] Bluetooth: hci0: Device setup in 138281 usecs
[   21.045218] Bluetooth: hci0: HCI Enhanced Setup Synchronous Connection command is advertised, but not supported.
[   21.135349] Bluetooth: hci0: AOSP extensions version v1.00
[   21.135386] Bluetooth: hci0: AOSP quality report is supported
[   24.808376] mt7921u 4-1.1.2:1.3 wlxe0e1a93655e3: entered allmulticast mode
[   24.808878] mt7921u 4-1.1.2:1.3 wlxe0e1a93655e3: entered promiscuous mode

Looks like the new firmware has some issues.

EasyNetDev commented 8 months ago

I did some tests with firmware and I discovered that the Bluetooth firmware is the cause for WiFi Mediatek mt7921u to not boot.

# modinfo btmtk
filename:       /lib/modules/6.6.7-dirty/kernel/drivers/bluetooth/btmtk.ko
firmware:       mediatek/mt7925/BT_RAM_CODE_MT7925_1_1_hdr.bin
firmware:       mediatek/BT_RAM_CODE_MT7961_1_2_hdr.bin
firmware:       mediatek/mt7668pr2h.bin
firmware:       mediatek/mt7663pr2h.bin
firmware:       mediatek/mt7622pr2h.bin
license:        GPL
version:        0.1
description:    Bluetooth support for MediaTek devices ver 0.1
author:         Mark Chen <mark-yw.chen@mediatek.com>
author:         Sean Wang <sean.wang@mediatek.com>
srcversion:     6DBBCA7567F5C10082C210E
depends:        bluetooth
intree:         Y
name:           btmtk
vermagic:       6.6.7-dirty SMP preempt mod_unload ARMv7 p2v8

This firmware mediatek/BT_RAM_CODE_MT7961_1_2_hdr.bin if you try to use the version from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek?h=20231211 (Latest version at this moment is 20231211) the device is not booting at all. You get the errors as I mentioned before.

As soon as I move back to version 20231111 https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek?h=20231111 my WiFi dongle is booting up without issues and I can use the latest version (20231211) of WiFi firmware:

version: 6.6.7-dirty
firmware-version: ____010000-20231109190959
expansion-rom-version:
bus-info: 4-1.1.2:1.3
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

@morrownr can you mention in the page https://github.com/morrownr/USB-WiFi/blob/main/home/How_to_Install_Firmware_for_Mediatek_based_USB_WiFi_adapters.md that there is an issue with Bluetooth firmware version 20231211 for mt7921u ?

whitslack commented 8 months ago

@EasyNetDev: You don't need the Bluetooth firmware and can simply delete it. Bluetooth and USB 3.0 don't play well together, and most MT7921U-based adapters disable the chipset's Bluetooth functionality for that reason anyway.

morrownr commented 8 months ago

@whitslack is correct with everything he said about Bluetooth and USB3. Here is a link to a location where you can get a copy of the Intel White Paper that explains the details:

https://www.usb.org/document-library/usb-30-radio-frequency-interference-impact-24-ghz-wireless-devices

USB3 WiFi adapters should not have Bluetooth turned on as the USB3 will cause interference with Bluetooth. If makers decide they really want Bluetooth capability in an adapter then they need to limit wifi to USB2 capability. All adapters with the mt7921au chipset that I am aware of have Bluetooth turned off so WiFi can operate in USB3 mode. However, there is a bug in the Bluetooth capability is still being detected and the driver/firmware is loading. System act like Bluetooth is available but when you try to use the Bluetooth, it won't work. It is not clear to me how this can be fixed but it really does need to be fixed.

This is not a problem with PCIe cards. I have a mt7922 based PCIe card. Wifi and Bluetooth work well together.

@morrownr

EasyNetDev commented 8 months ago

@EasyNetDev: You don't need the Bluetooth firmware and can simply delete it. Bluetooth and USB 3.0 don't play well together, and most MT7921U-based adapters disable the chipset's Bluetooth functionality for that reason anyway.

I set btmk module as blacklist for the moment to not load at all BlueTooth drivers for this device.

morrownr commented 8 months ago

I do the same as whitslack. I simply delete the following file.

sudo rm /lib/firmware/mediatek/BT_RAM_CODE_MT7961_1_2_hdr.bin

I wish a software fix for this situation could be found because both of my mt7921au based adapters have BT shutdown (by the adapter maker) yet Linux lights up like BT is available. This is a bug. An annoying bug.

whitslack commented 8 months ago

@EasyNetDev: I experienced a dramatic decline in the stability of my MT7921AU-based ALFA AWUS036AXML when I switched from Linux 6.5.13 (EOL) to Linux 6.6.8 (“stable”). It went from lasting days to weeks to lasting only minutes to hours. Looking at the git-diff, there were very many changes made in the driver between those versions, so I guess they made it worse! Maybe give 6.5.13 a try? EDIT: Going back to 6.5.13 has not improved stability for me. My MT7921AU AP is still crapping out within minutes of booting up. Maybe it's a firmware regression. ::big sigh:: I'm getting close to giving up on this chipset.

fayaaz commented 8 months ago

HI @fayaaz

In my test, the abnormal behavior disappeared after the patch applied. Could you please help to verify the problem in your environment? Note: still under development at this moment, not final yet.

diff --git a/drivers/net/wireless/mediatek/mt76/mt792x_usb.c b/drivers/net/wireless/mediatek/mt76/mt792x_usb.c
index 2dd283caed36..24b8a42a871e 100644
--- a/drivers/net/wireless/mediatek/mt76/mt792x_usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt792x_usb.c
@@ -121,44 +121,25 @@ static void mt792xu_uhw_wr(struct mt76_dev *dev, u32 addr, u32 val)

 static void mt792xu_dma_prefetch(struct mt792x_dev *dev)
 {
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(0),
-      MT_WPDMA0_MAX_CNT_MASK, 4);
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(0),
-      MT_WPDMA0_BASE_PTR_MASK, 0x80);
-
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(1),
-      MT_WPDMA0_MAX_CNT_MASK, 4);
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(1),
-      MT_WPDMA0_BASE_PTR_MASK, 0xc0);
-
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(2),
-      MT_WPDMA0_MAX_CNT_MASK, 4);
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(2),
-      MT_WPDMA0_BASE_PTR_MASK, 0x100);
-
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(3),
-      MT_WPDMA0_MAX_CNT_MASK, 4);
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(3),
-      MT_WPDMA0_BASE_PTR_MASK, 0x140);
-
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(4),
-      MT_WPDMA0_MAX_CNT_MASK, 4);
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(4),
-      MT_WPDMA0_BASE_PTR_MASK, 0x180);
-
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(16),
-      MT_WPDMA0_MAX_CNT_MASK, 4);
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(16),
-      MT_WPDMA0_BASE_PTR_MASK, 0x280);
-
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(17),
-      MT_WPDMA0_MAX_CNT_MASK, 4);
- mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL(17),
-      MT_WPDMA0_BASE_PTR_MASK,  0x2c0);
+#define DMA_PREFETCH_CONF(_idx_, _cnt_, _base_) \
+ mt76_rmw(dev, MT_UWFDMA0_TX_RING_EXT_CTRL((_idx_)), \
+      MT_WPDMA0_MAX_CNT_MASK | MT_WPDMA0_BASE_PTR_MASK, \
+      FIELD_PREP(MT_WPDMA0_MAX_CNT_MASK, (_cnt_)) | \
+      FIELD_PREP(MT_WPDMA0_BASE_PTR_MASK, (_base_)))
+
+ DMA_PREFETCH_CONF(0, 4, 0x080);
+ DMA_PREFETCH_CONF(1, 4, 0x0c0);
+ DMA_PREFETCH_CONF(2, 4, 0x100);
+ DMA_PREFETCH_CONF(3, 4, 0x140);
+ DMA_PREFETCH_CONF(4, 4, 0x180);
+ DMA_PREFETCH_CONF(16, 4, 0x280);
+ DMA_PREFETCH_CONF(17, 4, 0x2c0);
 }

 static void mt792xu_wfdma_init(struct mt792x_dev *dev)
 {
+ int i;
+
  mt792xu_dma_prefetch(dev);

  mt76_clear(dev, MT_UWFDMA0_GLO_CFG, MT_WFDMA0_GLO_CFG_OMIT_RX_INFO);
@@ -169,10 +150,27 @@ static void mt792xu_wfdma_init(struct mt792x_dev *dev)
       MT_WFDMA0_GLO_CFG_TX_DMA_EN |
       MT_WFDMA0_GLO_CFG_RX_DMA_EN);

- /* disable dmashdl */
- mt76_clear(dev, MT_UWFDMA0_GLO_CFG_EXT0,
-        MT_WFDMA0_CSR_TX_DMASHDL_ENABLE);
- mt76_set(dev, MT_DMASHDL_SW_CONTROL, MT_DMASHDL_DMASHDL_BYPASS);
+ mt76_rmw(dev, MT_DMASHDL_REFILL, MT_DMASHDL_REFILL_MASK, 0xffe00000);
+ mt76_clear(dev, MT_DMASHDL_PAGE, MT_DMASHDL_GROUP_SEQ_ORDER);
+ mt76_rmw(dev, MT_DMASHDL_PKT_MAX_SIZE,
+      MT_DMASHDL_PKT_MAX_SIZE_PLE | MT_DMASHDL_PKT_MAX_SIZE_PSE,
+      FIELD_PREP(MT_DMASHDL_PKT_MAX_SIZE_PLE, 1) |
+      FIELD_PREP(MT_DMASHDL_PKT_MAX_SIZE_PSE, 0));
+ for (i = 0; i < 5; i++)
+     mt76_wr(dev, MT_DMASHDL_GROUP_QUOTA(i),
+         FIELD_PREP(MT_DMASHDL_GROUP_QUOTA_MIN, 0x3) |
+         FIELD_PREP(MT_DMASHDL_GROUP_QUOTA_MAX, 0xfff));
+ for (i = 5; i < 16; i++)
+     mt76_wr(dev, MT_DMASHDL_GROUP_QUOTA(i),
+         FIELD_PREP(MT_DMASHDL_GROUP_QUOTA_MIN, 0x0) |
+         FIELD_PREP(MT_DMASHDL_GROUP_QUOTA_MAX, 0x0));
+ mt76_wr(dev, MT_DMASHDL_Q_MAP(0), 0x32013201);
+ mt76_wr(dev, MT_DMASHDL_Q_MAP(1), 0x32013201);
+ mt76_wr(dev, MT_DMASHDL_Q_MAP(2), 0x55555444);
+ mt76_wr(dev, MT_DMASHDL_Q_MAP(3), 0x55555444);
+
+ mt76_wr(dev, MT_DMASHDL_SCHED_SET(0), 0x76540123);
+ mt76_wr(dev, MT_DMASHDL_SCHED_SET(1), 0xFEDCBA98);

  mt76_set(dev, MT_WFDMA_DUMMY_CR, MT_WFDMA_NEED_REINIT);
 }
diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index 5e5c7bf51174..a503bc94d5ef 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -873,7 +873,7 @@ mt76u_tx_queue_skb(struct mt76_dev *dev, struct mt76_queue *q,
  if (err < 0)
      return err;

- mt76u_fill_bulk_urb(dev, USB_DIR_OUT, q2ep(q->hw_idx),
+ mt76u_fill_bulk_urb(dev, USB_DIR_OUT, qid >= MT_TXQ_PSD ? 5 : q2ep(q->hw_idx),
              q->entry[idx].urb, mt76u_complete_tx,
              &q->entry[idx]);

Looking good so far! No crashes with this patch. Let's see how it runs for a few days...

MEL1H commented 8 months ago

I wish to know how to apply this patch. Can you guys suggest me a guide or anything else?

fayaaz commented 8 months ago

I wish to know how to apply this patch. Can you guys suggest me a guide or anything else?

There is a guide recommended for building the kernel yourself buried in this thread. However, I used https://www.raspberrypi.com/documentation/computers/linux_kernel.html and cross compiled it on my PC. You will need to apply the patch above after you check out a recent version of the code. You can apply a patch with git apply <patch file path>

morrownr commented 8 months ago

I wish to know how to apply this patch. Can you guys suggest me a guide or anything else?

Remind us what distro and kernel you are using?

Beware: You have entered the Dev Zone.

MEL1H commented 8 months ago

pi@md-ap:~ $ uname -a Linux md-ap 6.6.8-v8+ #1 SMP PREEMPT Thu Dec 21 17:26:57 UTC 2023 aarch64 GNU /Linux

pi@md-ap:~ $ cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"

morrownr commented 8 months ago

Okay, so Debian is the distro and the version is 12.

Here is a guide for Debian:

https://cs4118.github.io/dev-guides/debian-kernel-compilation.html

I'd use the tranditional method.

When you get to the below line, stop and edit the source or apply the patch.

$ make -j$(nproc)

One thing to note about the guide is it is using kernel 5.10. You don't want that. Maube 6.6 would be more appropriate or the latest 6.7 rc if you want.

EasyNetDev commented 8 months ago

@EasyNetDev: I experienced a dramatic decline in the stability of my MT7921AU-based ALFA AWUS036AXML when I switched from Linux 6.5.13 (EOL) to Linux 6.6.8 (“stable”). It went from lasting days to weeks to lasting only minutes to hours. Looking at the git-diff, there were very many changes made in the driver between those versions, so I guess they made it worse! ~Maybe give 6.5.13 a try?~ EDIT: Going back to 6.5.13 has not improved stability for me. My MT7921AU AP is still crapping out within minutes of booting up. Maybe it's a firmware regression. ::big sigh:: I'm getting close to giving up on this chipset.

@whitslack I'm facing now a stranger behavior. After a random time (between 10-15 or more minutes) I can't see the SSID and I don't see anything in the dmesg. There is no crash of the driver or anything. After restarting the hostapd I can see the SSID again for a few minutes. Is getting me crazy this driver :)

I'm going on 6.5.13 now I'm checking if has a better stability. I'm using latest firmware version for my USB dongle.

whitslack commented 8 months ago

After a random time (between 10-15 or more minutes) I can't see the SSID and I don't see anything in the dmesg. There is no crash of the driver or anything.

@EasyNetDev: That's the behavior I always see. The beacons stop being broadcast when the driver hangs, so the network is no longer visible when clients scan. Because the driver is hung, no message is written to the kernel log when this happens. However, if I bring down the interface while it is hung, then I'll get a message in the kernel log about a timeout while waiting for transmission.

After restarting the hostapd I can see the SSID again for a few minutes.

You're lucky that you can simply restart hostapd. That never works for me, as the new instance of hostapd is unable to configure the interface. I always have to reboot to get my network working again.

morrownr commented 8 months ago

@EasyNetDev @whitslack

Both of you are seeing AP mode problems. Can I get you both to confirm that this is with the 2.4 GHz band?

Also, if you both could post your hostapd.conf files, that would be great.

I have been following this conversation about AP mode over the last few weeks while testing on my end. I had found nothing but I have been testing with the 5 GHz band only. Last night, after a conversation with @whitslack , I found out his problems are with the 2.4 GHz band. I then started changing things to look at 2.4 GHz for more testing.

EasyNetDev commented 8 months ago

@EasyNetDev @whitslack

Both of you are seeing AP mode problems. Can I get you both to confirm that this is with the 2.4 GHz band?

Also, if you both could post your hostapd.conf files, that would be great.

I have been following this conversation about AP mode over the last few weeks while testing on my end. I had found nothing but I have been testing with the 5 GHz band only. Last night, after a conversation with @whitslack , I found out his problems are with the 2.4 GHz band. I then started changing things to look at 2.4 GHz for more testing.

@morrownr nope. I'm using 5GHz 80MHz bandwidth.

EasyNetDev commented 8 months ago

This is how it looks on kernel 6.5.13 the iperf3:

- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 305.00-306.00 sec  6.40 MBytes  53.7 Mbits/sec
[  6] 305.00-306.00 sec  6.33 MBytes  53.1 Mbits/sec
[  8] 305.00-306.00 sec  6.38 MBytes  53.5 Mbits/sec
[ 10] 305.00-306.00 sec  6.35 MBytes  53.3 Mbits/sec
[ 12] 305.00-306.00 sec  6.39 MBytes  53.6 Mbits/sec
[ 14] 305.00-306.00 sec  6.41 MBytes  53.8 Mbits/sec
[ 16] 305.00-306.00 sec  6.39 MBytes  53.7 Mbits/sec
[ 18] 305.00-306.00 sec  6.41 MBytes  53.8 Mbits/sec
[SUM] 305.00-306.00 sec  51.1 MBytes   428 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 306.00-307.00 sec  5.53 MBytes  46.4 Mbits/sec
[  6] 306.00-307.00 sec  5.47 MBytes  45.9 Mbits/sec
[  8] 306.00-307.00 sec  5.57 MBytes  46.7 Mbits/sec
[ 10] 306.00-307.00 sec  5.49 MBytes  46.0 Mbits/sec
[ 12] 306.00-307.00 sec  5.53 MBytes  46.4 Mbits/sec
[ 14] 306.00-307.00 sec  5.53 MBytes  46.4 Mbits/sec
[ 16] 306.00-307.00 sec  5.57 MBytes  46.7 Mbits/sec
[ 18] 306.00-307.00 sec  5.48 MBytes  45.9 Mbits/sec
[SUM] 306.00-307.00 sec  44.2 MBytes   370 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 307.00-308.00 sec  2.16 MBytes  18.1 Mbits/sec
[  6] 307.00-308.00 sec  2.18 MBytes  18.2 Mbits/sec
[  8] 307.00-308.00 sec  2.22 MBytes  18.6 Mbits/sec
[ 10] 307.00-308.00 sec  2.19 MBytes  18.3 Mbits/sec
[ 12] 307.00-308.00 sec  2.21 MBytes  18.5 Mbits/sec
[ 14] 307.00-308.00 sec  2.14 MBytes  17.9 Mbits/sec
[ 16] 307.00-308.00 sec  2.21 MBytes  18.5 Mbits/sec
[ 18] 307.00-308.00 sec  2.30 MBytes  19.2 Mbits/sec
[SUM] 307.00-308.00 sec  17.6 MBytes   147 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 308.00-309.01 sec  0.00 Bytes  0.00 bits/sec
[  6] 308.00-309.01 sec  0.00 Bytes  0.00 bits/sec
[  8] 308.00-309.01 sec  0.00 Bytes  0.00 bits/sec
[ 10] 308.00-309.01 sec  0.00 Bytes  0.00 bits/sec
[ 12] 308.00-309.01 sec  0.00 Bytes  0.00 bits/sec
[ 14] 308.00-309.01 sec  0.00 Bytes  0.00 bits/sec
[ 16] 308.00-309.01 sec  0.00 Bytes  0.00 bits/sec
[ 18] 308.00-309.01 sec  0.00 Bytes  0.00 bits/sec
[SUM] 308.00-309.01 sec  0.00 Bytes  0.00 bits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 309.01-310.01 sec  0.00 Bytes  0.00 bits/sec
[  6] 309.01-310.01 sec  0.00 Bytes  0.00 bits/sec
[  8] 309.01-310.01 sec  0.00 Bytes  0.00 bits/sec
[ 10] 309.01-310.01 sec  0.00 Bytes  0.00 bits/sec
[ 12] 309.01-310.01 sec  0.00 Bytes  0.00 bits/sec
[ 14] 309.01-310.01 sec  0.00 Bytes  0.00 bits/sec
[ 16] 309.01-310.01 sec  0.00 Bytes  0.00 bits/sec
[ 18] 309.01-310.01 sec  0.00 Bytes  0.00 bits/sec
[SUM] 309.01-310.01 sec  0.00 Bytes  0.00 bits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -

After about 5 minutes of continuous test.

But this time I'm getting this error in dmesg:

[ 3492.634641] mt7921u 4-1.2:1.3: Message 00020003 (seq 7) timeout
[ 3492.914507] mt7921u 4-1.2:1.3: timed out waiting for pending tx
[ 3492.966664] ------------[ cut here ]------------
[ 3492.966742] WARNING: CPU: 2 PID: 4467 at kernel/kthread.c:660 kthread_park+0x120/0x124
[ 3492.966857] Modules linked in: aes_arm_bs crypto_simd cryptd bridge stp llc vrf ip_gre gre wireguard curve25519_neon libchacha20poly1305 chacha_neon poly1305_arm ip6_udp_tunnel udp_tunnel libcurve25519_generic algif_hash algif_skcipher af_alg bnep sunrpc nft_masq nft_nat nft_chain_nat nf_nat mt7921u mt7921_common mt76_connac_lib mt76_usb mt76 mac80211 cfg80211 btusb btrtl btbcm btmtk btintel bluetooth ecdh_generic rfkill ecc onboard_usb_hub s5p_cec nft_ct evdev nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 zstd lz4hc lz4hc_compress lz4 lz4_compress lzo_rle zram zsmalloc binfmt_misc nf_tables nfnetlink mpls_gso mpls_iptunnel mpls_router ip_tunnel loop fuse ip_tables ipv6 btrfs blake2b_neon blake2b_generic xor xor_neon lzo_compress zlib_deflate raid6_pq clk_exynos_clkout gpio_keys
[ 3492.968686] CPU: 2 PID: 4467 Comm: kworker/u16:2 Not tainted 6.5.13 #6
[ 3492.968740] Hardware name: Samsung Exynos (Flattened Device Tree)
[ 3492.968782] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
[ 3492.969006]  unwind_backtrace from show_stack+0x10/0x14
[ 3492.969107]  show_stack from dump_stack_lvl+0x40/0x4c
[ 3492.969195]  dump_stack_lvl from __warn+0x78/0x154
[ 3492.969311]  __warn from warn_slowpath_fmt+0x1ac/0x1b4
[ 3492.969407]  warn_slowpath_fmt from kthread_park+0x120/0x124
[ 3492.969487]  kthread_park from mt76u_stop_tx+0x234/0x298 [mt76_usb]
[ 3492.969657]  mt76u_stop_tx [mt76_usb] from mt7921u_mac_reset+0x78/0x1c0 [mt7921u]
[ 3492.969789]  mt7921u_mac_reset [mt7921u] from mt7921_mac_reset_work+0x80/0x158 [mt7921_common]
[ 3492.969937]  mt7921_mac_reset_work [mt7921_common] from process_one_work+0x1f8/0x524
[ 3492.970074]  process_one_work from worker_thread+0x54/0x508
[ 3492.970148]  worker_thread from kthread+0x110/0x12c
[ 3492.970214]  kthread from ret_from_fork+0x14/0x28
[ 3492.970273] Exception stack(0xf1e15fb0 to 0xf1e15ff8)
[ 3492.970321] 5fa0:                                     00000000 00000000 00000000 00000000
[ 3492.970370] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 3492.970416] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 3492.970455] ---[ end trace 0000000000000000 ]---
[ 3493.130346] mt7921u 4-1.2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a

[ 3493.147118] mt7921u 4-1.2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958
EasyNetDev commented 8 months ago

Here is my hostapd config:

interface=wlxe0e1a93655e3
bridge=lan1
driver=nl80211
logger_syslog=-1
logger_syslog_level=2
logger_stdout=-1
logger_stdout_level=2
ctrl_interface=/var/run/hostapd
ctrl_interface_group=0
ssid=MyAP-Gaming
country_code=RO
hw_mode=a
channel=36
beacon_int=100
dtim_period=2
max_num_sta=255
rts_threshold=2347
fragm_threshold=2346
macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0
wmm_enabled=1
wmm_ac_bk_cwmin=4
wmm_ac_bk_cwmax=10
wmm_ac_bk_aifs=7
wmm_ac_bk_txop_limit=0
wmm_ac_bk_acm=0
wmm_ac_be_aifs=3
wmm_ac_be_cwmin=4
wmm_ac_be_cwmax=10
wmm_ac_be_txop_limit=0
wmm_ac_be_acm=0
wmm_ac_vi_aifs=2
wmm_ac_vi_cwmin=3
wmm_ac_vi_cwmax=4
wmm_ac_vi_txop_limit=94
wmm_ac_vi_acm=0
wmm_ac_vo_aifs=2
wmm_ac_vo_cwmin=2
wmm_ac_vo_cwmax=3
wmm_ac_vo_txop_limit=47
wmm_ac_vo_acm=0
disassoc_low_ack=1
wds_sta=1
ieee80211n=1
wmm_enabled=1
ht_capab=[LDPC][HT40+][HT40-][GF][SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1][MAX-AMSDU-7935]
ieee80211ac=1
vht_oper_chwidth=1
vht_oper_centr_freq_seg0_idx=42
vht_capab=[RXLDPC][SHORT-GI-80][TX-STBC-2BY1][SU-BEAMFORMEE][MU-BEAMFORMEE][RX-ANTENNA-PATTERN][TX-ANTENNA-PATTERN][RX-STBC-1][BF-ANTENNA-4][MAX-MPDU-11454][MAX-A-MPDU-LEN-EXP7]
ieee80211ax=1
he_oper_chwidth=1
he_oper_centr_freq_seg0_idx=42
eapol_key_index_workaround=0
eap_server=0
own_ip_addr=192.168.143.1
nas_identifier=e0e1a93655e3
wpa=2
wpa_passphrase=XXXXXXXXXXXXX
wpa_key_mgmt=FT-PSK WPA-PSK
wpa_pairwise=CCMP
rsn_pairwise=CCMP
wpa_group_rekey=0
rsn_preauth=1
rsn_preauth_interfaces=lan1
ieee80211w=1
sae_require_mfp=1
okc=1
mobility_domain=02f8
r1_key_holder=e0e1a93655e3
reassociation_deadline=3000
pmk_r1_push=0
ft_psk_generate_local=1
ap_table_max_size=255
ap_table_expiration_time=3600
wps_pin_requests=/var/run/hostapd_wps_pin_requests
device_name=EasyNet-AP
manufacturer=EasyNet
model_number=EASY-100
serial_number=44664333
time_advertisement=2
time_zone=2EEST3,M3.5.0/2,M10.5.0/3
bss_transition=1
interworking=1
access_network_type=0
internet=1
EasyNetDev commented 8 months ago

Because in my case the driver is still recovering from crash, second test is looks like this:

[ 4358.045914] xhci-hcd xhci-hcd.8.auto: ERROR unknown event type 37
[ 4366.873683] mt7921u 4-1.2:1.3: Message 00020003 (seq 7) timeout
[ 4367.163531] mt7921u 4-1.2:1.3: timed out waiting for pending tx
[ 4367.393169] mt7921u 4-1.2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a

[ 4367.407101] mt7921u 4-1.2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958

And iperf3:

- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 121.00-122.00 sec  2.64 MBytes  22.1 Mbits/sec
[  6] 121.00-122.00 sec  2.97 MBytes  24.8 Mbits/sec
[  8] 121.00-122.00 sec  2.84 MBytes  23.7 Mbits/sec
[ 10] 121.00-122.00 sec  2.83 MBytes  23.6 Mbits/sec
[ 12] 121.00-122.00 sec  2.83 MBytes  23.6 Mbits/sec
[ 14] 121.00-122.00 sec  2.66 MBytes  22.2 Mbits/sec
[ 16] 121.00-122.00 sec  2.86 MBytes  23.9 Mbits/sec
[ 18] 121.00-122.00 sec  2.79 MBytes  23.3 Mbits/sec
[SUM] 121.00-122.00 sec  22.4 MBytes   187 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 122.00-123.01 sec   900 KBytes  7.32 Mbits/sec
[  6] 122.00-123.01 sec   965 KBytes  7.85 Mbits/sec
[  8] 122.00-123.01 sec   938 KBytes  7.63 Mbits/sec
[ 10] 122.00-123.01 sec   947 KBytes  7.70 Mbits/sec
[ 12] 122.00-123.01 sec   922 KBytes  7.50 Mbits/sec
[ 14] 122.00-123.01 sec   928 KBytes  7.55 Mbits/sec
[ 16] 122.00-123.01 sec   928 KBytes  7.55 Mbits/sec
[ 18] 122.00-123.01 sec   942 KBytes  7.67 Mbits/sec
[SUM] 122.00-123.01 sec  7.30 MBytes  60.8 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 123.01-124.01 sec  0.00 Bytes  0.00 bits/sec
[  6] 123.01-124.01 sec  0.00 Bytes  0.00 bits/sec
[  8] 123.01-124.01 sec  0.00 Bytes  0.00 bits/sec
[ 10] 123.01-124.01 sec  0.00 Bytes  0.00 bits/sec
[ 12] 123.01-124.01 sec  0.00 Bytes  0.00 bits/sec
[ 14] 123.01-124.01 sec  0.00 Bytes  0.00 bits/sec
[ 16] 123.01-124.01 sec  0.00 Bytes  0.00 bits/sec
[ 18] 123.01-124.01 sec  0.00 Bytes  0.00 bits/sec
[SUM] 123.01-124.01 sec  0.00 Bytes  0.00 bits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -

And looks like the performance of the AP is lower after recovery.

And after the second try of recovery I can't see anymore the SSID, even hostapd looks like is still running:

# systemctl status hostapd@wlxe0e1a93655e3.service
● hostapd@wlxe0e1a93655e3.service - Access point and authentication server for Wi-Fi and Ethernet (wlxe0e1a93655e3)
     Loaded: loaded (/lib/systemd/system/hostapd@.service; enabled; preset: enabled)
     Active: active (running) since Wed 2023-12-20 03:11:53 EET; 1 week 0 days ago
       Docs: man:hostapd(8)
   Main PID: 1850 (hostapd)
      Tasks: 1 (limit: 4425)
     Memory: 1.9M
        CPU: 350ms
     CGroup: /system.slice/system-hostapd.slice/hostapd@wlxe0e1a93655e3.service
             └─1850 /usr/sbin/hostapd -B -P /run/hostapd.wlxe0e1a93655e3.pid -dd /etc/hostapd/wlxe0e1a93655e3.conf

Dec 27 18:59:32 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 IEEE 802.11: authenticated
Dec 27 18:59:32 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 IEEE 802.11: associated (aid 1)
Dec 27 18:59:33 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 RADIUS: starting accounting session AE7082F8C8867215
Dec 27 18:59:33 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 WPA: pairwise key handshake completed (RSN)
Dec 27 19:15:52 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 IEEE 802.11: disassociated due to inactivity
Dec 27 19:15:53 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 IEEE 802.11: deauthenticated due to inactivity (tim>
Dec 27 19:22:50 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 IEEE 802.11: authenticated
Dec 27 19:22:50 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 IEEE 802.11: associated (aid 1)
Dec 27 19:22:50 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 RADIUS: starting accounting session CF9774E62820016C
Dec 27 19:22:50 R13 hostapd[1850]: wlxe0e1a93655e3: STA 1c:1b:b5:f7:8a:c8 WPA: pairwise key handshake completed (RSN)

# ps aux | grep hostapd
root      1850  0.0  0.2   7008  4400 ?        Ss   18:12   0:00 /usr/sbin/hostapd -B -P /run/hostapd.wlxe0e1a93655e3.pid -dd /etc/hostapd/wlxe0e1a93655e3.conf
morrownr commented 8 months ago

@EasyNetDev

Would you mind if I posted a suggested hostapd.conf for you to try?

EasyNetDev commented 8 months ago

@EasyNetDev

Would you mind if I posted a suggested hostapd.conf for you to try?

Not at all :). I'm glad to test different setups.

whitslack commented 8 months ago

@morrownr: Here's my complete network setup.

The system is a JetWay JNC9KDL-2550 mini-ITX motherboard running an Intel Atom D2550 quad-core 32-bit x86 CPU with 2 GB DDR3 DRAM and a Kingspec 8GB SATA Flash disk-on-module. There are two onboard RTL8169 gigabit Ethernet NICs (named eth0 and eth1) and exactly two attached USB devices: an ALFA AWUS036ACM 802.11ac adapter with the MT7612U chipset (named wifi5), and an ALFA AWUS036AXML 802.11ax adapter with the MT7921AUN chipset (named wifi6). There is no network manager on the system; rather, the network interfaces are configured by the ip applet of Busybox v1.36.1, running from inittab.

The startup of the system executes this procedure:

  1. /sbin/ipset restore </etc/ipset where /etc/ipset contains (with MAC addresses shown partially redacted here):
    create no-gateway hash:mac
    add no-gateway 30:DE:4B:00:00:00
    add no-gateway 84:D8:1B:00:00:00
    add no-gateway 84:D8:1B:00:00:01
    add no-gateway E0:46:EE:00:00:00
    add no-gateway E4:C3:2A:00:00:00
    add no-gateway E4:C3:2A:00:00:01
    add no-gateway E4:C3:2A:00:00:02
    add no-gateway E4:C3:2A:00:00:03
    add no-gateway E4:C3:2A:00:00:04
    add no-gateway E8:48:B8:00:00:00
    add no-gateway E8:48:B8:00:00:01
  2. /sbin/iptables-restore </etc/rules where /etc/rules contains (with comments added here for clarity):
    *filter
    # accept all ICMP packets addressed to the router
    -A INPUT -p icmp -j ACCEPT
    # accept all DHCP client requests except from the WAN interfaces
    -A INPUT ! -i eth0+ -p udp --dport 67 -j ACCEPT
    # reject all other packets addressed to the router from MACs in the no-gateway set
    -A INPUT ! -i eth0+ -m set --match-set no-gateway src -j REJECT --reject-with icmp-admin-prohibited
    # accept all UDP DNS queries except from the WAN interfaces
    -A INPUT ! -i eth0+ -p udp --dport 53 -j ACCEPT
    # accept all TCP DNS queries except from the WAN interfaces
    -A INPUT ! -i eth0+ -p tcp --dport 53 -j ACCEPT
    # reject all packets from the IoT LAN except those in established flows
    -A INPUT -i br-IoT -m state ! --state ESTABLISHED,RELATED -j REJECT --reject-with icmp-admin-prohibited
    # refuse to forward from the LANs to the WANs all packets from MACs in the no-gateway set
    -A FORWARD ! -i eth0+ -o eth0+ -m set --match-set no-gateway src -j REJECT --reject-with icmp-admin-prohibited
    # refuse to forward from the IoT LAN to any other LAN all packets not in established flows
    -A FORWARD -i br-IoT ! -o eth0+ -m state ! --state ESTABLISHED,RELATED -j REJECT --reject-with icmp-admin-prohibited
    COMMIT
    *nat
    # route all TCP packets from the fiber WAN to the DMZ LAN host
    -A PREROUTING -p tcp -i eth0.42 -j DNAT --to 192.168.42.2
    # route all TCP packets from the coax WAN to the DMZ LAN host
    -A PREROUTING -p tcp -i eth0.69 -j DNAT --to 192.168.69.2
    # route all UDP packets (except DHCP replies) from the fiber WAN to the DMZ LAN host
    -A PREROUTING -p udp -m udp ! --dport 68 -i eth0.42 -j DNAT --to 192.168.42.2
    # route all UDP packets (except DHCP replies) from the coax WAN to the DMZ LAN host
    -A PREROUTING -p udp -m udp ! --dport 68 -i eth0.69 -j DNAT --to 192.168.69.2
    # masquerade the source address of all packets going out on the WAN interfaces
    -A POSTROUTING -o eth0+ -j MASQUERADE
    COMMIT
  3. /sbin/ip6tables-restore </etc/rules6 where /etc/rules6 contains (with comments added here for clarity):
    *filter
    # accept all ICMPv6 packets addressed to the router
    -A INPUT -p icmpv6 -j ACCEPT
    # reject all other packets addressed to the router from MACs in the no-gateway set
    -A INPUT ! -i eth0+ -m set --match-set no-gateway src -j REJECT --reject-with icmp6-adm-prohibited
    # accept all UDP DNS queries except from the WAN interfaces
    -A INPUT ! -i eth0+ -p udp --dport 53 -j ACCEPT
    # accept all TCP DNS queries except from the WAN interfaces
    -A INPUT ! -i eth0+ -p tcp --dport 53 -j ACCEPT
    # reject all packets from the IoT LAN except those in established flows
    -A INPUT -i br-IoT -m state ! --state ESTABLISHED,RELATED -j REJECT --reject-with icmp6-adm-prohibited
    # refuse to forward from the LANs to the WANs all packets from MACs in the no-gateway set
    -A FORWARD ! -i eth0+ -o eth0+ -m set --match-set no-gateway src -j REJECT --reject-with icmp6-adm-prohibited
    # refuse to forward from the IoT LAN to any other LAN all packets not in established flows
    -A FORWARD -i br-IoT ! -o eth0+ -m state ! --state ESTABLISHED,RELATED -j REJECT --reject-with icmp6-adm-prohibited
    COMMIT
  4. Create the two WAN virtual interfaces:
    ip link add link eth0 name eth0.42 type vlan id 42
    ip link add link eth0 name eth0.69 type vlan id 69
  5. Create the IoT LAN virtual interface:
    ip link add link eth1 name eth1-IoT type vlan id 2
  6. Create the OpenVPN TAP interface:
    openvpn --syslog --mktun --dev tap0
  7. Create and populate the trusted LAN bridge:
    brctl addbr br0
    brctl addif br0 eth1 brctl addif br0 tap0
  8. Create the untrusted IoT LAN bridge:
    ip link add br-IoT type bridge
  9. Install the CAKE queuing discipline on the coax WAN interface:
    /sbin/tc -batch /etc/tc where /etc/tc contains:
    qdisc add dev eth0.69 root handle 1: cake bandwidth 12100kbit docsis internet dual-srchost nat diffserv4 ack-filter
  10. Bring all the network interfaces up:
    ip link set lo up
    ip link set eth0 up
    ip link set eth0.42 up
    ip link set eth0.69 up
    ip link set eth1 up
    ip link set eth1-IoT up master br-IoT
    ip link set tap0 up
    ip link set br0 up
    ip link set br-IoT up
  11. Add static IP addresses to the LAN interfaces:
    ip addr add 192.168.0.1/23 dev br0
    ip addr add 192.168.2.1/24 dev br-IoT
    ip addr add 192.168.42.1/24 dev br0
    ip addr add 192.168.69.1/24 dev br0
  12. Add policy routing rules for the fiber and coax DMZs:
    ip rule add from 192.168.42.0/24 lookup 42 priority 1042
    ip rule add from 192.168.69.0/24 lookup 69 priority 1069
  13. /sbin/dnsmasq where /etc/dnsmasq.conf contains (with MAC addresses shown partially redacted here):

    no-hosts
    log-queries=extra
    except-interface=eth0*
    bogus-priv
    resolv-file=/tmp/resolv.conf
    domain-needed
    cache-size=10000
    dhcp-range=192.168.1.1,192.168.1.254,720h
    dhcp-range=set:IoT,192.168.2.100,192.168.2.199,24h
    dhcp-range=::,constructor:br0,ra-only,ra-names
    dhcp-option=vendor:MSFT,1,2i
    dhcp-authoritative
    dhcp-rapid-commit
    log-dhcp
    dhcp-script=/bin/logger
    script-on-renewal
    enable-ra
    ra-param=br0,mtu:eth0,high,1800,9000
    
    dhcp-host=30:DE:4B:00:00:00,set:no-gateway,KS200M-0000
    dhcp-host=70:86:CE:00:00:00,MAW12V1QWT-0000
    dhcp-host=84:D8:1B:00:00:00,set:no-gateway,KP115-0005
    dhcp-host=84:D8:1B:00:00:01,set:no-gateway,KP303-0001
    dhcp-host=B8:27:EB:00:00:00,id:*
    dhcp-host=B8:8C:29:00:00:01,MAW12V1QWT-0001
    dhcp-host=E0:46:EE:00:00:00,set:no-gateway,GS108Ev3
    dhcp-host=E4:C3:2A:00:00:00,set:no-gateway,KP115-0000
    dhcp-host=E4:C3:2A:00:00:01,set:no-gateway,KP115-0001
    dhcp-host=E4:C3:2A:00:00:02,set:no-gateway,KP115-0002
    dhcp-host=E4:C3:2A:00:00:03,set:no-gateway,KP115-0003
    dhcp-host=E4:C3:2A:00:00:04,set:no-gateway,KP115-0004
    dhcp-host=E8:48:B8:00:00:00,set:no-gateway,KL125-0000
    dhcp-host=E8:48:B8:00:00:01,set:no-gateway,KL125-0001
    dhcp-option=tag:no-gateway,option:router
    dhcp-option=tag:no-gateway,option:dns-server
    
    domain=home.mattwhitlock.com
    local=/home.mattwhitlock.com/
    cname=home.mattwhitlock.com,Crushinator.home.mattwhitlock.com,86400
    address=/public.home.mattwhitlock.com/
    interface-name=public.home.mattwhitlock.com,eth0.42/4
    interface-name=public.home.mattwhitlock.com,eth0.69/4
    
    # https://pgl.yoyo.org/as/serverlist.php?hostformat=dnsmasq-server&showintro=1&mimetype=plaintext
    servers-file=/etc/adservers.conf
  14. /sbin/dhcpcd where /etc/dhcpcd.conf contains:

    debug
    allowinterfaces eth0.42 eth0.69
    clientid
    noipv4ll
    noipv6rs
    script /etc/dhcpcd.sh
    
    require dhcp_server_identifier
    option domain_name_servers domain_name domain_search
    option ntp_servers
    option rapid_commit
    
    interface eth0.42
            request 216.212.37.161
            ipv6rs
            ia_pd 0 br0/0/64
    
    interface eth0.69
            request 174.169.193.33
            ipv6rs
            ia_pd 0/2601:18c:9082:afd::/64 br0/0/64

    /etc/dhcpcd.sh contains:

    #!/bin/sh
    
    regen_resolv() {
            search=
            {
                    for each in /tmp/resolv-*.conf ; do
                            while read -r keyword args ; do
                                    case "${keyword}" in
                                            domain|search)
                                                    search="${search} ${args}"
                                                    ;;
                                            *)
                                                    echo "${keyword} ${args}"
                                                    ;;
                                    esac
                            done <"${each}"
                    done
                    [ -n "${search}" ] && echo "search${search}"
            } >/tmp/resolv.conf.new
            mv /tmp/resolv.conf.new /tmp/resolv.conf
    }
    
    case "${reason}" in
    
    BOUND|REBIND)
            case "${interface}" in
                    eth0.[0-9]*)
                            ip rule add from "${new_ip_address}" lookup "${interface#eth0.}" priority "${interface#eth0.}"
                            for each in ${new_routers} ; do
                                    ip route add default via "${each}" dev "${interface}" src "${new_ip_address}" metric "${ifmetric}" table "${interface#eth0.}"
                            done
                            ;;
            esac
    
            logger -t dhcp "Upstream routers: ${new_routers:-(none)}"
    
            logger -t dhcp "Domain name: ${new_domain_name:-(none)}"
            logger -t dhcp "DNS servers: ${new_domain_name_servers:-(none)}"
            {
                    [ -n "${new_domain_name}" ] && echo "domain ${new_domain_name}"
                    for each in ${new_domain_name_servers} ; do
                            echo "nameserver ${each}"
                            for router in ${new_routers} ; do
                                    ip route add "${each}" via "${router}" dev "${interface}" src "${new_ip_address}" metric "${ifmetric}"
                            done
                    done
            } >"/tmp/resolv-${interface}.conf"
            regen_resolv
    
            reason=RENEW "${0}" "${@}"
            ;;
    
    RENEW)
            logger -t dhcp "NTP servers: ${new_ntp_servers:-(none)}"
            for each in ${new_ntp_servers:-pool.ntp.org} ; do
                    /sbin/ntpclient -c 1 -L -n -s "${each}" && break
            done
            hwclock -w -u
            ;;
    
    EXPIRE|NAK)
            case "${interface}" in
                    eth0.[0-9]*)
                            ip rule del priority "${interface#eth0.}"
                            ip route flush table "${interface#eth0.}"
                            ;;
            esac
    
            rm -f "/tmp/resolv-${interface}.conf"
            regen_resolv
            ;;
    
    esac
  15. /sbin/openvpn --syslog --config /etc/openvpn/openvpn.conf where /etc/openvpn/openvpn.conf contains:

    server-bridge nogw
    dev tap0
    
    keepalive 10 60
    persist-tun
    persist-key
    
    ca /etc/openvpn/ca.crt
    dh /etc/openvpn/dh1024.pem
    cert /etc/openvpn/home.mattwhitlock.com.crt
    key /etc/openvpn/home.mattwhitlock.com.key
    tls-auth /etc/openvpn/ta.key 0
    cipher AES-256-GCM
    data-ciphers AES-256-GCM:AES-128-GCM:AES-256-CBC
  16. /etc/hostapd.sh, which contains (with MAC addresses shown partially redacted here):

    #!/bin/sh
    
    while : ; do
            nameif -s wifi5 00:c0:ca:00:00:00 wifi6 00:c0:ca:00:00:01
            [ -d /sys/class/net/wifi5 -a -d /sys/class/net/wifi6 ] && break
            sleep 1
    done
    
    exec /sbin/hostapd -s /etc/hostapd/wifi5.conf /etc/hostapd/wifi6.conf

    This takes care of waiting for the USB Wi-Fi adapters to appear on the bus, renaming their network interfaces to wifi5 and wifi6, and starting hostapd.

/etc/hostapd/wifi5.conf contains (with comments stripped and passphrase redacted):

interface=wifi5
bridge=br-IoT
driver=nl80211
logger_syslog=-1
logger_syslog_level=2
logger_stdout=-1
logger_stdout_level=2
ctrl_interface=/var/run/hostapd
ctrl_interface_group=0
ssid=Whitslack-IoT
country_code=US
ieee80211d=1
ieee80211h=1
hw_mode=g
channel=11
min_tx_power=20
beacon_int=100
dtim_period=2
max_num_sta=255
rts_threshold=-1
fragm_threshold=-1
preamble=1
macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0
wmm_enabled=1
uapsd_advertisement_enabled=1
wmm_ac_bk_cwmin=4
wmm_ac_bk_cwmax=10
wmm_ac_bk_aifs=7
wmm_ac_bk_txop_limit=0
wmm_ac_bk_acm=0
wmm_ac_be_aifs=3
wmm_ac_be_cwmin=4
wmm_ac_be_cwmax=10
wmm_ac_be_txop_limit=0
wmm_ac_be_acm=0
wmm_ac_vi_aifs=2
wmm_ac_vi_cwmin=3
wmm_ac_vi_cwmax=4
wmm_ac_vi_txop_limit=94
wmm_ac_vi_acm=0
wmm_ac_vo_aifs=2
wmm_ac_vo_cwmin=2
wmm_ac_vo_cwmax=3
wmm_ac_vo_txop_limit=47
wmm_ac_vo_acm=0
ap_isolate=1
ieee80211n=1
ht_capab=[LDPC][HT40+][HT40-][GF][SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1]
require_ht=1
nas_identifier=iot.home.mattwhitlock.com
wpa=2
extended_key_id=2
wpa_passphrase=##REDACTED##
wpa_key_mgmt=WPA-PSK
rsn_pairwise=CCMP
ieee80211w=1
beacon_prot=1
ocv=2
time_advertisement=2
time_zone=EST5EDT
wnm_sleep_mode=1
rrm_neighbor_report=1
rrm_beacon_report=1
stationary_ap=1

/etc/hostapd/wifi6.conf contains (with comments stripped and passphrase redacted):

interface=wifi6
bridge=br0
driver=nl80211
logger_syslog=-1
logger_syslog_level=2
logger_stdout=-1
logger_stdout_level=2
ctrl_interface=/var/run/hostapd
ctrl_interface_group=0
ssid=Whitslack
country_code=US
ieee80211d=1
ieee80211h=1
hw_mode=g
channel=6
min_tx_power=20
beacon_int=100
dtim_period=2
max_num_sta=255
rts_threshold=-1
fragm_threshold=-1
preamble=1
macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0
wmm_enabled=1
uapsd_advertisement_enabled=1
wmm_ac_bk_cwmin=4
wmm_ac_bk_cwmax=10
wmm_ac_bk_aifs=7
wmm_ac_bk_txop_limit=0
wmm_ac_bk_acm=0
wmm_ac_be_aifs=3
wmm_ac_be_cwmin=4
wmm_ac_be_cwmax=10
wmm_ac_be_txop_limit=0
wmm_ac_be_acm=0
wmm_ac_vi_aifs=2
wmm_ac_vi_cwmin=3
wmm_ac_vi_cwmax=4
wmm_ac_vi_txop_limit=94
wmm_ac_vi_acm=0
wmm_ac_vo_aifs=2
wmm_ac_vo_cwmin=2
wmm_ac_vo_cwmax=3
wmm_ac_vo_txop_limit=47
wmm_ac_vo_acm=0
ieee80211n=1
ht_capab=[LDPC][HT40+][HT40-][GF][SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1][MAX-AMSDU-7935]
require_ht=1
vht_capab=[MAX-MPDU-11454][RXLDPC][SHORT-GI-80][TX-STBC-2BY1][RX-STBC-1][SU-BEAMFORMEE][BF-ANTENNA-4][MAX-A-MPDU-LEN-EXP7][RX-ANTENNA-PATTERN][TX-ANTENNA-PATTERN]
ieee80211ax=1
nas_identifier=wifi6.home.mattwhitlock.com
wpa=2
extended_key_id=2
wpa_passphrase=##REDACTED##
wpa_key_mgmt=WPA-PSK WPA-PSK-SHA256
rsn_pairwise=CCMP CCMP-256
ieee80211w=1
beacon_prot=1
ocv=1
time_advertisement=2
time_zone=EST5EDT
wnm_sleep_mode=1
rrm_neighbor_report=1
rrm_beacon_report=1
stationary_ap=1
morrownr commented 8 months ago
# SSID
ssid=myPI-WiFi4
# PASSPHRASE
wpa_passphrase=myPW1234
# Band: a = 5Ghz (a/n/ac), g = 2Ghz (b/g/n)
hw_mode=g
# Channel
channel=6
# Country code
country_code=US

# Bridge interface
bridge=br0
# WiFi interface
interface=wlx0013ef6f0a98

# nl80211 is used with all Linux mac80211 (in-kernel) and modern Realtek drivers
driver=nl80211

# security
# auth_algs=3 is required for WPA3-SAE and WPA3-SAE Transition mode
auth_algs=1
macaddr_acl=0
ignore_broadcast_ssid=0
wpa=2
wpa_pairwise=CCMP
# WPA2-AES
wpa_key_mgmt=WPA-PSK

# IEEE 802.11n
ieee80211n=1
wmm_enabled=1
#
# Note: Only one ht_capab= line should be active. The content of these lines is
# determined by the capabilities of your adapter.
#
# generic 20 MHz setting
ht_capab=[SHORT-GI-20]
#
# mt7921au  (HT capabilities 0x9ff)
# ht_capab=[LDPC][HT40+][HT40-][GF][SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1][MAX-AMSDU-7935]
#

My suggestion is to start simple. Only use the active lines as shown to start. Modify things such as SSID as desired. If this gives solid, stable results, then add additional capabilities one at a time and test until you run into a problem. There is a bug in hostap with HE, 40 GHz width and 2.4 band. The fix is in master but that means you would have to compile it. I'm not saying that is contributing to your problem but rather is just something to keep in mind.

I have begun testing using 2.4.

whitslack commented 8 months ago

@morrownr: You'll be effectively running at 802.11g rates unless you configure WMM. As far as I am aware, it's not enabled in hostapd by default.

morrownr commented 8 months ago

@whitslack

It's there.

# IEEE 802.11n
ieee80211n=1
wmm_enabled=1

I haven't had time to look at yours so that reply was aimed at @EasyNetDev . Hopefully I am not mixing things up while looking at so many configuration files.

whitslack commented 8 months ago

@morrownr: Do you not need to set all the WMM queue parameters as well? (It's always been unclear to me whether the recommended values are in effect by default when wmm_enabled=1, so I've always set them all explicitly just to be safe.)

morrownr commented 8 months ago

@whitslack

Do you not need to set all the WMM queue parameters as well?

Not unless you have a specific need such a VoIP as far as I know. I always turn it on because the WiFi 4 (n) spec calls for it but I don't tweak it by adding the various settings. I am willing change my thoughts on this topic given information that shows how it can help.

@morrownr

whitslack commented 8 months ago

@morrownr: I haven't ever wanted to tweak the parameters, as frankly I've never had so much simultaneous Wi-Fi traffic that I actually needed to prioritize some flows over others, but even if I had, the recommended values look sensible to me, as far as I understand them. The only reason I set them in my config at all is that the comments in the example hostapd.conf don't explicitly say what the defaults are, and I know no way of checking that the correct values are really being used. If you say that the recommended values are what get used if nothing is explicitly specified, I would believe you.

morrownr commented 8 months ago

If you say that the recommended values are what get used if nothing is explicitly specified, I would believe you.

I can't say what the default values are. Like you said, the hostap doc (the example hostapd.conf) is not specific. We might dive into the source or find a place to leave a message. I've been using hostapd for a long time and this is just not a subject that has come up. I add wmm_enabled=1 because you are stuck at 54 Mbps if you don't.

I am studying your hostapd.conf files. I see that both are on the 2Ghz band. One being on channel 6 and the other on channel 11. I see something that causes concern in that you are using 40 for a channel width on both which means the signals are overlapping:

ht_capab=[LDPC][HT40+][HT40-][GF][SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1]

You might try the follow in both conf files:

ht_capab=[LDPC][SHORT-GI-20][TX-STBC][RX-STBC1]

I see some other things in both files that are just not what I would do. I am not saying you are wrong because wifi is complicated and I don't know exactly what you have in mind so it is what it is. I could take your conf files and modify them to what I would use if you want.

@morrownr

morrownr commented 8 months ago

@whitslack

Regarding:

You might try the follow in both conf files:

ht_capab=[LDPC][SHORT-GI-20][TX-STBC][RX-STBC1]

Depending on the situation, you could leave one on a width of 40 while taking the other down to 20. That would take the entire 2GHz spectrum. To do that:

Channel 11 ht_capab=[LDPC][SHORT-GI-20][TX-STBC][RX-STBC1]

Channel 3 ht_capab=[LDPC][HT40-][SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1]

Of note: The ht and vht example capab statements I have in the example hostapd.conf files give the full capabilities of the chipsets but you do not need to use all of the capabilities. It depends on what you are trying to do.

@morrownr

whitslack commented 8 months ago

I see something that causes concern in that you are using 40 for a channel width on both which means the signals are overlapping

@morrownr: I'm only advertising that the AP is capable of 40-MHz channel width, which it is. As you surely know, it's practically impossible to get a 2.4GHz-band BSS to actually run at 40-MHz channel width (without hacking the source code to violate the regulations) unless you live on a farm that is completely out of range of any other 2.4GHz networks. In actuality, both of my 2.4GHz-band networks are running with 20-MHz channel width because there are many other visible networks that are running with 20-MHz channel width, and the spec says that 40 MHz may not be used at all if any 20-MHz networks are in range. The access point does a radio scan at startup to check whether it can actually use 40-MHz, and that will fail every time if you live in any settled area.

I see some other things in both files that are just not what I would do. I am not saying you are wrong because wifi is complicated and I don't know exactly what you have in mind so it is what it is.

Some of the options in my conf files are left over from when I was trying to get bandsteering and fast roaming to work. I eventually gave up on those ideas, as there does not seem to be any real way to get clients to switch to a stronger AP until they literally cannot hold onto their current association any longer. It seems like a fundamental design oversight in 802.11.

morrownr commented 8 months ago

@whitslack

The access point does a radio scan at startup to check whether it can actually use 40-MHz, and that will fail every time if you live in any settled area.

I am aware of that but I do live in a settled area with 2GHz AP's visiable on 1, 6, 11. The only somewhat strong signal is on channel 11. The others are weak. I am able to run 40 MHz width in the lower area of the spectrum. While that would not seem to work with guidance, I do not know exactly what is considered within range and it may depend on the interpretation of the coder. Is it visible range or usable range? Is there a specific signal level that shuts down 40 MHz width? I have seen a lot of examples over the years of wifi components that do not meet guidance and laws. I have also seen situations where 20 MHz width out performs 40. I just test and see what performs best while being very stable.

options in my conf files are left over from when I was trying to get bandsteering and fast roaming to work.

Yeah... that is a messy topic.

My dual band setup is very different than yours. I run a 64 bit RasPi4B with a 64 bit OS, RasPiOS 2023-10-10. My networking is very simple and only uses systemd as I pointed out. Network Manager is disabled. My hostapd.conf files are well tested and only contain the limes that are needed for what I want to do. I use a Alfa ACM for 2GHz (WiFi 4) and an Alfa AXML for 5 GHz (WiFi 6 and WiFi 5). Very stable. If my AXML was doing 2GHz, I would not use WiFi 6 as there are many 2Ghz devices that may not be stable with WiFi 6 on 2GHz.

So far I am not to find the problems you the others are seeing but I have only been looking on 5GHz. I somehow missed out that the problems are with the 2GHz band so I am changing my testing. My plan is to change out the Alfa ACM for another adapter using the mt7921u driver and I will test to see what I get over time. I'm learning what you guys are doing and trying to follow along. I want to find the problems and offer a simple repeatable way to duplicate so as to help the Mediatek devs have good vision on what the problem is.

FYI: I am not doing anything with 6GHz (WiFi 6e) yet as I am just not there yet.

@morrownr