morrownr / USB-WiFi

USB WiFi Adapter Information for Linux
2.4k stars 161 forks source link

List of Bug Reports for the mt7921au chipset / mt7921u driver... #107

Open morrownr opened 1 year ago

morrownr commented 1 year ago

This issue is for maintaining a list of problematic issues that need work. This list will be maintained and updated in this first post by @morrownr . Please add posts to this issue as you have updated information for the existing BUGs in the list or if you have information about a new BUG. Thank you.

Dear Mediatek devs... help is appreciated.


Bug: (2024-04-18) See: https://github.com/morrownr/USB-WiFi/issues/392 . WDS/4addr not supported in AP mode. First reported with Alfa AXML adapter that uses the mt7921au chipset and mt7921u driver). The OP is unable to use WDS/4addr in AP mode.

Status: Open

Info: It was reported that this capability does work with an adapter that uses the mt7612u chipset/driver.


Bug: (2024-03-26) See: https://github.com/morrownr/USB-WiFi/issues/378 Wifi adapter not showing up. First reported with Alfa AXML adapter that uses the mt7921au chipset and mt7921u driver). The adapter is non-functional until using the workaround below.

Status: Open

Workaround: the workaround is to run modprobe -r btusb first, then plug in the usb wifi adapter.

More input is needed. Is this a problem with btusb?


Bug: (2023-12-22) Many Linux distros are detecting Bluetooth capability in mt7921au based adapters but none of the adapters on the market have Bluetooth turned on so it won't work. Linux should not be detecting Bluetooth capability when it is actually not available.

Status: Open and ongoing

Here is a link to a location where you can get a copy of the Intel White Paper that explains the details of why USB3 capable WiFi adapters should not have Bluetooth capability turned on:

https://www.usb.org/document-library/usb-30-radio-frequency-interference-impact-24-ghz-wireless-devices

USB3 WiFi adapters should not have Bluetooth turned on as the USB3 will cause interference with Bluetooth. If makers decide they really want Bluetooth capability in an adapter then they need to limit wifi to USB2 capability. All adapters with the mt7921au chipset that I am aware of have Bluetooth turned off so WiFi can operate in USB3 mode. However, there is a bug in that Bluetooth capability is still being detected by Linux distros and the driver/firmware is loading. Systems act like Bluetooth is available but when you try to use the Bluetooth, it won't work. It is not clear to me how this can be fixed but it really does need to be fixed.

This is not a problem with PCIe cards. I have a mt7922 based PCIe card. Wifi and Bluetooth work well together because wifi uses the PCIe bus and not USB. Please understand that issue in this bug is not exclusive to this chipset. This is an issue will all USB WiFi adapters. The adapters that have USB wifi capability and BT capabilities over the years have limited USB to USB2 to avoid the problem of interference.


Bug: (2023-12-07) Active monitor mode breaks driver.

Status: open

Reporter: @ZerBea Link: https://github.com/openwrt/mt76/issues/839 Problem: Using Active Monitor mode breaks the driver

Driver reports that active monitor mode is possible:

$ iw list | grep active Device supports active monitor (which will ACK incoming frames)

But if hcxdumptool set active monitor mode, it stops working.

If active monitor mode is disabled, everything's fine

0 ERROR(s) during runtime 638 Packet(s) captured by kernel 0 Packet(s) dropped by kernel 1 SHB written to pcapng dumpfile 1 IDB written to pcapng dumpfile 1 ECB written to pcapng dumpfile 83 EPB written to pcapng dumpfile

exit on sigterm I don't think the problem is related to hcxdumptool, because it can be reproduced with iw, ip link and tshark, too:

$ sudo ip link set wlp22s0f0u4i3 down $ sudo iw dev wlp22s0f0u4i3 set type monitor $ sudo ip link set wlp22s0f0u4i3 up $ tsahrk -i wlp22s0f0u4i3 22 packets captured

$ sudo ip link set wlp22s0f0u4i3 down $ sudo iw dev wlp22s0f0u4i3 set monitor active $ sudo ip link set wlp22s0f0u4i3 up $ tshark -i wlp22s0f0u4i3 Capturing on 'wlp22s0f0u4i3' ^C 0 packets captured

Background: Running active monitor mode, the device ACK incoming frames addressed to the virtual MAC of the device. This feature is really useful to perform PMKID attacks. At the moment, active monitor mode is working on:

mt76x0u mt76x2u

It is not working on:

mt7601u mt7921u

I see two options: active monitor mode should be fixed, or active monitor mode capability should not be reported by the driver

mt7601u $ iw list | grep active Device supports active monitor (which will ACK incoming frames)

mt7921u $ iw list | grep active Device supports active monitor (which will ACK incoming frames)


Bug: LED does not function in several of the usb wifi adapters that use the mt7921au chipset.

Status: open, it is unclear what the problem is.

Reported by @morrownr Confirmed by numerous users.


Bug: AP Mode DFS (5 GHz) support is non-functional Status: open

Reported by @morrownr Confirmed by numerous users.

This is really a serious omission in that in many places in the world there are limited non-DFS channels available leading to high levels of congestion.

Dear Mediatek, does your usb chipset competitor support DFS channels in AP Mode? Yes they do. See: out-of-kernel drivers for rtl8812au, rtl8811au, rtl8812bu and rtl8811cu. You need to think about this. Sincerely.


Bug: txpower reading is showing as unusually low as in 3 dBm using iw. Status: open

Reported by several individuals.

This reading must be wrong because actual usage suggests the reading should be much higher.


Bug: (feature request) mt7921u driver does not support 2 interfaces of AP mode on one adapter Status: open

Reported by @whitslack

mt7921u driver does not support 2 instances of AP mode whereas this was common on some drivers for older adapters.

Now:

valid interface combinations:

     * #{ managed, P2P-client } <= 2, #{ AP, P2P-GO } <= 1,
       total <= 2, #channels <= 2

What we want:

valid interface combinations:

     * #{ managed, P2P-client } <= 2, #{ AP, P2P-GO } <= 2,
       total <= 2, #channels <= 2

Bug: connection is dropped and the only way to correct the situation is to reboot (AP mode) Status: open

Testing to see if SG helps performance:

scatter-gather test with mt7921au based adapter

Issue: connection drops and the only resolution is to reboot the system.

Raspberry Pi 4B RasPiOS 2023-05-03

I changed the modulate parameter and rebooted between each test so as to alternate on and off.

iperf3 -c 192.168.1.1 -t 300

scatter-gather off (disable_usb_sg=1)

1:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-300.00 sec  19.9 GBytes   569 Mbits/sec    4             sender
[  5]   0.00-300.01 sec  19.9 GBytes   569 Mbits/sec                  receiver

2: 
[  5]   0.00-300.00 sec  19.9 GBytes   570 Mbits/sec    5             sender
[  5]   0.00-300.01 sec  19.9 GBytes   570 Mbits/sec                  receiver

3:
[  5]   0.00-300.00 sec  20.0 GBytes   573 Mbits/sec    2             sender
[  5]   0.00-300.01 sec  20.0 GBytes   573 Mbits/sec                  receiver

scatter-gather on (disable_usb_sg=0)

1:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-300.00 sec  19.9 GBytes   570 Mbits/sec    1             sender
[  5]   0.00-300.01 sec  19.9 GBytes   570 Mbits/sec                  receiver

2:
[  5]   0.00-300.00 sec  20.0 GBytes   572 Mbits/sec   48             sender
[  5]   0.00-300.01 sec  20.0 GBytes   572 Mbits/sec                  receiver

3.
[  5]   0.00-300.00 sec  19.9 GBytes   571 Mbits/sec    0             sender
[  5]   0.00-300.02 sec  19.9 GBytes   571 Mbits/sec                  receiver

Observation: So much for needing to average the results. I was careful to check that sg was on or off. I have no explanation for how the results could be so close. I see no evidence that sg is providing any performance increase.

Previous to this testing session, I have been able to see the issue of the connection being dropped and only a reboot will connect the situation. It happened twice a few days ago while testing with sg on. There is a history of this with mt7612u adapters. I have yet to duplicate the issue with sg off.

Conclusion: Further testing on different platforms is needed. I will test x86_64 next. Given the history of sg causing problems such as connections dropping that can only be corrected with a reboot, it may be better for the default to be disable_usb_sg=1 with a follow up to determine what the problem is.


deren commented 1 year ago

Hi @morrownr

I cannot reproduce the mess system log in in 6.0-rc3. Can you please show me full log & reproduce steps(if any special)?

Thanks, Deren

morrownr commented 1 year ago

Hi @deren

I will retest on the original system plus two additional systems and report as soon as I can.

Regards,

Nick

morrownr commented 1 year ago

Hi @deren

I have numbered the bugs so as to make it easier to reference them. The bug in question is now called Bug 2.

I have withdrawn Bug 2. After additional testing on different hardware, it appears this bug may be unique to that system so I need to back up and see if I can isolate the cause.

Have you been able to duplicate Bug 1?

Regards,

Nick

deren commented 1 year ago

Hi @morrownr

I cannot reproduce Bug2, either. The BT function always working properly, even if plug&play or reload driver several times.

But the log is weird to me, it looks like missing fw file on filesystem. Can you please check this problem related to any specific device? [ 149.874493] bluetooth hci0: Direct firmware load for mediatek/BT_RAM_CODE_MT7961_1_2_hdr.bin failed with error -2

Regards, Deren

morrownr commented 1 year ago

Hi @deren

The BT function always working properly, even if plug&play or reload driver several times.

I have a laptop computer that uses a wifi card based on the mt7921 chipset. Bluetooth works well. However, the subject here is about a usb wifi adapter that uses the mt7921au chipset. I have found no evidence that this adapter supports bluetooth. Is the capability turned off in hardware? I don't know but am trying to find out.

But the log is weird to me

What I posted is not information from a single log. The first section shows the log entries when the most up to date BT firmware is installed. The second section shows the log entries when I delete the BT firmware and reboot.

The first section of the log with the firmware installed makes me think the driver is trying to bring BT up but it is unable due to the lack of hardware. Bluetooth support is rare on usb wifi adapters. Could it be that the driver, mt7921u, is making an incorrect assumption that BT hardware is there to use?

My mt7921au based usb wifi adapter is showing no sign of support for BT whether the firmware is install or not. This adapter is a Comfast CF-951AX.

Regards,

Nick

deren commented 1 year ago

Hi @morrownr

What I posted is not information from a single log. The first section shows the log entries when the most up to date BT firmware is installed. The second section shows the log entries when I delete the BT firmware and reboot.

Got it. I did not get the point.

I have found no evidence that this adapter supports bluetooth. Is the capability turned off in hardware?

What I have is CF-953AX and can verify BT function working well in the card. Regarding the CF-951AX, there are some information about BT function. ( hope that is real :) ) https://www.sunsky-online.com/p/EDA003280201A/COMFAST-CF-951AX-1800Mbps-USB-3.0-WiFi6-Wireless-Network-Card-Black-.htm

[ 72.869871] Bluetooth: hci1: Opcode 0x c03 failed: -110 [ 74.882933] Bluetooth: hci1: Failed to read MSFT supported features (-110) [ 76.896658] Bluetooth: hci1: AOSP get vendor capabilities (-110)

I guess you consider the three lines means the BT function not working properly, right? The log do not show up in my test environment(ubuntu2004+kernel 6.0-rc5 + CF-953AX) and I think this is related to BT protocol usage in your host system. Can you see BT still alive in your system, such as desktop UI? For example, I can see the device is running after CF-953AX plugged.

# hciconfig hci1: Type: Primary Bus: USB BD Address: XX:XX:XX:XX:XX:XX ACL MTU: 1021:4 SCO MTU: 96:6 UP RUNNING RX bytes:17035 acl:0 sco:0 events:2758 errors:0 TX bytes:678011 acl:0 sco:0 commands:2756 errors:0

Regards, Deren

morrownr commented 1 year ago

Hi @deren

I guess you consider the three lines means the BT function not working properly, right?

I consider those 3 lines are possibly going to give a hint as to why BT is not working. And it is not working. I wish I was more familiar with BT but I am not so I am having to go slow with my troubleshooting. I also have a little nano BT adapter with a Braodcome chipset. I plugged it in and BT works with it. Mt test distro is Mint 21 (updated to kernel 6.0 rc3) and a short look at the forums tells me Mint is having problems with BT so I am going to suspend my report pending further investigation. I appreciate your help.

Can you see BT still alive in your system,

Yes. It shows in the gui applet that supports BT. In fact, with the nano adapter plugged in, both show. The difference is that I can pair with other BT devices with the nano adapter but I can't get anything with the Mediatek based adapter. I'll continue trying to determine the problem.

Nick

morrownr commented 1 year ago

@deren

Question: Since you have a Comfast CF-953AX adapter, my question is have you tested AP mode 5 GHz band DFS channel support?

I bring this up due to the MANY users that make use of AP mode. Many products also include usb adapters and AP mode is sometimes an important part of the product. I am contacted at times to serve in a consulting roll so I see many use cases that would benefit greatly from this support... and the 5 Realtek drivers I have up here all support this and it works well. There is a gap in capability that needs to be closed and it doesn't seem to be working for me.

Thanks,

Nick

leezu commented 1 year ago

On RasPi4B with Raspbian and kernel 6.0.0-rc7-v8+ (as well as earlier kernels), CF-953AX in AP mode and the latest September 2022 Firmware, I can relatively consistently reproduce firmware hangs by running a speedtest and apt upgrade in parallel on the SC7180 Snapdragon 7c. (Having other devices connected and other traffic patterns sometimes also causes the hang).

After the hang, the AP will recover and the following (starting from 4440.067187) is printed in the kernel log.

[Wed Sep 28 12:48:10 2022] mt7921u 2-1:1.3: WM Firmware Version: ____010000, Build Time: 20220908211021
[Wed Sep 28 12:48:11 2022] mt7921u 2-1:1.3 wlxe0e1a934a6a9: renamed from wlan1
[Wed Sep 28 12:48:12 2022] Bluetooth: hci0: Opcode 0x c03 failed: -110
[Wed Sep 28 12:48:13 2022] IPv6: ADDRCONF(NETDEV_CHANGE): ap0: link becomes ready
[Wed Sep 28 12:48:13 2022] IPv6: ADDRCONF(NETDEV_CHANGE): wlxe0e1a934a6a9: link becomes ready
[Wed Sep 28 14:02:00 2022] mt7921u 2-1:1.3: Message 00020003 (seq 12) timeout
[Wed Sep 28 14:02:00 2022] wlxe0e1a934a6a9: failed to remove key (0, 8c:fd:f0:42:20:cf) from hardware (-110)
[Wed Sep 28 14:02:00 2022] mt7921u 2-1:1.3: timed out waiting for pending tx
[Wed Sep 28 14:02:00 2022] ------------[ cut here ]------------
[Wed Sep 28 14:02:00 2022] WARNING: CPU: 2 PID: 1158 at kernel/kthread.c:659 kthread_park+0xb4/0xd0
[Wed Sep 28 14:02:00 2022] Modules linked in: xt_mark nft_chain_nat ctr aes_arm64 aes_generic ccm xt_MASQUERADE iptable_nat ip6table_nat nf_nat tun mt7921u mt7921_common mt76_connac_lib mt76_usb mt76 mac80211 btusb btrtl btintel btbcm bluetooth ecdh_generic ecc libaes libarc4 sg vc4 snd_soc_hdmi_codec bcm2835_codec(C) drm_display_helper brcmfmac rpivid_hevc(C) cec drm_cma_helper bcm2835_v4l2(C) bcm2835_isp(C) brcmutil v3d drm_kms_helper bcm2835_mmal_vchiq(C) v4l2_mem2mem videobuf2_vmalloc videobuf2_dma_contig gpu_sched videobuf2_memops cfg80211 snd_soc_core drm_shmem_helper videobuf2_v4l2 snd_compress videobuf2_common snd_bcm2835(C) snd_pcm_dmaengine videodev rfkill raspberrypi_hwmon snd_pcm vc_sm_cma(C) mc snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops uio_pdrv_genirq nvmem_rmem uio ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 xt_comment xt_multiport nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
[Wed Sep 28 14:02:00 2022]  nft_compat nf_tables nfnetlink drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[Wed Sep 28 14:02:00 2022] CPU: 2 PID: 1158 Comm: kworker/u8:1 Tainted: G         C         6.0.0-rc7-v8+ #4
[Wed Sep 28 14:02:00 2022] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[Wed Sep 28 14:02:00 2022] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
[Wed Sep 28 14:02:00 2022] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[Wed Sep 28 14:02:00 2022] pc : kthread_park+0xb4/0xd0
[Wed Sep 28 14:02:00 2022] lr : mt76u_stop_tx+0x278/0x330 [mt76_usb]
[Wed Sep 28 14:02:00 2022] sp : ffffffc00938bc50
[Wed Sep 28 14:02:00 2022] x29: ffffffc00938bc50 x28: 0000000000000000 x27: ffffff8049ee8848
[Wed Sep 28 14:02:00 2022] x26: 0000000000000000 x25: ffffff8049e42280 x24: ffffff8049ee2068
[Wed Sep 28 14:02:00 2022] x23: ffffff8049ee4820 x22: ffffff8049ee6020 x21: ffffff8049ee2048
[Wed Sep 28 14:02:00 2022] x20: ffffff8049368c00 x19: ffffff804c8f0000 x18: 0000000000000000
[Wed Sep 28 14:02:00 2022] x17: 0000000000000001 x16: ffffffdcef6b4e40 x15: 0018a9ea46410578
[Wed Sep 28 14:02:00 2022] x14: 000dd98ec5e42494 x13: 0000000000000213 x12: 00000000fa83b2da
[Wed Sep 28 14:02:00 2022] x11: 0000000000000213 x10: 0000000000001a90 x9 : ffffffdccfd9e8a8
[Wed Sep 28 14:02:00 2022] x8 : ffffff80401ed8f0 x7 : 0000000000000001 x6 : ffffffdcf0b0b0c0
[Wed Sep 28 14:02:00 2022] x5 : ffffffdcf09a9000 x4 : ffffffdcf09a90b0 x3 : 0000000000002800
[Wed Sep 28 14:02:00 2022] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000004
[Wed Sep 28 14:02:00 2022] Call trace:
[Wed Sep 28 14:02:00 2022]  kthread_park+0xb4/0xd0
[Wed Sep 28 14:02:00 2022]  mt76u_stop_tx+0x278/0x330 [mt76_usb]
[Wed Sep 28 14:02:00 2022]  mt7921u_mac_reset+0x88/0x2d8 [mt7921u]
[Wed Sep 28 14:02:00 2022]  mt7921_mac_reset_work+0xac/0x1a0 [mt7921_common]
[Wed Sep 28 14:02:00 2022]  process_one_work+0x1dc/0x450
[Wed Sep 28 14:02:00 2022]  worker_thread+0x154/0x450
[Wed Sep 28 14:02:00 2022]  kthread+0x104/0x110
[Wed Sep 28 14:02:00 2022]  ret_from_fork+0x10/0x20
[Wed Sep 28 14:02:00 2022] ---[ end trace 0000000000000000 ]---
[Wed Sep 28 14:02:00 2022] mt7921u 2-1:1.3: HW/SW Version: 0x8a108a10, Build Time: 20220908210919a

[Wed Sep 28 14:02:00 2022] mt7921u 2-1:1.3: WM Firmware Version: ____010000, Build Time: 20220908211021
[Wed Sep 28 14:02:05 2022] mt7921u 2-1:1.3: Message 00020003 (seq 6) timeout
[Wed Sep 28 14:02:06 2022] mt7921u 2-1:1.3: timed out waiting for pending tx
[Wed Sep 28 14:02:06 2022] mt7921u 2-1:1.3: HW/SW Version: 0x8a108a10, Build Time: 20220908210919a

[Wed Sep 28 14:02:06 2022] mt7921u 2-1:1.3: WM Firmware Version: ____010000, Build Time: 20220908211021
ghost commented 1 year ago

maybe some issues again with usb scatter gather

leezu commented 1 year ago

@ayyyuki1 possible. Based on above call trace, mt7921u driver does call a mt76_usb function. I'll try reproducing the issue with /sys/module/mt76_usb/parameters/disable_usb_sg set to Y

morrownr commented 1 year ago

Based on above call trace, mt7921u driver does call a mt76_usb function. I'll try reproducing the issue with /sys/module/mt76_usb/parameters/disable_usb_sg set to Y

My opinion is that I would like to see either the default setting for scatter-gather be changed or simple clean the code out of mt76. I have seen the problems that it causes with mt7612u based adapters and I have taken the time to do extensive tests to see if it increases speed in any worthwhile level. My conclusion is that I cannot see any increases in speed. I say dump it from the code.

bjlockie commented 1 year ago

I think the problem with scatter gather was suspected to be a bug in the USB hardware of a Raspberry Pi.

leezu commented 1 year ago

With /sys/module/mt76_usb/parameters/disable_usb_sg=Y I'm also able to reproduce the hang, though it does appear a little harder to reproduce. Raspberry Pi team has recently made progress on a related USB issue and established that it was due to a VL805 firmware bug. https://github.com/raspberrypi/linux/issues/4844 "VIA in Taiwan have reproduced the issue and are investigating. There's not likely to be another software workaround proposed here." "The fix as recommended by VIA is to disable bursts if this sequence of TRBs can occur." There was speculation if this helps with the Mediatek hangs (https://github.com/raspberrypi/linux/pull/5173#discussion_r974429018), but I verified that it does not help. Given the history of RPi4 USB Host controller firmware bugs I've opened https://github.com/raspberrypi/linux/issues/5193. I've also opened https://github.com/raspberrypi/linux/issues/5192 to track that on RPi4, disconnecting one USB device causes failure on other USB devices (ie. disconnecting a mass storage device triggers mt7921au errors).

bjlockie commented 1 year ago

Can the firmware of the VIA chip be upgraded?

leezu commented 1 year ago

Yes.

% sudo rpi-eeprom-update
[...]
  VL805_FW: Using bootloader EEPROM
     VL805: up to date
   CURRENT: 000138a1
    LATEST: 000138a1
morrownr commented 1 year ago

On RasPi4B with Raspbian and kernel 6.0.0-rc7-v8+ (as well as earlier kernels), CF-953AX in AP mode and the latest September 2022 Firmware, I can relatively consistently reproduce firmware hangs by running a speedtest and apt upgrade in parallel on the SC7180 Snapdragon 7c. (Having other devices connected and other traffic patterns sometimes also causes the hang).

@leezu

I'm looking to move this bug report up into message 1 with the other bugs. You mentioned the distro and kernel but did not mention:

bjlockie commented 1 year ago

I also want to know if is using an extension cable. And is it using a powered USB hub. I had to use a powered USB hub for the keyboard/mouse of my hdmi switch.

Oct. 8, 2022 00:55:53 Nick @.***>:

On RasPi4B with Raspbian and kernel 6.0.0-rc7-v8+[https://github.com/raspberrypi/linux/commit/3fb5ca89c844eb9fda924a5e76ffab9e8c068675] (as well as earlier kernels), CF-953AX in AP mode and the latest September 2022 Firmware, I can relatively consistently reproduce firmware hangs by running a speedtest and apt upgrade in parallel on the SC7180 Snapdragon 7c. (Having other devices connected and other traffic patterns sometimes also causes the hang).

@leezu[https://github.com/leezu]

I'm looking to move this bug report up into message 1 with the other bugs. You mentioned the distro and kernel but did not mention:

  • band/channel ?

  • hostapd ? and hostapd.conf

  • WPA3 ?

  • wpa_supplicant version ?

  • 32 bit ?

  • hostapd.log showing anything ?

— Reply to this email directly, view it on GitHub[https://github.com/morrownr/USB-WiFi/issues/107#issuecomment-1272227478], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AACKGALRAHP3GR4JLFJPJELWCD5FPANCNFSM6AAAAAAQGHSOXA]. You are receiving this because you commented.[Tracking image][https://github.com/notifications/beacon/AACKGAKA3YFOTWPOFOZ3XBLWCD5FPA5CNFSM6AAAAAAQGHSOXCWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSL2STJM.gif]

leezu commented 1 year ago

@morrownr @bjlockie

I'm looking to move this bug report up into message 1 with the other bugs. You mentioned the distro and kernel but did not mention:

band/channel: 36 hostapd v2.10 and hostapd.conf WPA3: wpa_key_mgmt=WPA-PSK wpa_supplicant version: 2.9 32 bit: running with arm_64bit=1 RPi config but getconf LONG_BIT returns 32. hostapd.log showing anything: Please see the two journals regarding two separate occurrences attached: journal-no-backtrace.txt journal-with-backtrace.txt

I also want to know if is using an extension cable. And is it using a powered USB hub. I had to use a powered USB hub for the keyboard/mouse of my hdmi switch.

It uses this extension lead https://smile.amazon.com/gp/product/B082HQXRZ1/ as without extension lead, the CF-953AX covers up the LAN port.

morrownr commented 1 year ago

@leezu

I added your firmware hanging issue as Bug 6. Please check it to see if I did a good job.

Also, I have been running AP mode 5 GHz over the few days with my RasPi4B. It appears I am seeing the same problem. Every few hours I will lose internet access. It can and usually does recover but it can take a long time. I shut down scatter/gather this morning to see if it helps. We will see.

Can I get you to test 2.4 GHz AP mode? Look at Bug 2. I am reporting that 2.4 GHz AP mode is not working. Would like confirmation.

wpa_supplicant version: 2.9

FYI: I think you need 2.10 for WPA3 support. I have a guide to do the upgrade if you want a copy.

Nick

gifter77 commented 1 year ago

Regarding bug 6., if I build a vanilla 6.4.rc linux kernel for my raspberry pi 4, I cannot reproduce the issue anymore.

morrownr commented 1 year ago

Hi @gifter77

Copy all. Will update.

gifter77 commented 1 year ago

Actually false alert @morrownr I reproduced even with vanilla 6.4-rc1 kernel :/

morrownr commented 1 year ago

@gifter77

254

Read that that issue. In my first post, there is a link to a patch. You very well may have run onto the same problem as @whitslack . He submitted a patch that is not in 6.4 as I have not seen an approval yet. You might think about applying the patch to see what you get.

gifter77 commented 1 year ago

@morrownr thanks. I've compiled the latest raspberry pi kernel (6.1) with this patch and unfortunately I can still reproduce the issue pretty easily with the same trace as @leezu below:

[  107.537594] usbcore: deregistering interface driver mt7921u
[  107.559823] wlan0: deauthenticating from <redacted> by local choice (Reason: 3=DEAUTH_LEAVING)
[  111.583809] mt7921u 2-1:1.0: Message 00020003 (seq 5) timeout
[  114.931814] mt7921u 2-1:1.0: timed out waiting for pending tx
[  115.060533] ------------[ cut here ]------------
[  115.060556] WARNING: CPU: 0 PID: 47 at kernel/kthread.c:659 kthread_park+0xb0/0xc8
[  115.060588] Modules linked in: xt_DSCP xt_tcpudp nft_compat nf_tables nfnetlink ctr aes_arm64 aes_generic libaes ccm mt7921u(-) mt7921_common mt76_connac_lib mt76_usb mt76 mac80211 libarc4 8021q garp stp llc imx708 cfg80211 dw9807_vcm vc4 rfkill cdc_acm spidev snd_soc_hdmi_codec drm_display_helper cec drm_dma_helper drm_kms_helper v3d gpu_sched snd_soc_core bcm2835_unicam drm_shmem_helper i2c_mux_pinctrl raspberrypi_hwmon i2c_mux v4l2_dv_timings snd_bcm2835(C) snd_compress v4l2_fwnode snd_pcm_dmaengine snd_pcm bcm2835_codec(C) rpivid_hevc(C) v4l2_async bcm2835_isp(C) i2c_brcmstb v4l2_mem2mem snd_timer videobuf2_dma_contig snd i2c_bcm2835 spi_bcm2835 syscopyarea sysfillrect sysimgblt fb_sys_fops uio_pdrv_genirq uio nvmem_rmem bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) vc_sm_cma(C) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev drm mc fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[  115.060892] CPU: 0 PID: 47 Comm: kworker/u8:2 Tainted: G         C         6.1.27-v8+ #1
[  115.060904] Hardware name: Raspberry Pi 4 Model B Rev 1.2 (DT)
[  115.060913] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
[  115.060957] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  115.060967] pc : kthread_park+0xb0/0xc8
[  115.060977] lr : mt76u_stop_tx+0x26c/0x330 [mt76_usb]
[  115.061003] sp : ffffffc0085bbc80
[  115.061008] x29: ffffffc0085bbc80 x28: 0000000000000000 x27: 0000000000000000
[  115.061026] x26: 0000000000000000 x25: ffffff8046d3d180 x24: 0000000000000100
[  115.061042] x23: ffffff804a762068 x22: ffffff804a762048 x21: ffffffde760ad838
[  115.061057] x20: ffffff804a550d80 x19: ffffff804a762020 x18: 0000000000000000
[  115.061072] x17: 0000000000000000 x16: ffffffde752ad6b0 x15: 0000007f7b7fcdd0
[  115.061087] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  115.061102] x11: 0000000000000000 x10: 0000000000001a60 x9 : ffffffde3da9b1ac
[  115.061117] x8 : ffffff8040b33980 x7 : ffffffc0085bc000 x6 : ffffffc0085b8000
[  115.061131] x5 : ffffffc0085bc000 x4 : 0000000000000000 x3 : ffffff8046d3d414
[  115.061146] x2 : 0000000000000000 x1 : 0000000000000004 x0 : ffffff804349dc40
[  115.061161] Call trace:
[  115.061165]  kthread_park+0xb0/0xc8
[  115.061175]  mt76u_stop_tx+0x26c/0x330 [mt76_usb]
[  115.061197]  mt7921u_mac_reset+0x7c/0x278 [mt7921u]
[  115.061217]  mt7921_mac_reset_work+0xa0/0x198 [mt7921_common]
[  115.061245]  process_one_work+0x208/0x480
[  115.061263]  worker_thread+0x50/0x428
[  115.061273]  kthread+0xfc/0x110
[  115.061284]  ret_from_fork+0x10/0x20
[  115.061296] ---[ end trace 0000000000000000 ]---
whitslack commented 1 year ago

You very well may have run onto the same problem as @whitslack .

No. Very different stack trace. Nothing to do with SKB headroom underflow, which is what my patch fixes.

morrownr commented 1 year ago

@whitslack

No. Very different stack trace. Nothing to do with SKB headroom underflow, which is what my patch fixes.

My bad. I was in a hurry earlier today and did not take the time to investigate.

@gifter77

I see you are using a RasPi4B v1.2. I have one of those.

Are you using the 64 bit 2023-05-03 release?

Is there anything in any USB port besides the adapter?

Are you using a USB3 port or USB2?

My Pi4B is working as an AP right now. Adapters: CF-951AX in USB3 port running WiFi 6 on channel 149, 5370 based WiFi 4 adapter in USB2 port on channel 6. The only 2 things in USB ports are the 2 adapters. The only upgrade to the software was a new compile of hostapd to v2.11-devel (2.9 won't do WiFi 6) I use the AP guide on the Main Menu.

I can check the log to see if I am getting anything but it seems stable.

gifter77 commented 1 year ago

@morrownr yes I'm using the latest 64-bit release (kernel 6.1.27-v8+). The Pi is in my 3D printer so other ports have some serial devices connected to them and I'm not running an AP. My adapter is a CF-953AX. It doesn't matter if connected to USB2 or USB3 or even via a powered USB hub, the issue can be reproduced by hammering the connection with iperf for example. I've also tried to manually overwrite the firmware to the latest, doesn't change anything. I'm running on channel 52 at the moment but I guess this would change sometimes. One slightly dodgy thing I can see if that the country seems to be set to Germany whereas I am in the UK and localisation settings are all set to GB:

global
country DE: DFS-ETSI
    (2400 - 2483 @ 40), (N/A, 20), (N/A)
    (5150 - 5250 @ 80), (N/A, 23), (N/A), NO-OUTDOOR, AUTO-BW
    (5250 - 5350 @ 80), (N/A, 20), (0 ms), NO-OUTDOOR, DFS, AUTO-BW
    (5470 - 5725 @ 160), (N/A, 26), (0 ms), DFS
    (5725 - 5875 @ 80), (N/A, 13), (N/A)
    (5945 - 6425 @ 160), (N/A, 23), (N/A), NO-OUTDOOR, AUTO-BW
    (57000 - 66000 @ 2160), (N/A, 40), (N/A)

I think this is a known issue of my Asus AX88U router broadcasting DE instead of GB for some reason but it shouldn't really matter.

morrownr commented 1 year ago

I'm using the latest 64-bit release (kernel 6.1.27-v8+).

The reason for the question has to do with firmware, not so much the driver. The new 20230503 release has very current firmware.

How do I check the firmware version in my system?

$ ethtool -i <interface name> 

Note: You may need to install the ethtool and iw packages depending on your distro.

The Pi is in my 3D printer so other ports have some serial devices connected to them and I'm not running an AP.

I have seen many problems with the USB subsystem on Pi's over the last several years and from looking at your log, I immediately thought about the USB subsystem. If it were me, I would remove everything from the usb ports and start adding one thing at a time until the problem shows. I'll assume you need to add the CF-953AX first as you are probably using it to run headless. If that is case, pound the heck out of it with iperf3 and see what you get in both a USB2 and USB3 port. If you don't see the problem, add another usb device and test, and continue until you have a full report.

You mentioned a powered usb hub... make sure you add that back into the mix last. Powered hubs are generally a problem on Pi's and have been for many years. This goes way back before the Pi4B. There is also the issue the Pi usb subsystem not having much power to offer devices. The usb subsystem is limited to 1200 mA.. By spec, it should be 2800 mA. Is it possible your devices are trying to pull more than 1200 mA when you throw a load on?

I've maintained 5 Realtek driver here for several years and also help users with Mediatek drivers. I would classify the RasPi usb subsystem as bad, The USB hub chipset is problematic. Well, I could go on but in this case, we really need to eliminate everything and add on device at a time until we see the problem. At that point, we can test differently or whatever is required to narrow down the issue. Your log certainly points to a usb issue but chasing around in code may not help as we need to do our best to determine the cause.

I am not seeing this. I just tested my Pi4B with iperf3. The only usb decive plugging in is the CF-951AX. I earned a long time ao that if you are going to use a usb wifi adapter, you have to be very careful adding other things to the usb subsystem. Mediatek adapters generally are more reliable in this situation as they use less power than Realtek adapters.

Let me know.

gifter77 commented 1 year ago

@morrownr

ethtool -i wlan0
driver: mt7921u
version: 6.1.21-v8+
firmware-version: ____010000-20230331110939
expansion-rom-version: 
bus-info: 2-1:1.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

And btw I had to install this firmware version manually, my other printer with 64-bit Raspbian completely up to date has firmware ____010000-20220908211021.

I will try to reproduce with only the Wifi adapter attached.

gifter77 commented 1 year ago

Reproduced with nothing else attached to the Pi apart from the CF-953AX on USB3 port.

morrownr commented 1 year ago

Reproduced with nothing else attached to the Pi apart from the CF-953AX on USB3 port.

USB2 port?

gifter77 commented 1 year ago

I'll do USB2 port by itself now. Btw it seems to be easier to reproduce when having the Pi as the iperf server, so traffic flowing to the Pi rather than from the Pi.

gifter77 commented 1 year ago

And reproduced as well on USB2 port.

morrownr commented 1 year ago

And btw I had to install this firmware version manually, my other printer with 64-bit Raspbian completely up to date has firmware ____010000-20220908211021.

That is why I asked if it was a fresh clean 2023-05-03 installation. Remember that firmware is not part of the kernel, it is part of the distros and there are hundreds, if not thousands, of firmware packages that support various drivers. Most distros, for the lack of bodies, don't do a particularly good job of keeping firmware current. Many update when a user posts an issue requesting the firmware be updated.

gifter77 commented 1 year ago

That is why I asked if it was a fresh clean 2023-05-03 installation. Remember that firmware is not part of the kernel, it is part of the distros and there are hundreds, if not thousands, of firmware packages that support various drivers. Most distros, for the lack of bodies, don't do a particularly good job of keeping firmware current. Many update when a user posts an issue requesting the firmware be updated.

Fair enough. My latest tests are with the latest firmware anyway.

morrownr commented 1 year ago

And reproduced as well on USB2 port.

Was the test with the 2023-05-03 release or an old one with upgraded firmware. I'm testing on the 2023-05-03 64 bit release.

Could it be something else in the distro that has since been upgraded that is causing the problem?

We need to find out what the difference in our setups is.

What firmware are your running in the Pi?

Do you have a solid power supply?

This is beginning to look like the situation with whitslack. I never could reproduce his problem but we were setting up our APs differently.

gifter77 commented 1 year ago

Was the test with the 2023-05-03 release or an old one with upgraded firmware. I'm testing on the 2023-05-03 64 bit release.

Old one with upgraded firmware.

Could it be something else in the distro that has since been upgraded that is causing the problem?

The distro is actually not clean Raspbian but MansailOs (3D printer distro) which is based on Raspbian. So yes there is definitely a difference in our distros.

What firmware are your running in the Pi?


sudo rpi-eeprom-update
*** UPDATE AVAILABLE ***
BOOTLOADER: update available
CURRENT: Thu  3 Sep 12:11:43 UTC 2020 (1599135103)
LATEST: Wed 11 Jan 17:40:52 UTC 2023 (1673458852)
RELEASE: default (/lib/firmware/raspberrypi/bootloader/default)
Use raspi-config to change the release.

VL805_FW: Dedicated VL805 EEPROM VL805: up to date CURRENT: 000138c0 LATEST: 000138c0



> Do you have a solid power supply?

Yes.
morrownr commented 1 year ago

BOOTLOADER: update available CURRENT: Thu 3 Sep 12:11:43 UTC 2020 (1599135103) LATEST: Wed 11 Jan 17:40:52 UTC 2023 (1673458852)

Okay, do you think it might be a good idea to get your Pi firmware/eeprom some where close to up to date? A fix could be in there.

Mine:

BOOTLOADER: update available CURRENT: Tue 2 Aug 15:55:05 UTC 2022 (1659455705) LATEST: Wed 11 Jan 17:40:52 UTC 2023 (1673458852)

Okay, I need to update mine as well but at least it was better than yours.

gifter77 commented 1 year ago

Okay, do you think it might be a good idea to get your Pi firmware/eeprom some where close to up to date? A fix could be in there.

Yes, I'm doing this and trying to reproduce atm.

morrownr commented 1 year ago

I am updating mine as well. We can check to make sure both Pi's are on the same bootloader after finished.

We gotta eliminate as many little differences as we can.

gifter77 commented 1 year ago
BOOTLOADER: up to date
   CURRENT: Wed 11 Jan 17:40:52 UTC 2023 (1673458852)
    LATEST: Wed 11 Jan 17:40:52 UTC 2023 (1673458852)
   RELEASE: default (/lib/firmware/raspberrypi/bootloader/default)
            Use raspi-config to change the release.

  VL805_FW: Dedicated VL805 EEPROM
     VL805: up to date
   CURRENT: 000138c0
    LATEST: 000138c0
gifter77 commented 1 year ago

And.... reproduced again.

EasyNetDev commented 1 year ago

Hi all,

I don't know if is related to bug, but I've got an USB Comfast CF-953AX dongle. I was trying to create multiple SSIDs over this dongle but I wasn't able, even I can create additional AP interfaces using iw phy phy3 interface add ra0 type __ap. But if I want to add bssid part in the config file I'm getting an error in hostapd:

WPA: group state machine entering state SETKEYSDONE (VLAN-ID 0)
wpa_driver_nl80211_set_key: ifindex=21 (wlan2) alg=2 addr=0x557d114b78 key_idx=1 set_tx=1 seq_len=0 key_len=32 key_flag=0x1a
nl80211: NEW_KEY
nl80211: KEY_DATA - hexdump(len=32): [REMOVED]
   broadcast key
nl80211: NL80211_CMD_SET_KEY - default key
wpa_driver_nl80211_set_key: ifindex=21 (wlan2) alg=4 addr=0x557d114b78 key_idx=4 set_tx=1 seq_len=0 key_len=16 key_flag=0x1a
nl80211: NEW_KEY
nl80211: KEY_DATA - hexdump(len=16): [REMOVED]
   broadcast key
nl80211: NL80211_CMD_SET_KEY - default key
nl80211: Set wlan2 operstate 0->1 (UP)
netlink: Operstate: ifindex=21 linkmode=-1 (no change), operstate=6 (IF_OPER_UP)
hostapd_setup_bss(hapd=0x55917667b0 (wlan2_1), first=0)
nl80211: Create interface iftype 3 (AP)
nl80211: Ignored event 7 (NL80211_CMD_NEW_INTERFACE) for foreign interface (ifindex 40 wdev 0x0)
nl80211: New interface wlan2_1 created: ifindex=40
nl80211: Add own interface ifindex 40 (ifidx_reason -1)
nl80211: if_indices[16]: 7(21) 21(-1) 40(-1)
Could not set interface wlan2_1 flags (UP): Device or resource busy
nl80211: Remove interface ifindex=40
nl80211: if_indices[16]: 7(21) 21(-1)
nl80211: if_indices[16]: 7(21) 21(-1)
Failed to add BSS (BSSID=e0:e1:a9:36:55:e3)
wlan2_1: Flushing old station entries
nl80211: flush -> DEL_STATION wlan2 (all)
wlan2_1: Deauthenticate all stations

What I notice in iw list:

        software interface modes (can always be added):
                 * AP/VLAN
                 * monitor
        valid interface combinations:
                 * #{ managed } <= 4, #{ AP } <= 1,
                   total <= 4, #channels <= 1, STA/AP BI must match

{ AP } <= 1. I don't know if this related to the driver or is related to the firmware.

Can I use this device with multiple SSIDs?

Thanks.

morrownr commented 1 year ago

Can I use this device with multiple SSIDs?

At this time, I can't think of a way.

It is not clear why the limit of 1 AP in this driver when previous drivers certainly support more. Work is certainly ongoing with this driver. We need to make some noise.

morrownr commented 1 year ago

@EasyNetDev

I added the AP x 2 issue as Bug/Feature Request number 8 in message number 1. We probably need to look at additional ways to get this in front of the Mediatek devs as you are not the first person looking for this capability.

morrownr commented 1 year ago

And.... reproduced again.

Is it time to burn a new sd card and try 2023-05-03?

gifter77 commented 1 year ago

Is it time to burn a new sd card and try 2023-05-03?

Ok, in the name of science :sweat_smile:

morrownr commented 1 year ago

I'll check my setup over and really beat it up good while you are busy. :smile:

gifter77 commented 1 year ago

So first observation, the firmware on clean 2023-05-03 installation is definitely not the latest:

[  654.717694] mt7921u 1-1.3:1.0: HW/SW Version: 0x8a108a10, Build Time: 20220908210919a

[  655.001882] mt7921u 1-1.3:1.0: WM Firmware Version: ____010000, Build Time: 20220908211021