openwrt / mt76

mac80211 driver for MediaTek MT76x0e, MT76x2e, MT7603, MT7615, MT7628 and MT7688
737 stars 343 forks source link

mt7921u active monitor mode breaks driver #839

Closed ZerBea closed 1 month ago

ZerBea commented 9 months ago

I got an ALFA AWUS036AXML. Setting active monitor mode causes the driver to stop. It took me several days to figure out what went wrong. A lot of tests have let this thread grow. This is the conclusion (the entire history is below).

Steps to reproduce by common tools like iw, ip link and tshark.

monitor mode:

$ sudo ip link set wlp22s0f0u4i3 down
$ sudo iw dev wlp22s0f0u4i3 set type monitor
$ sudo ip link set wlp22s0f0u4i3 up
$ tshark -i wlp22s0f0u4i3
22 packets captured

active monitor mode:

$ sudo ip link set wlp22s0f0u4i3 down
$ sudo iw dev wlp22s0f0u4i3 set monitor active
$ sudo ip link set wlp22s0f0u4i3 up 
$ tshark -i wlp22s0f0u4i3
Capturing on 'wlp22s0f0u4i3'
^C
0 packets captured

Background: Running active monitor mode, the device ACK incoming frames addressed to the virtual MAC of the device. This feature is really useful to perform PMKID attacks. At the moment, active monitor mode is working on:

mt76x0u
mt76x2u

It is not working on:

mt7601u
mt7921u

I see three options:

hcxdumptool does not set active monitor mode by default even if the driver reports that it is supported. That has been done by this commit: https://github.com/ZerBea/hcxdumptool/commit/8d3f24e5a10ebdcc75211ae9214ee30ff9e4b517

active monitor mode capability should not be reported by the driver [code] mt7601u: $ iw list | grep active Device supports active monitor (which will ACK incoming frames)

mt7921u: $ iw list | grep active Device supports active monitor (which will ACK incoming frames) [/code]

active monitor mode should be fixed by driver

ZerBea commented 9 months ago

Looks like this driver (https://github.com/openwrt/mt76) doesn't compile (out of the box) running Linux 6.6.1:

$ make -C /lib/modules/`uname -r`/build M=$PWD
make: Entering directory '/usr/lib/modules/6.6.1-arch1-1/build'
  CC [M]  /tmp/mt76/mmio.o
In file included from /tmp/mt76/mt76.h:19,
                 from /tmp/mt76/mmio.c:6:
/tmp/mt76/testmode.h:196:32: error: array type has incomplete element type 'struct nla_policy'
  196 | extern const struct nla_policy mt76_tm_policy[NUM_MT76_TM_ATTRS];
      |                                ^~~~~~~~~~~~~~
/tmp/mt76/mt76.h: In function 'mt76_put_page_pool_buf':
/tmp/mt76/mt76.h:1647:9: error: implicit declaration of function 'page_pool_put_full_page' [-Werror=implicit-function-declaration]
 1647 |         page_pool_put_full_page(page->pp, page, allow_direct);
      |         ^~~~~~~~~~~~~~~~~~~~~~~
/tmp/mt76/mt76.h: In function 'mt76_get_page_pool_buf':
/tmp/mt76/mt76.h:1655:16: error: implicit declaration of function 'page_pool_dev_alloc_frag' [-Werror=implicit-function-declaration]
 1655 |         page = page_pool_dev_alloc_frag(q->page_pool, offset, size);
      |                ^~~~~~~~~~~~~~~~~~~~~~~~
/tmp/mt76/mt76.h:1655:14: error: assignment to 'struct page *' from 'int' makes pointer from integer without a cast [-Werror=int-conversion]
 1655 |         page = page_pool_dev_alloc_frag(q->page_pool, offset, size);
      |              ^
cc1: all warnings being treated as errors
make[2]: *** [scripts/Makefile.build:243: /tmp/mt76/mmio.o] Error 1
make[1]: *** [/usr/lib/modules/6.6.1-arch1-1/build/Makefile:1913: /tmp/mt76] Error 2
make: *** [Makefile:234: __sub-make] Error 2
make: Leaving directory '/usr/lib/modules/6.6.1-arch1-1/build'
ZerBea commented 9 months ago

BTW: I went back to kernel 6.5.1 (Debian kernel config) -> neither monitor mode nor packet injection is working to kernel 6.1.21 (Raspbian kernel config) -> neither monitor mode nor packet injection is working

ZerBea commented 9 months ago

update After I got this issue report: https://github.com/ZerBea/hcxdumptool/issues/376 I did some more tests. If the interface is on monitor mode:

$ sudo hcxdumptool -m wlp22s0f0u9u3i3
$ iw dev
phy#12
    Interface wlp22s0f0u9u3i3
        ifindex 15
        wdev 0xc00000001
        addr 00:c0:ca:b5:74:e6
        type monitor
        channel 1 (2412 MHz), width: 20 MHz (no HT), center1: 2412 MHz
        txpower 3.00 dBm
        multicast TXQ:
            qsz-byt qsz-pkt flows   drops   marks   overlmt hashcol tx-bytes    tx-packets
            0   0   0   0   0   0   0   0   

it will receive packets:

$ tshark -i wlp22s0f0u9u3i3
Capturing on 'wlp22s0f0u9u3i3'
263 packets captured

But once the first frame has been injected, every thing stops:

$ tshark -i wlp22s0f0u9u3i3
Capturing on 'wlp22s0f0u9u3i3'
^C
0 packets captured

Looks like frame injection killed the driver.

morrownr commented 9 months ago

@ZerBea

You might want to retest with the recently released firmware:

https://github.com/morrownr/USB-WiFi/blob/main/home/How_to_Install_Firmware_for_Mediatek_based_USB_WiFi_adapters.md

Section 2.

ZerBea commented 9 months ago

@morrownr

Thanks for that information. I'll give it a try, but I still think it is related to the driver.

ZerBea commented 9 months ago

This is the latest working firmware: Build Time: 20230526130958

This one does not load: Build Time: 20231109190918

morrownr commented 9 months ago

This one does not load: Build Time: 20231109190918

It loads here:

$ ethtool -i wlx00c0cab37abb driver: mt7921u version: 6.5.0-0.deb12.1-amd64 firmware-version: ____010000-20231109190959

Adapter: Alfa AXML Distro: Debian 12

Remember that wifi firmware for the 7921 requires two firmware files:

WIFI_MT7961_patch_mcu_1_2_hdr.bin WIFI_RAM_CODE_MT7961_1.bin

There is also a bluetooth file but you won't be using it to so you can delete the file from the system:

BT_RAM_CODE_MT7961_1_2_hdr.bin

ZerBea commented 9 months ago

I double checked this:

old firmware:

[16148.856186] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20230526131214
[16148.879434] mt7921u 1-9.3:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a

new firmware:

[41144.190166] usb 1-9.3: new high-speed USB device number 11 using xhci_hcd
[41144.321418] usb 1-9.3: New USB device found, idVendor=0e8d, idProduct=7961, bcdDevice= 1.00
[41144.321422] usb 1-9.3: New USB device strings: Mfr=6, Product=7, SerialNumber=8
[41144.321424] usb 1-9.3: Product: Wireless_Device
[41144.321426] usb 1-9.3: Manufacturer: MediaTek Inc.
[41144.321427] usb 1-9.3: SerialNumber: 000000000
[41144.428601] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20231109191416

only the BT firmware has been loaded.

$ iw dev
$

All three bin's have been replaced:

$ ls *MT7961*.*
BT_RAM_CODE_MT7961_1_2_hdr.bin.zst
WIFI_RAM_CODE_MT7961_1.bin.zst
WIFI_MT7961_patch_mcu_1_2_hdr.bin.zst

I give it another try without compressing the files.

[41864.599670] usb 1-9.3: USB disconnect, device number 16
[41868.761868] usb 1-9.3: new high-speed USB device number 17 using xhci_hcd
[41868.893554] usb 1-9.3: New USB device found, idVendor=0e8d, idProduct=7961, bcdDevice= 1.00
[41868.893561] usb 1-9.3: New USB device strings: Mfr=6, Product=7, SerialNumber=8
[41868.893563] usb 1-9.3: Product: Wireless_Device
[41868.893565] usb 1-9.3: Manufacturer: MediaTek Inc.
[41868.893567] usb 1-9.3: SerialNumber: 000000000
[41869.006650] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20231109191416

same result.

BTW: regardless if the new firmware has been compressed by zstd and regardless to which port the device is connected (USB2 or USB3) after a while a got this error:

connected to USB2 port [41930.422262] usb 1-9.3: device not accepting address 17, error -71 connected to USB3 port [41961.868976] usb 1-4: device descriptor read/64, error -110

I'm on kernel

$ uname -r
6.6.3-arch1-1
morrownr commented 9 months ago

$ uname -r 6.5.0-0.deb12.1-amd64

I haven't gone to kernel 6.6 yet on this system, which is my main dev box, but will investigate doing a test with 6.6 on another box.

Keep in mind that loading the bluetooth firmware is basically worthless as you can't run USB3 and bluetooth together on a USB WiFi adapter. I'm pretty sure the communications between Mediatek and the makers was poor in this respect as the bt firmware should not be loaded if the adapter is in USB3 mode.

ZerBea commented 9 months ago

At the moment, I'm running out of ideas.

morrownr commented 9 months ago

I got nothing. I've never seen firmware compressed but if you say it works, it probably works. Got any other distros to try?

ZerBea commented 9 months ago

zstd compression is not something new: https://www.phoronix.com/news/2021-Linux-Zstd-Firmware and it is working like a charm.

BTW: At every test my reference is an ALFA AWUS036ACM (mt76x2u) and an ALFA AWUS036ACHM (mt76x0u). If both devices/drivers are working as expected, everything seems to be fine. After that, I'll go hunting for the problems of the new device/driver.

morrownr commented 9 months ago

I wonder if you had a bad download of one of the wifi firmware files?

ZerBea commented 9 months ago

Good point - let's compare it: $ md5sum BT_RAM_CODE_MT7961_1_2_hdr.bin f8e386541ca02a6311d7c0d9441fbab7 BT_RAM_CODE_MT7961_1_2_hdr.bin

$ md5sum WIFI_MT7961_patch_mcu_1_2_hdr.bin 0a4d833efe94a56c502de8a38405d8fe WIFI_MT7961_patch_mcu_1_2_hdr.bin

$ md5sum WIFI_RAM_CODE_MT7961_1.bin 8d0a4f6dc2d01a8b442ae0b8d76d9122 WIFI_RAM_CODE_MT7961_1.bin

morrownr commented 9 months ago

Here are my results from /lib/firmware/mediatek

$ md5sum WIFI_MT7961_patch_mcu_1_2_hdr.bin 0a4d833efe94a56c502de8a38405d8fe WIFI_MT7961_patch_mcu_1_2_hdr.bin

$ md5sum WIFI_RAM_CODE_MT7961_1.bin 8d0a4f6dc2d01a8b442ae0b8d76d9122 WIFI_RAM_CODE_MT7961_1.bin

I can't check the BT firmware because it does not exist on my system. I delete it to prevent it from loading and using resources. It should not load given that BT is turned off in our adapters (Alfa AXML) but it does... that is a programming mistake that needs to be corrected.

ZerBea commented 9 months ago

Thanks. The md5 hashes matches. I'll compile kernel 6.5 and give it another try.

morrownr commented 9 months ago

Alright @ZerBea , you know better than to change 2 variables at the same time.

ZerBea commented 9 months ago

Unfortunately the system on which I compiled the kernel does not have USB3 hardware. Now compiling the kernel on an USB3 system. When finished, we have all combinations of kernels, ehci, xhci and firmware.

Conclusion: the new firmware loads fine on kernel 6.5

the ERROR is back (after a while): [ 4213.904348] usb 1-2: device not accepting address 21, error -71

and packet injection is still not working.

ZerBea commented 9 months ago

As a final test I compiled kernel 6.1 and got the same results. Now I give up and wait for a driver update.

ZerBea commented 9 months ago

To make sure it is not a malfunction of my device. Is packet injection working on your system (kernel 6.5 and latest firmware)? $ sudo hcxdumptool -i INTERFACENAME --rds=1 -F I guess that my device is fine, because the problem occurs on openwrt as well. https://github.com/ZerBea/hcxdumptool/issues/376

morrownr commented 9 months ago

Interesting. It looks to me that you have the makings of additional bugs reports. It is also possible there is one source that is the cause. Hard to say.

The USB subsystem drivers, and especially the USB3 drivers, are not mankind's great invention.

I'm going to try to setup to test with kernel 6.6 and 6.7 tomorrow if I feel better.

I have two test systems in my lab but only one is setup and it is using secure boot which is not going to work with this very well at all so I need to rethink my setup. Will report.

ZerBea commented 9 months ago

Great, thanks. My test systems: 2 x Intel (ehci) 2 x AMD (xhci) 5 x Raspberry Pi zero 2 x Raspberry Pi A 2 x Raspberry Pi B

Linux kernel 6.1, 6.5 and 6.6

All tested devices / drivers (the latest tested device only with a driver patch) passed the tests on all systems: https://github.com/ZerBea/hcxdumptool/discussions/361 Except the mt7921u, which suggests to me that my testing environment is ok, Unfortunately the mt7921u test is time expensive, because, in every case I have two remaining screws (driver and firmware). Right know, I don't know which of them caused the trouble.

ZerBea commented 9 months ago

I found the problem. Unfortunately it is similar to this one: https://github.com/openwrt/mt76/issues/778

Driver reports that active monitor mode is possible:

$ iw list | grep active
    Device supports active monitor (which will ACK incoming frames)

But if hcxdumptool set active monitor mode, it stops working.

If active monitor mode is disabled, everything's fine

0 ERROR(s) during runtime
638 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
1 SHB written to pcapng dumpfile
1 IDB written to pcapng dumpfile
1 ECB written to pcapng dumpfile
83 EPB written to pcapng dumpfile

exit on sigterm

I don't think the problem is related to hcxdumptool, because it can be reproduced with iw, ip link and tshark, too:

$ sudo ip link set wlp22s0f0u4i3 down
$ sudo iw dev wlp22s0f0u4i3 set type monitor
$ sudo ip link set wlp22s0f0u4i3 up
$ tsahrk -i wlp22s0f0u4i3
22 packets captured

$ sudo ip link set wlp22s0f0u4i3 down
$ sudo iw dev wlp22s0f0u4i3 set monitor active
$ sudo ip link set wlp22s0f0u4i3 up 
$ tshark -i wlp22s0f0u4i3
Capturing on 'wlp22s0f0u4i3'
^C
0 packets captured
morrownr commented 9 months ago

Have you modified the original message to reflect this finding?

How does this finding reflect overall? Is packet injection working with active monitor mode off?

I'm a little fuzzy after being sick for so many days. Why is active monitor mode needed?

ZerBea commented 9 months ago

The head line has been modified.

Packet injection is working like a charm: https://github.com/ZerBea/hcxdumptool/discussions/361#discussioncomment-7567045

Background: Running active monitor mode, the device ACK incoming frames addressed to the virtual MAC of the device. This feature is really useful to perform PMKID attacks. At the moment, active monitor mode is working on:

mt76x0u
mt76x2u

It is not working on:

mt7601u
mt7921u

I see three options:

hcxdumptool does not set active monitor mode by default even if the driver reports that it is supported. That has been done by this commit: https://github.com/ZerBea/hcxdumptool/commit/8d3f24e5a10ebdcc75211ae9214ee30ff9e4b517

active monitor mode capability should not be reported by the driver [code] mt7601u: $ iw list | grep active Device supports active monitor (which will ACK incoming frames)

mt7921u: $ iw list | grep active Device supports active monitor (which will ACK incoming frames) [/code]

active monitor mode should be fixed by driver

morrownr commented 9 months ago

The head line has been modified.

It might help since this post has followed a long path to get where it is, if you use "Edit:" at the top of the original post then add what you have added in your last 2 posts so as to make it easy for a person that might fix it to understand without having to track things down. It might also, as am alternative work if you close this report and start a clean new post. I'm going to try to consolidate the information and add it to my main mt7921u bug list at my site. It seems quite clear at this point that active monitor is broken and is the cause of the problem at this point.

@morrownr

ZerBea commented 9 months ago

Done. Important information and steps how to reproduce is now mentioned in the first comment. Thanks for pointing me into this direction.

morrownr commented 9 months ago

Looks good. I borrowed some of your work. I'm reworking the BUG thread over at my site. This bug is now posted as the top:

https://github.com/morrownr/USB-WiFi/issues/107

Hopefully this can be fixed.

ZerBea commented 9 months ago

I hope so, too. The performance of the interface is enormous (when running latest git head hcxdumptool). It is now number one: https://github.com/ZerBea/hcxdumptool/discussions/361

ZerBea commented 9 months ago

Let me explain the difference between active monitor and passive monitor mode: If hcxdumptool request e.g. an ASSOCIATION by transmitting an ASSOCIATIONREQUEST frame active monitor mode: the target AP responds with an ASSOCIATIONRESPONSE frame and the device that hcxdumptool use ACK it passive monitor mode: the target AP responds with an ASSOCIATIONRESPONSE frame but due to missing ACK, it transmits up to 7 retries (which will spam the entire channel). That is a huge performance impact, because the channel is busy 7 times longer.

ZerBea commented 6 months ago

We can close this report. As a workaround, hcxdumptool's active monitor mode is not running by default. Even if the driver reports that it is possible, the user must allow it by a command line option.

morrownr commented 6 months ago

I haven't seen any action on your report over at OpenWRT. This is something that needs to be fixed. I am neck deep in work right now but when I have a little time, I think I know a guy that can track this down and suggest a patch. Remind me in a couple of weeks if we see no action.

Nick

morrownr commented 6 months ago

@ZerBea

FYI: I saw new firmware flow into linux-wireless last week so it should be posted for download this week or next. We have been making enough noise that the devs may have taken a look and found a fix.

As usual, the firmware guide is menu item 8, look at section 3 for this chipset:

https://github.com/morrownr/USB-WiFi

ZerBea commented 6 months ago

@morrownr Problem is not the firmware, because it is the same on all kernels. On 6.6 it is loaded, on 6.7 not.

morrownr commented 6 months ago

Active monitor mode is a nice add on. Latest hcxdumptool received a workaround if the driver reports active monitor mode capabilities but fails on it.

Are you sure you don't want to keep this open?

ZerBea commented 6 months ago

If my remaining machine received an update to kernel 6.7 I'm no longer able to do some further going tests regarding active monitor mode. Unfortunately. But I'll reopen this on next kernel 6.8 (or 6.9 or if the the problem mentioned above has been fixed). hcxdumptool doesn't need active monitor mode any longer.

ZerBea commented 6 months ago

At least I found a firmware version that is working (for me) on all kernel versions:

Bluetooth: hci1: HW/SW Version: 0x008a008a, Build Time: 20230526131214
mt7921u 1-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a
mt7921u 1-2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958

Now that I can test active monitor mode again (active monitor mode is still not working). I reopen this issue report as mentioned above.

It's not easy to find the real problem due to several setscrews (driver/firmware combinations).

morrownr commented 6 months ago

Wow! I have been seeing a pickup in people reporting problems related to this. It is reasonable to assume if the driver is reporting active monitor mode support, then it will work. They try to use it and end up with a mess.

ZerBea commented 6 months ago

@morrownr Unfortunately the same applies to the firmware.

ZerBea commented 1 month ago

I got my a solution:

$ hcxdumptool -L
...
* active monitor mode available (reported by driver - do not trust it)

From now hcxdumptool does not set active monitor mode by default. This has to be done by the user:

$ hcxdumptool -h
-A             : ACK incoming frames
                  INTERFACE must support active monitor mode

Closed, because there are serious issues as this one.

alexl83 commented 4 weeks ago

Dear All, I'm impacted by this matter too, on mt7921e (pcie MT7922) - coming from here: https://github.com/morrownr/USB-WiFi/issues/73

As I'm running kali with some wireless utilities on SBCs, it isn't always feasible to patch/update each single package to ignore driver advertising of NL80211_FEATURE_ACTIVE_MONITOR, especially as I'm not a developer - I tried digging in kernel driver to understand what's happening.

By cobbling together superficial understanding, common sense, gut feeling and GPT-4, I came up with a patch. More on this later down this post. It's not production-quality, and it's likely to disable active monitor from the list of advertised (and usable) features for ALL mt76-based drivers (so basically everything except MT7601U)

Now a little background on my research, perhaps someone in the community can jump in

Please note: my understanding can be flawed as it's based on the list above of cobbled things: common sense, GPT-4 etc etc

all Mediatek drivers in mainline kernel are based on mt76 framework, except MT7601U which lives in its own subdir outside mt76

mt76 declares supported features and flags in wiphy->features in wiphy->flags in drivers/net/wireless/mediatek/mt76/mac80211.c/net/wireless/mediatek/mt76/mac80211.c

code snippet from mt76_phy_init(struct mt76_phy *phy, struct ieee80211_hw *hw):

    wiphy->features |= NL80211_FEATURE_ACTIVE_MONITOR |
               NL80211_FEATURE_AP_MODE_CHAN_WIDTH_CHANGE;

features member in wiphy structure is a bitmask collection of advertised driver features set to either 0 or 1 to represent enable or disable via logic sourcery

The |= and | operators set to bitmask to 1 (feature present) while &= and ~ preponed to an NL80211 feature e.g. wiphy->features &= ~NL80211_FEATURE_ACTIVE_MONITOR; clears the bitmask to 0, thus setting the feature as not available.

This code snippet/bitmask setup seems to apply to all drivers under mt76 tree, I tried clearing the bitmask for NL80211_FEATURE_ACTIVE_MONITOR in drivers/net/wireless/mediatek/mt76/mt7921/init.c/net/wireless/mediatek/mt76/mt7921/init.c

as well as in

drivers/net/wireless/mediatek/mt76/mt792x_core.c/net/wireless/mediatek/mt76/mt792x_core.c all to no avail: after recompilation and test, active monitor wass still advertised by driver - checked using iw list

What I came as a possible conceptual solution is: clear the relevant bitmask in drivers/net/wireless/mediatek/mt76/mac80211.c/net/wireless/mediatek/mt76/mac80211.c

by patching code snipped from above and changing it to: mt76_phy_init(struct mt76_phy *phy, struct ieee80211_hw *hw):

    wiphy->features &= ~NL80211_FEATURE_ACTIVE_MONITOR;
    wiphy->features |= NL80211_FEATURE_AP_MODE_CHAN_WIDTH_CHANGE;

It works, my MT7922 PCIe card usinf mt7921e driver doesn't report anymore NF80211_FEATURE_ACTIVE_MONITOR (active monitor) - and likely all drivers included in drivers/net/wireless/mediatek/mt76/net/wireless/mediatek/mt76 tree behave the same, thus disabling a useful feature for hardware that supports it: according to this post, mt7612u (WiFi 5), mt7610u (WiFi 5) come to mind. mt7601u (WiFi 4) should be "immune" to this as it's not mt76-based and enables active monitoring in drivers/net/wireless/mediatek/mt7601u/init.c/net/wireless/mediatek/mt7601u/init.c

Code snippet from int mt7601u_register_device(struct mt7601u_dev *dev):

    wiphy->features |= NL80211_FEATURE_ACTIVE_MONITOR;
    wiphy->interface_modes = BIT(NL80211_IFTYPE_STATION);
    wiphy->flags |= WIPHY_FLAG_SUPPORTS_TDLS;

I want to try a similar approach as mt7601u and declare wiphy->features |= NL80211_FEATURE_ACTIVE_MONITOR;for individual cards/chipsets that really support it, like mt7612 and mt7610 (perhaps all variants usb/pcie/sdio/etc support active monitor)

Unfortunately I only own mt7922 Mediatek cards and cannot really test, on top of that my "assumption-by-common-logic" could be wrong and the only useable place to declare active monitor feature available or not is perhaps drivers/net/wireless/mediatek/mt76/mac80211.c/net/wireless/mediatek/mt76/mac80211.c In that case, I'm not skilled enough to provide a proper solution.

If someone interested in fixing this in a proper way could be so kind to jump in and provide me some help, perhaps we can manage it for good.

If we disable active monitor in mac80211.c by default and then enable on a per-chipset basis, I think potential places for doing it would be drivers/net/wireless/mediatek/mt76/<CHIPSETSUBDIR>/init.c and perhaps drivers/net/wireless/mediatek/mt76/mt792x_core.c/net/wireless/mediatek/mt76/mt792x_core.c

Now, for those courageous enough to read all this possibly wrong logical rant, here's my quick&dirty patch: mt76_disable_active_monitor_feature_advertising.patch

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: John Doe <john.doe@somewhere.on.planet>
Date: Sun, 11 Aug 2024 02:00:03 +0000
Subject: Patching kernel file drivers/net/wireless/mediatek/mt76/mac80211.c:
 MT76: disable advertising NL80211_FEATURE_ACTIVE_MONITOR

Signed-off-by: Alessandro Lannocca <john.doe@somewhere.on.planet>
---
 drivers/net/wireless/mediatek/mt76/mac80211.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
index e8ba2e4e8..beb19c294 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -430,12 +430,12 @@ mt76_phy_init(struct mt76_phy *phy, struct ieee80211_hw *hw)
    spin_lock_init(&phy->tx_lock);

    SET_IEEE80211_DEV(hw, dev->dev);
    SET_IEEE80211_PERM_ADDR(hw, phy->macaddr);

-   wiphy->features |= NL80211_FEATURE_ACTIVE_MONITOR |
-              NL80211_FEATURE_AP_MODE_CHAN_WIDTH_CHANGE;
+   wiphy->features &= ~NL80211_FEATURE_ACTIVE_MONITOR;
+   wiphy->features |= NL80211_FEATURE_AP_MODE_CHAN_WIDTH_CHANGE;
    wiphy->flags |= WIPHY_FLAG_HAS_CHANNEL_SWITCH |
            WIPHY_FLAG_SUPPORTS_TDLS |
            WIPHY_FLAG_AP_UAPSD;

    wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_CQM_RSSI_LIST);
-- 

It works for me, but keep in mind I'm unable to ensure it works for the right reason, and cannot understand any possible side-effect on other mt76-based HW Please don't push to linux kernel as this is a whacky hack at best

It'd be wonderful if this could be the first step at a proper fix - anyone willing to help with code/suggestions/ideas is very welcome

Grazie 🙏🏻

Ale

morrownr commented 3 weeks ago

@alexl83 @ZerBea

This issue needs attention. I have no problem making some real noise if we need to do so to get it fixed.

I may be wrong but I think Active Monitor worked in this driver at one point. Can either of you confirm that?

Where I am going with this is if we can show that it worked at one point, then this is a REGRESSION which would add importance to getting a fix in place. If we were able to bisect it and show exactly what patch broke it, then we can clean up a report that I will send to linux-wireless and the Mediatek devs.

@ZerBea

I know you know how to do a bisect. I know it takes a lot of time. I have never done one but would like to learn someday but right now I'm trying to get a couple of new drivers out of the door. Maybe @alexl83 can work with you on a bisect while you teach him how to do it. If you guys can sort this out, I will send the message.

@morrownr

alexl83 commented 3 weeks ago

@alexl83 @ZerBea

This issue needs attention. I have no problem making some real noise if we need to do so to get it fixed.

I may be wrong but I think Active Monitor worked in this driver at one point. Can either of you confirm that?

Where I am going with this is if we can show that it worked at one point, then this is a REGRESSION which would add importance to getting a fix in place. If we were able to bisect it and show exactly what patch broke it, then we can clean up a report that I will send to linux-wireless and the Mediatek devs.

@ZerBea

I know you know how to do a bisect. I know it takes a lot of time. I have never done one but would like to learn someday but right now I'm trying to get a couple of new drivers out of the door. Maybe @alexl83 can work with you on a bisect while you teach him how to do it. If you guys can sort this out, I will send the message.

@morrownr

@morrownr thanks for your support - I share the approach Can't say if it ever worked in the past, I never had any mediatek usb cards, only very old ralink lately (say 4 months) I started tinkering with armbian, bootstrapping kali from it in order to customize my own distro for my various SBCs, so I bought MT7922 (m2 ngff) cards, I think they go with RZ616-something name

since I started tinkering with MT7922 cards, it never worked - the moment you try and use active monitoring, you get "mute" output

Happy to help, up to now I tried kernel 6.6 - 6.8 - 6.10 - multiple releases, all aarch64 - always the same I wonder if that's a firmware topic And why mt76 sets active monitoring for all cards sharing same codebase even when they do not support it - cannot understand

I had to patch or pass arguments to ignore active monitoring to tools I use (hcxdumptool v6.3.1, angryoxide) that has always been advertised by driver and never worked - wifite pmkid collection doesn't work even with my "poor's man" patch.

Never tried a bisect, I know it exists, happy to learn and try out

ZerBea commented 3 weeks ago

Active monitor mode is working on mt76x0u.

$ hcxdumptool -l
  2   5 fc3497321ca3    980ee4f3e212    *   wlp48s0f4u2u4       mt76x0u NETLINK

$ sudo hcxdumptool -A --rcascan=active
...
^C
69 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
22 PROBERESPONSE(s) captured

It is not working on mt76x2u:

$ hcxdumptool -l
  3   6 000fc913411c    a4a6a98a90ae    *   wlp48s0f4u2u4       mt76x2u NETLINK

$ sudo hcxdumptool -A --rcascan=active
...
0 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
Warning: too less packets received (monitor mode may not work as expected)
Possible reasons:
 driver is broken (most likely)
 no transmitter in range
 frames are filtered out by BPF
Warning: no PROBERESPONSES received (frame injection may not work as expected)
Possible reasons:
 no AP in range
 frames are filtered out by BPF
 driver is broken
 driver does not support frame injection

and it is not working on mt7921u:

$ hcxdumptool -l
  4   7 00c0cab5742a    00c0cab5742a    *   wlp22s0f0u9u3i3     mt7921u NETLINK

$ sudo hcxdumptool -A --rcascan=active
...
^C
0 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
Warning: too less packets received (monitor mode may not work as expected)
Possible reasons:
 driver is broken (most likely)
 no transmitter in range
 frames are filtered out by BPF
Warning: no PROBERESPONSES received (frame injection may not work as expected)
Possible reasons:
 no AP in range
 frames are filtered out by BPF
 driver is broken
 driver does not support frame injection

So we shouldn't disable it in general for all drivers of the mt76 series (like the patch does), because at least one one driver of the mt76 series, is working as expected. I guess this was the first one which has been released and only this one include the code for active monitor mode.

Bisecting is useless, because it has never worked on the rest of the mt76 series drivers.

morrownr commented 3 weeks ago

@ZerBea

I certainly thought you had reported it was working with mt7612u.

Are you sure it is a lack of code or is it broken code?

As we have talked, I really want to get some adapters in the hands of Dubhater so he can take a look at this and a couple of other things. He is busy right now and has not answered my last email addressing this issue.

alexl83 commented 3 weeks ago

Active monitor mode is working on mt76x0u.

$ hcxdumptool -l
  2     5 fc3497321ca3    980ee4f3e212    *   wlp48s0f4u2u4       mt76x0u NETLINK

$ sudo hcxdumptool -A --rcascan=active
...
^C
69 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
22 PROBERESPONSE(s) captured

It is not working on mt76x2u:

$ hcxdumptool -l
  3     6 000fc913411c    a4a6a98a90ae    *   wlp48s0f4u2u4       mt76x2u NETLINK

$ sudo hcxdumptool -A --rcascan=active
...
0 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
Warning: too less packets received (monitor mode may not work as expected)
Possible reasons:
 driver is broken (most likely)
 no transmitter in range
 frames are filtered out by BPF
Warning: no PROBERESPONSES received (frame injection may not work as expected)
Possible reasons:
 no AP in range
 frames are filtered out by BPF
 driver is broken
 driver does not support frame injection

and it is not working on mt7921u:

$ hcxdumptool -l
  4     7 00c0cab5742a    00c0cab5742a    *   wlp22s0f0u9u3i3     mt7921u NETLINK

$ sudo hcxdumptool -A --rcascan=active
...
^C
0 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
Warning: too less packets received (monitor mode may not work as expected)
Possible reasons:
 driver is broken (most likely)
 no transmitter in range
 frames are filtered out by BPF
Warning: no PROBERESPONSES received (frame injection may not work as expected)
Possible reasons:
 no AP in range
 frames are filtered out by BPF
 driver is broken
 driver does not support frame injection

So we shouldn't disable it in general for all drivers of the mt76 series (like the patch does), because at least one one driver of the mt76 series, is working as expected. I guess this was the first one which has been released and only this one include the code for active monitor mode.

Bisecting is useless, because it has never worked on the rest of the mt76 series drivers.

I'd love to disable selectively for MT7921{e,u} - but I failed at every attempt - it seems I'm unable to remove NL80211_FEATURE_ACTIVE_MONITOR from any other place other than mt76 mac80211.c

ZerBea commented 3 weeks ago

@morrownr

Thanks for pointing me into the right direction.

I have two USB hubs. One of them (USB3 hub) is connected to an USB3 port of my AMD motherboard while the other one (USB2 hub) is connected to an USB2 port of my AMD motherboard. Mostly I run hcxdumptool tests in automatic mode (usb port is detected by hcxdumptool). I've never considered that the problem could be related to USB. Why on earth should an USB3 problem have an impact on active monitor mode.

$ sudo hcxdumptool -A --rcascan=active It looks like this is an epical fail. Please take a look at this. Same test as above, but now the device is connected to an USB2 port:

$ lsusb
Bus 001 Device 008: ID 0e8d:7612 MediaTek Inc. MT7612U 802.11a/b/g/n/ac Wireless Adapter

$ hcxdumptool -l
  3   6 000fc913411c    806d97a8f78d    *   wlp48s0f4u2u4       mt76x2u NETLINK
...
^C
115 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
12 PROBERESPONSE(s) captured

Conclusion: Active monitor mode on mt76x0u is working on USB2 and USB3 (ASUS USB AC51 is an USB2 adapter) Active monitor mode on mt76x2u is only working on USB2 (ALLNET WA1200AC is an USB3 adapter) Active monitor mode on mt7921u is neither working on USB2 nor on USB3 (ALFA AWUS 036AXML is an USB3 C adaper) Looks like we have to take a look at the USB part of the driver.

@alexl83 Exactly that is the problem. Entire mt76 driver series use the same core. Inside the core ACTIVE MONITOR flag is 1. Unfortunately a change in core applies to all chipsets.

morrownr commented 3 weeks ago

@ZerBea

I've never considered that the problem could be related to USB.

While helping users over the last few years, I have come upon a lot of usb related problems. It is common for the usb wifi adapters to be blamed for the problems. My rules regarding usb:

Why on earth should an USB3 problem have an impact on active monitor mode.

I can't answer that but it does not surprise me.

Looks like we have to take a look at the USB part of the driver.

Yes, we really need to get adapters into the hands of Dubhater.

But keep in mind, different systems use different USB chipsets and it may be those chipsets and their drivers that are the problem. Test in different systems before blaming the adapter and always make sure to test in a USB2 port as well.

Remind me of the command line to test active monitor and I will try it tomorrow with multiple adapters on multiple systems.

Nick

ZerBea commented 3 weeks ago

@morrownr Just to make sure it is not my AMD notebook and it is not my AMD desktop, I put my hands on an Intel System running latest Linux stable Kernel 6.10.4: The ALFA AWUS 036AXML is connected to one of the USB3 ports (Intel chipset this time);

$ cat /proc/cpuinfo
model name  : Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz

$ lspci 
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)

same error regarding BT firmware:

[   11.030763] Bluetooth: hci0: Opcode 0x0c03 failed: -110
[   13.164071] Bluetooth: hci0: Failed to read MSFT supported features (-110)
[   15.297181] Bluetooth: hci0: AOSP get vendor capabilities (-110)
[   15.306403] mt7921u 1-2:1.3: probe with driver mt7921u failed with error -5
[   15.307338] usbcore: registered new interface driver mt7921u
[   15.580500] usb 1-2: new low-speed USB device number 7 using xhci_hcd
[   15.710514] usb 1-2: device descriptor read/64, error -71
[   15.940554] usb 1-2: device descriptor read/64, error -71
[   16.170720] usb 1-2: new high-speed USB device number 8 using xhci_hcd
[   21.394105] usb 1-2: device descriptor read/64, error -110

The same happens if it is connected to one of the USB2 ports.

After the firmware /usr/lib/firmware/mediatek/BT_RAM_CODE_MT7961_1_2_hdr.bin.zst has been removed, the WiFi part of the device is working.

Unfortunately the same issue as on the AMD systems:

$ sudo hcxdumptool -A --rcascan=active -i wlp0s20f0u2i3
^C
0 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
Warning: too less packets received (monitor mode may not work as expected)
Possible reasons:
 driver is broken (most likely)
 no transmitter in range
 frames are filtered out by BPF
Warning: no PROBERESPONSES received (frame injection may not work as expected)
Possible reasons:
 no AP in range
 frames are filtered out by BPF
 driver is broken
 driver does not support frame injection

I fully agree: "It is common for the usb wifi adapters to be blamed for the problems." But I'm sure this time it is the usb wifi adapter or its driver.

morrownr commented 3 weeks ago

Hi @ZerBea

But I'm sure this time it is the usb wifi adapter or its driver.

I'm going to go with it being the axml adapter. I think Alfa screwed something up. I don't know if it is an eeprom (efuse) issue or bad connection but I think they left the adapter in a situation where Linux is told that BT is supported but when it is time to turn BT on, nothing is there are the btusb driver does not handle that is a reasonable way. I am an old guy that learned programming with FORTRAN and am certainly not a wireless expert. It can take me a long time to sort out wireless coding issues... on the other hand, we really really need to get some mt adapters in Dubhater's hands as we have a few issues that he can sort out. He sent the new drivers in to linux-wireless yesterday so now the check it then fix it stuff starts which should allow some time here to figure out to get adapters to him.

I have 3 adapters based on the mt7921au chipset. One is the AXML, then I have an Edup and a Comfast. I'll see about doing some comparative testing later today.