Closed ZerBea closed 4 years ago
I can confirm the issue with 5.4
Some additional information: Running hcxdumptool on a Raspberry Pi with kernel 5.4.51 caused the same issue. After a while no packets arrived via RAW_SOCKET. Wireshark/tshark only showing outgoing packets from hcxdumptool (with 14 bytes hcxdumptool radiotap header). No error message or warning appears in dmesg log. iw list is working as well as iw dev, but iw scan showing an empty scan list.
A simple loop will cause the driver to stop: counter=1 while [ $counter -le 20 ] do sudo iw dev wlp5s0f4u2 scan ((counter++)) done
Running the script a few times we get the first error message: command failed: Device or resource busy (-16)
Running the script more times, the error messages increase. command failed: Device or resource busy (-16) command failed: Device or resource busy (-16)
until we get the message 20 times: command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16)
Unfortunately dmesg log showing no error/warning.
:
I run the same test on 5.7.8-200 (last fedora kernel) using the same adapter (ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]) and it works fine for me. Moreover yesterday I run 1h iperf traffic with both 5.7.8-200 and mt76 wireless tree and it worked fine. Can you please check if you are able to trigger the issue even with mt76 wireless tree? Are you running the device in sta mode or are you doing something different? (e.g injecting traffic)
@LorenzoBianconi thanks for your fast reply. I'll check the mt76 wireless tree instead of 5.7.9-arch1-1 to make sure, it isn't Arch related. I opened the issue here and not on bugzilla.kernel.org because I have no idea what went wrong. Additional I can't trust the USB 3.0 ports due to the xhci issue. Running hcxdumptool (injecting traffic) on kernel 5.4 in monitor mode (Raspberry Pi - @RealEnder) caused that the driver stopped receiving packets very soon. Switching often between monitor mode and managed mode caused that the driver stopped, too. Is there an option to activate an enhanced debug mode on mt76 wireless tree or is it better to add some additional printk(), too?
@RealEnder , Alex, is it possible for you, to run hcxdumptool in parallel with tshark to monitor the traffic from/to the device until it stops receiving packets?
@LorenzoBianconi thanks for your fast reply. I'll check the mt76 wireless tree instead of 5.7.9-arch1-1 to make sure, it isn't Arch related. I opened the issue here and not on bugzilla.kernel.org because I have no idea what went wrong. Additional I can't trust the USB 3.0 ports due to the xhci issue. Running hcxdumptool (injecting traffic) on kernel 5.4 in monitor mode (Raspberry Pi - @RealEnder) caused that the driver stopped receiving packets very soon.
I can try to run hcxdumptool. Can you please provide me a reproducer? Moreover, is the dongle connected to an usb3.0 port or 2.0?
Switching often between monitor mode and managed mode caused that the driver stopped, too. Is there an option to activate an enhanced debug mode on mt76 wireless tree or is it better to add some additional printk(), too?
That will be great. The device is connected to USB 2.0 hcxdumptool test command: $ hcxdumptool -i interface -o test.pcapng --enable_status=95 --active_beacon Every minute you'll receive an additional status message like this: 11:28:00 1 ERROR:0 INCOMING:5041 OUTGOING:2125 PMKIDROGUE:2 PMKID:0 M1M2ROGUE:0 M1M2:0 M2M3:0 M3M4:0 M3M4ZEROED:0 GPS:0 You can monitor outgoing and incoming traffic running tshark in parallel $ tshark -i interface -w test2.pcapng
BTW: Running the same command on a mt7601u, everything is working as expected. $ hcxdumptool -i interface -o test.pcapng --enable_status=95 --active_beacon
Hunting for issues like this one is very tricky, because it can be caused by hcxdumptool (injecting traffic), the USB host (xhci), poor cable connection between device and hub, device overheating and more.... Unfortunately neither hcxdumptool (ERROR:0 ) nor dmesg showing an error. Even If I handle SIGPIPE in hcxdumptool, I got no error message.
Isn't mt76 wireless ready for kernel 5.7?
$ make -C /lib/modules/`uname -r`/build M=$PWD
make: Entering directory '/usr/lib/modules/5.7.9-arch1-1/build'
CC [M] /home/zerobeat/temp/mt76/agg-rx.o
/home/zerobeat/temp/mt76/agg-rx.c: In function 'mt76_rx_aggr_stop':
/home/zerobeat/temp/mt76/agg-rx.c:293:2: error: implicit declaration of function 'rcu_swap_protected' [-Werror=implicit-function-declaration]
293 | rcu_swap_protected(wcid->aggr[tidno], tid,
| ^~~~~~~~~~~~~~~~~~
/home/zerobeat/temp/mt76/agg-rx.c:294:7: error: implicit declaration of function 'lockdep_is_held'; did you mean 'lockdep_rtnl_is_held'? [-Werror=implicit-function-declaration]
294 | lockdep_is_held(&dev->mutex));
| ^~~~~~~~~~~~~~~
| lockdep_rtnl_is_held
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:267: /home/zerobeat/temp/mt76/agg-rx.o] Error 1
make: *** [Makefile:1732: /home/zerobeat/temp/mt76] Error 2
That will be great. The device is connected to USB 2.0 hcxdumptool test command: $ hcxdumptool -i interface -o test.pcapng --enable_status=95 --active_beacon Every minute you'll receive an additional status message like this: 11:28:00 1 ERROR:0 INCOMING:5041 OUTGOING:2125 PMKIDROGUE:2 PMKID:0 M1M2ROGUE:0 M1M2:0 M2M3:0 M3M4:0 M3M4ZEROED:0 GPS:0 You can monitor outgoing and incoming traffic running tshark in parallel $ tshark -i interface -w test2.pcapng
I run hcxdumptool for ~20min and it works as expected, I am able to sniff traffic with tcpdump. How often does the issue occur?
BTW: Running the same command on a mt7601u, everything is working as expected. $ hcxdumptool -i interface -o test.pcapng --enable_status=95 --active_beacon
Hunting for issues like this one is very tricky, because it can be caused by hcxdumptool (injecting traffic), the USB host (xhci), poor cable connection between device and hub, device overheating and more.... Unfortunately neither hcxdumptool (ERROR:0 ) nor dmesg showing an error. Even If I handle SIGPIPE in hcxdumptool, I got no error message.
Isn't mt76 wireless ready for kernel 5.7?
$ make -C /lib/modules/`uname -r`/build M=$PWD make: Entering directory '/usr/lib/modules/5.7.9-arch1-1/build' CC [M] /home/zerobeat/temp/mt76/agg-rx.o /home/zerobeat/temp/mt76/agg-rx.c: In function 'mt76_rx_aggr_stop': /home/zerobeat/temp/mt76/agg-rx.c:293:2: error: implicit declaration of function 'rcu_swap_protected' [-Werror=implicit-function-declaration] 293 | rcu_swap_protected(wcid->aggr[tidno], tid, | ^~~~~~~~~~~~~~~~~~ /home/zerobeat/temp/mt76/agg-rx.c:294:7: error: implicit declaration of function 'lockdep_is_held'; did you mean 'lockdep_rtnl_is_held'? [-Werror=implicit-function-declaration] 294 | lockdep_is_held(&dev->mutex)); | ^~~~~~~~~~~~~~~ | lockdep_rtnl_is_held cc1: all warnings being treated as errors make[1]: *** [scripts/Makefile.build:267: /home/zerobeat/temp/mt76/agg-rx.o] Error 1 make: *** [Makefile:1732: /home/zerobeat/temp/mt76] Error 2
what I mean is the full wireless-driver-next tree
Ok, my fault. Now cloning the full wireless-driver-next. This issue ocurs mostly randomly. Sometimes it take up to one hour. How is the temperature of your AC51? Maybe it is a heat failure and the device stops.
Power consumption of the device is ok, too. Measured: 4.83 V and 0.12 A Should be ok for an USB hub.
Now running the wireless-driver-next. Looking still fine after 3 minutes.
Stopped hcxdumptool (everything is working as expected), going back to managed mode and started the scan script (several times): counter=1 while [ $counter -le 20 ] do sudo iw dev wlp39s0f3u1u1u2 scan ((counter++)) done
No " Device or resource busy (-16)" appeared! wireless-driver-next is running fine for me.
for me
sorry, I did not get what you mean here. is wireless-driver-next working? if not, how long does it take to stop?
on 5.4 the issue occurs often on 5.7 the issue occurs seldom wireless-driver-next, no error occured
This should make it less confusing: "form me" means that I can only speak for me. We have another participant here (@RealEnder) with a similar issue. Would be great, if he can confirm it, too. If so, we have a 100% solution.
on 5.4 the issue occurs often on 5.7 the issue occurs seldom wireless-driver-next, no error occured
This should make it less confusing: "form me" means that I can only speak for me. We have another participant here (@RealEnder) with a similar issue. Would be great, if he can confirm it, too. If so, we have a 100% solution.
ack, let's keep testing a little bit more and if you do not have any issue with wireless-drivers-next tree let's close it
I was hoping you would say that. It's a good idea to run more tests. I'll do some more tests on the Raspberry Pi. Here I noticed, too, that the RPI doesn't start sometimes, if a mt76x0u device is plugged in, before system power on. My reference device is an EDIMAX EW-7711UAN, ID 7392:7710 Edimax Technology Co., Ltd, mt7601u https://github.com/ZerBea/hcxdumptool/wiki/Penetration-testing-system-2 and an ALLNET ALLWA0150, ID 148f:7601 Ralink Technology, Corp. MT7601U Wireless Adapter, mt7601u https://github.com/ZerBea/hcxdumptool/wiki/Penetration-testing-system-1 Both of them running perfect.
BTW: Now 5Ghz injection on the mt76x0u device is working fine, too. It took me a while to find the CRDA (in combination with udev) issue.
I am not even sure that this issue is related to the driver only, because I noticed that on other devices (kernel <= 4.19), too: rt2800usb https://bugzilla.kernel.org/show_bug.cgi?id=202243#c19 or ath9k_htc https://github.com/ZerBea/hcxdumptool/issues/80
The only indication that it is possible a driver issue is, that the mt7601u is working fine under the same circumstances.
First test series finished on notebook and desktop running kernel 5.7 with wireless-drivers-next. No issues. That is really good But I noticed that the devices are going to be warm. @LorenzoBianconi do you know something about the thermal design of the chipset? Could it be possible that a thermal watchdog shut down the device without informing the kernel about it? Next step is to compile the mt76 driver for kernel 5.4 and start 2 tests using a Raspberry Pi Zero. I put one of the systems into a bag and I observe the temperature of the device. I know, the RPI will not shut down, due to the thermal design of the case - so let's see what happens with the WiFi adapter. If everything is working as expected, we I'll close the issue report.
@LorenzoBianconi I could need a "helping advice": Sometimes, after power on and plugged-in mt76x0 device, I got this message: $ journalctl | grep cfg80211 cfg80211: Process '/usr/bin/set-wireless-regdom' failed with exit code 1.
result: $ cat /sys/module/cfg80211/parameters/ieee80211_regdom 00
I think this caused that:
Could be a timing issue during boot, but I'm not sure. Do you have an idea?
I decided to add a task to Arch issue tracker: https://bugs.archlinux.org/task/67371
If nothing speaks against it, I would like to let this issue reportl open for a while. Unfortunately I encountered so many issues (xhci, crda, possible libnl) that have to be solved, before I'm able to do more tests on the driver.
@LorenzoBianconi I could need a "helping advice": Sometimes, after power on and plugged-in mt76x0 device, I got this message: $ journalctl | grep cfg80211 cfg80211: Process '/usr/bin/set-wireless-regdom' failed with exit code 1.
result: $ cat /sys/module/cfg80211/parameters/ieee80211_regdom 00
This issue does not seem to be related to mt76x0u.
I think this caused that:
* 5GHz injection isn't working
Injection does not work on World regdomain since active scanning is forbidden on 5GHz IIRC
* the device sometimes stops working as expected - if setchannel() arrived on a "not allowed channel".
Could be a timing issue during boot, but I'm not sure. Do you have an idea?
I decided to add a task to Arch issue tracker: https://bugs.archlinux.org/task/67371
@LorenzoBianconi thanks for the information. I'm sure that none of the issues are related to mt76 driver. Unfortunately I discovered them after reporting the mt76 issues. I'm sorry for that - but without your ideas I'll still hunt for them. It looks like the whole crda system is as weak as the xhci system. During the last past years many issues are reported.
I'll close this issue report, because the driver is working as expected. Thanks for your help.
Cheers Mike
mt76x0u driver stop working after a while, running under heavy load. Discovered on kernel 5.4.51 and kernel 5.7.9 and AMD, INTEL and Raspberry Pi Test devices: Bus 003 Device 003: ID 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U] Bus 005 Device 006: ID 148f:761a Ralink Technology, Corp. MT7610U ("Archer T2U" 2.4G+5G WLAN Adapter
I have absolutely no idea what exactly happened due to missing warnings and error messages. But I can I can rule out a hardware error because the same issue happened on different system, different devices and different USB ports
Also I can rule out an xhci issue as reported, here: https://bugzilla.kernel.org/show_bug.cgi?id=202541 because the issue happened on a Raspberry Pi, too.
After running heavy load, NetworkSacn list is empty as well as iw scan list. $ sudo iw dev wlp5s0f4u2 scan $
dmesg doesn't show an error or warning. The device simply doesn't work:
Reloading module doesn't help. System must be rebooted.
After a reboot, everything is working as expected (for a while):