morrownr / USB-WiFi

USB WiFi Adapter Information for Linux
2.4k stars 161 forks source link

Netgear A8000 fails after approx 4 hours of uptime #381

Open Tom-Shaw opened 4 months ago

Tom-Shaw commented 4 months ago

I bought an A8000 (0846:9060), upgraded my kernel to 6.5 (from Debian bookworm-backports x86-64), and was happy with that it worked immediately and transferred data faster than my old USB WI-FI.

However, the wireless connection consistently fails after about 4 hours, sometimes closer to 5 hours. It has never failed at less than 4 hours and it has never stayed up for longer than 5 hours. Connectivity drops completely and the device can no longer even scan for networks. The failure is associated with this message in dmesg (note: these are separate events on separate boots of the OS)

[17475.172817] wlx9418655ec8xx: deauthenticated from f2:9f:c2:d5:52:xx (Reason: 2=PREV_AUTH_NOT_VALID)

[17241.102265] wlx9418655ec8xx: deauthenticated from fe:9f:c2:d5:52:xx (Reason: 2=PREV_AUTH_NOT_VALID)

[16858.652256] wlx9418655ec8xx: deauthenticated from fa:9f:c2:d4:52:xx (Reason: 2=PREV_AUTH_NOT_VALID)

In all cases it seems my only option is to reboot at that point.

What I've tried

Upgrading the device firmware.

It was originally:

driver: mt7921u
version: 6.5.0-0.deb12.4-amd64
firmware-version: ____010000-20230117170942

After adding the new firmware to /lib/firmware/mediatek and rebooting:

driver: mt7921u
version: 6.5.0-0.deb12.4-amd64
firmware-version: ____010000-20231109190959

Still no change in behaviour. It fails after 4 to 5 hours seconds with the "deauthenticated" message in dmesg.

Restarting the device

sudo systemctl stop NetworkManager
sudo rmmod mt7921u
sudo modprobe mt7921u
sudo systemctl start NetworkManager

If I do this BEFORE the issue has occurred, it is successful and seems to reset the time until it fails:

(initial messages from 11886 to 11909 triggered by the above commands)

[11886.945159] wlx9418655ec8xx: deauthenticating from fa:9f:c2:d4:52:xx by local choice (Reason: 3=DEAUTH_LEAVING)
[11895.733617] usbcore: deregistering interface driver mt7921u
[... snip ...]
[11909.583990] wlx9418655ec8xx: associated

Then 14738 seconds later, the network device fails as usual with the same message!

[26647.892679] wlx9418655ec8xx: deauthenticated from fa:9f:c2:d4:52:xx (Reason: 2=PREV_AUTH_NOT_VALID)

If I try and do a sudo systemctl stop NetworkManager or sudo rmmod mt7921u AFTER the issue has occurred, it doesn't work and the console keeps repeating:

kernel:[76150.619079] unregister_netdevice: waiting for wlx9418655ec8xx to become free. Usage count = 2

No difference with rmmod -f.

No difference with different orders of this and of unplugging and plugging in the device.

Scatter-gather

I have tried echo 1 > /sys/module/mt76_usb/parameters/disable_usb_sgwith no effect.

5GHz vs 2GHz

My AP (Unifi) supports 2GHz and 5GHz. I get the same behaviour whether I use the "floating" SSID (which could be 2GHz or 5GHz) or if I choose the specific 2GHz or 5GHz-specific SSID.

Any tips on what to try next?

morrownr commented 4 months ago

@Tom-Shaw

Good report.

I have 2 adapters based on the same chipset and am not seeing this. After pondering the issue for most of a day, I think what I would do is find another AP/wifi router to test with. If you don't see the problem on an alternate AP, then maybe it is time to take a hard look at the settings in your current AP. Start changing things one at a time with testing in between.

Cheers,

@morrownr

Tom-Shaw commented 4 months ago

Thanks @morrownr

I feel that even if something in the AP triggers the issue, there's definitely something getting stuck on the OS side too, given that it can't recover from the issue, the kernel module can't be removed, and even unplugging and plugging back in doesn't help.

I'll probably upgrade my AP sometime this year so I'll check back in then. In the meantime I've worked around the issue by setting a cron job to run every 3 hours to restart the device as above (stop, rmmod, modprobe, start). It's not ideal to have 5-10 seconds of network outage every 3 hours, but better than needing a full reboot!

Appreciate your time and suggestions.

morrownr commented 4 months ago

@Tom-Shaw

I saw new firmware flow into linux-wireless last week so it should be posted for download sometime this week or next. They don't post fixes with the firmware so I don't know if it will help.

Is your AP one that is supported by OpenWRT? Tell me what brand and model it is and I will check. If it is supported, OpenWRT could be a solution to this problem.

@morrownr

Tom-Shaw commented 4 months ago

@morrownr

Thanks for the heads-up. I've applied the patch from the mailing list and turned off the workaround, so I should know in a few hours.

driver: mt7921u version: 6.5.0-0.deb12.4-amd64 firmware-version: ____010000-20240219111038

It's a Netgear Nighthawk AXE3000 WiFi 6E USB 3.0 Adapter, Item No A8000-100PAS, in Australia - I can't see it in the OpenWRT table of hardware.

morrownr commented 4 months ago

It's a Netgear Nighthawk AXE3000 WiFi 6E USB 3.0 Adapter, Item No A8000-100PAS, in Australia - I can't see it in the OpenWRT table of hardware.

I guess I was not clear or maybe I am confused. Your adapter will work with OpenWRT because OpenWRT has a driver for it but that was not what I was trying to say. It seemed to me from what you said that your AP/router might need to be upgraded so I thought OpenWRT might help...as long as it is a supported AP/router.

@morrownr

Tom-Shaw commented 4 months ago

Oh sorry I read AP as adapter, my fault. The AP is an Amplifi HD with an AmpliFi MeshPoint HD extending the range.

Tom-Shaw commented 4 months ago

Sadly the updated firmware has not fixed the issue for me.

morrownr commented 4 months ago

FYI: I noticed the 6.6 kernel is available in the Debian update system now. I upgraded to 6.6 from 6.5 last night with my Debian 12 installation. Don't know if that has anything to do with this problem.

I looked at OpenWRT and I don't see support for your router.

[17475.172817] wlx9418655ec8xx: deauthenticated from f2:9f:c2:d5:52:xx (Reason: 2=PREV_AUTH_NOT_VALID)

If it were me with this problem, I would be in the web interface of the router checking the settings. I can't tell you what I would be looking for but I would be looking and researching the settings that are available. I'd probably also temporarily disable the AmpliFi MeshPoint HD extending the range to see if that has anything to do with it. WiFi is a cool thing but it is complicated and there are little incompatibilities here and there and sometimes we just have to find a configuration that works around problem issues. I really don't think your adapter is the problem but I could be wrong. Also, is the firmware in your router at the most current level?

I've been helping people at this site for a few years. I've seen thousands of problem reports. In the last few years as technology has changed with updates including WPA3 and WiF6/6e/7, there have been growing pains for all operating systems. It has been a massive effort to get to where we are. I'm hoping that the bugs and problems stabilize over the next 2 years. For Linux, wireless support and especially usb adapter support has improved greatly over the last 5 years and will improve even more over the next year as more and more adapters are supported with good quality in-kernel drivers.

Anyway, more than you wanted to know.

@morrownr

Tom-Shaw commented 4 months ago

No joy with the new kernel, same issue: driver: mt7921u version: 6.6.13+bpo-amd64 firmware-version: ____010000-20240219111038

I'll probably leave it at that for awhile, given that I have a reasonable (if dirty) workaround of restarting the kernel module every 3 hours.

I agree that changing the AP settings or firmware level might work around the problem, but given the Windows PC next to me doesn't have the same issue with the same adapter and the same AP, I think the root cause is most likely on the Linux kernel side. If this issue pops up for other people and is worth looking at deeper, then I'm happy to collect crash dumps or run debug-enabled modules as required.

Thanks for your effort. I've spent many years dealing with paid support much less responsive and helpful than you!

morrownr commented 4 months ago

the root cause is most likely on the Linux kernel side.

Certainly may be. I'll keep an eye out. There are others that stop by here that have this adapter.

Cheers and thanks for the kind words.

mousseq commented 3 months ago

I am experiencing similar behavior. I have the A8000 installed on a headless Raspberry Pi 4b running the latest Pi OS (Bookworm). This has the 6.6 kernel: Linux pi4-1r 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux

My Pi stays up for variable periods of time but eventually becomes incommunicado with errors like these: Apr 04 05:37:08 pi4-1r dnsmasq[1249]: reading /etc/resolv.conf Apr 04 05:37:08 pi4-1r dnsmasq[1249]: using nameserver 75.75.75.75#53 Apr 04 05:37:08 pi4-1r dnsmasq[1249]: using nameserver 75.75.76.76#53 Apr 04 05:37:08 pi4-1r dnsmasq[1249]: using nameserver 2001:558:feed::1#53 Apr 04 05:37:08 pi4-1r dnsmasq[1249]: using nameserver 2001:558:feed::2#53 Apr 04 05:37:08 pi4-1r dnsmasq[1249]: cleared cache Apr 04 05:57:23 pi4-1r systemd-timesyncd[491]: Timed out waiting for reply from 104.131.155.175:123 (2.debian.pool.ntp.org). Apr 04 05:57:33 pi4-1r systemd-timesyncd[491]: Timed out waiting for reply from 69.10.223.134:123 (2.debian.pool.ntp.org). Apr 04 05:57:44 pi4-1r systemd-timesyncd[491]: Timed out waiting for reply from 168.215.194.18:123 (2.debian.pool.ntp.org). Apr 04 05:57:54 pi4-1r systemd-timesyncd[491]: Timed out waiting for reply from 162.159.200.1:123 (2.debian.pool.ntp.org).

My configuration has wlan0 (the internal NIC) configured as an AP. When the problem occurs, both network interfaces fail. The journalctl indicates that the Pi stays up (cron entries appear), but there is no way to access the machine.

morrownr commented 2 months ago

Hi @mousseq

While I do not have a A8000, I do have 2 usb wifi adapters with the same chipset that uses the same driver and firmware. I also run my Pi4B headless.

My configuration has wlan0 (the internal NIC) configured as an AP.

What do I when I burn a clean sd? The first thing I do is turn off the internal wifi of my Pi4B. I don't think twice about it, i just it. It is a quality of driver issue. Broadcom... that is all I need to say.

I don't see anything in your log that tells me the A8000 and its driver have anything to do with this problem. My Pi4B is also in AP mode. It serves up WiFi 6 on the 5 GHz band. It is ultra dependable. I can explain more about my setup if you wish. I may not be doing exactly what you are doing but you are welcome to ask.

mousseq commented 2 months ago

@morrownr , thanks for the response.

My experience is a bit more complex. I have several Pi4s configured the same way, only with differing external NICs (Brostrend, Comfast, Netgear A6210). Only this one shows signs of instability. Frankly, I've had extensive success with this configuration. I recently installed another A8000 on a Pi4 running Ubuntu 23.10. It has been stable over the period that the Pi in question has restarted several times. Do you know if the Ubuntu driver is different from the Pi OS driver? - for either the internal or external NICs?

morrownr commented 2 months ago

@mousseq

Do you know if the Ubuntu driver is different from the Pi OS driver?

The version of the driver, mt7921u, is based on the kernel version. If your Ubuntu installation is running kernel 6.5 and PiOS is running kernel 6.1, they have different versions of mt7921u.

There is also the issue of firmware version. The firmware for the mt7921au has been updated several time over the last 2 years which is historically high for firmware. The Mediatel devs work this hard but we have gotten to the point that wifi is very complicated in modern WiFi 6 and 7 drivers. Things seem to have slowed down lately but knowing how to update your firmware is a good idea. To check on the driver info:

$ ethtool -i wlan0 driver: mt7921u version: 6.6.20+rpt-rpi-v8 firmware-version: ____010000-20240219111038 expansion-rom-version: bus-info: 2-2:1.3 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no

Notice how the driver version is the kernel version. Also note the version/date on the firmware. Modern in-kernel standards compliant Linux wifi drivers consist of multiple files. One of more driver files and one or more firmware file. The firmware is not part of the kernel, it is part of the distro and is easily upgraded by users or the distros maintainer. To upgrade your firmware:

See section 3:

https://github.com/morrownr/USB-WiFi/blob/main/home/How_to_Install_Firmware_for_Mediatek_based_USB_WiFi_adapters.md

When I work on my AP guide, I try out various ways of handling the networking. I always work toward the most stable setup possible. There are a lot of ways to set up an AP. How stable any particular setup is depends a lot on how well maintained are the various components. The networking may trigger what appears to be a bug in a driver and drivers are capable of taking down a system. Like I said, I will not use the Pi internal wifi as I do not trust the driver and I have little tolerance for unstable setups. I also use systemd-networkd for networking. It is a rock... I mean 24/7/365 solid. You can see my setup in the AP guide I have on the Main Menu here. My suggestions:

@morrownr

mousseq commented 2 months ago

@morrownr:

This is very useful. My Pi4 running Pi OS has this stack:

ethtool -i wlan1

driver: mt7921u version: 6.6.20+rpt-rpi-v8 firmware-version: ____010000-20230526130958 expansion-rom-version: bus-info: 2-1:1.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no

The Pi4 running Ubuntu 23.10 has this stack: $ ethtool -i wlx9418655efcae driver: mt7921u version: 6.5.0-1013-raspi firmware-version: ____010000-20231109190959 expansion-rom-version: bus-info: 2-1:1.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no

Neither one has the later stack you are running. I looked at the link to Section 3.2. My reading is that I should follow the instructions of the first mt7921u section. I tried this on the Ubuntu installation and A8000 was not recognized. I removed the new firmware and rebooted. The A8000 is still not recognized.

morrownr commented 2 months ago

The A8000 is still not recognized.

Look in the directory where you copied the files to. If you see a .bin version and a .nz (compressed) version of the files you copied, delete the compressed files.

mousseq commented 2 months ago

The Pi descended into a pretty random state. I wound up burning the SD card with a new Ubuntu image, updating it, and then adding the patched firmware. This worked. Thanks for all your help.

morrownr commented 2 months ago

The Pi descended into a pretty random state.

COVID for computers.

I wound up burning the SD card with a new Ubuntu image, updating it, and then adding the patched firmware. This worked. Thanks for all your help.

You are welcome. I hope this is a stable setup that works well with the A8000.

mousseq commented 2 months ago

Sadly, the new firmware did not change the system behavior. After a day or so, the machine falls into the incommunicado state. I enabled tracing in NetworkManager and managed to capture the transition in the attached journalctl transcript. Unfortunately, the log does indicate why the link drops (around line 3389). Curiously, there is no mention of wlan0 (the internal NIC configured as the AP/hotspot). link-drop-wlan1.txt

morrownr commented 2 months ago

@mousseq

Unfortunately, the log does indicate why the link drops (around line 3389).

That doesn't really help with what is causing the problem. Have you been able to test using a different AP/WiFi router?

mousseq commented 1 month ago

Please pardon the hiatus. I was away ...

I have now tried several NICs (Netgear A8000, A6210, Comfast AX1800, and Panda AC1200) on a different router. All (repeat all) of these fail in the same way. The Pi stays up for some period of time and then the external NIC ceases to respond. I note that the internal NIC (configured as an access point) remains up. I can ssh in via the internal NIC. I can see that wlan1 exists but I cannot change its state. Neither iwlist nor nmcli succeed. If I restart the wpa_supplicant service, all networks stop and they do not restart. I'm beginning to suspect NetworkManager.

bjlockie commented 1 month ago

The problem is likely due to the Pi's buggy USB implementation. Are you using extension cables or a hub? What OS is the Pi?

I think you're right about Network Manager. Disable it if you want to use an AP.

mousseq commented 1 month ago

Thanks for the input. The external NICs are directly connected (no cable, no hub). The OS is the latest Pi Bookworm. I am about to try Ubuntu just to see if the problem exists there as well.

mousseq commented 3 weeks ago

I found a reference to a scatter/gather problem associated with the NetGear NICs. When I disabled scatter/gather, the NIC became stable and the system remains up. https://github.com/morrownr/7612u/blob/main/mt76_usb.conf.

Tom-Shaw commented 3 weeks ago

Glad you were able to fix your issue. I mentioned the scatter gather workaround in my original description. Definitely a different root cause then.

mousseq commented 3 weeks ago

I was going to report that the problem resurfaced after several days. However, that problem turned out to be an issue with wayvnc. The network connections remain stable.

morrownr commented 3 weeks ago

I just reread this thread. I have been watching for similar issues and fixes but am not having much luck.

It is very possible that your problems are not with the mt7921u driver. It takes more than the wifi driver to make these things go.

Have you tried the adapter in a USB2 port? If it stabilizes in a USB2 port, that could indicate a problem with the USB3 hardware driver or with the USB3 hardware.

What hardware are you running? Do you know the USB3 chip?

mousseq commented 3 weeks ago

My previous declaration of solved was premature. The scatter/gather patch fixes the problem for the 6210 NIC but not the 8000 NIC. With the 8000 in place, the network stack ceases to work after about a day. I am running a Pi 4 with the A8000 NIC. The lsusb -t output indicates: /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M |__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=mt7921u, 5000M .The NIC is in one of the USB3 ports This link indicates that the chip is a VL805 USB 3.0 controller.

morrownr commented 3 weeks ago

@mousseq

The infamous VL805 USB 3.0 controller. RasPi has made many bad hardware selections over the years and this is one of their worst.

Try sticking the A8000 into one of the usb2 ports. It won't be as fast but if it works and stays up, that will give us an idea of where the problem is.

mousseq commented 2 weeks ago

I swapped the NIC to a USB 2 port. This made no difference (the communications dropped after a day). That suggests (but is not conclusive) that the problem is not with the USB controller.

morrownr commented 2 weeks ago

@mousseq

What you say is spot on. This type of problem can be hard to solve and there are a lot of things that are involved. It could be a setting or bug in your AP/router. It could be a bug in the driver or in the distro. One of the key things that I address in the README of the out-of-kernel driver repos here is to set the AP/router so that each ssid has a different name so you can control which band you are connected to.

If you are using a dfs channel on 5 GHz, that could be an issue if a conflicting radio signal happens and the AP/router does not handle it well and many do not. Actually there are a lot of AP/router settings that could contribute to a problem like this. Could it be problems with power saving settings in the bios of your computer? Yes. Does it happen when you are using the system or do you only see this after the system has gone down into a power saving mode?