morrownr / USB-WiFi

USB WiFi Adapter Information for Linux
2.4k stars 161 forks source link

CF-953AX crashes under sustained load, AWUS036AXML is fine #328

Closed ViRb3 closed 8 months ago

ViRb3 commented 8 months ago

So I've been casually using CF-953AX on a Raspberry Pi 4 for almost a year now, and it worked fine. However, a few days ago I wanted to upload a large archive (~300GB) and came across an easily reproducible workflow that crashes the adapter.

First, some specs:

Now, for the workload: I deploy https://github.com/9001/copyparty, a fancy webserver, behind HTTPS with nginx. The webserver lets me upload files from a browser, and crucially, split them into multiple streams to maximize speeds. I set the parallelization factor to 16, and we're off to a good start at about 30MB/s. But after 15-30 minutes, the adapter crashes. Unplugging it and plugging it back again makes it work, but it's stuck at ~5Mbit speeds. Only a full system restart brings it back to normal.

I tried this workload 3 times and reproduced the crash each time within 30 minutes.

Now I got my hands on a AWUS036AXML and ran it through the same workload. All 300GB of data uploaded with zero issues. Strangely, it's also stuck at ~5Mbit speeds when first plugged in, until I restart the system and then it goes to 500Mbit+. Also, LED glows in green on AWUS036AXML, I saw some reports that it doesn't work but definitely does for me.

I don't know if this is the Comfast's fault, but wanted to share and get some thoughts :)

morrownr commented 8 months ago

Hi @ViRb3

Please run the following and post the results in a reply:

$ sudo dmesg | grep usb $ sudo dmesg | grep mt7

What port is the 93AX and AXML plugged into? On the Pi or on the powered hub? I think you said the hub for the 963AX... the same with the AXML?

Plugged into USB3 or USB2 port?

Do you have Scatter-Gather turned off?

Over the years, I have seen a lot of problems with USB3 capable wifi adapters using USB3 on the RasPi4B.

ViRb3 commented 8 months ago

These are the errors that directly correlate with the adapter crashing. There are no other logs for minutes before:

Nov 03 01:37:24 server kernel: usb 2-1.4: USB disconnect, device number 7
Nov 03 01:37:24 server kernel: wlan0: deauthenticating from xx:xx:xx:xx:xx:xx by local choice (Reason: 3=DEAUTH_LEAVING)
Nov 03 01:37:25 server kernel: wlan0: failed to remove key (0, xx:xx:xx:xx:xx:xx) from hardware (-19)
Nov 03 01:37:26 server kernel: wlan0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-19)
Nov 03 01:37:26 server kernel: mt7921u 2-1.4:1.3: timed out waiting for pending tx
Nov 03 01:37:28 server kernel: usb 2-1.4: new SuperSpeed USB device number 8 using xhci_hcd
Nov 03 01:37:28 server kernel: usb 2-1.4: New USB device found, idVendor=0e8d, idProduct=7961, bcdDevice= 1.00
Nov 03 01:37:28 server kernel: usb 2-1.4: New USB device strings: Mfr=6, Product=7, SerialNumber=8
Nov 03 01:37:28 server kernel: usb 2-1.4: Product: Wireless_Device
Nov 03 01:37:28 server kernel: usb 2-1.4: Manufacturer: MediaTek Inc.
Nov 03 01:37:28 server kernel: usb 2-1.4: SerialNumber: 000000000
Nov 03 01:37:30 server kernel: Bluetooth: hci0: Opcode 0x c03 failed: -110
Nov 03 01:37:30 server kernel: usb 2-1.4: reset SuperSpeed USB device number 8 using xhci_hcd
Nov 03 01:37:30 server kernel: mt7921u 2-1.4:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230117170855a
Nov 03 01:37:31 server kernel: mt7921u 2-1.4:1.3: WM Firmware Version: ____010000, Build Time: 20230117170942
Nov 03 01:37:32 server kernel: Bluetooth: hci0: Opcode 0x c03 failed: -110
Nov 03 01:37:34 server kernel: Bluetooth: hci0: Opcode 0x c03 failed: -110
Nov 03 01:37:35 server kernel: wlan0: authenticate with xx:xx:xx:xx:xx:xx
Nov 03 01:37:35 server kernel: wlan0: 80 MHz not supported, disabling VHT
Nov 03 01:37:35 server kernel: wlan0: send auth to xx:xx:xx:xx:xx:xx (try 1/3)
Nov 03 01:37:35 server kernel: wlan0: authenticated
Nov 03 01:37:35 server kernel: wlan0: associate with xx:xx:xx:xx:xx:xx (try 1/3)
Nov 03 01:37:35 server kernel: wlan0: RX AssocResp from xx:xx:xx:xx:xx:xx (capab=0x1411 status=0 aid=1)
Nov 03 01:37:35 server kernel: wlan0: associated
Nov 03 01:37:35 server kernel: wlan0: Limiting TX power to 20 (20 - 0) dBm as advertised by xx:xx:xx:xx:xx:xx
Nov 03 01:37:35 server kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

Both adapters were/are plugged into the exact same USB3 port, which is the powered hub. Only the powered hubs themselves go into the Raspberry Pi, as to avoid shortage of current.

Have not disabled Scatter-Gather on either adapters, will try now and report back.

ViRb3 commented 8 months ago

By the way, I read somewhere that the Comfast may be overheating and shutting down under sustained load. I'm about 15 minutes through the workflow, and to be honest, it does feel noticeably warm to the touch. Didn't experience such an effect with the Alfa, but it could be due to the different casing. Test continues...

morrownr commented 8 months ago

Nov 03 01:37:31 kernel: mt7921u 2-1.4:1.3: WM Firmware Version: ____010000, Build Time: 20230117170942

Your firmware is a few versions behind.

Go to:

https://github.com/morrownr/USB-WiFi

Menu item 8 gives instructions for adding or, in your case, updating firmware.

You need section 2. MT7921 - mt7921au, mt7921, and mt7921k (AMD RZ608) chipsets

If you are not using the Bluetooth from the adapters, which you won't be since it is turned off, you can delete the Bluetooth firmware:

sudo rm /lib/firmware/mediatek//BT_RAM_CODE_MT7961_1_2_hdr.bin

Retest and report the dmesg info as requested earlier.

morrownr commented 8 months ago

I read somewhere that the Comfast may be overheating and shutting down under sustained load.

That could be true but our testing with power usage seems to indicate it would be hard to get this chipset to heat up to the point that it will cut out. It uses very low amounts of power even at full speed. About 250 mA. The rtl8812bu chipset will use around 500 mA at full speed and the rtl8814au chipset will use over 800 mA at full speed. It is possible that Comfast has particularly bad thermal characterics in that particular case and over time it could be a problem. Alfa has a history of adapters that have good thermal characteristics. Overall, Alfa just makes better quality adapters than other companies. Yes, they cost more but good quality can actually save you money in the long run.

While the usb wifi adapter industry considers usb wifi to be a home product and that means light usage to them. When I look at the analytics of this site, I see far to many folks that are in IT of big companies and government agencies as well as tech companies. We need a reclassification and the adapter maker that does it first can reap the benefits. We need adapter designs that work for a lot of wearable products as well as robotics type of situations.

Let me know when you get the results of the tests with the new firmware. I have more ideas.

ViRb3 commented 8 months ago

Sanity checked old firmware, took 250GB for the crash to happen. Updated the firmware, confirmed it worked, then got the absolutely same error, this time after just 1GB:

[ 2623.765788] usb 2-1.4: USB disconnect, device number 7
[ 2623.781817] wlan0: deauthenticating from xx:xx:xx:xx:xx:xx by local choice (Reason: 3=DEAUTH_LEAVING)
[ 2624.285951] wlan0: failed to remove key (0, xx:xx:xx:xx:xx:xx) from hardware (-19)
[ 2625.021865] wlan0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-19)
[ 2625.241775] mt7921u 2-1.4:1.3: timed out waiting for pending tx
[ 2627.174064] usb 2-1.4: new SuperSpeed USB device number 8 using xhci_hcd
[ 2627.196396] usb 2-1.4: New USB device found, idVendor=0e8d, idProduct=7961, bcdDevice= 1.00
[ 2627.196407] usb 2-1.4: New USB device strings: Mfr=6, Product=7, SerialNumber=8
[ 2627.196413] usb 2-1.4: Product: Wireless_Device
[ 2627.196418] usb 2-1.4: Manufacturer: MediaTek Inc.
[ 2627.196423] usb 2-1.4: SerialNumber: 000000000
[ 2629.213821] Bluetooth: hci0: Opcode 0x c03 failed: -110
[ 2629.294913] usb 2-1.4: reset SuperSpeed USB device number 8 using xhci_hcd
[ 2629.356316] mt7921u 2-1.4:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a
[ 2629.642519] mt7921u 2-1.4:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958
[ 2631.325843] Bluetooth: hci0: Opcode 0x c03 failed: -110
[ 2633.341881] Bluetooth: hci0: Opcode 0x c03 failed: -110
[ 2633.957451] wlan0: authenticate with xx:xx:xx:xx:xx:xx
[ 2633.957491] wlan0: 80 MHz not supported, disabling VHT
[ 2634.097516] wlan0: send auth to xx:xx:xx:xx:xx:xx (try 1/3)
[ 2634.100831] wlan0: authenticated
[ 2634.101882] wlan0: associate with xx:xx:xx:xx:xx:xx (try 1/3)
[ 2634.107111] wlan0: RX AssocResp from xx:xx:xx:xx:xx:xx (capab=0x1411 status=0 aid=1)
[ 2634.139076] wlan0: associated
[ 2634.144506] wlan0: Limiting TX power to 20 (20 - 0) dBm as advertised by xx:xx:xx:xx:xx:xx
[ 2634.200876] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

Testing with Scatter-Gather disabled now... Just to confirm, this is the correct method, right?

echo "options mt76_usb disable_usb_sg=1" >> /etc/modprobe.d/mt76_usb.conf

Honestly, kind of hoping it crashes again, so I can finally accept that the cheap Comfast is no match for the 3 times more expensive Alfa, and just call it a day.

ViRb3 commented 8 months ago

And it died again, this time after about 120GB. Will now re-run the entire transfer a few times with the Alfa, and if it doesn't crash, I think we have an answer.

morrownr commented 8 months ago

Testing with Scatter-Gather disabled now... Just to confirm, this is the correct method, right?

Try testing the setting with:

$ grep [[:alnum:]] /sys/module/mt76_usb/parameters/*

Bluetooth: hci0: Opcode 0x c03 failed: -110

If you delete the blurtooth firmware file, you won't have this in your log.

Honestly, kind of hoping it crashes again, so I can finally accept that the cheap Comfast is no match for the 3 times more expensive Alfa, and just call it a day.

I have a CF-951AX with the same chipset. There are some quality control or engineering issues but it does some things well so I added it to the Plug and Play list and documented the problems so shoppers will know. It will not work with any powered hub or extension cable.

I've got something else for you to test if interested. Plug the adapters into a USB2 port on the Pi4B and test. Plug the adapters into a USB3 port on the Pi4B and test. Tell me the results of both tests.

Now, why am I asking the above:

Over the last few years, since the Pi4B has been available, I have seen so many problems with USB3 capable adapters using USB3 ports...both Mediatek and Realtek. The rtl8812bu chipset is particularly bad and is noted in the driver docs.

There are suspects. The USB3 chipset that RasPi picked for the Pi4B is suspected to have silicone errors that have not been corrected in software and it may not be possible. There may be other problems in the Pi usb subsystem. It just does not seem to be able to handle sustained high loads in many setups. Then there is the issue of powered hubs which you are probably aware of. If you had just told me your setup and plan, I would replied "good luck with that." Seriously, I am amazed that it is working as well as it is. I really don't think it is the adapter driver or firmware but rather something in the usb subsystem of the Pi4B.

I'm actually pretty good at stabilizing things out but it might leave you with a system that is not doing what you want. The Pi4B really needed another way to hook up fast storage as the usb subsystem is a weak link that cannot handle some setups.

Nick

morrownr commented 8 months ago

And it died again, this time after about 120GB. Will now re-run the entire transfer a few times with the Alfa, and if it doesn't crash, I think we have an answer.

I am interested in knowing the results. Please keep me posted.

ViRb3 commented 8 months ago

Try testing the setting with: $ grep [[:alnum:]] /sys/module/mt76_usb/parameters/*

cat /sys/module/mt76_usb/parameters/disable_usb_sg
Y

Seems like Scatter-Gather disable worked. It shows N when I remove the modprobe entry.

If you delete the blurtooth firmware file, you won't have this in your log.

Ah I know, I just couldn't be bothered as I'll forget to revert it (not that I'll ever need it but still). This shouldn't have any impact on the crashing, right?

I am interested in knowing the results. Please keep me posted.

Will certainly do!

The USB3 chipset that RasPi picked for the Pi4B is suspected to have silicone errors that have not been corrected in software and it may not be possible.

Couldn't be more excited for my Pi 5 to arrive with its in-house I/O controller :)

There are suspects. The USB3 chipset that RasPi picked for the Pi4B is suspected to have silicone errors that have not been corrected in software and it may not be possible. There may be other problems in the Pi usb subsystem. It just does not seem to be able to handle sustained high loads in many setups. Then there is the issue of powered hubs which you are probably aware of. If you had just told me your setup and plan, I would replied "good luck with that." Seriously, I am amazed that it is working as well as it is. I really don't think it is the adapter driver or firmware but rather something in the usb subsystem of the Pi4B.

I may be a small sample pool, but I have two more RPI4s running with different powered hubs and external HDDs, except they are connected over ethernet. They've had 100% uptime for the past 2 years and haven't crashed on me a single time with the Raspberry Pi OS. I've done some serious sustained I/O on them too, although that may be limited due to the CPU bottlenecking on the LUKS encryption overhead. By the way, before that I was on Ubuntu, and I was getting kernel panic every week. I think RPi Foundation are really on top of their hardware bug patching.

bjlockie commented 8 months ago

I see far to many folks that are in IT of big companies and government agencies as well as tech companies.

I suspect many people come here AFTER someone has a non plug and play adapter. :-(

morrownr commented 8 months ago

I suspect many people come here AFTER someone has a non plug and play adapter. :-(

You are spot on. In fact I'll go further, Not only have they picked bad adapters but they have spent a lot of money and their boss is after their ass. A lot of the sbc's out there have very poor software support and companies get in trouble buying their products. How often I see someone who had bought sbc's that only have support for kernel 4.4 and they buy something like adapters based on rtl8812bu or rtl8832bu and want WPA3 support or WiFi 6 capability. After telling them that is not going to happen, I get "but we are willing to pay." They don't understand. My only advicein cases like this is that in the future, get a person that knows Linux and wifi involved in the planning.

I get contacted at times by companies and it is almost never to recommend something for a project, it almost always because things have blown up and they want someone to fix their problem. My answer is "give me the details and I'll give you a quote."

It is good to make a plan but it is bad to fall in love with the plan. Purchase small quantities and test you plan until you find out what works well.

ViRb3 commented 8 months ago

Here's the update - I transferred about 600GB with the Alfa and there were zero issues. I made absolutely no other changes to the setup. I think this answers all of our questions. For any future readers, don't cheap out on a Comfast and risk obscure issues months or years later. Get an Alfa (or other adapter with good track history) and call it a day :)

Thanks for the help with this investigation @morrownr!

kasinjsh commented 8 months ago

Here's the update - I transferred about 600GB with the Alfa and there were zero issues. I made absolutely no other changes to the setup. I think this answers all of our questions. For any future readers, don't cheap out on a Comfast and risk obscure issues months or years later. Get an Alfa (or other adapter with good track history) and call it a day :)

Thanks for the help with this investigation @morrownr!

I tested my Fenvi ax1800 whit the same chipset and I did noticed it getting hot on upload (as device is transmitting, not receiving, it puts on work to radio), tested whit iperf3 for 20mins and I was not comfortable continuing my testing up to 30mins as I did on download. Issue here is heatsink, to be more precise lack of it. For fact I know Fenvi device has no heatsink, and it is required if you do sustained uploads. I'm quite sure Alfa has at least something on that chip. For average Joe this is not a issue, but ifor your use case it is a issue. If you feel comfortable opening device and tinkering, you can try add heatsink on your own.