greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 40 forks source link

Low download throughput on Netgear r7800 (qca9984) with ath10k-ct firmware #138

Open rickkdotnet opened 4 years ago

rickkdotnet commented 4 years ago

Description of the problem (how to configure, how to reproduce, how often it happens).

Opening a separate issue for https://github.com/greearb/ath10k-ct/issues/136#issuecomment-641095620:

A Netgear R7800 running OpenWRT with -ct firmware consistently consistenly gives significantly lower download throughput on 5Ghz VHT80 than the same setup with mainline firmware.

An iperf single-stream TCP download from the AP or a host connected to ethernet show an unstable 150-300 mb/s while mainline firmware hovers around 400mb/s. Upload speeds on both firmwares are comparable and well over 300 mb/s in the same circumstances.

Multiple streams such as the Flent RRUL tests show more dramatic differences of over 5x lower throughput, these are typical: 84118608-85056a00-aa33-11ea-843a-eb29cff7e6aa 84118620-88005a80-aa33-11ea-9278-1a9a9fdf66c5

In real world use the connection feels "laggy" at times but usable - no disconnects. Bitrate as reported by the OpenWRT GUI is constantly changing.

Software (OS, Firmware version, kernel, driver, etc):

I've tried at least the following combinations:

OpenWRT 19.07.03 (Hnyman build from his dropbox)

OpenWRT r13399-559b338466 (Hnyman build from his dropbox)

OpenWRT r13517-6fca1646dd (own build based on Hnyman scripts)

I've also tried both the full and the full-hgt-mgt variants, and a version 012 of the firmware but my logs are incomplete.

All tests were done in ETSI NL 5Ghz against a MacBook Pro (15-inch, 2018) with an
AirPort Extreme (0x14E4, 0x7BF) , Feb 28 2020 15:24:56 version 9.30.357.35.32.5.47 FWID 01-9ce4adf3, 802.11 a/b/g/n/ac card. I'll try a different client soon.

Channels, regulatory domain and distance to the AP do not seem to be factors.

Hardware (NIC chipset, platform, etc)

Recently bought Netgear r7800 with qca9984.

Logs (dmesg, maybe supplicant and/or hostap)

I've attached a dmesg of OpenWRT r13399-559b338466 (Hnyman build from his dropbox). dmesg.boot-r13399-559b338466-10.4b-ct-9984-fW-013-d81f62d97.txt

In use I occasionally see ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats.

As a user I'm currently happy with the mainline firmware, but seeing all the work that went/goes into ath10k-ct it would be nice if this can be fixed.

rickkdotnet commented 4 years ago

I did some more testing with the -ct firmware. Right after reboot into -ct, everything seemed fine. A bit lower throughput perhaps, but lower latency also than mainstream firmware. All good.

I started downstairs, close to the AP and everything was good. I moved upstairs and performance was still reasonable. I was already starting to wonder if I haven't been imaginging things until suddenly rrul download throughput on my primary client went south again:

netperf-20200628-1328-MXM-5Ghz-lug2gw-ct-rebburo

I started testing with UDP streams (netperf -t udp_stream) and these showed completely random results for received packets. Anywhere from 0% to 100% loss.

Now.. to make things interesting.. I switched to another laptop (2015 Macbook pro sitting on the same desk) now.. RRUL download throughput on this laptop was still reasonable, but here latency was all over the place.

netperf-20200617-1317-MXM-5Ghz-ct-mbp-gw-buro

The logs show nothing really spectacular, apart from these every minute: kern/20200610.log:Jun 10 12:06:08 nighthawk kernel: [ 2096.656937] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats kern/20200610.log:Jun 10 12:06:10 nighthawk kernel: [ 2098.908147] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats kern/20200610.log:Jun 10 12:06:11 nighthawk kernel: [ 2099.613435] ath10k_pci 0001:01:00.0: Invalid peer id 1 peer stats buffer kern/20200610.log:Jun 10 12:06:11 nighthawk kernel: [ 2099.618800] ath10k_pci 0001:01:00.0: Invalid peer id 1 peer stats buffer kern/20200610.log:Jun 10 12:06:12 nighthawk kernel: [ 2100.342861] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats

But these are also there when performance is reasonable.

I couldn't do anything to recover from this state. Rebooting openwrt nor the clients helped. The only thing that worked was... reverting to the mainline firmware.

Perhaps there's something happening in my radio spectrum that's handled differently by -ck ?

When looking at my SNR graphs I see similar results for mainline and ct firmware, until a sudden drop in "power" (that's signal such as reported by iwinfo) on 12:36. That is likely because I moved my laptop back upstairs to the spot where I do most of my testing,but... I think it's a pretty sharp drop (~-90 dBm) compared to what the mainline firmware reports later (~-77dBm).

I'm back on mainline at 13:36, I'm still in the same spot, and everything is back to normal.

image

So in the end it seems to be just a plain old signal strength issue, with the presence of low-rate clients probably messing up earlier tests, blabla.

So I switched off all other stations, rebooted into -ct again, moving between the AP and the 'bad' spot and sure enough, close to the AP throughput is fine but a few meters out problems start. In my normal spot upstairs performance is just bad with -ct, and signal reported by iwinfo is ~-90dBm.

I can imagine bad performance with -90 dBm, but I don't understand why mainline firmware reports ~-77dBm and is doing great in the exact same circumstances. Also, note that throughput to the AP is just fine.

Perhaps (this generation of?) the R7800 is a bit different from others on the radio side?

greearb commented 4 years ago

Hello, are you willing to test a series of builds to bisect this problem? I can build you the image from my FW that is very close to an upstream QCA FW, and if that does not exhibit the problem, then you can bisect to find where the problem was introduced....

Fail-Safe commented 3 years ago

@rickkdotnet Curious how this is going for you now. I have similar experiences with the -ct firmware and would like to know if you and @greearb made headway on it.

xupefei commented 3 years ago

Any updates yo this issue? @greearb I can try to bisect firmware if @rickkdotnet cannot make it.

I'm using a Macbook Pro 15', with Wifi upload speed 700mbps and download (has been cut in half) 150-300mbps.

gsustek commented 3 years ago

Please try with this firmware: https://www.candelatech.com/downloads/ath10k-9984-10-4/firmware-5-ct-non-commercial-full-htt-mgt-11.bin That was the last firmware i used and achieved 80-90MB/s.

xupefei commented 3 years ago

@gsustek Thank you! I tried this firmware and see a small improvement over the default OpenWrt firmware. The speed is quite unstable but the peak can be 500mbps and average ~350mbps.

greearb commented 3 years ago

I uploaded a new set of binaries that you can bisect with.

http://www.candelatech.com/downloads/ath10k-9984-10-4b/bisect/all_builds-9984-H-dec-7-2020.tar.gz

xupefei commented 3 years ago

Hi @greearb, I tried a few firmwares in your package but unable to pin down the first bad version due to unstable TX throughput. Here is my record: https://www.icloud.com/numbers/0Y5PrlO8nDED_1kZ7ulZf4aTw#atk10k_bisect

My setup:

My observations:

For comparison, I flashed Hnyman's non-CT image R7800-master-r15207-c29f6121fd-20201213-2136-sysupgrade.bin, moved IRQ32 to CPU1, and can see s stable >500mbps TX speed.

Any suggestion for me to move forward?

ahart241 commented 3 years ago

Hello @xupefei @greearb @gsustek @rickkdotnet

I've been following this thread as I too was experiencing similar issues (high latency, low transmit throughput 150~300Mbps) on a EA8500 running OpenWRT 19.07.03.

I am happy to say that I am seeing much improved throughput and lower latency after updating hostapd 2019-08-08-ca8c2bd2-3 » 2019-08-08-ca8c2bd2-4. Based on this and what @greearb mentioned about the ath10k-ct firmware versions being the same between 19.07.01->19.07.03, I am inclined to think that this may not be a firmware bug, but rather a regression in hostapd. Is anyone else able to validate this? Perhaps the next steps to move forward is test this against different versions of hostapd to see if/when this regression may have been reintroduced and report an issue with hostapd maintainer?

My current configuration:

OpenWRT 19.07.03

xupefei commented 3 years ago

I am happy to say that I am seeing much improved throughput and lower latency after updating hostapd 2019-08-08-ca8c2bd2-3 » 2019-08-08-ca8c2bd2-4

@ahart241 Could you elaborate? the 2019-08-08-ca8c2bd2-4 version has been there for a while. What is the throughput you get before and after?

ahart241 commented 3 years ago

Previous throughput was between 150-300Mbps, after upgrading from hostapd 2019-08-08-ca8c2bd2-3 » 2019-08-08-ca8c2bd2-4 and with Software Flow Offloading enabled I am achieving download speeds topping out around 480Mbps. Testing using with iperf3, between my phone (Samsung Galaxy S10) and the EA8500 I am getting 470-520Mbps RX and 270-290Mbps TX from perspective of the router. Also, this ping response time is seemingly better after upgrading hostapd. I know that these are older (August 2019) builds and that this is being reported on later (development snapshot?) builds, but I am wondering if this might be a regression in hostapd.

igiannakas commented 3 years ago

experiencing the same issue with my R7800.

  1. Router configured as dumb access point
  2. Installed release 19.07.5
  3. Configured 5Ghz band as 80mhz, channel 32 (no conflicting AP's on this)
  4. Run iPERF3 test between my MacBook Pro Touch Bar 2016 and my home server wired to the AP. Download speed while close to the AP was ~700mbps and upload ~550-600mbps
  5. Run iPERF3 test between my MacBook Pro Touch Bar 2019 and my home server. Upload speed ~550-600mbps. But the download speed starts at ~350-400mbps and drops to ~150mbps.
  6. Reverted firmware to stock latest Netgear. Same channel config. Both laptops now download at full speed (~700-750mbps).
greearb commented 3 years ago

Comparing against netgear firmware doesn't help me much, there could be huge number of other factors that affect performance vs openwrt. Comparing against owrt with upstream qca firmware instead of ath10k-ct firmware (and/or upstream driver vs ath10k-ct driver) may be more helpful.

timkgh commented 3 years ago

I just ran an iperf3 TCP test using hnyman's build master-r15759-ce4cb8e51d-20210214 which comes with the CT driver and firmware 10.4b-ct-9984-fW-13-5ae337bb1. One side is a wired Linux server, the other side a 2019 MacBook Pro on 5GHz radio. The only difference between the tests is the firmware, otherwise the setup was exactly the same, same location.

Mainline firmware 10.4-3.9.0.2-00139:

[ ID] Interval           Transfer     Bitrate         Retr
[  7]   0.00-10.01  sec   571 MBytes   478 Mbits/sec  110             sender
[  7]   0.00-10.00  sec   568 MBytes   476 Mbits/sec                  receiver

CT 10.4b-ct-9984-fW-13-5ae337bb1:

[ ID] Interval           Transfer     Bitrate         Retr
[  7]   0.00-10.00  sec   205 MBytes   172 Mbits/sec  250             sender
[  7]   0.00-10.00  sec   202 MBytes   170 Mbits/sec                  receiver

The performance difference is significant.