Open rickkdotnet opened 4 years ago
I did some more testing with the -ct firmware. Right after reboot into -ct, everything seemed fine. A bit lower throughput perhaps, but lower latency also than mainstream firmware. All good.
I started downstairs, close to the AP and everything was good. I moved upstairs and performance was still reasonable. I was already starting to wonder if I haven't been imaginging things until suddenly rrul download throughput on my primary client went south again:
I started testing with UDP streams (netperf -t udp_stream) and these showed completely random results for received packets. Anywhere from 0% to 100% loss.
Now.. to make things interesting.. I switched to another laptop (2015 Macbook pro sitting on the same desk) now.. RRUL download throughput on this laptop was still reasonable, but here latency was all over the place.
The logs show nothing really spectacular, apart from these every minute: kern/20200610.log:Jun 10 12:06:08 nighthawk kernel: [ 2096.656937] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats kern/20200610.log:Jun 10 12:06:10 nighthawk kernel: [ 2098.908147] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats kern/20200610.log:Jun 10 12:06:11 nighthawk kernel: [ 2099.613435] ath10k_pci 0001:01:00.0: Invalid peer id 1 peer stats buffer kern/20200610.log:Jun 10 12:06:11 nighthawk kernel: [ 2099.618800] ath10k_pci 0001:01:00.0: Invalid peer id 1 peer stats buffer kern/20200610.log:Jun 10 12:06:12 nighthawk kernel: [ 2100.342861] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats
But these are also there when performance is reasonable.
I couldn't do anything to recover from this state. Rebooting openwrt nor the clients helped. The only thing that worked was... reverting to the mainline firmware.
Perhaps there's something happening in my radio spectrum that's handled differently by -ck ?
When looking at my SNR graphs I see similar results for mainline and ct firmware, until a sudden drop in "power" (that's signal such as reported by iwinfo) on 12:36. That is likely because I moved my laptop back upstairs to the spot where I do most of my testing,but... I think it's a pretty sharp drop (~-90 dBm) compared to what the mainline firmware reports later (~-77dBm).
I'm back on mainline at 13:36, I'm still in the same spot, and everything is back to normal.
So in the end it seems to be just a plain old signal strength issue, with the presence of low-rate clients probably messing up earlier tests, blabla.
So I switched off all other stations, rebooted into -ct again, moving between the AP and the 'bad' spot and sure enough, close to the AP throughput is fine but a few meters out problems start. In my normal spot upstairs performance is just bad with -ct, and signal reported by iwinfo is ~-90dBm.
I can imagine bad performance with -90 dBm, but I don't understand why mainline firmware reports ~-77dBm and is doing great in the exact same circumstances. Also, note that throughput to the AP is just fine.
Perhaps (this generation of?) the R7800 is a bit different from others on the radio side?
Hello, are you willing to test a series of builds to bisect this problem? I can build you the image from my FW that is very close to an upstream QCA FW, and if that does not exhibit the problem, then you can bisect to find where the problem was introduced....
@rickkdotnet Curious how this is going for you now. I have similar experiences with the -ct firmware and would like to know if you and @greearb made headway on it.
Any updates yo this issue? @greearb I can try to bisect firmware if @rickkdotnet cannot make it.
I'm using a Macbook Pro 15', with Wifi upload speed 700mbps and download (has been cut in half) 150-300mbps.
Please try with this firmware: https://www.candelatech.com/downloads/ath10k-9984-10-4/firmware-5-ct-non-commercial-full-htt-mgt-11.bin That was the last firmware i used and achieved 80-90MB/s.
@gsustek Thank you! I tried this firmware and see a small improvement over the default OpenWrt firmware. The speed is quite unstable but the peak can be 500mbps and average ~350mbps.
I uploaded a new set of binaries that you can bisect with.
http://www.candelatech.com/downloads/ath10k-9984-10-4b/bisect/all_builds-9984-H-dec-7-2020.tar.gz
Hi @greearb, I tried a few firmwares in your package but unable to pin down the first bad version due to unstable TX throughput. Here is my record: https://www.icloud.com/numbers/0Y5PrlO8nDED_1kZ7ulZf4aTw#atk10k_bisect
My setup:
R7800-master-r15237-fca0eb2d92-20201218-1651-sysupgrade.bin
My observations:
For comparison, I flashed Hnyman's non-CT image R7800-master-r15207-c29f6121fd-20201213-2136-sysupgrade.bin
, moved IRQ32 to CPU1, and can see s stable >500mbps TX speed.
Any suggestion for me to move forward?
Hello @xupefei @greearb @gsustek @rickkdotnet
I've been following this thread as I too was experiencing similar issues (high latency, low transmit throughput 150~300Mbps) on a EA8500 running OpenWRT 19.07.03.
I am happy to say that I am seeing much improved throughput and lower latency after updating hostapd 2019-08-08-ca8c2bd2-3 » 2019-08-08-ca8c2bd2-4. Based on this and what @greearb mentioned about the ath10k-ct firmware versions being the same between 19.07.01->19.07.03, I am inclined to think that this may not be a firmware bug, but rather a regression in hostapd. Is anyone else able to validate this? Perhaps the next steps to move forward is test this against different versions of hostapd to see if/when this regression may have been reintroduced and report an issue with hostapd maintainer?
My current configuration:
I am happy to say that I am seeing much improved throughput and lower latency after updating hostapd 2019-08-08-ca8c2bd2-3 » 2019-08-08-ca8c2bd2-4
@ahart241 Could you elaborate? the 2019-08-08-ca8c2bd2-4
version has been there for a while. What is the throughput you get before and after?
Previous throughput was between 150-300Mbps, after upgrading from hostapd 2019-08-08-ca8c2bd2-3 » 2019-08-08-ca8c2bd2-4 and with Software Flow Offloading enabled I am achieving download speeds topping out around 480Mbps. Testing using with iperf3, between my phone (Samsung Galaxy S10) and the EA8500 I am getting 470-520Mbps RX and 270-290Mbps TX from perspective of the router. Also, this ping response time is seemingly better after upgrading hostapd. I know that these are older (August 2019) builds and that this is being reported on later (development snapshot?) builds, but I am wondering if this might be a regression in hostapd.
experiencing the same issue with my R7800.
Comparing against netgear firmware doesn't help me much, there could be huge number of other factors that affect performance vs openwrt. Comparing against owrt with upstream qca firmware instead of ath10k-ct firmware (and/or upstream driver vs ath10k-ct driver) may be more helpful.
I just ran an iperf3
TCP test using hnyman's build master-r15759-ce4cb8e51d-20210214
which comes with the CT driver and firmware 10.4b-ct-9984-fW-13-5ae337bb1
. One side is a wired Linux server, the other side a 2019 MacBook Pro on 5GHz radio. The only difference between the tests is the firmware, otherwise the setup was exactly the same, same location.
Mainline firmware 10.4-3.9.0.2-00139
:
[ ID] Interval Transfer Bitrate Retr
[ 7] 0.00-10.01 sec 571 MBytes 478 Mbits/sec 110 sender
[ 7] 0.00-10.00 sec 568 MBytes 476 Mbits/sec receiver
CT 10.4b-ct-9984-fW-13-5ae337bb1
:
[ ID] Interval Transfer Bitrate Retr
[ 7] 0.00-10.00 sec 205 MBytes 172 Mbits/sec 250 sender
[ 7] 0.00-10.00 sec 202 MBytes 170 Mbits/sec receiver
The performance difference is significant.
Description of the problem (how to configure, how to reproduce, how often it happens).
Opening a separate issue for https://github.com/greearb/ath10k-ct/issues/136#issuecomment-641095620:
A Netgear R7800 running OpenWRT with -ct firmware consistently consistenly gives significantly lower download throughput on 5Ghz VHT80 than the same setup with mainline firmware.
An iperf single-stream TCP download from the AP or a host connected to ethernet show an unstable 150-300 mb/s while mainline firmware hovers around 400mb/s. Upload speeds on both firmwares are comparable and well over 300 mb/s in the same circumstances.
Multiple streams such as the Flent RRUL tests show more dramatic differences of over 5x lower throughput, these are typical:
In real world use the connection feels "laggy" at times but usable - no disconnects. Bitrate as reported by the OpenWRT GUI is constantly changing.
Software (OS, Firmware version, kernel, driver, etc):
I've tried at least the following combinations:
OpenWRT 19.07.03 (Hnyman build from his dropbox)
OpenWRT r13399-559b338466 (Hnyman build from his dropbox)
OpenWRT r13517-6fca1646dd (own build based on Hnyman scripts)
I've also tried both the full and the full-hgt-mgt variants, and a version 012 of the firmware but my logs are incomplete.
All tests were done in ETSI NL 5Ghz against a MacBook Pro (15-inch, 2018) with an
AirPort Extreme (0x14E4, 0x7BF) , Feb 28 2020 15:24:56 version 9.30.357.35.32.5.47 FWID 01-9ce4adf3, 802.11 a/b/g/n/ac card. I'll try a different client soon.
Channels, regulatory domain and distance to the AP do not seem to be factors.
Hardware (NIC chipset, platform, etc)
Recently bought Netgear r7800 with qca9984.
Logs (dmesg, maybe supplicant and/or hostap)
I've attached a dmesg of OpenWRT r13399-559b338466 (Hnyman build from his dropbox). dmesg.boot-r13399-559b338466-10.4b-ct-9984-fW-013-d81f62d97.txt
In use I occasionally see ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats.
As a user I'm currently happy with the mainline firmware, but seeing all the work that went/goes into ath10k-ct it would be nice if this can be fixed.