greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 40 forks source link

5 GHz wifi is really laggy on R7800 running OpenWRT 19.07.3 #136

Closed graysky2 closed 4 years ago

graysky2 commented 4 years ago

Please provide this info. See this link for more info on how to gather debug info: http://www.candelatech.com/ath10k-bugs.php

Description of the problem (how to configure, how to reproduce, how often it happens).

This problem only affects devices (iphones seen on two different ones so far) connected to my 5 GHz WiFi eventually get really sluggish to load data when browsing in Safari, Chrome, or using the Facebook app. When working properly, there is a minimal lag when, for example, googing and awaiting the hit set to return, or simply scrolling through the feed within the facebook app.

When I experience this "sluggish to load data" effect, I can wait 5-20+ seconds to see the hit set after hitting "go" on google search. Or I can see the facebook app trying to load more data but eventually timing out.

I can also see this if I ssh into the OpenWRT device and ping one of the iphones. Example:

# ping 10.9.8.167
PING 10.9.8.167 (10.9.8.167): 56 data bytes
64 bytes from 10.9.8.167: seq=0 ttl=64 time=1451.634 ms
64 bytes from 10.9.8.167: seq=1 ttl=64 time=450.724 ms
64 bytes from 10.9.8.167: seq=2 ttl=64 time=1433.493 ms
64 bytes from 10.9.8.167: seq=3 ttl=64 time=432.479 ms
64 bytes from 10.9.8.167: seq=7 ttl=64 time=4728.605 ms
64 bytes from 10.9.8.167: seq=8 ttl=64 time=3727.968 ms
64 bytes from 10.9.8.167: seq=9 ttl=64 time=2726.992 ms
64 bytes from 10.9.8.167: seq=10 ttl=64 time=1725.764 ms
64 bytes from 10.9.8.167: seq=11 ttl=64 time=724.267 ms
64 bytes from 10.9.8.167: seq=12 ttl=64 time=2.335 ms
64 bytes from 10.9.8.167: seq=13 ttl=64 time=2.113 ms
64 bytes from 10.9.8.167: seq=27 ttl=64 time=4191.774 ms
64 bytes from 10.9.8.167: seq=28 ttl=64 time=3189.923 ms
64 bytes from 10.9.8.167: seq=29 ttl=64 time=2189.990 ms
64 bytes from 10.9.8.167: seq=30 ttl=64 time=1189.751 ms
64 bytes from 10.9.8.167: seq=31 ttl=64 time=189.554 ms
64 bytes from 10.9.8.167: seq=32 ttl=64 time=1.675 ms
64 bytes from 10.9.8.167: seq=33 ttl=64 time=662.067 ms
64 bytes from 10.9.8.167: seq=34 ttl=64 time=744.757 ms
^C
--- 10.9.8.167 ping statistics ---
36 packets transmitted, 19 packets received, 47% packet loss
round-trip min/avg/max = 1.675/1566.624/4728.605 ms

If I run /etc/init.d/network restart the problem goes away but will eventually return. Sometimes it takes hours, other times, it returns more quickly.

Software (OS, Firmware version, kernel, driver, etc)

Hardware (NIC chipset, platform, etc)

Logs (dmesg, maybe supplicant and/or hostap) I increased the debugging level, here is dmesg.

After I restarted:

# ping 10.9.8.167
PING 10.9.8.167 (10.9.8.167): 56 data bytes
64 bytes from 10.9.8.167: seq=0 ttl=64 time=5.615 ms
64 bytes from 10.9.8.167: seq=1 ttl=64 time=101.718 ms
64 bytes from 10.9.8.167: seq=2 ttl=64 time=9.849 ms
64 bytes from 10.9.8.167: seq=3 ttl=64 time=1.649 ms
64 bytes from 10.9.8.167: seq=4 ttl=64 time=1.963 ms
64 bytes from 10.9.8.167: seq=5 ttl=64 time=1.590 ms
64 bytes from 10.9.8.167: seq=6 ttl=64 time=2.952 ms
64 bytes from 10.9.8.167: seq=7 ttl=64 time=1.963 ms
64 bytes from 10.9.8.167: seq=8 ttl=64 time=1.807 ms
64 bytes from 10.9.8.167: seq=9 ttl=64 time=2.283 ms
64 bytes from 10.9.8.167: seq=10 ttl=64 time=1.558 ms
64 bytes from 10.9.8.167: seq=11 ttl=64 time=1.323 ms
64 bytes from 10.9.8.167: seq=12 ttl=64 time=1.431 ms
64 bytes from 10.9.8.167: seq=13 ttl=64 time=2.324 ms
64 bytes from 10.9.8.167: seq=14 ttl=64 time=103.414 ms
64 bytes from 10.9.8.167: seq=15 ttl=64 time=1.608 ms
64 bytes from 10.9.8.167: seq=16 ttl=64 time=2.002 ms
64 bytes from 10.9.8.167: seq=17 ttl=64 time=1.769 ms
64 bytes from 10.9.8.167: seq=18 ttl=64 time=1.745 ms
^C
--- 10.9.8.167 ping statistics ---
19 packets transmitted, 19 packets received, 0% packet loss
round-trip min/avg/max = 1.323/13.082/103.414 ms

This seems similar to https://github.com/greearb/ath10k-ct/issues/61

graysky2 commented 4 years ago

I updated to a snapshot of OpenWRT (R7800-master-r13313-6934b20912-20200520-1850) which is using the following:

Unfortunately, the problem is present here too (upon booting, the 5 GHz wifi comes up but googling on the phone takes 5-20+ sec for the hits to be displayed). As well, here is a result pinging the phone from the R7800:

# ping chaos
PING chaos (10.9.8.167): 56 data bytes
64 bytes from 10.9.8.167: seq=0 ttl=64 time=659.117 ms
64 bytes from 10.9.8.167: seq=1 ttl=64 time=1.368 ms
64 bytes from 10.9.8.167: seq=2 ttl=64 time=1.872 ms
64 bytes from 10.9.8.167: seq=3 ttl=64 time=1.722 ms
64 bytes from 10.9.8.167: seq=4 ttl=64 time=3419.010 ms
64 bytes from 10.9.8.167: seq=5 ttl=64 time=2416.988 ms
64 bytes from 10.9.8.167: seq=6 ttl=64 time=1417.295 ms
64 bytes from 10.9.8.167: seq=7 ttl=64 time=415.008 ms
64 bytes from 10.9.8.167: seq=8 ttl=64 time=1.893 ms
64 bytes from 10.9.8.167: seq=9 ttl=64 time=2.555 ms
64 bytes from 10.9.8.167: seq=10 ttl=64 time=2.820 ms
64 bytes from 10.9.8.167: seq=11 ttl=64 time=2.161 ms
64 bytes from 10.9.8.167: seq=30 ttl=64 time=1996.805 ms
64 bytes from 10.9.8.167: seq=31 ttl=64 time=996.353 ms
64 bytes from 10.9.8.167: seq=32 ttl=64 time=1.653 ms
64 bytes from 10.9.8.167: seq=33 ttl=64 time=1440.373 ms
64 bytes from 10.9.8.167: seq=34 ttl=64 time=439.170 ms
64 bytes from 10.9.8.167: seq=49 ttl=64 time=96.257 ms
^C
--- chaos ping statistics ---
56 packets transmitted, 18 packets received, 67% packet loss
round-trip min/avg/max = 1.368/739.578/3419.010 ms

Now, if I run /etc/init.d/network restart (which I did after approx 684 sec of uptime), the issue is no longer present.

Here is dmesg.

Here is an example pinging the phone after the restart of the network daemon:

# ping chaos
PING chaos (10.9.8.167): 56 data bytes
64 bytes from 10.9.8.167: seq=0 ttl=64 time=2.038 ms
64 bytes from 10.9.8.167: seq=1 ttl=64 time=2.448 ms
64 bytes from 10.9.8.167: seq=2 ttl=64 time=108.918 ms
64 bytes from 10.9.8.167: seq=3 ttl=64 time=25.438 ms
64 bytes from 10.9.8.167: seq=4 ttl=64 time=48.248 ms
64 bytes from 10.9.8.167: seq=5 ttl=64 time=69.391 ms
64 bytes from 10.9.8.167: seq=6 ttl=64 time=91.695 ms
64 bytes from 10.9.8.167: seq=7 ttl=64 time=10.660 ms
64 bytes from 10.9.8.167: seq=8 ttl=64 time=33.962 ms
64 bytes from 10.9.8.167: seq=9 ttl=64 time=57.349 ms
64 bytes from 10.9.8.167: seq=10 ttl=64 time=81.011 ms
64 bytes from 10.9.8.167: seq=11 ttl=64 time=101.629 ms
64 bytes from 10.9.8.167: seq=12 ttl=64 time=18.565 ms
64 bytes from 10.9.8.167: seq=13 ttl=64 time=37.803 ms
64 bytes from 10.9.8.167: seq=14 ttl=64 time=58.859 ms
64 bytes from 10.9.8.167: seq=15 ttl=64 time=81.862 ms
64 bytes from 10.9.8.167: seq=16 ttl=64 time=104.704 ms
64 bytes from 10.9.8.167: seq=17 ttl=64 time=25.902 ms
64 bytes from 10.9.8.167: seq=18 ttl=64 time=49.472 ms
64 bytes from 10.9.8.167: seq=19 ttl=64 time=72.071 ms
64 bytes from 10.9.8.167: seq=20 ttl=64 time=97.184 ms
64 bytes from 10.9.8.167: seq=21 ttl=64 time=1.519 ms
64 bytes from 10.9.8.167: seq=22 ttl=64 time=40.347 ms
64 bytes from 10.9.8.167: seq=23 ttl=64 time=64.262 ms
64 bytes from 10.9.8.167: seq=24 ttl=64 time=105.543 ms
64 bytes from 10.9.8.167: seq=25 ttl=64 time=110.265 ms
64 bytes from 10.9.8.167: seq=26 ttl=64 time=2.247 ms
64 bytes from 10.9.8.167: seq=27 ttl=64 time=54.976 ms
64 bytes from 10.9.8.167: seq=28 ttl=64 time=79.637 ms
64 bytes from 10.9.8.167: seq=29 ttl=64 time=879.577 ms
64 bytes from 10.9.8.167: seq=30 ttl=64 time=2.055 ms
64 bytes from 10.9.8.167: seq=31 ttl=64 time=39.129 ms
^C
--- chaos ping statistics ---
33 packets transmitted, 32 packets received, 3% packet loss
round-trip min/avg/max = 1.519/79.961/879.577 ms
greearb commented 4 years ago

Was your device still connected to the same 5Ghz radio after restarting the network?

The top trace makes it looks like you have 4+ second beacons or something.

graysky2 commented 4 years ago

@greearb - Yes, it reconnected to the same SSID. The R7800 is only broadcasting two SSIDs @ 5 GHz (primary and guest). This particular iphone has auto-join enabled for just the primary.

greearb commented 4 years ago

I'm not too sure how to debug this. Seems like others have reported similar issues in the past. It looks like some problem with power-save, but not sure why restarting the firmware would help. I'm curious to know if the problem comes back after time, or if that single network restart fixes things? Please add info on the exact iphone model(s) you are using.

graysky2 commented 4 years ago

@greearb - thanks for the help.

It seems that this problem occurs on EVERY fresh boot of the router and persists until I run /etc/init.d/network restart which usually but not always fixes it. I can also log in to luci then go to network>wireless and hit the restart button for radio0.

I see it on the following devices based on pinging said device from the OW router: iPhone 7 iPhone 8 Plus iPad Air 2 Macbook Air

For whatever reason, I do not see it on the following: Lenovo P1 Raspberry Pi4B (connected via wireless)

cvillabrille commented 4 years ago

Same here, Iphone XR Iphone 11 *iPad

not on MacBook though.

Solved (apparently) after network restart. Every fresh reboot causes the same connection issue. iPhone connects, but suddenly drops connection.

I have also plenty of IoT devices and all loose connection from time to time until I restart the network or restart the radio0 through Luci.

Router is GL-B1300 with openwrt 19.07.3 r11063-85e04e9f46 with nothing to upgrade.

log is full of :

kern.warn kernel: [41818.853953] ath10k_ahb a000000.wifi: failed to increase tx pending count: -16, dropping

greearb commented 4 years ago

As I am sure you are aware, this doesn't give me a lot of clues as to what is the core bug, and it may not be entirely firmware related since the firmware shouldn't change between restarts. cvillabrille, do you also see the long 'ping chaos' times? Can you sniff the air pre restart to see if beacons are coming out at expected ~100ms intervals?

cvillabrille commented 4 years ago

I understand, I will do some tests and let you know my findings.

I do not restart internet very often.. my house lights/plugs depends on that :D

Thanks for your quick reply.

ptln22 commented 4 years ago

Saw the same issue a while ago when upgrading a TP-Link C2600 (ipq8064, qca9980). OpenWrt is master-r12792-b74386acc6 (2020-04-01).

Tested firmware version 015 (2020-01-29) and 017 (2020-03-25). Both had this issue with Iphone 6, Iphone XR and Ipad. Reverted back to 012 (2019-11-04) which does not show any problems. I have not tested the versions 013 and 014.

graysky2 commented 4 years ago

I did a factory reset and setup a single 5 GHz SSID. I did not experience the lag. I setup a guest network and guest SSID and I began to experience the lag.

I restored my "old" config files, rebooted, and disabled my guest network. I am not experiencing the lag.

Conclusion: this bug is triggered by having both a regular SSID and guest SSID. Thoughts to mitigate and retain the regular+guest WiFis are welcomed!

cvillabrille commented 4 years ago

I do not have guest network setup but I am still having the same issue. Some time after a network restart, the issue re-appeared.

Reset the network to default (rm /etc/config/wireless), created a new one (wifi config) and manually filled only the Wifi name and the wifi pass (channel set to auto and security to WPA2-PSK)

I have noticed when I setup the channel to Auto (it selects channel 149) 5GHz network is not detected by any of MacBooks (tested with three), I must select 36Mhz Channel and then, it works.

Eth tool shows:

root@GLB1300WRT:~# ethtool -i wlan0
driver: ath10k_ahb
version: 4.14.180
firmware-version: 10.4b-ct-4019-fW-012-17ba98334
expansion-rom-version:
bus-info: a000000.wifi
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

ping to a wired device shows timing between 0.2 to 2.3

# ping 10.0.2.230
PING 10.0.2.230 (10.0.2.230): 56 data bytes
64 bytes from 10.0.2.230: seq=0 ttl=64 time=0.707 ms
64 bytes from 10.0.2.230: seq=1 ttl=64 time=0.562 ms
64 bytes from 10.0.2.230: seq=2 ttl=64 time=2.357 ms
64 bytes from 10.0.2.230: seq=3 ttl=64 time=2.271 ms

Ping to an iPhone shows timings between 2.988 to 101.389 ms

# ping 10.0.2.132
PING 10.0.2.132 (10.0.2.132): 56 data bytes
64 bytes from 10.0.2.132: seq=0 ttl=64 time=101.389 ms
64 bytes from 10.0.2.132: seq=1 ttl=64 time=19.516 ms
64 bytes from 10.0.2.132: seq=2 ttl=64 time=33.501 ms
64 bytes from 10.0.2.132: seq=3 ttl=64 time=47.828 ms
64 bytes from 10.0.2.132: seq=4 ttl=64 time=61.676 ms
64 bytes from 10.0.2.132: seq=5 ttl=64 time=76.208 ms
64 bytes from 10.0.2.132: seq=6 ttl=64 time=89.625 ms
64 bytes from 10.0.2.132: seq=7 ttl=64 time=103.935 ms
64 bytes from 10.0.2.132: seq=8 ttl=64 time=16.620 ms
64 bytes from 10.0.2.132: seq=9 ttl=64 time=31.449 ms
64 bytes from 10.0.2.132: seq=10 ttl=64 time=45.405 ms
64 bytes from 10.0.2.132: seq=11 ttl=64 time=58.614 ms
64 bytes from 10.0.2.132: seq=12 ttl=64 time=73.366 ms
64 bytes from 10.0.2.132: seq=13 ttl=64 time=86.562 ms
64 bytes from 10.0.2.132: seq=14 ttl=64 time=2.988 ms
64 bytes from 10.0.2.132: seq=15 ttl=64 time=3.484 ms
64 bytes from 10.0.2.132: seq=16 ttl=64 time=24.697 ms
graysky2 commented 4 years ago

When my iphone7 is behaving normally, the average ping mid double-digit ms (50-90). See here and above. I guess that is to be expected for a low-battery powered device?

PING chaos (10.9.8.167): 56 data bytes
64 bytes from 10.9.8.167: seq=0 ttl=64 time=36.362 ms
64 bytes from 10.9.8.167: seq=1 ttl=64 time=45.990 ms
64 bytes from 10.9.8.167: seq=2 ttl=64 time=105.328 ms
64 bytes from 10.9.8.167: seq=3 ttl=64 time=2.888 ms
64 bytes from 10.9.8.167: seq=4 ttl=64 time=14.686 ms
64 bytes from 10.9.8.167: seq=5 ttl=64 time=78.995 ms
64 bytes from 10.9.8.167: seq=6 ttl=64 time=94.431 ms
64 bytes from 10.9.8.167: seq=7 ttl=64 time=109.057 ms
64 bytes from 10.9.8.167: seq=8 ttl=64 time=22.429 ms
64 bytes from 10.9.8.167: seq=9 ttl=64 time=38.790 ms
64 bytes from 10.9.8.167: seq=10 ttl=64 time=55.098 ms
^C
--- chaos ping statistics ---
11 packets transmitted, 11 packets received, 0% packet loss
round-trip min/avg/max = 2.888/54.914/109.057 ms
graysky2 commented 4 years ago

@ptln22 -

Tested firmware version 015 (2020-01-29) and 017 (2020-03-25). Both had this issue with Iphone 6, Iphone XR and Ipad. Reverted back to 012 (2019-11-04) which does not show any problems. I have not tested the versions 013 and 014.

It seems that the 19.07.3 package for ath10k-firmware-qca9984-ct is providing firmware-5-ct-full-community-12.bin-lede.013 based on downloading all of the firmware files at http://www.candelatech.com/downloads/ath10k-9984-10-4b/ and comparing their md5sums to the one provided by the package.

# md5sum /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin
7a5f71886362f2d600fa86c0e1de52ba  /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin

And

% md5sum firmware-5-ct-*|grep 7a5f71886362f2d600fa86c0e1de52ba
7a5f71886362f2d600fa86c0e1de52ba  firmware-5-ct-full-community-12.bin-lede.013

Are you simply replacing /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin on OW with firmware-5-ct-full-community-12.bin-lede.012 from the URL above? Did you replace /lib/firmware/ath10k/QCA9984/hw1.0/board-2.bin on OW at all?

cvillabrille commented 4 years ago

Ended up returning to OW 19.07.1 (r10911-c155900f66) that has ath10k-firmware-qca4019-ct (2019-10-03). OW 19.07.2 have same issues with wifi.

Tested all possible resolution steps with no luck. Thanks for the development of the module. Much appreciated but, too much devices depending on the wifi and I cannot make it stable. I hope by 19.07.4 ath10k will have a quick review. Happy to help in case of further tests are required.

graysky2 commented 4 years ago

@cvillabrille - I have been using the updated firmware and drivers build against the 19.07 branch since last night but an still experiencing the issue.

I am now trying the "old" mainline ath10k drivers for a few days to see if the problem is with the -ct ones or something else entirely. After rebooting into the build with the mainline driver/firmware, I have not yet seen the bug. It will take several reboots and days of uptime before I draw any conclusions.

In any case, if you want to build OW with the latest (the 018 release), you can apply https://github.com/graysky2/openwrt/commit/587911062614470c61335f1ab4c8260c7b6a85da to the openwrt-19.07 branch.

rickkdotnet commented 4 years ago

Just wanted to chime in that I also experience lag on 5Ghz my R7800 with the CT firmware, both on hnyman master and on 19.07.3. I run 80mhz, with a touchbar Macbook Pro. Signal strength or channel doesn't seem to matter much.

If I run the flent rrul test the download speed is consistently about 1/10th of the mainline firmware. Upload speeds are fine.

I'll dig a bit deeper.

cvillabrille commented 4 years ago

Thanks @graysky2 for the suggestion.

I am now fine with the 19.07.1. I would like to "enjoy" the 19.07.X but is not a requirement to be honest. I can wait.

Now working from home due to the current situation with COVID-19, I need a stable connection rather than flawless new version of OW :) and, in addition to the Apple-products issues, 2.4Ghz was unstable too. My IoT devices weren't able to maintain a stable connection with the new firmware.

I have my router on extroot USB with btrfs, and saved a snapshot of overlay before downgrading. In case I need to return to 19.07.03 to install an ath10k FW upgrade I could do it relatively fast (compared with reinstalling everything from scratch).

Thanks for your efforts, I will keep receiving updates here.

greearb commented 4 years ago

cvillabrille: On your stable 19.07.1, what ath10k firmware is in use?

If one of you can get a monitor-mode capture of the problem state using a third device, please let me know. Maybe I can find a clue.

graysky2 commented 4 years ago

@greearb - I believe 19.07.1 as well as 19.07.2 and also 19.07.3 all use the 12 release of the firmware:
https://github.com/openwrt/openwrt/commit/0bb4733e6727b05d6f6ecd82ae18b5246ad801da#diff-7767c6da016b31deb2fd133a9a1a4b59

Here is the history of package/firmware/ath10k-firmware/Makefile in the openwrt-19.07 branch: https://github.com/openwrt/openwrt/commits/openwrt-19.07/package/firmware/ath10k-firmware/Makefile

greearb commented 4 years ago

I make muliple sub-releases of each version 12, 13, etc. Please show me dmesg from the latest openwrt ath10k-ct stable release that works, that should have details of the exact commit id the firmware is built from.

And, assuming there is a stable ath10k-ct firmware, have you tried copying the latest firmware over that image (and not adjusting the rest of openwrt)? That would test if it is firmware problem or something else in the system.

Another way to test is to move the working firmware image to the later openwrt images that break, again to test if it is firmware related or not.

cvillabrille commented 4 years ago

cvillabrille: On your stable 19.07.1, what ath10k firmware is in use?

Good question indeed, not sure if this is the answer though...

# opkg list-installed | grep ath10k
ath10k-firmware-qca4019-ct - 2019-10-03-d622d160-1
kmod-ath10k-ct - 4.14.167+2019-09-09-5e8cd86f-1

# opkg info ath10k-firmware-qca4019-ct
Package: ath10k-firmware-qca4019-ct
Version: 2019-10-03-d622d160-1
Depends: libc
Provides: ath10k-firmware-qca4019
Status: install user installed
Section: firmware
Architecture: arm_cortex-a7_neon-vfpv4
Size: 468551
Filename: ath10k-firmware-qca4019-ct_2019-10-03-d622d160-1_arm_cortex-a7_neon-vfpv4.ipk
Description: Alternative ath10k firmware for IPQ4019 radio from Candela Technologies.
 Enables IBSS and other features.  Works with standard or ath10k-ct driver.
 See:  http://www.candelatech.com/ath10k-10.4.php
Installed-Time: 1580313935

# opkg info kmod-ath10k-ct
Package: kmod-ath10k-ct
Version: 4.14.167+2019-09-09-5e8cd86f-1
Depends: kernel (= 4.14.167-1-fa00c1231ac7d7840ec6ffe62dcad926), kmod-mac80211, kmod-ath, kmod-hwmon-core
Provides: kmod-ath10k
Status: install user installed
Section: kernel
Architecture: arm_cortex-a7_neon-vfpv4
Size: 205697
Filename: kmod-ath10k-ct_4.14.167+2019-09-09-5e8cd86f-1_arm_cortex-a7_neon-vfpv4.ipk
Description: ath10k-ct driver optimized for CT ath10k firmware
Installed-Time: 1580313935

My device info is:

# ethtool -i wlan0
driver: ath10k_ahb
version: 4.14.167
firmware-version: 10.4b-ct-4019-fW-012-17ba98334

However, now that I have checked in detail.. I have found the following versions available (now sure if apply to my router model GL-B1300

ath10k-firmware-qca4019     20190416-1              ath10k qca4019 firmware 
ath10k-firmware-qca6174     20190416-1              ath10k qca6174 firmware 
ath10k-firmware-qca9887     2019-10-03-d622d160-1   ath10k firmware for QCA9887 devices 
ath10k-firmware-qca9887-ct  2019-10-03-d622d160-1   Alternative ath10k firmware for QCA9887 from Candela Technologies.… 
ath10k-firmware-qca9888     20190416-1              ath10k qca9888 firmware 
ath10k-firmware-qca9888-ct  2019-10-03-d622d160-1   Alternative ath10k firmware for QCA9886 and QCA9888 from Candela Technologies.… 
ath10k-firmware-qca988x     2019-10-03-d622d160-1   ath10k firmware for QCA988x devices 
ath10k-firmware-qca988x-ct  2019-10-03-d622d160-1   Alternative ath10k firmware for QCA988X from Candela Technologies.… 
ath10k-firmware-qca9984     20190416-1              ath10k qca9984 firmware 
ath10k-firmware-qca9984-ct  2019-10-03-d622d160-1   Alternative ath10k firmware for QCA9984 from Candela Technologies.… 
ath10k-firmware-qca99x0     20190416-1              ath10k qca99x0 firmware 
ath10k-firmware-qca99x0-ct  2019-10-03-d622d160-1   Alternative ath10k firmware for QCA99x0 from Candela Technologies.…

If one of you can get a monitor-mode capture of the problem state using a third device, please let me know. Maybe I can find a clue.

I tried enabling like this: echo 0xc0000032 > /sys/kernel/debug/ieee80211/phy0/ath10k/debug_level but the info was really vague so I returned to the original value. If I can find something useful, I will add it here.

Not sure how I could upgrade the firmware since I do not have most of the folders referred in this documentation.

cvillabrille commented 4 years ago

I make muliple sub-releases of each version 12, 13, etc. Please show me dmesg from the latest openwrt ath10k-ct stable release that works, that should have details of the exact commit id the firmware is built from.

I will, once I reboot it. I have provided info in my previous comment as well, hope that helps.

And, assuming there is a stable ath10k-ct firmware, have you tried copying the latest firmware over that image (and not adjusting the rest of openwrt)? That would test if it is firmware problem or something else in the system.

No. My troubleshooting was reduced to test OW entire versions. To do so, I would have to play installing all the versions and as mentioned this is now my "only working internet connection". But is a good idea indeed for those who could do it, any volunteer?

Another way to test is to move the working firmware image to the later openwrt images that break, again to test if it is firmware related or not.

Not sure even how to do that, but I could investigate.

I was under the impression the kmod module depends on any specific kernel..

# opkg info kmod-ath10k-ct
Package: kmod-ath10k-ct
Version: 4.14.167+2019-09-09-5e8cd86f-1
Depends: kernel (= 4.14.167-1-fa00c1231ac7d7840ec6ffe62dcad926), kmod-mac80211, kmod-ath, kmod-hwmon-core
[...]

not on the firmware though.

# opkg info ath10k-firmware-qca4019-ct
Package: ath10k-firmware-qca4019-ct
Version: 2019-10-03-d622d160-1
Depends: libc
cvillabrille commented 4 years ago

@cvillabrille - Try this:

dmesg | grep 'firmware ver'

I will, but now is full of: [15078.207157] ath10k_ahb a000000.wifi: failed to increase tx pending count: -16, dropping This night I will reboot it.

graysky2 commented 4 years ago

@cvillabrille - Try this:

dmesg | grep 'firmware ver'
graysky2 commented 4 years ago

@cvillabrille - When I boot into 19.07.3, I get:

[   12.615332] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fW-012-17ba98334 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 877928bc
[   22.044806] ath10k_pci 0001:01:00.0: firmware ver 10.4b-ct-9984-fW-012-17ba98334 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 877928bc

Do you not get that under 19.07.1? I believe @greearb wants the commit hash and info:

firmware ver 10.4b-ct-9984-fW-012-17ba98334

aedancullen commented 4 years ago

Yep, can confirm 19.07.1 has 10.4b-ct-4019-fW-012-17ba98334, which is the same as 19.07.3.

@cvillabrille are you really absolutely sure that 19.07.3 has this problem on GL-B1300 and 19.07.1 does not? If so, that wouldn't be a firmware issue.

I have a GL-B1300 also, but I can't yet confirm whether or not there's a significant difference between 19.07.1 and 19.07.3. During normal usage after a fresh flash, both of them have similar-looking logs, which show occasional deauth/disassociate for phones (iOS and Android) and an iPad on occasion. Sometimes it's missing ACKs, sometimes with that "timer DEAUTH/REMOVE", and sometimes just a STA-DISCONNECTED followed by it immediately reconnecting. Since I haven't reproduced the objective failure above (high ping), I'm not sure if these are evidence of a problem. I'm running both 2.4 and 5GHz radios with the same SSID. If I collect any concrete evidence of instability I will post logs.

graysky2 commented 4 years ago

@rickkdotnet -

I am now trying the "old" mainline ath10k drivers for a few days to see if the problem is with the -ct ones or something else entirely. After rebooting into the build with the mainline driver/firmware, I have not yet seen the bug. It will take several reboots and days of uptime before I draw any conclusions.

As you too are running the R7800, please try the ath10k drivers/firmware (not ath10k-ct) and see if that fixes your problem. I believe you can do this on 19.07.3 without custom building anything. Something like this should work:

opkg update
opkg remove kmod-ath10k-ct ath10k-firmware-qca9984-ct
opkg install kmod-ath10k ath10k-firmware-qca9984
reboot
cvillabrille commented 4 years ago

@cvillabrille - Try this:

dmesg | grep 'firmware ver'
# dmesg | grep 'firmware ver'
[   26.909735] ath10k_ahb a000000.wifi: firmware ver 10.4b-ct-4019-fW-012-17ba98334 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 8f2e0e09
[   29.218756] ath10k_ahb a800000.wifi: firmware ver 10.4b-ct-4019-fW-012-17ba98334 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 8f2e0e09
cvillabrille commented 4 years ago

@cvillabrille are you really absolutely sure that 19.07.3 has this problem on GL-B1300 and 19.07.1 does not? If so, that wouldn't be a firmware issue.

I am sure, otherwise I would not downgrade my FW.

Everything started by the iPhone issues. After removed all the setup (rm /etc/config/wifi; wifi config) WiFi 5Ghz became unstable. It auto-selects channel 149 so it is not being detected by iPhone/Mac. I must manually select a low channel. 2.4Ghz is also unstable, IoT devices disconnect from time to time and no way of reconnecting them back. I must manually select 20Mhz Width and low channels to return to a stable connection, but again after a while, it just disconnect.

I'm running both 2.4 and 5GHz radios with the same SSID. If I collect any concrete evidence of instability I will post logs.

With same SSID name you do not know if your devices are login in 5Ghz or 2.4Ghz (unless obviously you go to the sumary wifi OW page).

Anyhow, I report what I see. No more, no less. Whether is related to the FW or module.... I cannot assure it. Thanks anyway to everyone for your efforts/updates to find a resolution (whatever it is).

cvillabrille commented 4 years ago
opkg update
opkg remove kmod-ath10k-ct ath10k-firmware-qca9984-ct
opkg install kmod-ath10k ath10k-firmware-qca9984
reboot

I will and let you know.... thanks for the suggestion.

cvillabrille commented 4 years ago

Confirmed, same FW version with 19.07.3 r11063-85e04e9f46

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 19.07.3, r11063-85e04e9f46
 -----------------------------------------------------
# dmesg | grep 'firmware ver'
[   23.921514] ath10k_ahb a000000.wifi: firmware ver 10.4b-ct-4019-fW-012-17ba98334 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 8f2e0e09
[   26.281396] ath10k_ahb a800000.wifi: firmware ver 10.4b-ct-4019-fW-012-17ba98334 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 8f2e0e09
cvillabrille commented 4 years ago

Something like this should work:

opkg update
opkg remove kmod-ath10k-ct ath10k-firmware-qca9984-ct
opkg install kmod-ath10k ath10k-firmware-qca9984
reboot
# opkg remove kmod-ath10k-ct  ath10k-firmware-qca4019-ct
Removing package kmod-ath10k-ct from root...
Removing package ath10k-firmware-qca4019-ct from root...

# opkg install kmod-ath10k ath10k-firmware-qca4019
Installing kmod-ath10k (4.14.180+4.19.120-1-1) to root...
Downloading http://downloads.openwrt.org/releases/19.07.3/targets/ipq40xx/generic/kmods/4.14.180-1-fa00c1231ac7d7840ec6ffe62dcad926/kmod-ath10k_4.14.180%2b4.19.120-1-1_arm_cortex-a7_neon-vfpv4.ipk
Installing ath10k-firmware-qca4019 (20190416-1) to root...
Downloading http://downloads.openwrt.org/releases/19.07.3/packages/arm_cortex-a7_neon-vfpv4/base/ath10k-firmware-qca4019_20190416-1_arm_cortex-a7_neon-vfpv4.ipk
Configuring kmod-ath10k.
Configuring ath10k-firmware-qca4019.

# reboot

Same issue. in Auto, 5Ghz selects channel 149 so does not appear in MacBook. iPad and iPhone see the WiFi but they ask for the password (when it did not change). I need to return to manually select channel 36.

I need some days/reboots to see the issue.... I'll be back :)

cvillabrille commented 4 years ago

Saw below messages after reboot with kmod-ath10k ath10k-firmware-qca4019...

Interesting messages are:

[   22.384548] ath10k_ahb a000000.wifi: Falling back to user helper
[   22.428797] firmware ath10k!QCA4019!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[...]
[   24.113184] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/firmware-6.bin failed with error -2
[   24.113230] ath10k_ahb a800000.wifi: Falling back to user helper
[   24.347821] firmware ath10k!QCA4019!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[...]
[ 1933.507168] ath10k_ahb a800000.wifi: peer-unmap-event: unknown peer id 1
[ 1933.507225] ath10k_ahb a800000.wifi: peer-unmap-event: unknown peer id 1

Full log:

# dmesg| grep ath
[   22.384495] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/firmware-6.bin failed with error -2
[   22.384548] ath10k_ahb a000000.wifi: Falling back to user helper
[   22.428797] firmware ath10k!QCA4019!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[   22.441772] ath10k_ahb a000000.wifi: qca4019 hw1.0 target 0x01000000 chip_id 0x003900ff sub 0000:0000
[   22.441821] ath10k_ahb a000000.wifi: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
[   22.453646] ath10k_ahb a000000.wifi: firmware ver 10.4-3.6-00140 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps crc32 ba79b746
[   22.511111] ath10k_ahb a000000.wifi: board_file api 2 bmi_id 0:16 crc32 bcebe54c
[   23.911028] ath10k_ahb a000000.wifi: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 512 raw 0 hwcrypto 1
[   23.925307] ath: EEPROM regdomain: 0x0
[   23.925317] ath: EEPROM indicates default country code should be used
[   23.925323] ath: doing EEPROM country->regdmn map search
[   23.925333] ath: country maps to regdmn code: 0x3a
[   23.925340] ath: Country alpha2 being used: US
[   23.925344] ath: Regpair used: 0x3a
[   24.113184] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/firmware-6.bin failed with error -2
[   24.113230] ath10k_ahb a800000.wifi: Falling back to user helper
[   24.347821] firmware ath10k!QCA4019!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[   24.348158] ath10k_ahb a800000.wifi: qca4019 hw1.0 target 0x01000000 chip_id 0x003900ff sub 0000:0000
[   24.355796] ath10k_ahb a800000.wifi: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
[   24.369493] ath10k_ahb a800000.wifi: firmware ver 10.4-3.6-00140 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps crc32 ba79b746
[   24.416786] ath10k_ahb a800000.wifi: board_file api 2 bmi_id 0:17 crc32 bcebe54c
[   25.816638] ath10k_ahb a800000.wifi: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 512 raw 0 hwcrypto 1
[   25.832978] ath: EEPROM regdomain: 0x0
[   25.832990] ath: EEPROM indicates default country code should be used
[   25.832994] ath: doing EEPROM country->regdmn map search
[   25.833004] ath: country maps to regdmn code: 0x3a
[   25.833011] ath: Country alpha2 being used: US
[   25.833016] ath: Regpair used: 0x3a
[ 1933.507168] ath10k_ahb a800000.wifi: peer-unmap-event: unknown peer id 1
[ 1933.507225] ath10k_ahb a800000.wifi: peer-unmap-event: unknown peer id 1
cvillabrille commented 4 years ago

Tested kmod-ath10k-ct-smallbuffers

Same result... WiFi takes 149 or above when Auto.. on MacBooks still cannot see 5G, little improvement in iPhone/iPad since are able to see the WiFi and login correctly.

log messages after boot:

# dmesg | grep ath
[   23.865757] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/fwcfg-ahb-a000000.wifi.txt failed with error -2
[   23.865808] ath10k_ahb a000000.wifi: Falling back to user helper
[   23.928960] firmware ath10k!fwcfg-ahb-a000000.wifi.txt: firmware_loading_store: map pages failed
[   23.932524] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-5.bin failed with error -2
[   23.936873] ath10k_ahb a000000.wifi: Falling back to user helper
[   24.006165] firmware ath10k!QCA4019!hw1.0!ct-firmware-5.bin: firmware_loading_store: map pages failed
[   24.006644] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-2.bin failed with error -2
[   24.014480] ath10k_ahb a000000.wifi: Falling back to user helper
[   24.083341] firmware ath10k!QCA4019!hw1.0!ct-firmware-2.bin: firmware_loading_store: map pages failed
[   24.083787] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/firmware-6.bin failed with error -2
[   24.091613] ath10k_ahb a000000.wifi: Falling back to user helper
[   24.152436] firmware ath10k!QCA4019!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[   24.166197] ath10k_ahb a000000.wifi: qca4019 hw1.0 target 0x01000000 chip_id 0x003900ff sub 0000:0000
[   24.166272] ath10k_ahb a000000.wifi: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[   24.179492] ath10k_ahb a000000.wifi: firmware ver 10.4-3.6-00140 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps crc32 ba79b746
[   24.235602] ath10k_ahb a000000.wifi: board_file api 2 bmi_id 0:16 crc32 bcebe54c
[   25.602460] ath10k_ahb a000000.wifi: 10.4 wmi init: vdevs: 16  peers: 528  tid: 102
[   25.602509] ath10k_ahb a000000.wifi: msdu-desc: 2500  skid: 32
[   25.640629] ath10k_ahb a000000.wifi: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 512 raw 0 hwcrypto 1
[   25.655025] ath: EEPROM regdomain: 0x0
[   25.655034] ath: EEPROM indicates default country code should be used
[   25.655039] ath: doing EEPROM country->regdmn map search
[   25.655050] ath: country maps to regdmn code: 0x3a
[   25.655057] ath: Country alpha2 being used: US
[   25.655062] ath: Regpair used: 0x3a
[   25.842363] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/fwcfg-ahb-a800000.wifi.txt failed with error -2
[   25.842411] ath10k_ahb a800000.wifi: Falling back to user helper
[   26.119653] firmware ath10k!fwcfg-ahb-a800000.wifi.txt: firmware_loading_store: map pages failed
[   26.121310] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-5.bin failed with error -2
[   26.127523] ath10k_ahb a800000.wifi: Falling back to user helper
[   26.224986] firmware ath10k!QCA4019!hw1.0!ct-firmware-5.bin: firmware_loading_store: map pages failed
[   26.225318] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-2.bin failed with error -2
[   26.233265] ath10k_ahb a800000.wifi: Falling back to user helper
[   26.294787] firmware ath10k!QCA4019!hw1.0!ct-firmware-2.bin: firmware_loading_store: map pages failed
[   26.295195] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/firmware-6.bin failed with error -2
[   26.303070] ath10k_ahb a800000.wifi: Falling back to user helper
[   26.373679] firmware ath10k!QCA4019!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[   26.373957] ath10k_ahb a800000.wifi: qca4019 hw1.0 target 0x01000000 chip_id 0x003900ff sub 0000:0000
[   26.381615] ath10k_ahb a800000.wifi: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[   26.395872] ath10k_ahb a800000.wifi: firmware ver 10.4-3.6-00140 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps crc32 ba79b746
[   26.442643] ath10k_ahb a800000.wifi: board_file api 2 bmi_id 0:17 crc32 bcebe54c
[   27.807930] ath10k_ahb a800000.wifi: 10.4 wmi init: vdevs: 16  peers: 528  tid: 102
[   27.807973] ath10k_ahb a800000.wifi: msdu-desc: 2500  skid: 32
[   27.848087] ath10k_ahb a800000.wifi: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 512 raw 0 hwcrypto 1
[   27.864340] ath: EEPROM regdomain: 0x0
[   27.864351] ath: EEPROM indicates default country code should be used
[   27.864356] ath: doing EEPROM country->regdmn map search
[   27.864366] ath: country maps to regdmn code: 0x3a
[   27.864373] ath: Country alpha2 being used: US
[   27.864378] ath: Regpair used: 0x3a
[   35.445573] ath10k_ahb a000000.wifi: 10.4 wmi init: vdevs: 16  peers: 528  tid: 102
[   35.445617] ath10k_ahb a000000.wifi: msdu-desc: 2500  skid: 32
[   35.557844] ath10k_ahb a000000.wifi: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4
[   36.443728] ath10k_ahb a000000.wifi: NOTE:  Firmware DBGLOG output disabled in debug_mask: 0x10000000
[   37.141150] ath10k_ahb a800000.wifi: 10.4 wmi init: vdevs: 16  peers: 528  tid: 102
[   37.141225] ath10k_ahb a800000.wifi: msdu-desc: 2500  skid: 32
[   37.254889] ath10k_ahb a800000.wifi: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4
graysky2 commented 4 years ago

@cvillabrille - I am not sure what to tell you regarding a cause of you issue, but it seems clear that it's not related to the ath10k-ct code since:

@greearb -

I am now trying the "old" mainline ath10k drivers for a few days to see if the problem is with the -ct ones or something else entirely. After rebooting into the build with the mainline driver/firmware, I have not yet seen the bug. It will take several reboots and days of uptime before I draw any conclusions.

I have been using the ath10k driver provided by linux-firmware for about 3 days now and have not experienced the original problem I posted about. I have rebooted it multiple times in an attempt to trigger it. Do you have any thoughts as to what in your code could be causing this or have any ideas what else I can try to debug further?

rickkdotnet commented 4 years ago

As you too are running the R7800, please try the ath10k drivers/firmware (not ath10k-ct) and see if that fixes your problem. I believe you can do this on 19.07.3 without custom building anything. Something like this should work:

Thanks, but I already went down the custom building rabbithole :)

Anyway, I tried a number of different combinations, 19.07.3 and master, -ct kmod + ct firmware, -ct kmod + mainline firmware, mainline fw + -ct firmware, a few different -ct firmwares flavors, -ct version 12 and the latest and with the -ct firmware the download performance is consistently (much) worse and latency is unstable. I also messed around with wmm, region etc. to no avail.

The only thing I haven't thoroughly tested yet is a different client (this is a 2019 touchbar MBP). Subjective tests aren't promising though.

The performance of the driver also makes a subtle difference, but my problem is definately in the firmware side .

I'm including two rrul results here to give you an idea. The download throughput is typical for all tests with -ct firmware involged. Interestingly the lag doesn't really show up in the latency graph, but is noticable in real world use.

netperf-20200619-1319-MXM-5Ghz-r13373-e8fbb98c6d-buro netperf-20200649-1249-MXM-5Ghz-r13500-4e65838871-ct-buro

However, I'm not sure anymore if I'm having the same problem as OP, perhaps it's better at home with one of the issues centering around download performance... ?

greearb commented 4 years ago

I have lost track of all the different combinations. Some people are testing on ipq4019 systems, some on Netgear 4x4, some have latency, others throughput issues. At the least, the firmware is not cause of all of the problems. For the person that has different behaviour using same ath10k-ct firmware, your bug is likely not in firmware at all, maybe a general openwrt bug is appropriate and/or bisect the problem. For the throughput problem that does appear -ct firmware related, please open a new bug specifically about throughput on your particular platform, and try UDP throughput as well as TCP to see if it is specific to TCP. If some of you can NOT reproduce one of these bugs on exact same hardware and similar configuration, maybe work with the reporter to try to understand if there are differences in configuration that make a difference? In general, I see 700+Mbps throughput when testing OTA in not-very-optimal RF conditions on a 9984 chipset AP and latest -ct firmware/driver and recent stable OpenWrt, and 500+Mbps on a 2x2 IPQ4019 system: 9984 platform: http://www.candelatech.com/examples/cicd/ben-home-ecw5410/ 4019 platform: http://www.candelatech.com/examples/cicd/ferndale-basic-01/basic/

cvillabrille commented 4 years ago

Today's dmesg error:

ath10k-4.19/htt_rx.c:1206 0xbf610ca0 [ath10k_core@bf5f5000+0x52000]

After this, 5Ghz network just stopped working in Apple devices. Had to restart wifi.

[30840.180486] ------------[ cut here ]------------
[30840.180535] WARNING: CPU: 0 PID: 0 at /builder/shared-workdir/build/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/linux-ipq40xx_generic/ath10k-ct-smallbuffers/ath10k-ct-2019-09-09-5e8cd86f/ath10k-4.19/htt_rx.c:1206 0xbf610ca0 [ath10k_core@bf5f5000+0x52000]
[30840.184943] Modules linked in: pppoe ppp_async l2tp_ppp ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_policy xt_nat xt_multiport xt_mark xt_mac xt_limit xt_esp xt_co
nntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack
 libcrc32c iptable_mangle iptable_filter ipt_ah ip_tables hwmon crc_ccitt compat fuse xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_s
et_hash_ipport
[30840.257166]  ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 l2tp_netlink l2tp_core udp_tunnel ip6_udp_t
unnel xfrm6_mode_tunnel xfrm6_mode_transport xfrm6_mode_beet ipcomp6 xfrm6_tunnel esp6 ah6 xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet ipcomp esp4 ah4 tunnel6 tunnel4 tun xfrm_user xfrm_ipcomp af_key xfrm_algo exfat raid10 raid1 raid0 md_mod sha1_
generic md5 echainiv authenc uas usb_storage uhci_hcd ohci_platform ohci_hcd sd_mod scsi_mod ext4 mbcache jbd2 btrfs xor zstd_decompress zstd_compress xxhash xor_neon raid6_pq crc32c_generic leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple gpio_button_hotplu
g
[30840.327714] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.180 #0
[30840.349885] Hardware name: Generic DT based system
[30840.356144] Function entered at [<c030e2a8>] from [<c030a7a8>]
[30840.360826] Function entered at [<c030a7a8>] from [<c0740174>]
[30840.366643] Function entered at [<c0740174>] from [<c031db58>]
[30840.372457] Function entered at [<c031db58>] from [<c031dc24>]
[30840.378271] Function entered at [<c031dc24>] from [<bf610ca0>]
[30840.384156] Function entered at [<bf610ca0>] from [<bf6121a0>]
[30840.389906] Function entered at [<bf6121a0>] from [<bf612960>]
[30840.395721] Function entered at [<bf612960>] from [<bf64b30c>]
[30840.401548] Function entered at [<bf64b30c>] from [<c063c7c8>]
[30840.407355] Function entered at [<c063c7c8>] from [<c0301520>]
[30840.413170] Function entered at [<c0301520>] from [<c0321c7c>]
[30840.418986] Function entered at [<c0321c7c>] from [<c035b1b0>]
[30840.424802] Function entered at [<c035b1b0>] from [<c030140c>]
[30840.430615] Function entered at [<c030140c>] from [<c030b30c>]
[30840.436433] Exception stack(0xc0a01f40 to 0xc0a01f88)
[30840.442255] 1f40: 00000001 00000000 00000000 c0313960 ffffe000 c0a03cb8 c0a03c6c 00000000
[30840.447378] 1f60: 00000000 00000001 cfffce00 c092da28 c0a01f88 c0a01f90 c0307d88 c0307d8c
[30840.455533] 1f80: 60000013 ffffffff
[30840.463686] Function entered at [<c030b30c>] from [<c0307d8c>]
[30840.466988] Function entered at [<c0307d8c>] from [<c0351ee8>]
[30840.472892] Function entered at [<c0351ee8>] from [<c0352208>]
[30840.478705] Function entered at [<c0352208>] from [<c0900c04>]
[30840.484597] ---[ end trace f75be32d170c24ee ]---
[37383.626713] ath10k_ahb a800000.wifi: Invalid peer id 22 or peer stats buffer, peer:   (null)  sta:   (null)

What's the meaning of below error messages about falling back to user helper or map page failed?

[   25.842363] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/fwcfg-ahb-a800000.wifi.txt failed with error -2
[   25.842411] ath10k_ahb a800000.wifi: Falling back to user helper
[   26.119653] firmware ath10k!fwcfg-ahb-a800000.wifi.txt: firmware_loading_store: map pages failed
[   26.121310] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-5.bin failed with error -2
[   26.127523] ath10k_ahb a800000.wifi: Falling back to user helper
[   26.224986] firmware ath10k!QCA4019!hw1.0!ct-firmware-5.bin: firmware_loading_store: map pages failed
[   26.225318] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-2.bin failed with error -2
[   26.233265] ath10k_ahb a800000.wifi: Falling back to user helper
[   26.294787] firmware ath10k!QCA4019!hw1.0!ct-firmware-2.bin: firmware_loading_store: map pages failed
[   26.295195] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/firmware-6.bin failed with error -2
[   26.303070] ath10k_ahb a800000.wifi: Falling back to user helper

It could be something within OW itself as you mentioned @graysky2 but I cannot figure out what though.

I will try to perform a new fresh installation/configuration (instead of configuring everything from backup/snapshot) and "let you rest in peace" :)

Thanks for your suggestions here. Much appreciated everyone's help.

graysky2 commented 4 years ago

https://forum.openwrt.org/t/r7800-reporting-firmware-load-errors-normal/64798

graysky2 commented 4 years ago

@rickkdotnet - How did you collect and plot the data like this?

graysky2 commented 4 years ago

@greearb - I agree that there is a lot of extraneous info in this issue. I just opened #139 now trying to keep this more on topic. Please have a look and thank you.

rickkdotnet commented 4 years ago

@rickkdotnet - How did you collect and plot the data like this?

I created those graphs using https://flent.org/.

FWIW. I just found out the -ct firmware reports much lower signal levels (on the r7800 side) than the non-ct firmware when stations are a bit further away from the AP, perhaps something to check out.

gearhead commented 4 years ago

Hello, everyone. I'm here as well. I have not noticed the throughput issues, but have noticed that I get partial lack of connection. Most of my connections are via cable, so it took me a while to figure out that something was going on. I noticed it with my 5Ghz radio as well as my 2.5Ghz. What I have seen is flaky LAN connections to the device (RPi running a web server and MPD). RPi boots and gets an IP address on the 5Ghz and I can connect to it from my LAN and from it to the internet and to other LAN devices. After a time (~24 hrs or so) it drops the connection and will not reconnect (I use connman/iwd on the RPi to manage its WiFi connection which should try to reconnect if it is dropped). Since it is only wifi connected, I cannot really query the log because after a reboot the log starts over (TMP file). Sometimes it still has a connection and can connect to the internet, but no LAN devices can connect to it (http, ssh, ping all fail) but it happily sees the internet and can do what it does (Stream music). The only way to recover functionality is to reboot the router. I also note that all mDNS traffic is thwarted from any WiFi 5Ghz connected devices. On my phone, I can run an app which can browse Avahi/Bonjour and when the router is in this state, it cannot see anything on the LAN. If I go to a cable connected device, it can see all cable connected nDNS devices but no wifi connected devices. I just connected my other Rpi (Zero on 2.5Ghz) to this router and the same thing happens. After a while it just drops and I can no longer connect to it. I cannot 'see' anything on this one as it is headless, but no longer can ping or anything. This is in the kernel log:

[  136.154234] ath10k_pci 0000:01:00.0: mac flush null vif, drop 0 queues 0xffff
[  136.221835] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[  136.221984] br-lan: port 3(wlan0) entered blocking state
[  136.227352] br-lan: port 3(wlan0) entered forwarding state
[  179.607168] ath10k_pci 0000:01:00.0: htt tx: fixing invalid VHT TX rate code 0xff
[  342.222723] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats
[32273.381390] ath10k_pci 0001:01:00.0: wmi: fixing invalid VHT TX rate code 0xff
[90944.814819] ath10k_pci 0001:01:00.0: Invalid peer id 0 or peer stats buffer, peer: 284f5023  sta: 00000000

I also have posted issues here: https://forum.openwrt.org/t/ipv6-assigned-but-not-connect-able-on-wireless/66052

I am now running the snapshot. as it seems better than the 19.07.3 which is unusable on Wifi, IMO

greearb commented 4 years ago

Hello,

Unless your problem bisects to the same problem patch as the reporter, please open a new bug.

Thanks, Ben

On 06/27/2020 04:09 PM, gearhead wrote:

Hello, everyone. I'm here as well. I have not noticed the throughput issues, but have noticed that I get partial lack of connection. Most of my connections are via cable, so it took me a while to figure out that something was going on. I noticed it with my 5Ghz radio as well as my 2.5Ghz. What I have seen is flaky LAN connections to the device (RPi running a web server and MPD). RPi boots and gets an IP address on the 5Ghz and I can connect to it from my LAN and from it to the internet and to other LAN devices. After a time (~24 hrs or so) it drops the connection and will not reconnect (I use connman/iwd on the RPi to manage its WiFi connection which should try to reconnect if it is dropped). Since it is only wifi connected, I cannot really query the log because after a reboot the log starts over (TMP file). Sometimes it still has a connection and can connect to the internet, but no LAN devices can connect to it (http, ssh, ping all fail) but it happily sees the internet and can do what it does (Stream music). The only way to recover functionality is to reboot the router. I also note that all mDNS traffic is thwarted from any WiFi 5Ghz connected devices. On my phone, I can run an app which can browse Avahi/Bonjour and when the router is in this state, it cannot see anything on the LAN. If I go to a cable connected device, it can see all cable connected nDNS devices but no wifi connected devices. I just connected my other Rpi (Zero on 2.5Ghz) to this router and the same thing happens. After a while it just drops and I can no longer connect to it. I cannot 'see' anything on this one as it is headless, but no longer can ping or anything. This is in the kernel log:

|[ 136.154234] ath10k_pci 0000:01:00.0: mac flush null vif, drop 0 queues 0xffff [ 136.221835] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready [ 136.221984] br-lan: port 3(wlan0) entered blocking state [ 136.227352] br-lan: port 3(wlan0) entered forwarding state [ 179.607168] ath10k_pci 0000:01:00.0: htt tx: fixing invalid VHT TX rate code 0xff [ 342.222723] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats [32273.381390] ath10k_pci 0001:01:00.0: wmi: fixing invalid VHT TX rate code 0xff [90944.814819] ath10k_pci 0001:01:00.0: Invalid peer id 0 or peer stats buffer, peer: 284f5023 sta: 00000000 |

I also have posted issues here: https://forum.openwrt.org/t/ipv6-assigned-but-not-connect-able-on-wireless/66052

I am now running the snapshot. as it seems better than the 19.07.3 which is unusable on Wifi, IMO

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greearb/ath10k-ct/issues/136#issuecomment-650646356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACHNKRVTT3L3ZF64YND7SDRYZ33ZANCNFSM4NJKXUUA.

-- Ben Greear greearb@candelatech.com Candela Technologies Inc http://www.candelatech.com

greearb commented 4 years ago

Sorry, I got this confused with bug 139. Either way, I'd like to resolve 139 first since it has been bisected.

graysky2 commented 4 years ago

I have lost track of all the different combinations. Some people are testing on ipq4019 systems, some on Netgear 4x4, some have latency, others throughput issues. At the least, the firmware is not cause of all of the problems.

and

Sorry, I got this confused with bug 139. Either way, I'd like to resolve 139 first since it has been bisected.

Agreed. I think this issue has too many related/off-topic posts. Might be worthwhile to close it.

graysky2 commented 4 years ago

@cvillabrille @gearhead @rickkdotnet - If you're up to trying, see the firmware here. Not saying it will solve your issues, but seems to have solved mine: https://github.com/greearb/ath10k-ct/issues/139#issuecomment-660312103

jorgeperezhidalgo commented 1 year ago

Hi, this post is quite old already. Now I have a GL-B1300 with OpenWRT 22.03 and I'm experiencing the same issures. Is there a way I can make my GL-B1300 wifi work properly? thanks