greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 40 forks source link

Openwrt on Archer C7: non-CT version better performance and stability #145

Open ulrich-mayer opened 4 years ago

ulrich-mayer commented 4 years ago

Because I read about DAWN I switched from 19.07 to current snapshots. When I observed kernel fails on my Archer C7 v5 I filed this bug on firmware crashes, related to ath10k-ct with firmware (default from the openwrt snapshot): https://bugs.openwrt.org/index.php?do=details&task_id=3228&order=dateopened&sort=desc

After I switched to the non-CT version of the ath10k driver stability and performance improved significantly, as described in a forum post here: https://forum.openwrt.org/t/dawn-on-archer-c7-minor-problems/69809 Even though the non-CT driver also has issues (see end of dmesg) the user experience is much better (as of July 2020)

In the forum discussion @greearb recommended to open an issue here and asked for more detail on the chipset of the failing device: From dmesg (with the non-CT driver): [ 15.550417] ath10k_pci 0000:00:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000

In this discussion https://forum.openwrt.org/t/archer-c7-2-4-ghz-wireless-dies-in-24-48-hours/44163/63 another firmware version is recommended:

uweklatt Pocket_Sevens Apr 14 Hello Pocket_Sevens, actually I am using ath10k-firmware-qca988x-ct-htt and kmod-ath10k-ct. Uwe

@greearb could you roughly describe the difference between CT and CT-HTT (or point me to where I can find that info)?

Thanks Uli

Please find the full dmesg attached: dmesg.txt

ulrich-mayer commented 4 years ago

I found

root@aps:/lib/firmware/ath10k/QCA988X# opkg list *firmware*qca988x*
ath10k-firmware-qca988x - 2019-10-03-d622d160-1 - ath10k firmware for QCA988x devices
ath10k-firmware-qca988x-ct - 2020-04-24-2 - Alternative ath10k firmware for QCA988X from Candela Technologies.
 Enables IBSS and other features.  See:
 http://www.candelatech.com/ath10k-10.1.php
 This firmware will NOT be used unless the standard ath10k-firmware-qca988x
 is un-selected since the driver will try to load firmware-5.bin before
 firmware-2.bin
ath10k-firmware-qca988x-ct-full-htt - 2020-04-24-2 - Alternative ath10k firmware for QCA988X from Candela Technologies.
 Uses normal HTT TX data path for management frames, which improves
 stability in busy networks and fixes .11r authentication.
 Enables IBSS and other features.  See:
 http://www.candelatech.com/ath10k-10.1.php
 This firmware selects and requires the ath10k-ct driver.

Note the fixes .11r authentication on the full-htt version.

I guess that the reason for the bad user experience when I tried last time was the lack of 802.11r support in the default ath10k-ct firmware on openwrt, Since I use dawn and therefore 802.11rkv.

I gave the ath10k-ct driver another try, and now I think the user experience is on par with the non-CT driver/FW.

I still I hit a lot of

Sun Aug  2 16:52:21 2020 daemon.err dawn[2508]: not enough memory (4019496580 @ 104)
Sun Aug  2 16:52:21 2020 daemon.err dawn[2508]: not enough memory (4084566110 @ 104)
Sun Aug  2 16:52:21 2020 daemon.err dawn[2508]: not enough memory (1561658278 @ 104)
Sun Aug  2 16:52:21 2020 daemon.err dawn[2508]: not enough memory (3292162862 @ 104)

when roaming while transferring big files.

Since I saw various FW versions on http://www.candelatech.com/ath10k-10.1.php I tried the one for machines with tight memory: firmware-2-ct-full-nrcc-community.bin (after setting /lib/firmware/ath10k/fwcfg-pci-0000:00:00.0.txt according to what I found on the bottom of http://www.candelatech.com/ath10k-10.4.php#config. It loads properly). As expected, the DAWN not enough memory messages are still there.

@greearb I'll have an eye on the kernel messages and report if I hit another one with firmware-2-ct-full-nrcc-community.bin. Also, I'd recommend to rephrase the description on http://www.candelatech.com/ath10k-10.1.php a little. It took me a while to realize that the "compiled out" in swbmiss, beacon filtering, roaming code, descriptor-mgt compiled out does not just refer to "descriptor management", but it might as well be just me...

greearb commented 4 years ago

Hello,

I'd expect any of the different variants of ath10k-ct firmware to support 11kvr.

Can you test again to see if changing just the firmware variant causes some reproducible problem?

Thanks, Ben

On 8/2/20 8:18 AM, ulrich-mayer wrote:

I found

|root@aps:/lib/firmware/ath10k/QCA988X# opkg list firmwareqca988x* ath10k-firmware-qca988x - 2019-10-03-d622d160-1 - ath10k firmware for QCA988x devices ath10k-firmware-qca988x-ct - 2020-04-24-2 - Alternative ath10k firmware for QCA988X from Candela Technologies. Enables IBSS and other features. See: http://www.candelatech.com/ath10k-10.1.php This firmware will NOT be used unless the standard ath10k-firmware-qca988x is un-selected since the driver will try to load firmware-5.bin before firmware-2.bin ath10k-firmware-qca988x-ct-full-htt - 2020-04-24-2 - Alternative ath10k firmware for QCA988X from Candela Technologies. Uses normal HTT TX data path for management frames, which improves stability in busy networks and fixes .11r authentication. Enables IBSS and other features. See: http://www.candelatech.com/ath10k-10.1.php This firmware selects and requires the ath10k-ct driver. |

Note the /fixes .11r authentication/ on the full-htt version.

I guess that the reason for the bad user experience when I tried last time was the lack of 802.11r support in the default ath10k-ct firmware on openwrt, Since I use dawn and therefore 802.11rkv.

I gave the ath10k-ct driver another try, and now I think the user experience is on par with the non-CT driver/FW.

I still I hit a lot of

|Sun Aug 2 16:52:21 2020 daemon.err dawn[2508]: not enough memory (4019496580 @ 104) Sun Aug 2 16:52:21 2020 daemon.err dawn[2508]: not enough memory (4084566110 @ 104) Sun Aug 2 16:52:21 2020 daemon.err dawn[2508]: not enough memory (1561658278 @ 104) Sun Aug 2 16:52:21 2020 daemon.err dawn[2508]: not enough memory (3292162862 @ 104) |

when roaming while transferring big files.

Since I saw various FW versions on http://www.candelatech.com/ath10k-10.1.php I tried the one for machines with tight memory: |firmware-2-ct-full-nrcc-community.bin| (after setting |/lib/firmware/ath10k/fwcfg-pci-0000:00:00.0.txt| according to what I found on the bottom of |http://www.candelatech.com/ath10k-10.4.php#config|. It loads properly). As expected, the DAWN not enough memory messages are still there.

@greearb https://github.com/greearb I'll have an eye on the kernel messages and report if I hit another one with |firmware-2-ct-full-nrcc-community.bin|. Also, I'd recommend to rephrase the description on http://www.candelatech.com/ath10k-10.1.php a little. It took me a while to realize that the "compiled out" in |swbmiss, beacon filtering, roaming code, descriptor-mgt compiled out| does not just refer to "descriptor management", but it might as well be just me...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greearb/ath10k-ct/issues/145#issuecomment-667686872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACHNKXSPAJKQ4ZDMCK6QULR6V7SZANCNFSM4PELV7TA.

-- Ben Greear greearb@candelatech.com Candela Technologies Inc http://www.candelatech.com

ulrich-mayer commented 4 years ago

Hey @greearb, sorry for the delay.... yesterday I picked a new snapshot because I saw a number of promising commits: OpenWrt SNAPSHOT r14181-1e696c6ced / LuCI Master git-20.226.17214-f2e9031 which includes commit eff8c76aa02275a7b325e0fa93cc349380299fde commit b7727a8005635a46255518bdf19eb016e160278a

all I did initially was switching to smallbuffers:

opkg update
opkg remove wpad-basic kmod-ath10k-ct
opkg install avahi-daemon-service-http avahi-nodbus-daemon \
             wpad-openssl  \
             kmod-ath10k-ct-smallbuffers
opkg install luci luci-app-dawn

With that I quickly hit a firmware issue again: some kern.warn kernel: [ 1375.187294] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon and eventually kern.err kernel: [ 1378.051356] ath10k_pci 0000:00:00.0: Cannot communicate with firmware, previous wmi cmds: 36904:268747 36954:268721 36904:268715 36904:268715, jiffies: 269504, attempting to fake crash and restart firmware, dev-flags: 0x42 apw-kernel.log

Then I changed to the htt version of the firmware: ath10k-firmware-qca988x-ct-full-htt I don't know whether it's related to the htt version, but there I hit another familiar FW issue: kern.err kernel: [25474.893049] ath10k_pci 0000:00:00.0: firmware crashed! (guid 218d97e0-f6d9-4ed5-b942-5d1b908ecbf5) aps-reconnect.log

Please note that there seems to be a significant difference between Archer C7 v2 and v5 NO fails observed so far with CT on v2, it is still running with the CT htt version: SoC: Qualcomm Atheros QCA9558 ver 1 rev 0

I see the fails on Archer C7 v5, these I have switched back to the non-CT version: SoC: Qualcomm Atheros QCA956X ver 1 rev 0

The FW identifies the same wifi HW on both Archer C7 v2 and v5: ath10k_pci 0000:00:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000

greearb commented 4 years ago

The 'aps-reconnect.log' crash is same or similar as bug 123, dereference some 0xdeadbeff ptr. I don't know why that would happen, but looks pretty serious, so I think crashing and restarting firmware is best option. Could be board-file issues on the v5 perhaps?

real-t0mg commented 4 years ago

Hi, I'm having the same issue I believe. Hardware is TP-Link Archer C7 v5 with CT ath10k-firmware-qca988x-ct - 2019-10-03-d622d160-1. I've opened a bug in the OpenWRT tracker too ( bug report in openwrt ).

Let me know what I may do to help.

ath10k_firmware_crash.txt

EDIT: adding more info.
crash.txt

ulrich-mayer commented 4 years ago

@greearb I agree, restarting the firmware is a graceful way out of that situation with little collateral damage. I've diffed board.bin from CT and non-CT packages: they are the same. Would you expect them to be different? Bug 123 is about Archer C7 v2... that's my only Archer C7 running stable with CT firmware. (Could still be the same bug, of course, allthough there the dump shows 0xDEADC0DE).