Closed jeffsf closed 5 years ago
Note that the above testing was done prior to the two, potentially significant commits below. Additional testing underway at this time.
commit c6caa7a27a38929f6d7e76795df6c3dbba7d7351
Author: Felix Fietkau <redacted>
Date: Fri Mar 1 14:54:31 2019 +0100
mac80211: add a fix to prevent unsafe queue wake calls during restart
Signed-off-by: Felix Fietkau <nredacted>
commit 82d306b595b374277fd04c158d4cc7ddf5cf0b37
Author: Felix Fietkau <redacted>
Date: Fri Mar 1 13:10:53 2019 +0100
mac80211: backport tx queue start/stop fix
Among other things, it fixes a race condition on calling ieee80211_restart_hw
Signed-off-by: Felix Fietkau <redacted>
Edit: The wireless appears "stable" with current builds, based on device-specific commits off a point on OpenWrt master
after these commits.
There is no firmware crash, but firmware does appear to just go away. No obvious errors in DBGLOG output from firmware. This is last interesting message before FW goes away. It might be interesting if this command is always the one that is last before failure. Also, you could turn on 'wmi' debugging to get more precise idea of last messages before FW goes away...if you can find a pattern, maybe it would provide a clue.
Sat Feb 23 18:08:48 2019 kern.warn kernel: [ 336.641926] ath10k_pci 0000:01:00.0: bss channel survey timed out
Changing from 0x3f to debug_mask=0x203f
in /etc/modules.d/ath10k_core
Adding debug_mask=0x203f
to /etc/modules.d/ath10k-ct
(not sure if that does anything)
If those aren't "the right" values, I can easily change them for later tests.
I'll run with the current, seemingly functional build for a while, then go back to a "failing" build.
I'll also try to confirm that the above-mentioned commits are "responsible" for the change in behavior.
So I understand, it is generally working OK for you now, or is it still failing after 90 sec? If mostly fixed but other issues remain, maybe worth opening specific bugs for remaining issues?
Correct, working as well as I would expect for a half-complete bring-up of a new device.
At least for me, I worry when things "fix themselves", so trying to at least confirm that the commits mentioned above resolved the issue.
I'll close this out and, one way or another, report if I can confirm that was the case.
I wasn't able to convince myself of which specific commit resolved this. Issue has not reoccurred with current OpenWrt master
branch.
Originally reported on ath10k mailing list as QCA9888: Driver/Firmware Crash After Initialization with follow-on direct email of the same title on February 16, 2019 containing logs and driver/firmware files used at that time.
Summary:
master
and Linux 4.14Failure rate is somewhere above 90%
QCA9888 on IPQ4019 platform, attached over PCIe; hardware is virtually unloaded.
Additional data was collected with
ath10k_core debug_mask=0x3f
One example showing exceptionally long-term success, including authentication is from the 2019-02-23_1821-PST run, with dmesg and syslog available
Some common errors seen in the logs are
bss channel survey timed out
wmi command <various decimal numbers> timeout, restarting hardware
Hardware restart was requested
Hardware became unavailable during restart.
firmware ver 10.4b-ct-9888-fW-012-5815a26a api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT crc32 4a66be6f
Later versions have been tried.
Snapshot-in-time OpenWrt source at https://github.com/jeffsf/openwrt-ea8300/
DTS segment showing QCA9888 attachment (inherits from
#include "qcom-ipq4019.dtsi"