Closed stintel closed 2 years ago
@stintel did you notice #156 and #178 ?
They are pretty similar firmware crashes.
I also see crashes with my travel router GL-AR750, which is a rather similar MIPS ath79 based device with QCA9887, as your device. (see crash log extract below, note the older -ct firmware 2020-07-02-1)
But I have see no similar frequent crashes with my main router, ARM based ipq806x R7800, which is very popular with OpenWrt (and for which I author a community build).
So, I think that the ath79 and QCA988x might be a common theme making these ath10k-ct firmware crashes more frequent. Also #156 and #178 seem to concern such ath79 devices.
This is with GL-R750 / OpenWrt SNAPSHOT r14869-9867d08e07 / ath10k-firmware-qca9887-ct 2020-07-02-1
Fri Apr 9 23:34:46 2021 kern.warn kernel: [ 279.896985] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
Fri Apr 9 23:34:46 2021 kern.warn kernel: [ 279.999390] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
Fri Apr 9 23:34:46 2021 kern.err kernel: [ 280.085469] ath10k_pci 0000:00:00.0: Cannot communicate with firmware, previous wmi cmds: 36904:-5695 36954:-5792 36904:-5821 36904:-5948, jiffies: -4992, attempting to fake crash and restart firmware, dev-flags: 0x42
Fri Apr 9 23:34:46 2021 kern.warn kernel: [ 280.105485] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
Fri Apr 9 23:34:46 2021 kern.warn kernel: [ 280.113563] ath10k_pci 0000:00:00.0: failed to send wmi nop: -143
Fri Apr 9 23:34:46 2021 kern.warn kernel: [ 280.119905] ath10k_pci 0000:00:00.0: could not request stats (type -268435456 ret -143 specifier 1)
Fri Apr 9 23:34:47 2021 kern.err kernel: [ 280.129541] ath10k_pci 0000:00:00.0: firmware crashed! (guid n/a)
Fri Apr 9 23:34:47 2021 kern.info kernel: [ 280.135894] ath10k_pci 0000:00:00.0: qca9887 hw1.0 target 0x4100016d chip_id 0x004000ff sub 0000:0000
Fri Apr 9 23:34:47 2021 kern.info kernel: [ 280.145517] ath10k_pci 0000:00:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
Fri Apr 9 23:34:47 2021 kern.info kernel: [ 280.159086] ath10k_pci 0000:00:00.0: firmware ver 10.1-ct-87-__fW-022-538f0906 api 2 features wmi-10.x,has-wmi-mgmt-tx,mfp,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,get-temp-CT,tx-rc-CT,cust-stats-CT,retry-gt2-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT crc32 e27449db
From #178
I cannot make progress on firmware crashes that are for this reason: [ 3702.764066] ath10k_pci 0000:00:00.0: Cannot communicate with firmware, previous wmi cmds: 40859:849904 36904:849785 36904:849780 36904:849775, jiffies: 850688, attempting to fake crash and restart firmware, dev-flags: 0x42
This doesn't appear in my logs, so I'd say it's definitely a different problem.
I don't think I can make much more progress on wave-1 ath10k firmware. There is the memory corruption and crash bug, which I do not know how to fix, and there is the 'firmware hangs and driver forces restart' issue, which I also do not know how to fix. In general, a fast crash and auto recovery may just be the best we can do here. If stock firmware works better in certain cases or certain platforms, then please use it there. In other uses and other platforms, -ct firmware/driver may be better, so then use it in those cases.
Too bad. I guess I'll just be replacing my QCA-based APs then.
early testing with an MTK ea8450 looks promising...it has good owrt support.
This is with GL-R750 / OpenWrt SNAPSHOT r14869-9867d08e07 / ath10k-firmware-qca9887-ct 2020-07-02-1
Is the QCA9887 actually a wave 1 device? Because Qualcomm sells it as wave 2.
Closing this bug, not something I can fix (wave-1 mem corruption and/or timeout talking to firmware)
Closing this bug, not something I can fix (wave-1 mem corruption and/or timeout talking to firmware)
Very unfortunate, but understandable. As of now I will advise people to stay away from anything QCA as far as possible.
Hi Ben,
We've briefly talked about this one on IRC before.
This seems to be happening relatively often now, whereas last year, it happened only very rarely. Happening on OpenWrt master r16423-bcdf600fc5 on a D-Link DAP-2695 (ath79).
To give an idea of how often it happens now (and probably incomplete due to limited buffer):
I'll try reverting the last ath10k-ct firmware in OpenWrt master to see if that changes anything.