greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 41 forks source link

ath10k fails to load on Fritzbox 5490 #185

Closed jschwartzenberg closed 3 years ago

jschwartzenberg commented 3 years ago

Please provide this info. See this link for more info on how to gather debug info: http://www.candelatech.com/ath10k-bugs.php

Description of the problem (how to configure, how to reproduce, how often it happens). I don't know if this is the best spot to report this, but the module fails to load with OpenWRT on a Fritzbox 5490 (https://github.com/kestrel1974/openwrt/pull/1). This router is not yet officially supported, we're looking at getting it to work. It was suggested to write here for possible help regarding the ath10k module. The ath9k module does load correctly. The device has two SoCs, the ath10k is attached to a dedicated ath79 WiFi SoC (there is also a main Lantiq-based SoC).

Software (OS, Firmware version, kernel, driver, etc) OpenWRT is built from here: https://github.com/kestrel1974/openwrt/pull/1 Linux kernel version: 5.10.31 ath10k 5.10 driver, optimized for CT firmware, probing pci device: 0x3c.

Hardware (NIC chipset, platform, etc) ath10k_pci 0000:00:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000

Logs (dmesg, maybe supplicant and/or hostap)

[ 4401.235230] ath10k 5.10 driver, optimized for CT firmware, probing pci device: 0x3c.
[ 4401.244455] ath10k_pci 0000:00:00.0: pci irq legacy oper_irq_mode 1 irq_mode 0 reset_mode 0
[ 4402.220420] ath10k_pci 0000:00:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
[ 4402.220439] ath10k_pci 0000:00:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[ 4402.224405] ath10k_pci 0000:00:00.0: firmware ver 10.1-ct-8x-__fH-022-ecad3248 api 2 features wmi-10.x,mfp,txstatus-noack,wmi-10.x-CT,ratemask-CT,txrate-CT,get-temp-CT,tx-rc-CT,cust-stats-CT,retry-gt2-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT crc32 1b2a161c
[ 4402.553905] ath10k_pci 0000:00:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
[ 4408.581778] ath10k_pci 0000:00:00.0: failed to receive control response completion, polling..
[ 4409.605780] ath10k_pci 0000:00:00.0: ctl_resp never came in (-145)
[ 4409.612075] ath10k_pci 0000:00:00.0: failed to connect to HTC: -145
[ 4409.721186] ------------[ cut here ]------------
[ 4409.726016] WARNING: CPU: 0 PID: 5 at kernel/workqueue.c:3037 __flush_work.isra.51+0x22c/0x234
[ 4409.734865] Modules linked in: ath10k_pci(+) ath10k_core ath9k ath9k_common iptable_nat ath9k_hw ath xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD nf_nat nf_flow_table nf_conntrack mac80211 ipt_REJECT cfg80211 ath9k_pci_owl_loader xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 gpio_button_hotplug [last unloaded: ath10k_core]
[ 4409.787984] CPU: 0 PID: 5 Comm: kworker/u2:0 Tainted: G        W         5.10.31 #0
[ 4409.795854] Workqueue: ath10k_wq ath10k_core_stop [ath10k_core]
[ 4409.801890] Stack : 80618a5c 80099e28 00000009 00000000 806c0000 800bb058 81429720 00000017
[ 4409.810428]         81416e3c 8143dc74 8143dc44 8014d370 806c0000 00000001 8143dc18 a83cb3fd
[ 4409.819016]         00000000 00000000 80618a5c 8143da98 ffffefff 8072595c 00000000 00000000
[ 4409.827603]         00000000 fffe3520 00000000 000c24ce 00000000 00000000 00000009 80099e28
[ 4409.836199]         00000009 00000000 806c0000 806c0000 00000018 80363a98 00000000 80bb0000
[ 4409.844742]         ...
[ 4409.847251] Call Trace:
[ 4409.849755] [<80066f18>] show_stack+0x30/0x100
[ 4409.854287] [<80082efc>] __warn+0xc0/0xe8
[ 4409.858409] [<80082f80>] warn_slowpath_fmt+0x5c/0xac
[ 4409.863520] [<80099e28>] __flush_work.isra.51+0x22c/0x234
[ 4409.869024] [<8009a004>] __cancel_work_timer+0x15c/0x208
[ 4409.874551] [<81c22248>] ath10k_htc_stop_hl+0x1c/0x5c [ath10k_core]
[ 4409.880978] [<81c2a264>] ath10k_htt_set_rx_ops+0xc8/0x120 [ath10k_core]
[ 4409.887741] 
[ 4409.889256] ---[ end trace d6f2e27f2ea4299f ]---
[ 4409.893957] ------------[ cut here ]------------
[ 4409.898675] WARNING: CPU: 0 PID: 5 at kernel/workqueue.c:3037 __flush_work.isra.51+0x22c/0x234
[ 4409.907423] Modules linked in: ath10k_pci(+) ath10k_core ath9k ath9k_common iptable_nat ath9k_hw ath xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD nf_nat nf_flow_table nf_conntrack mac80211 ipt_REJECT cfg80211 ath9k_pci_owl_loader xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 gpio_button_hotplug [last unloaded: ath10k_core]
[ 4409.960624] CPU: 0 PID: 5 Comm: kworker/u2:0 Tainted: G        W         5.10.31 #0
[ 4409.968503] Workqueue: ath10k_wq ath10k_core_stop [ath10k_core]
[ 4409.974602] Stack : 80618a5c 80099e28 00000009 00000000 806c0000 800bb058 81429720 00000017
[ 4409.983132]         81416e3c 8143dc74 8143dc44 8014d370 806c0000 00000001 8143dc18 a83cb3fd
[ 4409.991659]         00000000 00000000 80618a5c 8143da98 ffffefff 8072595c 00000000 00000000
[ 4410.000185]         00000000 fffe3bb4 00000000 000ec737 00000000 00000000 00000009 80099e28
[ 4410.008704]         00000009 00000000 806c0000 806c0000 00000018 80363a98 00000000 80bb0000
[ 4410.017221]         ...
[ 4410.019721] Call Trace:
[ 4410.022299] [<80066f18>] show_stack+0x30/0x100
[ 4410.026857] [<80082efc>] __warn+0xc0/0xe8
[ 4410.030947] [<80082f80>] warn_slowpath_fmt+0x5c/0xac
[ 4410.036114] [<80099e28>] __flush_work.isra.51+0x22c/0x234
[ 4410.041633] [<8009a004>] __cancel_work_timer+0x15c/0x208
[ 4410.047093] [<81c22254>] ath10k_htc_stop_hl+0x28/0x5c [ath10k_core]
[ 4410.053516] [<81c2a264>] ath10k_htt_set_rx_ops+0xc8/0x120 [ath10k_core]
[ 4410.060260] 
[ 4410.061830] ---[ end trace d6f2e27f2ea429a0 ]---
[ 4410.066564] ath10k_pci 0000:00:00.0: could not init core (-145)
[ 4410.077412] ath10k_pci 0000:00:00.0: could not probe fw (-145)

Apologies if this is not the right spot to ask on this. Do you have any idea what I could try? Any help or pointer in the right direction would be appreciated!

greearb commented 3 years ago

It appears the radio never wakes up and responds to the driver. Possibly there are pci issues on this board, or something like that?

jschwartzenberg commented 3 years ago

That might be possible, a similar board appears to work fine. @kestrel1974 do you think it could be a PCI issue?

kestrel1974 commented 3 years ago

Other comments searching for similar problems have pointed to hardware problems, but if your board works with stock firmware, then I wonder what the problem might be, that the device does not wake up. Or it could be another endianess problem like that for the renesas pcie usb specific to the 5490 and pci, but on the ath79 target. What speaks against that theory is that the 7490 firmware works unchanged on the 5490 according to your observations.

jschwartzenberg commented 3 years ago

Yeah, I can't vouch 100% that WiFi is working optimally with the stock firmware, but I don't believe there were issues. I understand there are some differences in the 7490 and 5490 hardware as the project to run 7490 stock firmware on 5490 is also running into differences with WiFi. I ask will there once more.

I don't know if the 7490 stock firmware works, but ath79 kernel messages have AVM FritzBox 7490 - Target in the support log of the stock 5490 firmware.

jschwartzenberg commented 3 years ago

I put the stock firmware on (7.27) and 5 GHz is working with it, so it does not seem to be a hardware issue.

jschwartzenberg commented 3 years ago

The size of the 7490 and 5490 WASP files matches up exactly, so they are likely equal: https://boxmatrix.info/listings/FRITZ.Box_5490-07.27.image--b932deeeac29ef0b933928549f7f36ef--listing2.txt

-rw-rw-rw-    13380 2021-05-11 /lib/firmware/ath_tgt_fw1.fw
-r--r--r-- 11946244 2021-05-11 /lib/firmware/ath_tgt_fw2.fw

https://boxmatrix.info/listings/FRITZ.Box_7490-07.27.image--1ef9123ae71aab36f617ee4a9f88241a--listing2.txt

-rw-rw-rw-    13380 2021-05-12 /lib/firmware/ath_tgt_fw1.fw
-r--r--r-- 11946244 2021-05-12 /lib/firmware/ath_tgt_fw2.fw
kestrel1974 commented 3 years ago

@greearb Is there any indicator when the caldata extracted from the routers read only area is too small? For 5490 just the same offset and length as for 7490 is used and there is some output, but could it be too small? Is there an hex indicator for length or end for the caldata?

dmascord commented 3 years ago

I am having the same issue on a Ruckus R500 with qca988x hw2.0 target 0x4100016c chip_id 0x043222ff sub 168c:3223

Stock firmware has no issue loading the firmware, but ath10k-ct (and also ath10k_pci) both fail with the same timeout error.

[ 14.921598] ath10k_pci 0000:00:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043222ff sub 168c:3223 [ 14.931001] ath10k_pci 0000:00:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0 [ 14.943232] ath10k_pci 0000:00:00.0: firmware ver 10.1-ct-8x-__fW-022-ecad3248 api 2 features wmi-10.x,has-wmi-mgmt-tx,mfp,txstatus-noack,wmi-10.x-CT,ratemask-CT,txrate-CT,get-temp-CT,tx-rc-CT,cust-stats-CT,retry-gt2-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT crc32 3e4cf97f [ 15.379953] ath10k_pci 0000:00:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08 [ 17.377421] ath10k_pci 0000:00:00.0: failed to receive control response completion, polling.. [ 18.417419] ath10k_pci 0000:00:00.0: ctl_resp never came in (-145) [ 18.423690] ath10k_pci 0000:00:00.0: failed to connect to HTC: -145 [ 18.560680] ------------[ cut here ]------------ [ 18.565380] WARNING: CPU: 0 PID: 5 at kernel/workqueue.c:3039 __flush_work.isra.51+0x22c/0x234 [ 18.574137] Modules linked in: ath10k_pci(+) ath10k_core ath xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject_bridge nft_reject nft_redir nft_quota nft_objref nft_numgen nft_meta_bridge nft_log nft_limit nft_hash nft_ct nft_counter nf_tables_set nf_tables nf_nat nf_flow_table_hw nf_flow_table nf_conntrack mac80211 ipt_REJECT ebtable_nat ebtable_filter ebtable_broute cfg80211 xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc nfnetlink nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_filter ip_tables ebtables ebt_vlan ebt_stp ebt_snat ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_ip6 ebt_ip ebt_dnat ebt_arpreply ebt_arp ebt_among ebt_802_3 crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd [ 18.574294] gpio_button_hotplug usbcore nls_base usb_common crc32c_generic crypto_hash [ 18.671164] CPU: 0 PID: 5 Comm: kworker/u2:0 Not tainted 5.4.124 #0 [ 18.677586] Workqueue: ath10k_wq ath10k_core_stop [ath10k_core] [ 18.683583] Stack : 805b4d10 8009a1e4 00000009 00000000 80650000 800b814c 805b4d10 00000000 [ 18.692067] 00000017 8fc1913c 805a57d8 8fc43c04 80650000 8fc19153 8fc43bd8 1fb5af75 [ 18.700547] 00000000 00000000 00000000 000000b1 0000005d 00000000 6831306b 5f636f72 [ 18.709030] 000000b1 807c0000 00000000 000a56d2 00000000 00000009 00000000 8009a1e4 [ 18.717511] 00000009 00000000 80650000 80650000 00000002 80307bac 00000000 807a0000

kestrel1974 commented 3 years ago

@greearb I got more reports about the same issue for example with fritzbox 7490, while my fritzbox 7490 works. Could there be a hw revision that does not work with the driver (the none ct driver is reported not to work too)? Is there any trace or printk that can be added to figure out, why the hardware is not responding? For the reported problems the stock firmware which is based on very old kernel version is reported to work?! Or is that just another issue that can probably never be solved due to missing specs etc?

Thanks in advance.

greearb commented 3 years ago

Maybe it is a rev-1 card, or something like that. Support for that was never working well in ath10k and was removed by upstream maintainer almost immediately. Either way, this is not something I have time or interest in trying to debug, and probably it is some HW related thing that is outside my ability to fix.

kestrel1974 commented 3 years ago

@greearb Thanks for your comment. It looks like the caldata was extracted to the wrong file name and directory, not to QCA988X/hw2.0 directory and overwriting the board.bin that comes from the ath10k firmware package. So we still need to check, but not using the caldata provided by the package seems to solve the problem.

kestrel1974 commented 3 years ago

@greearb Sorry, it is actually the other way around. The caldata extracted from some of the hardware causes the failed to connect to HTC: -145 error. So I thought about a patch to fall back to board.bin if that error happens after loading caldata. Is there any way to go back to initialization if that happens?

jschwartzenberg commented 3 years ago

So it was discovered that replacing the calibration data allows ath10k to operate successfully: https://github.com/kestrel1974/openwrt/pull/1#issuecomment-876724900

Maybe the calibration data extracted somehow has an incompatibility with the ath10k driver and is only suitable for AVM's driver. Someone mentioned to me that AVM's driver appears to be based on the BSD driver, sadly its source is not available. I don't know if using this extracted calibration data might still have a more optimal result if it could be gotten to work with ath10k driver.

dmascord commented 3 years ago

@greearb - Working on an Extreme Networks AP-3935i-ROW, which has two ath10k cards in it, the 5Ghz card works fine with the precal data from 0x5000 from ART, but 2Ghz card fails to initialize from 0x1000

[  151.227031] ath10k_pci 0000:01:00.0: assign IRQ: got 34
[  151.227084] ath10k 5.10 driver, optimized for CT firmware, probing pci device: 0x40.
[  151.232424] ath10k_pci 0000:01:00.0: enabling bus mastering
[  151.239753] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[  151.634457] ath10k_pci 0000:01:00.0: qca99x0 hw2.0 target 0x01000000 chip_id 0x003801ff sub 168c:0002
[  151.634495] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[  151.644556] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9980-fW-13-5ae337bb1 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 b36a12bf
[  151.715409] ath10k_pci 0000:01:00.0: failed to fetch board data for bus=pci,bmi-chip-id=1,bmi-board-id=28 from ath10k/QCA99X0/hw2.0/board-2.bin
[  151.715501] ath10k_pci 0000:01:00.0: board_file api 1 bmi_id 1:28 crc32 7e56fd07
[  152.875664] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[  152.875696] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
[  152.954024] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[  152.954767] ath10k_pci 0000:01:00.0: wmi print 'free: 31080 iram: 23028 sram: 9596'
[  153.230000] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1
[  153.320600] ath10k_pci 0000:01:00.0: invalid MAC address; choosing random
[  153.320660] ath: EEPROM regdomain: 0xb000
[  153.326488] ath: EEPROM indicates we should expect a country code
[  153.330363] ath: invalid regulatory domain/country code 0xb000
[  153.336528] ath: Invalid EEPROM contents

The factory dmesg shows the usage of AR900B/hw.2/boardData_AR900B_CUS240_2GMipiHigh_v2_006.bin . If I use that boardData file in place of the pre-cal-pci-0000:01:00.0.bin file, ath10k loads fine with 2G:

[ 1055.166417] ath10k_pci 0000:01:00.0: assign IRQ: got 34
[ 1055.166469] ath10k 5.10 driver, optimized for CT firmware, probing pci device: 0x40.
[ 1055.171618] ath10k_pci 0000:01:00.0: enabling bus mastering
[ 1055.179153] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[ 1055.512508] ath10k_pci 0000:01:00.0: qca99x0 hw2.0 target 0x01000000 chip_id 0x003801ff sub 168c:0002
[ 1055.512548] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[ 1055.522585] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9980-fW-13-5ae337bb1 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 b36a12bf
[ 1055.593583] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 1:6 crc32 08fa09f2
[ 1056.742275] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[ 1056.742307] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
[ 1056.823498] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[ 1056.824244] ath10k_pci 0000:01:00.0: wmi print 'free: 31080 iram: 23028 sram: 9596'
[ 1057.105641] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1
[ 1057.190584] ath: EEPROM regdomain sanitized
[ 1057.190614] ath: EEPROM regdomain: 0x64
[ 1057.193570] ath: EEPROM indicates we should expect a direct regpair map
[ 1057.197504] ath: Country alpha2 being used: 00
[ 1057.203986] ath: Regpair used: 0x64

Is there a sane way to validate the precal data is actually valid ?

EDIT - Looks like the precal data should start with "202f" - which at ART @ 0x1000 and ART @ 0x5000 do not contain valid precal data.

@kestrel1974 / @jschwartzenberg - perhaps the data that is being loaded as precal data is corrupt, and therefore there is a need to take the precal from elsewhere ? When the board.bin is replaced does the bmi_id show something sensible ?

kestrel1974 commented 3 years ago

@greearb @jschwartzenberg @dmascord It turned out that the same hardware model from different manufactured dates have different offsets where the caldata with x'4408' starts. Modifying the script that extracts the caldata to actually search for x'4408' and not use hard coded offsets seem to have solved the issue. I think this could be closed.

jschwartzenberg commented 3 years ago

Yep, ath10k was loaded properly with the right data provided. I'll close this. Thanks a lot for looking along @greearb!