greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 41 forks source link

Firmware Crashing on 9984, 9888 due to rate-ctrl related mem corruption. #38

Closed dougmoscrop closed 5 years ago

dougmoscrop commented 6 years ago

Description

Losing 5GHz from what appears to be a firmware crash. I have tried several builds of OpenWrt, most recently I built from OpenWrt master (6ef1c978) and in addition to this, I during a second build updated ath10k-ct to b9989fbd and ath10k-firmware to 4ed74b59 (I probably did something wrong, but the result is the same.)

This is same as bug #30, which was seen under high load on 9888 radios. I suspect based on that bug that it may be related to stations in power-save mode. --Ben

Edit: I also did notice https://github.com/greearb/ath10k-ct/issues/35 but the reporter made mention of some beta hardware; to my knowledge I do not have beta hardware but I did buy it off eBay so maybe?

Network

[x86 pfSense Router] - [TG-SG3210 Switch] - [R7800 configured as AP - no firewall, etc.]

Hardware

Netgear R7800 (QCA 9984)

ethtool -i wlan0

driver: ath10k_pci
version: 4.14.74
firmware-version: 10.4-ct-9984-fW-011-cf79c7f
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

ethtool -i wlan1

driver: ath10k_pci
version: 4.14.74
firmware-version: 10.4-ct-9984-fW-011-cf79c7f
expansion-rom-version:
bus-info: 0001:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

Logs

This is before I changed the logging level:

log.txt

I have since increased logging verbosity as per http://www.candelatech.com/ath10k-bugs.php and will keep an eye out for anything else.

Dump

Sorry, I don't see anything in /sys/kernel/debug/ieee80211/wiphy0/ath10k -- how can I help?

greearb commented 6 years ago

I think you have found a crash I have not seen before. It looks like a div-by-zero or something like that in the prefetch scheduler: 0x009c15ae RAM: __subsf3_aux /home/customer/tree/RD-2011.2/tools/swtools-x86-linux/xtensa-elf/src/libgcc-xcc/config/xtensa/ieee754-sf.S:277 0x409c15ae RAM: __subsf3_aux /home/customer/tree/RD-2011.2/tools/swtools-x86-linux/xtensa-elf/src/libgcc-xcc/config/xtensa/ieee754-sf.S:277 0x8099744e RAM: tx_pfsched_atf_token_scheduler_hdlr /home/greearb/git/embedd/qca-10-4-3-new-9984/wlan/mac_core/src/wal/AR/tx_sched/tx_prefetch_sched.c:152 0x8099848a RAM: tx_pfsched_sched_command_done /home/greearb/git/embedd/qca-10-4-3-new-9984/wlan/mac_core/src/wal/AR/tx_sched/tx_prefetch_sched.c:1687 0x80997ec6 RAM: tx_pfsched_completion_callback /home/greearb/git/embedd/qca-10-4-3-new-9984/wlan/mac_core/src/wal/AR/tx_sched/tx_prefetch_sched.c:1282 0x8099a225 RAM: tx_sched_atf_num_peer_zero_tokens /home/greearb/git/embedd/qca-10-4-3-new-9984/wlan/mac_core/src/wal/AR/tx_sched/tx_sched_wifi_ip02.c:643 0x80995441 RAM: _tx_send_seq_trig_dsr_done /home/greearb/git/embedd/qca-10-4-3-new-9984/wlan/mac_core/src/wal/AR/tx/wifi_ip02/ar_wal_tx_seq.c:1767 0x80993066 RAM: _tx_send_completion_dsr_hdlr /home/greearb/git/embedd/qca-10-4-3-new-9984/wlan/mac_core/src/wal/AR/tx/wifi_ip02/ar_wal_tx_send.c:8141 0x8098ed50 RAM: _tx_send_completion_dsr_hdlr_wrapper /home/greearb/git/embedd/qca-10-4-3-new-9984/wlan/mac_core/src/wal/AR/tx/wifi_ip02/ar_wal_tx_send.c:1149 0x80963ad3 ROM: cmnos_intr_handle_pending_dsrs /local/mnt/workspace/CRMBuilds/CNSS.BL.3.0-00058-S-1_20150213_182825/b/cnss_proc/wlan/mac_core/src/os/common/cmnos_intrinf.c:335 0x80960e80 ROM: check_idle /local/mnt/workspace/CRMBuilds/CNSS.BL.3.0-00058-S-1_20150213_182825/b/cnss_proc/wlan/mac_core/src/os/athos/athos_main.c:2017

Is this problem reproducible (with -ct driver/software)?

greearb commented 6 years ago

Upon further poking around, that backtrace doesn't really make sense. Here is the very latest binary, please test with it and send me more logs if you see any crashes. If it is reproducible, I can add custom debugging to further track it down. firmware-5-full-community.bin.gz

dougmoscrop commented 6 years ago

@greearb Thanks for the reply; I am a bit out of my element here (mostly a casual home user).

Just some more info, hopefully not noise: When I used the OpenWrt 18.06 'regular' build, I was losing 5GHz but not with a crash, but messages like received unexpected .... in push mode. I then loaded the "hnyman build" of 18.06, and got the crashes instead. Then I tried master, crashes, then compiled my own, crashes. So each time they were 'reproducible' in the sense that they happened ... but it can take a while before it does. I want to say that it seems to correlate with devices on my network being inactive, but I also don't want to create a red herring. I can't reproduce it "on demand". It just happens and at some point I'll pull out my Samsung S8 and it will say "internet not available" on the wifi side. Same with the MacBook. In order to "fix" it, I must disconnect/reconnect. It doesn't ever appear to heal on it's own on either device type. They think they're connected, but with no internet.

Thinking out loud: Since I bought this off eBay, is there a possibility it's just bad/abused hardware?

Also, I installed the latest DD-WRT (using TFTP recovery just to sort of be sure I wasn't screwing anything up) to see if it was any different and I see stuff like this:

[57319.943056] ath10k_warn: 7 callbacks suppressed
[57319.943075] ath10k_pci 0001:01:00.0: received unexpected tx_fetch_ind event: in push mode
...
[58535.157396] ath10k_pci 0001:01:00.0: received unexpected tx_fetch_ind event: in push mode
[68058.248728] ath10k_warn: 18 callbacks suppressed
[68058.248786] ath10k_pci 0001:01:00.0: received unexpected tx_fetch_ind event: in push mode
...

Which is what I was seeing on the 18.06 install of OpenWrt.

This is what DD-WRT is running:

[   12.008227] ath10k_pci 0001:01:00.0: enabling device (0140 -> 0142)
[   12.008956] ath10k_pci 0001:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[   12.177048] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/pre-cal-pci-0001:01:00.0.bin failed with error -2
[   12.177100] ath10k_pci 0001:01:00.0: Falling back to user helper
[   12.188898] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error -2
[   12.192819] ath10k_pci 0001:01:00.0: Falling back to user helper
[   12.204904] ath10k_pci 0001:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[   12.209490] ath10k_pci 0001:01:00.0: kconfig debug 1 debugfs 1 tracing 0 dfs 0 testmode 0
[   12.228901] ath10k_pci 0001:01:00.0: firmware ver 10.4-3.6-00144 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps crc32 e385d6b0
[   14.474193] ath10k_pci 0001:01:00.0: failed to fetch board data for bus=pci,vendor=168c,device=0046,subsystem-vendor=168c,subsystem-device=cafe from ath10k/QCA9984/hw1.0/board-2.bin
[   14.474590] ath10k_pci 0001:01:00.0: board_file api 1 bmi_id N/A crc32 fc1e3b6a
[   15.879304] ath10k_pci 0001:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal file max-sta 512 raw 0 hwcrypto 1

I am going to TFTP-install the latest snapshot build in OpenWrt and then copy in that firmware until I hear any further instructions

dougmoscrop commented 6 years ago

Comparison of what firmware was there vs. what you gave me (.bak is what came with OpenWRT):

root@OpenWrt:/tmp# ls -la /lib/firmware/ath10k/QCA9984/hw1.0/
drwxr-xr-x    1 root     root           232 Oct  9 01:32 .
drwxr-xr-x    1 root     root           224 Oct  8 08:03 ..
-rw-r--r--    1 root     root        145520 Oct  8 08:03 board-2.bin
-rw-r--r--    1 root     root        600496 Oct  8 08:03 firmware-5.bak
-rw-r--r--    1 root     root        598296 Oct  9 01:29 firmware-5.bin

From OpenWrt:

[ 13.034313] ath10k_pci 0000:01:00.0: firmware ver 10.4-ct-9984-fW-011-cf79c7f api 5 features peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,cust-stats-CT crc32 25783e66

From you:

[ 21.103125] ath10k_pci 0001:01:00.0: firmware ver 10.4-ct-9984-fW-011-a0654c8 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT crc32 40acab71

Other than killing firewall, DHCP, dnsmasq, etc. I merely enabled Wi-Fi and set a password, didn't fiddle with anything yet. Normally I set my country (Canada) and so on. Will keep you posted!

dougmoscrop commented 6 years ago

OK, after ^ I watched some TV with my wife, after it ended I pulled my phone out, tried fast.com and saw 300-400Mbps on my Samsung S8. I surfed a few websites and lo suddenly nothing is loading. Sure enough there's a crash.

Attached is the entirety of the output of dmesg

crash.txt

Edit: As an aside, the "original" issue I was seeing is something like what is mentioned here: https://github.com/openwrt/openwrt/pull/1374#issuecomment-425316030 -- the crash came as a result of me trying new firmware/builds from master

dougmoscrop commented 6 years ago

This is probably a very stupid thing for me to have done, but, I used the 3.6.0.1 non-CT firmware, with the CT driver, and I have not had a single crash to disconnect. dmesg has a bunch of messages about unknown events, etc. but it's so far been the best experience.

Edit: I take it back, my wife just told me Wi-Fi is down again; it was up for days this time, I will see what the logs say when I get home from work.

greearb commented 6 years ago

I think I found the problem in my firmware, could you please test the attached and let me know how it works? firmware-5-full-community.bin.gz

dougmoscrop commented 6 years ago

OK, I have loaded that.

The [ 38.860077] print_req_error: I/O error, dev mtdblock0, sector 0 seem new?

[   27.452859] ath10k_pci 0001:01:00.0: firmware ver 10.4-ct-9984-fW-011-a0654c8 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT crc32 152c166e
[   29.771843] ath10k_pci 0001:01:00.0: board_file api 2 bmi_id 0:2 crc32 cf58c3bc
[   35.594173] ath10k_pci 0001:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[   35.594204] ath10k_pci 0001:01:00.0: msdu-desc: 2500  skid: 32
[   35.677027] ath10k_pci 0001:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0'
[   35.677730] ath10k_pci 0001:01:00.0: wmi print 'free: 92428 iram: 16068 sram: 23552'
[   35.923275] ath10k_pci 0001:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1
[   36.012220] ath: EEPROM regdomain: 0x0
[   36.012231] ath: EEPROM indicates default country code should be used
[   36.012241] ath: doing EEPROM country->regdmn map search
[   36.012257] ath: country maps to regdmn code: 0x3a
[   36.012269] ath: Country alpha2 being used: US
[   36.012278] ath: Regpair used: 0x3a
[   36.022040] kmodloader: done loading kernel modules from /etc/modules.d/*
[   38.860071] print_req_error: 14 callbacks suppressed
[   38.860077] print_req_error: I/O error, dev mtdblock0, sector 0
[   38.864671] print_req_error: I/O error, dev mtdblock0, sector 8
[   38.870290] print_req_error: I/O error, dev mtdblock0, sector 16
[   38.876317] print_req_error: I/O error, dev mtdblock0, sector 24
[   38.883316] print_req_error: I/O error, dev mtdblock0, sector 0
[   38.887901] Buffer I/O error on dev mtdblock0, logical block 0, async page read
[   38.898992] print_req_error: I/O error, dev mtdblock0, sector 0
[   38.900830] Buffer I/O error on dev mtdblock0, logical block 0, async page read
[   38.907856] print_req_error: I/O error, dev mtdblock1, sector 0
[   38.914575] print_req_error: I/O error, dev mtdblock1, sector 8
[   38.920416] print_req_error: I/O error, dev mtdblock1, sector 16
[   38.926458] print_req_error: I/O error, dev mtdblock1, sector 24
[   38.932687] Buffer I/O error on dev mtdblock1, logical block 0, async page read
[   38.941360] Buffer I/O error on dev mtdblock1, logical block 0, async page read
[   40.010182] Generic PHY fixed-0:01: attached PHY driver [Generic PHY] (mii_bus:phy_addr=fixed-0:01, irq=POLL)
[   40.011459] dwmac1000: Master AXI performs any burst length
[   40.019201] ipq806x-gmac-dwmac 37400000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
[   40.025407] ipq806x-gmac-dwmac 37400000.ethernet eth1: registered PTP clock
[   40.033670] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[   40.045223] br-lan: port 1(eth1.1) entered blocking state
[   40.046250] br-lan: port 1(eth1.1) entered disabled state
[   40.052015] device eth1.1 entered promiscuous mode
[   40.057265] device eth1 entered promiscuous mode
[   40.064839] IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready
[   40.993855] ath: EEPROM regdomain: 0x807c
[   40.993879] ath: EEPROM indicates we should expect a country code
[   40.996967] ath: doing EEPROM country->regdmn map search
[   41.002915] ath: country maps to regdmn code: 0x3a
[   41.008388] ath: Country alpha2 being used: CA
[   41.012902] ath: Regpair used: 0x3a
[   41.017417] ath: regdomain 0x807c dynamically updated by user
[   41.020742] ath: EEPROM regdomain: 0x807c
[   41.026690] ath: EEPROM indicates we should expect a country code
[   41.030607] ath: doing EEPROM country->regdmn map search
[   41.036773] ath: country maps to regdmn code: 0x3a
[   41.042142] ath: Country alpha2 being used: CA
[   41.046792] ath: Regpair used: 0x3a
[   41.051148] ath: regdomain 0x807c dynamically updated by user
[   41.135016] ipq806x-gmac-dwmac 37400000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
[   41.135089] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   41.144770] br-lan: port 1(eth1.1) entered blocking state
[   41.148959] br-lan: port 1(eth1.1) entered forwarding state
[   41.163625] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   48.598556] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[   48.598588] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
[   48.679664] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0'
[   48.680350] ath10k_pci 0000:01:00.0: wmi print 'free: 92428 iram: 16068 sram: 23552'
[   48.966426] ath10k_pci 0000:01:00.0: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4
[   48.966704] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   49.597639] ath10k_pci 0000:01:00.0: NOTE:  Firmware DBGLOG output disabled in debug_mask: 0x10000000
[   54.900556] ath10k_pci 0001:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[   54.900589] ath10k_pci 0001:01:00.0: msdu-desc: 2500  skid: 32
[   54.983642] ath10k_pci 0001:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0'
[   54.984351] ath10k_pci 0001:01:00.0: wmi print 'free: 92428 iram: 16068 sram: 23552'
[   55.426453] ath10k_pci 0001:01:00.0: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4
[   55.426715] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
[   55.436632] br-lan: port 2(wlan0) entered blocking state
[   55.442221] br-lan: port 2(wlan0) entered disabled state
[   55.448233] device wlan0 entered promiscuous mode
[   55.457961] br-lan: port 3(wlan1) entered blocking state
[   55.457991] br-lan: port 3(wlan1) entered disabled state
[   55.463277] device wlan1 entered promiscuous mode
[   56.138460] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   56.138750] br-lan: port 2(wlan0) entered blocking state
[   56.143929] br-lan: port 2(wlan0) entered forwarding state
[   57.568785] IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
[   57.569138] br-lan: port 3(wlan1) entered blocking state
[   57.574348] br-lan: port 3(wlan1) entered forwarding state
greearb commented 6 years ago

Sorry, I must have uploaded the wrong thing. The firmware version should be: 10.4-ct-9984-fW-011-c3d1394 firmware-5-full-community.bin.gz

dougmoscrop commented 6 years ago

OK, looks good:

[   27.285911] ath10k_pci 0001:01:00.0: firmware ver 10.4-ct-9984-fW-011-c3d1394 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT crc32 53f2e496

-- oof, that crashed fast: crash.txt

greearb commented 6 years ago

So, interesting...this is a different thing, looks like rate-ctrl has a mem corruption of some kind, and this looks very similar to bug #30 I need to write a hunk of debugging code to narrow down this corruption and then let you all who can reproduce it do so over and over as I keep bisecting down in the code. I'll get started on that as soon as I can...probably best to go back to your stable FW until I have something new to test.

notgood commented 6 years ago

Not sure if it's the same bug, but ath10k-ct firmware crashes for me as well, can't even bring 5GHz interface up. Same R7800, laster OpenWRT master. log.txt

The only non-standard thing in my setup is a custom regdb

iw reg get
global
country 00: DFS-UNSET
    (2400 - 2500 @ 40), (N/A, 30), (N/A)
    (4900 - 5900 @ 160), (N/A, 30), (N/A)
greearb commented 6 years ago

dougmoscrop, please try this image, it has some debugging that hopefully will point me towards the root cause. notgood: I'll check your crash next.... firmware-5-full-community.bin.gz

greearb commented 6 years ago

notgood, yours fails with 'channel invalid' error out of the firmware. Does stock-firmware work on this system? Please open a new bug, and gather logs with DBGLOG firmware logging on (see directions on gathering logs here: http://www.candelatech.com/ath10k-bugs.php

dougmoscrop commented 6 years ago

Will do - @greearb what logging level etc should I set?

greearb commented 6 years ago

@dougmoscrop I think default logging will be enough, I'll let you know if I think otherwise after seeing the next log.

huaracheguarache commented 6 years ago

I too just had a crash on my R7800, but I'm using the latest beta firmware, and I'm running the latest OpenWrt master (39e5e17045) :

root@telia:~# ethtool -i wlan0
driver: ath10k_pci
version: 4.14.75
firmware-version: 10.4b-ct-9984-fH-011-620dfcc
expansion-rom-version: 
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

Here's the log:

ct_crash.txt

greearb commented 6 years ago

That ct_crash.txt appears to be the same bug as #30, but with the 'b' firmware. I will need to update my bug catcher logic slightly to detect this variant and will upload a new debugging binary when I get a chance.

dougmoscrop commented 6 years ago

Sorry for the delay @greearb - loaded the firmware and did a reboot, but Wi-Fi never came up. Got this from wired:

test.log.txt

greearb commented 6 years ago

Please test this one, I fixed that debugging related crash and verified it at least loads. firmware-5-full-community.bin.gz

FYI: Rate-ctrl was re-worked to get better performance, possibly it will make this bug act differently now.

greearb commented 6 years ago

So, here is an un-tested image with more bug-check. Wouldn't be surprised if it crashes on startup since I'm out of time to test it on hardware at the moment. But, if someone has time, please see if it works at all, and post crashes if you see them... firmware-5-full-htt-mgt-community.bin.gz

greearb commented 6 years ago

A user mentioned the above firmware works for him, here is an updated one with similar logic bug better debugging in case the mem corruption is detected. firmware-5-full-htt-mgt-community.bin.gz

dougmoscrop commented 6 years ago

Thanks, I will load this today.

On Sun, Oct 21, 2018 at 1:45 PM Ben Greear notifications@github.com wrote:

A user mentioned the above firmware works for him, here is an updated one with similar logic bug better debugging in case the mem corruption is detected. firmware-5-full-htt-mgt-community.bin.gz https://github.com/greearb/ath10k-ct/files/2499451/firmware-5-full-htt-mgt-community.bin.gz

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greearb/ath10k-ct/issues/38#issuecomment-431688805, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjzFCyguSfYU7s9nLJuzUk81U2FxeX6ks5unLLPgaJpZM4XLcmS .

greearb commented 6 years ago

Here is a better one I think firmware-5-full-htt-mgt-community.bin.gz ...sorry about that!

huaracheguarache commented 6 years ago

Anyone else testing the latest firmware? I have not yet been able to crash it despite running iperf3 to stress the network and turning on wifi power management on my laptop.

Although somewhat unrelated, I would like to mention that the debugging firmware feels sluggish for some reason. Websites load slowly and even the OpenWrt Luci admin page takes a long time to bring up.

EDIT: this is the firmware I'm talking about: firmware-5-full-htt-mgt-community.bin.gz

dougmoscrop commented 6 years ago

I am sorry I have not had a chance to test yet, my ISP is taking a shit and I've had plenty of distractions.

I will as soon as possible.

@greearb if you don't mind my asking, how are you able to make these? Did you reverse engineer the firmware, or are you privy to the source code somehow? Sorry if this is common knowledge or answered elsewhere.

greearb commented 6 years ago

The debugging firmware is likely sluggish due to doing sanity checks every (NIC) IRQ, wmi command, etc. I'm basically checking for memory corruption between known fence posts in the firmware, and if corruption is found, then I can hopefully quickly narrow it down to what code is causing it.

I do have access to QCA firmware source under NDA and that is how I can make modified firmware images.

huaracheguarache commented 6 years ago

Speak of the devil and he shall appear; I just had a crash. I tried to find the binary dump, but I'm unable to find a file named fw_crash_dump in both /sys/kernel/debug/ieee80211/phy0/ath10k/ and /sys/kernel/debug/ieee80211/phy1/ath10k/. Am I looking for the right file?

Anyhow, here's the dmesg log:

crash_231018.txt

greearb commented 6 years ago

Sorry, that is one I can't debug "Cannot communicate with firmware, attempting to fake crash and restart firmware."

And yeah, it is damn near impossible to find the binary dump now that it uses the 'coredump' API in the 4.16+ driver. I'll have to write up some new instructions when I get a chance...

huaracheguarache commented 6 years ago

Oh, that's a bummer...

TheIvanSusanin commented 6 years ago

Also R7800 and firmware crashes (on two AP). I think trace is from debug firmware that was posted in this thread

704435.410105] ath10k_pci 0000:01:00.0: firmware crashed! (guid 2c539331-1458-4693-810a-0ef718ce1697) [704435.410175] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe [704435.417975] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0 [704435.432673] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fH-011-3b8413e api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,tx-rc-CT,cust-stats-CT crc32 efc1660c [704435.440058] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 dd6d039c [704435.461303] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1 [704435.470708] ath10k_pci 0000:01:00.0: firmware register dump: [704435.478834] ath10k_pci 0000:01:00.0: [00]: 0x0000000A 0x000015B3 0x009A638E 0x00975B31 [704435.484378] ath10k_pci 0000:01:00.0: [04]: 0x009A638E 0x00060B30 0x00000000 0x00446E54 [704435.492287] ath10k_pci 0000:01:00.0: [08]: 0x000000FF 0x00000000 0x00412CE0 0xFFFFFFFF [704435.500251] ath10k_pci 0000:01:00.0: [12]: 0x00000009 0x00000000 0x00973ABC 0x00973AD2 [704435.508162] ath10k_pci 0000:01:00.0: [16]: 0x00973AB0 0x00960E8D 0x009606CA 0x00000000 [704435.516245] ath10k_pci 0000:01:00.0: [20]: 0x409A638E 0x004065BC 0x0045F5E4 0x00000000 [704435.524212] ath10k_pci 0000:01:00.0: [24]: 0x809A64E5 0x0040661C 0x00446E5A 0xC09A638E [704435.532214] ath10k_pci 0000:01:00.0: [28]: 0x809A7B0D 0x0040669C 0x0045F084 0x0045F550 [704435.540207] ath10k_pci 0000:01:00.0: [32]: 0x809A891A 0x0040672C 0x0045F58C 0x00406850 [704435.548092] ath10k_pci 0000:01:00.0: [36]: 0x809A894A 0x0040684C 0x0042D854 0x0042EF60 [704435.556179] ath10k_pci 0000:01:00.0: [40]: 0x80985E4B 0x0040689C 0x00000000 0x0045A914 [704435.564168] ath10k_pci 0000:01:00.0: [44]: 0x809949B7 0x004068BC 0x0042ED10 0x0045A914 [704435.572137] ath10k_pci 0000:01:00.0: [48]: 0x8098FC20 0x004068DC 0x0042ED10 0x00000000 [704435.580131] ath10k_pci 0000:01:00.0: [52]: 0x80963AD3 0x00406A7C 0x0042ED10 0x0098FC18 [704435.588022] ath10k_pci 0000:01:00.0: [56]: 0x80960E80 0x00406A9C 0x0000001F 0x00400000 [704435.596100] ath10k_pci 0000:01:00.0: Copy Engine register dump: [704435.604102] ath10k_pci 0000:01:00.0: [00]: 0x0004a000 9 9 3 3 [704435.610347] ath10k_pci 0000:01:00.0: [01]: 0x0004a400 15 15 53 54 [704435.616583] ath10k_pci 0000:01:00.0: [02]: 0x0004a800 3 3 66 67 [704435.623184] ath10k_pci 0000:01:00.0: [03]: 0x0004ac00 28 28 30 28 [704435.629682] ath10k_pci 0000:01:00.0: [04]: 0x0004b000 7390 7390 138 113 [704435.636117] ath10k_pci 0000:01:00.0: [05]: 0x0004b400 6 6 101 102 [704435.643064] ath10k_pci 0000:01:00.0: [06]: 0x0004b800 8 8 8 8 [704435.649394] ath10k_pci 0000:01:00.0: [07]: 0x0004bc00 1 1 1 1 [704435.655820] ath10k_pci 0000:01:00.0: [08]: 0x0004c000 0 0 127 0 [704435.662423] ath10k_pci 0000:01:00.0: [09]: 0x0004c400 1 1 1 1 [704435.668919] ath10k_pci 0000:01:00.0: [10]: 0x0004c800 0 0 0 0 [704435.675351] ath10k_pci 0000:01:00.0: [11]: 0x0004cc00 0 0 0 0 [704435.683979] ath10k_pci 0000:01:00.0: debug log header, dbuf: 0x4237f0 dropped: 0 [704435.689392] ath10k_pci 0000:01:00.0: [0] next: 0x423808 buf: 0x419010 sz: 1500 len: 28 count: 1 free: 0 [704435.696944] ath10k_pci 0000:01:00.0: ath10k_pci ATH10K_DBG_BUFFER: [704435.705655] ath10k: [0000]: 048C4850 17FC0001 009A638E 000015B3 000015B3 004064AC 91104569 [704435.711619] ath10k_pci 0000:01:00.0: ATH10K_END [704435.720981] ath10k_pci 0000:01:00.0: [1] next: 0x4237f0 buf: 0x419600 sz: 1500 len: 0 count: 0 free: 0

greearb commented 6 years ago

That is the same crash that another user reported, please try this FW, it has more debugging in this area (but not the very slow debugging I posted previously). This is a 'b' firmware: firmware-5-full-htt-mgt-community.bin.gz

TheIvanSusanin commented 6 years ago

Started to get a lot of "failed to send pdev bss chan info request: -108" with last posted firmware. Never (month that I run R7800) saw this error before.

Edit: really A LOT

TheIvanSusanin commented 5 years ago

Been running this firmware for 4 days now. Not sure if there is any crashes or not as kernel log quickly (20min) fills in with "failed to send pdev bss chan info request: -108". Probably takes like 20min to get 1022 messages like this. It also seems like there is less stations connected to this access point then usual - i am running two.

Edit: there is a lot of corresponding " hostapd: Failed to set beacon parameters" errors

TheIvanSusanin commented 5 years ago

Not sure if related: Got hard lock. had to reset. Got crash on boot. Never happened before

46.059104] ------------[ cut here ]------------ [ 46.059231] WARNING: CPU: 0 PID: 0 at /var/lib/buildbot/slaves/slashdirt-03/MAIN/build/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x/ath10k-ct-2018-09-29-b9989fbd/ath10k-4.16/htt_rx.c:901 ath10k_htt_t2h_msg_handler+0xdc4/0x1d0c [ath10k_core] [ 46.063575] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform [ 46.134427] sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug [ 46.156558] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.76 #0 [ 46.165740] Hardware name: Generic DT based system [ 46.172019] [] (unwind_backtrace) from [] (show_stack+0x14/0x20) [ 46.176617] [] (show_stack) from [] (dump_stack+0x88/0x9c) [ 46.184510] [] (dump_stack) from [] (warn+0xf0/0x11c) [ 46.191536] [] (warn) from [] (warn_slowpath_null+0x20/0x28) [ 46.198432] [] (warn_slowpath_null) from [] (ath10k_htt_t2h_msg_handler+0xdc4/0x1d0c [ath10k_core]) [ 46.206149] [] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [] (ath10k_htt_t2h_msg_handler+0xfb0/0x1d0c [ath10k_core]) [ 46.216795] [] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [] (ath10k_htt_txrx_compl_task+0x115c/0x11e0 [ath10k_core]) [ 46.229615] [] (ath10k_htt_txrx_compl_task [ath10k_core]) from [] (ath10k_pci_napi_poll+0x7c/0x108 [ath10k_pci]) [ 46.242272] [] (ath10k_pci_napi_poll [ath10k_pci]) from [] (net_rx_action+0x144/0x31c) [ 46.254301] [] (net_rx_action) from [] (do_softirq+0xf0/0x264) [ 46.263762] [] (__do_softirq) from [] (irq_exit+0xdc/0x148) [ 46.271657] [] (irq_exit) from [] (handle_domain_irq+0xa8/0xc8) [ 46.278686] [] (handle_domain_irq) from [] (gic_handle_irq+0x6c/0xb8) [ 46.286671] [] (gic_handle_irq) from [] (irq_svc+0x6c/0x90) [ 46.294822] Exception stack(0xc0b01f48 to 0xc0b01f90) [ 46.302474] 1f40: 00000001 00000000 00000000 c0315600 ffffe000 c0b03c74 [ 46.307518] 1f60: c0b03c28 00000000 00000000 c0a2da28 00000000 00000000 c0b01f90 c0b01f98 [ 46.315667] 1f80: c0308924 c0308928 60000013 ffffffff [ 46.323826] [] (__irq_svc) from [] (arch_cpu_idle+0x38/0x44) [ 46.328863] [] (arch_cpu_idle) from [] (do_idle+0xe8/0x1bc) [ 46.336324] [] (do_idle) from [] (cpu_startup_entry+0x1c/0x20) [ 46.343364] [] (cpu_startup_entry) from [] (start_kernel+0x400/0x40c) [ 46.350998] ---[ end trace 7fc36435088192ef ]---

TheIvanSusanin commented 5 years ago

Finally few proper crashes:

[395963.120706] ath10k_pci 0000:01:00.0: firmware crashed! (guid 58e5a691-5700-4f4a-a75e-1908e5b525d8) [395963.120827] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe [395963.128624] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0 [395963.143561] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fH-011-2a847fb api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,tx-rc-CT,cust-stats-CT crc32 b4fd66f3 [395963.150687] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 dd6d039c [395963.171854] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1 [395963.181241] ath10k_pci 0000:01:00.0: firmware register dump: [395963.189349] ath10k_pci 0000:01:00.0: [00]: 0x0000000A 0x000015B3 0x009A5D55 0x00975B31 [395963.194994] ath10k_pci 0000:01:00.0: [04]: 0x009A5D55 0x00060B30 0x00000005 0x00000001 [395963.202806] ath10k_pci 0000:01:00.0: [08]: 0x00000033 0x0045F430 0x00000001 0x00000000 [395963.210791] ath10k_pci 0000:01:00.0: [12]: 0x00000009 0x00000000 0x00973D28 0x00973D33 [395963.218778] ath10k_pci 0000:01:00.0: [16]: 0x00973AB0 0x009606CA 0x009606CA 0x00000000 [395963.226764] ath10k_pci 0000:01:00.0: [20]: 0x409A5D55 0x0040664C 0x0045E894 0x0045FA2C [395963.234751] ath10k_pci 0000:01:00.0: [24]: 0x809A6B8C 0x004066AC 0x0000FFFF 0xC09A5D55 [395963.242736] ath10k_pci 0000:01:00.0: [28]: 0x809A72AD 0x0040671C 0x00412CE0 0x00000001 [395963.250722] ath10k_pci 0000:01:00.0: [32]: 0x809A8922 0x0040676C 0x00406850 0x0045E894 [395963.258708] ath10k_pci 0000:01:00.0: [36]: 0x809A8A1E 0x0040684C 0x0042CFF4 0x004068DC [395963.266695] ath10k_pci 0000:01:00.0: [40]: 0x80985E47 0x0040689C 0x00000000 0x0045A24C [395963.274681] ath10k_pci 0000:01:00.0: [44]: 0x809949B3 0x004068BC 0x0042E4B0 0x0045A24C [395963.282666] ath10k_pci 0000:01:00.0: [48]: 0x8098FC1C 0x004068DC 0x0042E4B0 0x00000000 [395963.290656] ath10k_pci 0000:01:00.0: [52]: 0x80963AD3 0x00406A7C 0x0042E4B0 0x0098FC14 [395963.298641] ath10k_pci 0000:01:00.0: [56]: 0x80960E80 0x00406A9C 0x0000001F 0x00400000 [395963.306626] ath10k_pci 0000:01:00.0: Copy Engine register dump: [395963.314616] ath10k_pci 0000:01:00.0: [00]: 0x0004a000 7 7 3 3 [395963.320864] ath10k_pci 0000:01:00.0: [01]: 0x0004a400 6 6 204 205 [395963.327203] ath10k_pci 0000:01:00.0: [02]: 0x0004a800 14 14 77 78 [395963.333711] ath10k_pci 0000:01:00.0: [03]: 0x0004ac00 17 17 19 17 [395963.340222] ath10k_pci 0000:01:00.0: [04]: 0x0004b000 6636 6636 32 248 [395963.346733] ath10k_pci 0000:01:00.0: [05]: 0x0004b400 22 22 373 374 [395963.353588] ath10k_pci 0000:01:00.0: [06]: 0x0004b800 13 13 13 13 [395963.359925] ath10k_pci 0000:01:00.0: [07]: 0x0004bc00 1 1 1 1 [395963.366436] ath10k_pci 0000:01:00.0: [08]: 0x0004c000 0 0 127 0 [395963.372946] ath10k_pci 0000:01:00.0: [09]: 0x0004c400 1 1 1 1 [395963.379457] ath10k_pci 0000:01:00.0: [10]: 0x0004c800 0 0 0 0 [395963.385967] ath10k_pci 0000:01:00.0: [11]: 0x0004cc00 0 0 0 0 [395963.394506] ath10k_pci 0000:01:00.0: debug log header, dbuf: 0x422fa8 dropped: 0 [395963.400007] ath10k_pci 0000:01:00.0: [0] next: 0x422f90 buf: 0x4195c0 sz: 1500 len: 324 count: 12 free: 0 [395963.407566] ath10k_pci 0000:01:00.0: ath10k_pci ATH10K_DBG_BUFFER: [395963.416172] ath10k: [0000]: 05014052 17FC4C07 9110A004 00000000 00000000 00000001 00000001 05014052 [395963.422339] ath10k: [0008]: 17FC4C07 9110A005 00000079 00000033 00088776 00000033 05014052 13FC4C07 [395963.431712] ath10k: [0016]: 0000A0B5 0045F430 0045E894 0042CFF4 05014052 0FFC4C07 0000A0B6 0042CFF4 [395963.440827] ath10k: [0024]: 00000006 05014052 14044C01 71108880 00000000 00000000 00000000 00000FF0 [395963.449942] ath10k: [0032]: 05014052 14044C01 71108880 00010000 00000000 00000000 000000FF 05014052 [395963.459056] ath10k: [0040]: 14044C01 71108880 00020000 00000000 00000000 000000FF 05014052 14044C01 [395963.468170] ath10k: [0048]: 71108880 00030000 00000000 00000000 000001FF 05014052 14044C01 71108880 [395963.477286] ath10k: [0056]: 00040000 00000000 00000000 000001FF 05014052 14044C01 71108880 00050000 [395963.486399] ath10k: [0064]: 00000000 00000000 000003FF 05014052 14044C01 71108880 00060000 00000000 [395963.495514] ath10k: [0072]: 00000000 00000000 05014052 17FC0001 009A5D55 000015B3 000015B3 0040653C

[395963.513731] ath10k_pci 0000:01:00.0: ATH10K_END [395963.518316] ath10k_pci 0000:01:00.0: [1] next: 0x422fa8 buf: 0x418fd0 sz: 1500 len: 0 count: 0 free: 0


[427745.895342] ath10k_pci 0000:01:00.0: firmware crashed! (guid 384e87a1-9180-46db-bbe5-47e186aedac7) [427745.895415] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe [427745.903233] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0 [427745.918295] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fH-011-2a847fb api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,tx-rc-CT,cust-stats-CT crc32 b4fd66f3 [427745.925418] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 dd6d039c [427745.946540] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1 [427745.955950] ath10k_pci 0000:01:00.0: firmware register dump: [427745.964067] ath10k_pci 0000:01:00.0: [00]: 0x0000000A 0x000015B3 0x009A504F 0x00975B31 [427745.969618] ath10k_pci 0000:01:00.0: [04]: 0x009A504F 0x00060730 0x00000002 0x00446B68 [427745.977524] ath10k_pci 0000:01:00.0: [08]: 0x0042D0FC 0x00412CE0 0x0045F430 0x00406740 [427745.985490] ath10k_pci 0000:01:00.0: [12]: 0x00000009 0x00000000 0x009A78DC 0x009A78E4 [427745.993488] ath10k_pci 0000:01:00.0: [16]: 0x00973AB0 0x00960E65 0x009606CA 0x00000000 [427746.001389] ath10k_pci 0000:01:00.0: [20]: 0x409A504F 0x0040669C 0xFFFFFFFF 0x00412CE0 [427746.009451] ath10k_pci 0000:01:00.0: [24]: 0x809A785E 0x004066FC 0x00000001 0xC09A504F [427746.017448] ath10k_pci 0000:01:00.0: [28]: 0x809A89EE 0x0040672C 0x00412CE0 0x00406850 [427746.025415] ath10k_pci 0000:01:00.0: [32]: 0x809A8A1E 0x0040684C 0x0042D0FC 0x0042E700 [427746.033412] ath10k_pci 0000:01:00.0: [36]: 0x80985E47 0x0040689C 0x00000000 0x00459FDC [427746.041319] ath10k_pci 0000:01:00.0: [40]: 0x809949B3 0x004068BC 0x0042E4B0 0x00459FDC [427746.049407] ath10k_pci 0000:01:00.0: [44]: 0x8098FC1C 0x004068DC 0x0042E4B0 0x00000000 [427746.057374] ath10k_pci 0000:01:00.0: [48]: 0x80963AD3 0x00406A7C 0x0042E4B0 0x0098FC14 [427746.065371] ath10k_pci 0000:01:00.0: [52]: 0x80960E80 0x00406A9C 0x0000001F 0x00400000 [427746.073262] ath10k_pci 0000:01:00.0: [56]: 0x80960E51 0x00406ACC 0x00400000 0x00000000 [427746.081337] ath10k_pci 0000:01:00.0: Copy Engine register dump: [427746.089338] ath10k_pci 0000:01:00.0: [00]: 0x0004a000 7 7 3 3 [427746.095586] ath10k_pci 0000:01:00.0: [01]: 0x0004a400 11 11 49 50 [427746.101825] ath10k_pci 0000:01:00.0: [02]: 0x0004a800 50 50 113 114 [427746.108424] ath10k_pci 0000:01:00.0: [03]: 0x0004ac00 21 21 23 21 [427746.114922] ath10k_pci 0000:01:00.0: [04]: 0x0004b000 1070 1070 65 25 [427746.121355] ath10k_pci 0000:01:00.0: [05]: 0x0004b400 30 30 221 222 [427746.128292] ath10k_pci 0000:01:00.0: [06]: 0x0004b800 22 22 22 22 [427746.134634] ath10k_pci 0000:01:00.0: [07]: 0x0004bc00 1 1 1 1 [427746.141060] ath10k_pci 0000:01:00.0: [08]: 0x0004c000 0 0 127 0 [427746.147661] ath10k_pci 0000:01:00.0: [09]: 0x0004c400 1 1 1 1 [427746.154160] ath10k_pci 0000:01:00.0: [10]: 0x0004c800 0 0 0 0 [427746.160591] ath10k_pci 0000:01:00.0: [11]: 0x0004cc00 0 0 0 0 [427746.169214] ath10k_pci 0000:01:00.0: debug log header, dbuf: 0x422f90 dropped: 0 [427746.174733] ath10k_pci 0000:01:00.0: [0] next: 0x422fa8 buf: 0x418fd0 sz: 1500 len: 28 count: 1 free: 0 [427746.182183] ath10k_pci 0000:01:00.0: ath10k_pci ATH10K_DBG_BUFFER: [427746.190893] ath10k: [0000]: 01F08A38 17FC0001 009A504F 000015B3 000015B3 0040658C 91104569 [427746.196856] ath10k_pci 0000:01:00.0: ATH10K_END [427746.206221] ath10k_pci 0000:01:00.0: [1] next: 0x422f90 buf: 0x4195c0 sz: 1500 len: 0 count: 0 free: 0


[573142.158305] ath10k_pci 0000:01:00.0: firmware crashed! (guid ff717379-fb5d-44ce-8198-5698820f092f) [573142.158377] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe [573142.166206] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0 [573142.180954] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fH-011-2a847fb api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,tx-rc-CT,cust-stats-CT crc32 b4fd66f3 [573142.188275] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 dd6d039c [573142.209530] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1 [573142.218832] ath10k_pci 0000:01:00.0: firmware register dump: [573142.227018] ath10k_pci 0000:01:00.0: [00]: 0x0000000A 0x000015B3 0x009A6447 0x00975B31 [573142.232579] ath10k_pci 0000:01:00.0: [04]: 0x009A6447 0x00060B30 0x00000005 0x00000007 [573142.240481] ath10k_pci 0000:01:00.0: [08]: 0x00406764 0x000000FF 0x00400000 0x00000006 [573142.248447] ath10k_pci 0000:01:00.0: [12]: 0x00000009 0x00000000 0x00973ABC 0x00973AD2 [573142.256365] ath10k_pci 0000:01:00.0: [16]: 0x00973AB0 0x009A5677 0x009606CA 0x00000000 [573142.264442] ath10k_pci 0000:01:00.0: [20]: 0x409A6447 0x0040659C 0x0045FC0C 0x00000000 [573142.272408] ath10k_pci 0000:01:00.0: [24]: 0x809A65B9 0x004065FC 0x00000001 0xC09A6447 [573142.280406] ath10k_pci 0000:01:00.0: [28]: 0x809A7BE1 0x0040669C 0x0045E894 0x0045FB78 [573142.288404] ath10k_pci 0000:01:00.0: [32]: 0x809A89EE 0x0040672C 0x0045FBB4 0x00406850 [573142.296294] ath10k_pci 0000:01:00.0: [36]: 0x809A8A1E 0x0040684C 0x0042CFF4 0x0042E700 [573142.304366] ath10k_pci 0000:01:00.0: [40]: 0x80985E47 0x0040689C 0x00000000 0x0045A24C [573142.312333] ath10k_pci 0000:01:00.0: [44]: 0x809949B3 0x004068BC 0x0042E4B0 0x0045A24C [573142.320331] ath10k_pci 0000:01:00.0: [48]: 0x8098FC1C 0x004068DC 0x0042E4B0 0x00000000 [573142.328328] ath10k_pci 0000:01:00.0: [52]: 0x80963AD3 0x00406A7C 0x0042E4B0 0x0098FC14 [573142.336225] ath10k_pci 0000:01:00.0: [56]: 0x80960E80 0x00406A9C 0x0000001F 0x00400000 [573142.344291] ath10k_pci 0000:01:00.0: Copy Engine register dump: [573142.352295] ath10k_pci 0000:01:00.0: [00]: 0x0004a000 7 7 3 3 [573142.358544] ath10k_pci 0000:01:00.0: [01]: 0x0004a400 31 31 421 422 [573142.364786] ath10k_pci 0000:01:00.0: [02]: 0x0004a800 59 59 58 59 [573142.371381] ath10k_pci 0000:01:00.0: [03]: 0x0004ac00 17 17 19 17 [573142.377879] ath10k_pci 0000:01:00.0: [04]: 0x0004b000 1728 1728 138 98 [573142.384318] ath10k_pci 0000:01:00.0: [05]: 0x0004b400 24 24 503 504 [573142.391250] ath10k_pci 0000:01:00.0: [06]: 0x0004b800 25 25 25 25 [573142.397591] ath10k_pci 0000:01:00.0: [07]: 0x0004bc00 1 1 1 1 [573142.404022] ath10k_pci 0000:01:00.0: [08]: 0x0004c000 0 0 127 0 [573142.410617] ath10k_pci 0000:01:00.0: [09]: 0x0004c400 1 1 1 1 [573142.417116] ath10k_pci 0000:01:00.0: [10]: 0x0004c800 0 0 0 0 [573142.423552] ath10k_pci 0000:01:00.0: [11]: 0x0004cc00 0 0 0 0 [573142.432170] ath10k_pci 0000:01:00.0: debug log header, dbuf: 0x422f90 dropped: 0 [573142.437595] ath10k_pci 0000:01:00.0: [0] next: 0x422fa8 buf: 0x418fd0 sz: 1500 len: 104 count: 4 free: 0 [573142.445149] ath10k_pci 0000:01:00.0: ath10k_pci ATH10K_DBG_BUFFER: [573142.453850] ath10k: [0000]: 00DF9095 17FC4C07 91107000 000000FF 000000D8 000B0000 02000006 00DF9095 [573142.459909] ath10k: [0008]: 17FC4C07 9110700A 00000FF0 00000000 00000000 00000003 00DF9095 0FFC4C07 [573142.469281] ath10k: [0016]: 9110700B 00000003 00000003 00DF9095 17FC0001 009A6447 000015B3 000015B3 [573142.478400] ath10k: [0024]: 0040648C 91104569 [573142.487523] ath10k_pci 0000:01:00.0: ATH10K_END [573142.492713] ath10k_pci 0000:01:00.0: [1] next: 0x422f90 buf: 0x4195c0 sz: 1500 len: 0 count: 0 free: 0

greearb commented 5 years ago

On 11/25/2018 09:47 AM, TheIvanSusanin wrote:

Finally few proper crashes:

Thanks for reporting this.

The first problem is one we have been tracking...this provides extra debug info. It points to the phymode for peer being changed incorrectly and/or perhaps corrupted (phymode was HT40, but current rate was VHT). I will try to figure out how this can happen.

Second one was txbf related assert I don't recall seeing before....I added debug to get more info if it happens again.

Third was crash I have seen elsewhere, but your firmware didn't have some additional logging I already put there, so ignoring your crash.

I will upload a new image to try soon...

Thanks, Ben

-- Ben Greear greearb@candelatech.com Candela Technologies Inc http://www.candelatech.com

greearb commented 5 years ago

Here is an updated image that should fix at least one of these rate-ctrl related bugs, and has more debugging for others. This should not have excessive debugging code however, so I expect it to run normally as far as performance goes. firmware-5-full-htt-mgt-community.bin.gz

TheIvanSusanin commented 5 years ago

BTW, something extra interesting: (maybe doesnt belong to this thread) I have Blink security cameras which are based on TI Simplelink. When AP runs -ct firmware/kernel module they fail to perform 3-way handshake and become essentially useless.

greearb commented 5 years ago

Please open a new bug for this, and maybe try the older non-b firmware version to see if it has the same issues?

Thanks, Ben

On 11/29/2018 03:53 PM, TheIvanSusanin wrote:

BTW, something extra interesting: (maybe doesnt belong to this thread) I have Blink security cameras which are based on TI Simplelink. When AP runs -ct firmware/kernel module they fail to perform 3-way handshake and become essentially useless.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greearb/ath10k-ct/issues/38#issuecomment-443038109, or mute the thread https://github.com/notifications/unsubscribe-auth/AAR2qnwccmyOrBKg3S2HHScssiQlRpQ6ks5u0HOKgaJpZM4XLcmS.

-- Ben Greear greearb@candelatech.com Candela Technologies Inc http://www.candelatech.com

huaracheguarache commented 5 years ago

@dougmoscrop have you tired the latest OpenWrt trunk? The stable firmware was recently updated.

Things have been running very smoothly for me.

dougmoscrop commented 5 years ago

Not yet but I will! Thanks for the heads up

On Mon, Dec 17, 2018, 9:24 AM huaracheguarache, notifications@github.com wrote:

@dougmoscrop https://github.com/dougmoscrop have you tired the latest OpenWrt trunk? The stable firmware was recently updated https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=cc5c63f217e6ca29959b5c61ed5de690a42d9fbf .

Things have been running very smoothly for me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greearb/ath10k-ct/issues/38#issuecomment-447863386, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjzFE40O7RL8tuRkZ-k0K2l5fySzbrJks5u56kzgaJpZM4XLcmS .

greearb commented 5 years ago

Bug 58 has morphed into this bug, and is against newer code, so closing this one and will continue to work on the issue in bug 58.