greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 41 forks source link

Netgear R7800 crash #79

Closed timkgh closed 5 years ago

timkgh commented 5 years ago

Description of the problem (how to configure, how to reproduce, how often it happens). Crash in dmesg

Software (OS, Firmware version, kernel, driver, etc) OpenWRT ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fW-012-38a5514c3 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT crc32 7de96fb7

Hardware (NIC chipset, platform, etc) Netgear R7800

Logs (dmesg, maybe supplicant and/or hostap)

[ 2843.339572] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.105 #0
[ 2843.361722] Hardware name: Generic DT based system
[ 2843.367743] [<c030f2b4>] (unwind_backtrace) from [<c030b490>] (show_stack+0x14/0x20)
[ 2843.372426] [<c030b490>] (show_stack) from [<c079d338>] (dump_stack+0x88/0x9c)
[ 2843.380322] [<c079d338>] (dump_stack) from [<c0322be8>] (__warn+0xf0/0x11c)
[ 2843.387342] [<c0322be8>] (__warn) from [<c0322cd4>] (warn_slowpath_null+0x20/0x28)
[ 2843.394246] [<c0322cd4>] (warn_slowpath_null) from [<bf8069d4>] (ath10k_htt_t2h_msg_handler+0x1674/0x2650 [ath10k_core])
[ 2843.402016] [<bf8069d4>] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [<bf8077d0>] (ath10k_htt_t2h_msg_handler+0x2470/0x2650 [ath10k_core])
[ 2843.412960] [<bf8077d0>] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [<bf8080b4>] (ath10k_htt_txrx_compl_task+0x6f8/0xbb8 [ath10k_core])
[ 2843.425513] [<bf8080b4>] (ath10k_htt_txrx_compl_task [ath10k_core]) from [<bf856040>] (ath10k_pci_napi_poll+0x7c/0x11c [ath10k_pci])
[ 2843.438266] [<bf856040>] (ath10k_pci_napi_poll [ath10k_pci]) from [<c069b8e0>] (net_rx_action+0x144/0x31c)
[ 2843.450108] [<c069b8e0>] (net_rx_action) from [<c03015c8>] (__do_softirq+0xf0/0x264)
[ 2843.459574] [<c03015c8>] (__do_softirq) from [<c0327018>] (irq_exit+0xdc/0x148)
[ 2843.467469] [<c0327018>] (irq_exit) from [<c0363e90>] (__handle_domain_irq+0xa8/0xc8)
[ 2843.474493] [<c0363e90>] (__handle_domain_irq) from [<c0301488>] (gic_handle_irq+0x6c/0xb8)
[ 2843.482480] [<c0301488>] (gic_handle_irq) from [<c030c08c>] (__irq_svc+0x6c/0x90)
[ 2843.490631] Exception stack(0xc0b01f48 to 0xc0b01f90)
[ 2843.498289] 1f40:                   00000001 00000000 00000000 c0315040 ffffe000 c0b03cbc
[ 2843.503335] 1f60: c0b03c70 00000000 00000000 c0a2ca28 00000000 00000000 c0b01f90 c0b01f98
[ 2843.511480] 1f80: c030878c c0308790 60000013 ffffffff
[ 2843.519635] [<c030c08c>] (__irq_svc) from [<c0308790>] (arch_cpu_idle+0x38/0x44)
[ 2843.524675] [<c0308790>] (arch_cpu_idle) from [<c0359c68>] (do_idle+0xe8/0x1bc)
[ 2843.532134] [<c0359c68>] (do_idle) from [<c0359fb0>] (cpu_startup_entry+0x1c/0x20)
[ 2843.539165] [<c0359fb0>] (cpu_startup_entry) from [<c0a00cd0>] (start_kernel+0x3fc/0x408)
[ 2843.546922] ---[ end trace 65e4239a90934cf0 ]---
[ 2843.556942] ath10k_pci 0001:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[ 2843.561124] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
psyborg55 commented 5 years ago

can you post full dmesg, from boot start?

timkgh commented 5 years ago

Here's a fresh full dmesg: dmesg.log

greearb commented 5 years ago

I think I have fixed this already, please open a new bug if you see more problems in the future.

ValdikSS commented 4 years ago

@greearb, is this fixed in 4.4 kernel branch?

greearb commented 4 years ago

Not sure, I haven't done any testing on 4.4 in some time. Does it crash for you? And, why use such an old kernel?

ValdikSS commented 4 years ago

@greearb I'm using Turris Omnia router, which runs on 4.4 kernel on a stable firmware (4.14 on next, yet unstable firmware). It has Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (Compex WLE900VX card).

root@turris:/lib/firmware/ath10k/QCA988X/hw2.0# ls -lah
drwxr-xr-x    1 root     root          46 Oct  4  2018 .
drwxr-xr-x    1 root     root          10 Oct  4  2018 ..
-rw-r--r--    1 root     root        2.1K Sep 25  2018 board.bin
-rw-r--r--    1 root     root      242.4K Sep 25  2018 firmware-5.bin
root@turris:/lib/firmware/ath10k/QCA988X/hw2.0# md5sum *
ab36ef267d15cfc02317ceeb38e8f548  board.bin
cba1adbd64b243750270d8af88db1099  firmware-5.bin

I experience two problems:

  1. My server which is always connected via Wi-Fi to the router becomes unavailable and could not connect to Wi-Fi until you restart the router or Wi-Fi subsystem (type wifi in OpenWRT). You won't be able to connect to Wi-Fi network from the server until the router is restarted. Yet, other Wi-Fi devices continue to work fine. This happens about once in 1-2 weeks. The server has Intel Wireless 8265 PCI-e card. No new kernel messages appear when this happens.

  2. I see these dmesg messages as @timkgh reported. I'm not sure if this is related to the issue above.

dmesg-omnia.zip

Turris support can't help me. If you have an idea what could be wrong and how to fix it, I will gladly test that.

greearb commented 4 years ago

Please try the latest openwrt and see if that works better?

ValdikSS commented 4 years ago

@greearb Right now the router runs Turris OS, a fork of OpenWRT 15.05, and not physically near me so I can't install stock OpenWRT on it. Turris OS gets updates, they update 4.4 LTS kernel and firmware version from time to time.

greearb commented 4 years ago

I don't have the time to debug old forks of openwrt, so you can hope that Turris fixes it for you, or try some other AP with better support. If you reproduce ath10k-ct problems on recent OpenWRT, please open bugs.