greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 41 forks source link

FW crash - QCA 9984 -Netgear R7800 #54

Closed shelterx closed 5 years ago

shelterx commented 5 years ago

Here's a crash I got on the R7800

[ 6206.587463] ath10k_pci 0001:01:00.0: firmware crashed! (guid n/a)
[ 6206.587522] ath10k_pci 0001:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[ 6206.592551] ath10k_pci 0001:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[ 6206.606636] ath10k_pci 0001:01:00.0: firmware ver 10.4b-ct-9984-fW-012-81e1edd54 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT crc32 0391c067
[ 6206.614334] ath10k_pci 0001:01:00.0: board_file api 2 bmi_id 0:2 crc32 cf58c3bc
[ 6206.635504] ath10k_pci 0001:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1
[ 6206.644591] ath10k_pci 0001:01:00.0: firmware register dump:
[ 6206.652628] ath10k_pci 0001:01:00.0: [00]: 0x0000000A 0x000015B3 0x0099BA07 0x00975B31
[ 6206.658587] ath10k_pci 0001:01:00.0: [04]: 0x0099BA07 0x00060930 0x00000005 0x00000007
[ 6206.666303] ath10k_pci 0001:01:00.0: [08]: 0x00430738 0x004471DC 0x00000000 0x00447F2C
[ 6206.674207] ath10k_pci 0001:01:00.0: [12]: 0x00000009 0x00000000 0x009C1BCD 0x009C1BD1
[ 6206.682043] ath10k_pci 0001:01:00.0: [16]: 0x0099BA07 0x009606CA 0x009606CA 0x00000000
[ 6206.690016] ath10k_pci 0001:01:00.0: [20]: 0x4099BA07 0x004067CC 0x0044719C 0x00000000
[ 6206.697919] ath10k_pci 0001:01:00.0: [24]: 0x8099E381 0x0040682C 0x0042ED20 0xC099BA07
[ 6206.705826] ath10k_pci 0001:01:00.0: [28]: 0x809972CE 0x0040686C 0x00430738 0x004450AC
[ 6206.713730] ath10k_pci 0001:01:00.0: [32]: 0x809949B2 0x0040689C 0x00000001 0x00430738
[ 6206.721543] ath10k_pci 0001:01:00.0: [36]: 0x8098FC30 0x004068DC 0x0042ED20 0x00000000
[ 6206.729537] ath10k_pci 0001:01:00.0: [40]: 0x80963AD3 0x00406A7C 0x0042ED20 0x0098FC28
[ 6206.737441] ath10k_pci 0001:01:00.0: [44]: 0x80960E80 0x00406A9C 0x0000001F 0x00400000
[ 6206.745323] ath10k_pci 0001:01:00.0: [48]: 0x80960E51 0x00406ACC 0x00400000 0x00000000
[ 6206.753141] ath10k_pci 0001:01:00.0: [52]: 0x80960E9D 0x00406AEC 0x00000000 0x00400600
[ 6206.761123] ath10k_pci 0001:01:00.0: [56]: 0x40960024 0x00406B0C 0x00403D08 0x00403D08
[ 6206.769024] ath10k_pci 0001:01:00.0: Copy Engine register dump:
[ 6206.776940] ath10k_pci 0001:01:00.0: [00]: 0x0004a000  13  13   3   3
[ 6206.782661] ath10k_pci 0001:01:00.0: [01]: 0x0004a400  17  17 407 408
[ 6206.789337] ath10k_pci 0001:01:00.0: [02]: 0x0004a800  20  20  83  84
[ 6206.795773] ath10k_pci 0001:01:00.0: [03]: 0x0004ac00  19  19  21  19
[ 6206.802103] ath10k_pci 0001:01:00.0: [04]: 0x0004b000 575 575  36 252
[ 6206.808613] ath10k_pci 0001:01:00.0: [05]: 0x0004b400  11  11 105 107
[ 6206.815016] ath10k_pci 0001:01:00.0: [06]: 0x0004b800  14  14  14  14
[ 6206.821374] ath10k_pci 0001:01:00.0: [07]: 0x0004bc00   1   1   1   1
[ 6206.827887] ath10k_pci 0001:01:00.0: [08]: 0x0004c000   0   0 127   0
[ 6206.834291] ath10k_pci 0001:01:00.0: [09]: 0x0004c400   0   0   0   0
[ 6206.840644] ath10k_pci 0001:01:00.0: [10]: 0x0004c800   0   0   0   0
[ 6206.847162] ath10k_pci 0001:01:00.0: [11]: 0x0004cc00   0   0   0   0
[ 6206.855525] ath10k_pci 0001:01:00.0: debug log header, dbuf: 0x423818  dropped: 0
[ 6206.860934] ath10k_pci 0001:01:00.0: [0] next: 0x423800 buf: 0x419610 sz: 1500 len: 368 count: 14 free: 0
[ 6206.868488] ath10k_pci 0001:01:00.0: ath10k_pci ATH10K_DBG_BUFFER:
[ 6206.877022] ath10k: [0000]: 00603C21 10005881 00003112 0042FE90 0000000A 00000000 00603C22 10005881
[ 6206.883000] ath10k: [0008]: 00003112 0042FE90 0000000E 00000001 00603E43 1400581D 00000000 00455CD4
[ 6206.892014] ath10k: [0016]: 000F6450 00000006 00000000 00603E43 1000581B 0000F1C5 00000000 00000000
[ 6206.901043] ath10k: [0024]: 000F6450 00603E44 14006402 71103332 532818A0 0000C5F1 0042D80C 00000002
[ 6206.910071] ath10k: [0032]: 00603E44 13FC4C07 211000A1 000009B3 00000009 00447F2C 00603E44 1000587B
[ 6206.919100] ath10k: [0040]: 0042FE90 00455CD4 00000001 0042D80C 00603E45 1400587C 51100001 000F64E8
[ 6206.928129] ath10k: [0048]: 000003FC 00000007 00456004 00603E45 1000587A 0042FE90 00456004 01000000
[ 6206.937156] ath10k: [0056]: 00000002 00603E45 14006403 532818A0 C5F10000 00000002 00000000 00456004
[ 6206.946184] ath10k: [0064]: 00603E58 17FC587D 51100002 000F6450 00000001 00000006 00455CD4 00603E58
[ 6206.955213] ath10k: [0072]: 17FC4C07 711050A2 000009B3 00000000 000009B4 00000039 00603E58 17FC4C07
[ 6206.964241] ath10k: [0080]: 711057A2 000009B3 00000000 000009B4 00000001 00603E58 17FC0001 0099BA07
[ 6206.973193] ath10k: [0088]: 000015B3 000015B3 004066BC 91104569
[ 6206.982296] ath10k_pci 0001:01:00.0: ATH10K_END
[ 6206.989226] ath10k_pci 0001:01:00.0: [1] next: 0x423818 buf: 0x419020 sz: 1500 len: 0 count: 0 free: 0
[ 6206.993133] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[ 6207.013776] ath10k_pci 0001:01:00.0: removing peer, cleanup-all, deleting: peer dcda1a00 vdev: 0 addr: a0:18:28:53:f1:c5
[ 6207.013807] ath10k_pci 0001:01:00.0: removing peer, cleanup-all, deleting: peer db68a200 vdev: 0 addr: 7c:1c:4e:96:27:bf
[ 6207.023782] ath10k_pci 0001:01:00.0: removing peer, cleanup-all, deleting: peer d951f800 vdev: 0 addr: b8:27:eb:f5:57:d1
[ 6207.034718] ath10k_pci 0001:01:00.0: removing peer, cleanup-all, deleting: peer dbbbac00 vdev: 0 addr: a0:40:a0:7c:c5:84
[ 6207.127506] ieee80211 phy1: Hardware restart was requested
[ 6207.387354] ath10k_pci 0001:01:00.0: Invalid state: 3 in ath10k_htt_tx_32, warning will not be repeated.
[ 6207.387423] ------------[ cut here ]------------
[ 6207.396018] WARNING: CPU: 1 PID: 0 at /home/ftp/Archive_2/.build/openwrt-basedir/master/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x/ath10k-ct-2018-12-20-118e16da/ath10k-4.19/htt_tx.c:1250 ath10k_htt_tx_32+0xf0/0x9e0 [ath10k_core]
[ 6207.400592] Modules linked in: ath10k_pci ath10k_core ath mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_bpf xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY ts_kmp ts_fsm ts_bm nf_reject_ipv4 nf_nat_rtsp nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv4 nf_conntrack_rtsp nf_conntrack_rtcache nf_conntrack_netlink iptable_raw iptable_mangle iptable_filter ipt_ah ipt_ECN ip_tables crc_ccitt compat chaoskey fuse sch_cake act_skbedit act_mirred em_u32 cls_u32 cls_tcindex
[ 6207.471238]  cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_NPT ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_nat nf_conntrack ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables msdos ip_gre gre ifb sit tunnel4 ip_tunnel tun vfat fat cifs nls_utf8 nls_iso8859_15 nls_iso8859_1 nls_cp850 nls_cp437 nls_cp1250 sha1_generic md5 md4 usb_storage leds_gpio xhci_plat_hcd
[ 6207.542790]  xhci_pci xhci_hcd dwc3 dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug ext4 jbd2 mbcache exfat crc32c_generic
[ 6207.564959] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.90 #0
[ 6207.585836] Hardware name: Generic DT based system
[ 6207.591862] [<c030f58c>] (unwind_backtrace) from [<c030b7a4>] (show_stack+0x14/0x20)
[ 6207.596458] [<c030b7a4>] (show_stack) from [<c078f818>] (dump_stack+0x88/0x9c)
[ 6207.604352] [<c078f818>] (dump_stack) from [<c0318ec4>] (__warn+0xf0/0x11c)
[ 6207.611375] [<c0318ec4>] (__warn) from [<c0318fb0>] (warn_slowpath_null+0x20/0x28)
[ 6207.618292] [<c0318fb0>] (warn_slowpath_null) from [<bf7d8b98>] (ath10k_htt_tx_32+0xf0/0x9e0 [ath10k_core])
[ 6207.626063] [<bf7d8b98>] (ath10k_htt_tx_32 [ath10k_core]) from [<bf7bd0f8>] (ath10k_mac_op_set_bitrate_mask+0xc40/0xdac [ath10k_core])
[ 6207.635618] [<bf7bd0f8>] (ath10k_mac_op_set_bitrate_mask [ath10k_core]) from [<bf7c2f24>] (ath10k_mac_tx_push_txq+0x234/0x290 [ath10k_core])
[ 6207.647774] [<bf7c2f24>] (ath10k_mac_tx_push_txq [ath10k_core]) from [<bf7c31b4>] (ath10k_mac_op_wake_tx_queue+0x88/0x12c [ath10k_core])
[ 6207.660601] [<bf7c31b4>] (ath10k_mac_op_wake_tx_queue [ath10k_core]) from [<bf746fa4>] (ieee80211_unreserve_tid+0x658/0x718 [mac80211])
[ 6207.672881] [<bf746fa4>] (ieee80211_unreserve_tid [mac80211]) from [<bf748be0>] (ieee80211_tx_prepare_skb+0x21c/0x264 [mac80211])
[ 6207.684725] [<bf748be0>] (ieee80211_tx_prepare_skb [mac80211]) from [<bf748d34>] (ieee80211_xmit+0x10c/0x124 [mac80211])
[ 6207.696528] [<bf748d34>] (ieee80211_xmit [mac80211]) from [<bf749e34>] (__ieee80211_subif_start_xmit+0x8c8/0x978 [mac80211])
[ 6207.707466] [<bf749e34>] (__ieee80211_subif_start_xmit [mac80211]) from [<bf74a1d4>] (ieee80211_subif_start_xmit+0x2f0/0x310 [mac80211])
[ 6207.718597] [<bf74a1d4>] (ieee80211_subif_start_xmit [mac80211]) from [<c068d06c>] (dev_hard_start_xmit+0xc8/0x154)
[ 6207.730743] [<c068d06c>] (dev_hard_start_xmit) from [<c068d838>] (__dev_queue_xmit+0x630/0x7b0)
[ 6207.740892] [<c068d838>] (__dev_queue_xmit) from [<c076ef04>] (br_dev_queue_push_xmit+0x118/0x13c)
[ 6207.749567] [<c076ef04>] (br_dev_queue_push_xmit) from [<c076f04c>] (deliver_clone+0x54/0x68)
[ 6207.758596] [<c076f04c>] (deliver_clone) from [<c0770c30>] (br_handle_frame_finish+0x50c/0x55c)
[ 6207.767189] [<c0770c30>] (br_handle_frame_finish) from [<c0770eec>] (br_handle_frame+0x26c/0x2b4)
[ 6207.775699] [<c0770eec>] (br_handle_frame) from [<c06888a0>] (__netif_receive_skb_core+0x71c/0xbdc)
[ 6207.784729] [<c06888a0>] (__netif_receive_skb_core) from [<c068aa24>] (process_backlog+0xb0/0x164)
[ 6207.793587] [<c068aa24>] (process_backlog) from [<c068e620>] (net_rx_action+0x144/0x31c)
[ 6207.802610] [<c068e620>] (net_rx_action) from [<c03015c8>] (__do_softirq+0xf0/0x264)
[ 6207.810861] [<c03015c8>] (__do_softirq) from [<c031d284>] (irq_exit+0xdc/0x148)
[ 6207.818581] [<c031d284>] (irq_exit) from [<c030e7c0>] (handle_IPI+0xb4/0x1a8)
[ 6207.825607] [<c030e7c0>] (handle_IPI) from [<c03014b8>] (gic_handle_irq+0x9c/0xb8)
[ 6207.832897] [<c03014b8>] (gic_handle_irq) from [<c030c38c>] (__irq_svc+0x6c/0x90)
[ 6207.840354] Exception stack(0xdd461f80 to 0xdd461fc8)
[ 6207.847933] 1f80: 00000001 00000000 00000000 c0315300 ffffe000 c0b03c74 c0b03c28 00000000
[ 6207.852971] 1fa0: 00000000 512f04d0 00000000 00000000 dd461fc8 dd461fd0 c030884c c0308850
[ 6207.861108] 1fc0: 60000013 ffffffff
[ 6207.869269] [<c030c38c>] (__irq_svc) from [<c0308850>] (arch_cpu_idle+0x38/0x44)
[ 6207.872576] [<c0308850>] (arch_cpu_idle) from [<c034fd58>] (do_idle+0xe8/0x1bc)
[ 6207.880210] [<c034fd58>] (do_idle) from [<c03500a0>] (cpu_startup_entry+0x1c/0x20)
[ 6207.887238] [<c03500a0>] (cpu_startup_entry) from [<423017cc>] (0x423017cc)
[ 6207.894935] ---[ end trace cabfc6ffadbdca29 ]---
[ 6207.901744] ath10k_pci 0001:01:00.0: failed to transmit packet, dropping: -19
[ 6207.906609] ath10k_pci 0001:01:00.0: failed to submit frame: -19
[ 6207.913633] ath10k_pci 0001:01:00.0: failed to push frame: -19
[ 6208.392916] ath10k_pci 0001:01:00.0: failed to transmit packet, dropping: -19
[ 6208.392967] ath10k_pci 0001:01:00.0: failed to submit frame: -19
[ 6208.399130] ath10k_pci 0001:01:00.0: failed to push frame: -19
[ 6209.390495] ath10k_pci 0001:01:00.0: failed to transmit packet, dropping: -19
[ 6209.390572] ath10k_pci 0001:01:00.0: failed to submit frame: -19
[ 6209.396695] ath10k_pci 0001:01:00.0: failed to push frame: -19
[ 6210.394639] ath10k_pci 0001:01:00.0: failed to transmit packet, dropping: -19
[ 6210.394711] ath10k_pci 0001:01:00.0: failed to submit frame: -19
[ 6210.400777] ath10k_pci 0001:01:00.0: failed to push frame: -19
[ 6211.392626] ath10k_pci 0001:01:00.0: failed to transmit packet, dropping: -19
[ 6211.392713] ath10k_pci 0001:01:00.0: failed to submit frame: -19
[ 6211.398770] ath10k_pci 0001:01:00.0: failed to push frame: -19
[ 6212.395945] ath10k_pci 0001:01:00.0: failed to transmit packet, dropping: -19
[ 6212.395972] ath10k_pci 0001:01:00.0: failed to submit frame: -19
[ 6212.402117] ath10k_pci 0001:01:00.0: failed to push frame: -19
[ 6213.046556] ath10k_pci 0001:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[ 6213.046583] ath10k_pci 0001:01:00.0: msdu-desc: 2500  skid: 32
[ 6213.130448] ath10k_pci 0001:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[ 6213.131288] ath10k_pci 0001:01:00.0: wmi print 'free: 87020 iram: 26788 sram: 18240'
[ 6213.394527] ath10k_pci 0001:01:00.0: failed to transmit packet, dropping: -19
[ 6213.394617] ath10k_pci 0001:01:00.0: failed to submit frame: -19
[ 6213.400679] ath10k_pci 0001:01:00.0: failed to push frame: -19
[ 6213.505390] ath10k_pci 0001:01:00.0: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4
[ 6213.803008] ath10k_pci 0001:01:00.0: device successfully recovered
greearb commented 5 years ago

This is a crash I have seen before and I added debugging. I think I understand the problem now. For reference, the issue is that the CT firmware cleans up some schedule items on peer deletion, and then later the schedule gets 'completed'. Simplistic 'fifo' sched handling logic caused us to look at the wrong schedule object. I have enabled a search over all existing schedule items in case somehow an item is not handled in fifo manner, and another bit of code that should just re-kick the scheduler and ignore the mis-matched sched-id for the case that you hit. This bug was a regression added in previous attempts to fix some use-after-free bugs in the scheduler code.

Please try the attached firmware for 9984 to see if it works better for you. [deleted, it was invalid, see next comment]

FW stack trace: 0x0099ba07 RAM: tx_pfsched_completion_callback /home/greearb/git/digitalpath/qca-ct-3.5.3.50-9984/wlan/mac_core/src/wal/AR/tx_sched/tx_prefetch_sched.c:1470 0x4099ba07 RAM: tx_pfsched_completion_callback /home/greearb/git/digitalpath/qca-ct-3.5.3.50-9984/wlan/mac_core/src/wal/AR/tx_sched/tx_prefetch_sched.c:1470 0x8099e381 RAM: _tx_sch_sched_cmd_done /home/greearb/git/digitalpath/qca-ct-3.5.3.50-9984/wlan/mac_core/src/wal/AR/tx_sched/tx_sched_wifi_ip02.c:649 0x809972ce RAM: _tx_send_seq_trig_dsr_done /home/greearb/git/digitalpath/qca-ct-3.5.3.50-9984/wlan/mac_core/src/wal/AR/tx/wifi_ip02/ar_wal_tx_seq.c:2052 0x809949b2 RAM: _tx_send_completion_dsr_hdlr /home/greearb/git/digitalpath/qca-ct-3.5.3.50-9984/wlan/mac_core/src/wal/AR/tx/wifi_ip02/ar_wal_tx_send.c:9050 0x8098fc30 RAM: _tx_send_completion_dsr_hdlr_wrapper /home/greearb/git/digitalpath/qca-ct-3.5.3.50-9984/wlan/mac_core/src/wal/AR/tx/wifi_ip02/ar_wal_tx_send.c:1452 0x80963ad3 ROM: cmnos_intr_handle_pending_dsrs /local/mnt/workspace/CRMBuilds/CNSS.BL.3.0-00058-S-1_20150213_182825/b/cnss_proc/wlan/mac_core/src/os/common/cmnos_intrinf.c:335 0x80960e80 ROM: check_idle /local/mnt/workspace/CRMBuilds/CNSS.BL.3.0-00058-S-1_20150213_182825/b/cnss_proc/wlan/mac_core/src/os/athos/athos_main.c:2017 0x80960e51 ROM: athos_main /local/mnt/workspace/CRMBuilds/CNSS.BL.3.0-00058-S-1_20150213_182825/b/cnss_proc/wlan/mac_core/src/os/athos/athos_main.c:1998 0x80960e9d ROM: main /local/mnt/workspace/CRMBuilds/CNSS.BL.3.0-00058-S-1_20150213_182825/b/cnss_proc/wlan/mac_core/src/os/athos/athos_main.c:2051 0x40960024 ROM: _stext /local/mnt/workspace/CRMBuilds/CNSS.BL.3.0-00058-S-1_20150213_182825/b/cnss_proc/wlan/mac_core/src/os/athos/xtos/crt1-tiny.S:90

greearb commented 5 years ago

Sorry, previous binary attachment was not correct, please test this one instead. firmware-5-full-community.bin.gz

shelterx commented 5 years ago

Will try and report back. (It doesn't happen very often and seems to be more frequent depending on what I stream so it might take 2-3 days).

shelterx commented 5 years ago

No good at all. This FW crashes constantly, see provided file for more log output. kernellog.txt

[1435.435997] ath10k_pci 0001:01:00.0: firmware crashed! (guid n/a) [ 1435.436096] ath10k_pci 0001:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe [ 1435.441165] ath10k_pci 0001:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0 [ 1435.455569] ath10k_pci 0001:01:00.0: firmware ver 10.4b-ct-9984-fW-012-bb3d19701 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT crc32 1279b325 [ 1435.462901] ath10k_pci 0001:01:00.0: board_file api 2 bmi_id 0:2 crc32 cf58c3bc [ 1435.483979] ath10k_pci 0001:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1 [ 1435.493129] ath10k_pci 0001:01:00.0: firmware register dump: [ 1435.501159] ath10k_pci 0001:01:00.0: [00]: 0x0000000A 0x00000000 0x0099B9FE 0x00000000 [ 1435.507062] ath10k_pci 0001:01:00.0: [04]: 0x00000000 0x00060024 0x00000000 0x00000000 [ 1435.514787] ath10k_pci 0001:01:00.0: [08]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.522688] ath10k_pci 0001:01:00.0: [12]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.530588] ath10k_pci 0001:01:00.0: [16]: 0x00985E47 0x009606CA 0x009606CA 0x0099B9FE [ 1435.538487] ath10k_pci 0001:01:00.0: [20]: 0x00000000 0x00401C10 0x00000000 0x00000000 [ 1435.546384] ath10k_pci 0001:01:00.0: [24]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.554284] ath10k_pci 0001:01:00.0: [28]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.562184] ath10k_pci 0001:01:00.0: [32]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.570085] ath10k_pci 0001:01:00.0: [36]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.577995] ath10k_pci 0001:01:00.0: [40]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.585883] ath10k_pci 0001:01:00.0: [44]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.593782] ath10k_pci 0001:01:00.0: [48]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.601679] ath10k_pci 0001:01:00.0: [52]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.609580] ath10k_pci 0001:01:00.0: [56]: 0x00000000 0x00000000 0x00000000 0x00000000 [ 1435.617478] ath10k_pci 0001:01:00.0: Copy Engine register dump: [ 1435.625387] ath10k_pci 0001:01:00.0: [00]: 0x0004a000 11 11 3 3 [ 1435.631200] ath10k_pci 0001:01:00.0: [01]: 0x0004a400 31 31 421 422 [ 1435.637814] ath10k_pci 0001:01:00.0: [02]: 0x0004a800 62 62 61 62 [ 1435.644221] ath10k_pci 0001:01:00.0: [03]: 0x0004ac00 0 0 2 0 [ 1435.650643] ath10k_pci 0001:01:00.0: [04]: 0x0004b000 453 453 40 0 [ 1435.657066] ath10k_pci 0001:01:00.0: [05]: 0x0004b400 23 23 118 119 [ 1435.663490] ath10k_pci 0001:01:00.0: [06]: 0x0004b800 22 22 22 22 [ 1435.669913] ath10k_pci 0001:01:00.0: [07]: 0x0004bc00 1 1 1 1 [ 1435.676337] ath10k_pci 0001:01:00.0: [08]: 0x0004c000 0 0 127 0 [ 1435.682761] ath10k_pci 0001:01:00.0: [09]: 0x0004c400 0 0 0 0 [ 1435.689184] ath10k_pci 0001:01:00.0: [10]: 0x0004c800 0 0 0 0 [ 1435.695607] ath10k_pci 0001:01:00.0: [11]: 0x0004cc00 0 0 0 0 [ 1435.704055] ath10k_pci 0001:01:00.0: debug log header, dbuf: 0x422fb8 dropped: 0 [ 1435.709472] ath10k_pci 0001:01:00.0: [0] next: 0x422fa0 buf: 0x4195d0 sz: 1500 len: 28 count: 1 free: 0

greearb commented 5 years ago

Sorry about that, I had a logic flaw in the last patch. Please try this one instead. And, please run with debug-level of 0xc0000020 and send me 'dmesg' output after the system has been running for a bit even if it doesn't crash or have obvious issues.

firmware-5-full-community.bin.gz

shelterx commented 5 years ago

Wifi went dead with that firmware, devices got connected but no internet, couöldn't ping them either. dmesg.txt

greearb commented 5 years ago

On 1/3/19 3:01 PM, shelterx wrote:

Wifi went dead with that firmware, devices got connected but no internet, couöldn't ping them either. dmesg.txt https://github.com/greearb/ath10k-ct/files/2725550/dmesg.txt

Seems some sort of bad interaction with powersave. Can you get another log sooner after startup where dmesg still shows at least some of the initial bootup text? I am hoping to better understand how it gets to this broken state.

Thanks, Ben

-- Ben Greear greearb@candelatech.com Candela Technologies Inc http://www.candelatech.com

greearb commented 5 years ago

Here is another image. It will likely assert early in your test case, but hopefully the resulting logs will let me better understand the problem. Can you also let me know the device(s) that connect to your AP? Maybe we can reproduce the issue locally.

firmware-5-full-community.bin.gz

shelterx commented 5 years ago

The dmesg buffer gets filled so quickly, tried to pipe it to a file but it got empty. But here's some debug info together with a crash. Connected devices are usually iPhone 8 Plus, AppleTV 4k, ChromeCast Ultra, iPad Air and Raspberry Pi 2. dmesg.txt

shelterx commented 5 years ago

Here's another log right after start, no crash but no working wifi. First part is from logread, it's continued in the 2_dmesg.txt file. 1_logread.txt.txt 2_dmesg.txt

greearb commented 5 years ago

I backed out part of the code that originally triggered these issues. This probably means there is still a use-after-free bug in the code, but probably it is quite rare, and maybe I can find some other way to work around that does have the tx-stall and related issues. Please try the attached firmware: [edit, snip] Here is a proper image, previous one was missing the intended change.

firmware-5-full-community.bin.gz

shelterx commented 5 years ago

Wifi is dead with that image. Connects but nothing works.

greearb commented 5 years ago

Please post dmesg so I can double-check it is expected version etc. I'll go back and back out more of the previous troublesome commit later today.

shelterx commented 5 years ago

Can't test right now but it's the version you posted above, i'm 99% sure of it.

shelterx commented 5 years ago

Here it is. dmesg-2019-01-08.txt

greearb commented 5 years ago

I found yet another logic bug in the code in question. I am going to try to fix that and test with a co-workers iphone to see if we can verify at least basic functionality...hopefully will have something worth testing tomorrow.

greearb commented 5 years ago

Ok, here is another attempt. It works with my android phone, at least. firmware-5-full-community.bin.gz

shelterx commented 5 years ago

Nope, no go with firmware ver 10.4b-ct-9984-fW-012-51585cf99 api 5 Devices shows as connected both in OpenWRT and the device itself. The iPad Air loaded a page in Safari then every connection went dead and can't connect anywhere. I also noticed that the AppleTV connects at lower rates than the official firmware-5.bin_10.4-3.9.0.1-00008. The official firmware is actually flawless for me.

greearb commented 5 years ago

Sorry, I wish I could reproduce it. Here is another build...this disables the 'reorder' logic in the sched callback....maybe that was the problem. firmware-5-full-community.bin.gz

Lu-Fi commented 5 years ago

@greearb I also had daylie crashes on my Archer C5, when having much Traffic and Clients. Now i Installed a Backup Router (Archer C7) with Openwrt latest Stable and moved all Clients there. While only 1 Client is using the C5 with current GIT build, the router did not crash within the last 5 Days. Normally there are ~20 Systems (4PC, 16IOT) Connected.

shelterx commented 5 years ago

Disabling the reoder logic seems to have fixed it. Tested Apple TV, iPhone, iPad, LGwebOSTV and Raspberry Pi. Now we'll have to wait and see how it does in the long run. I hope you can get the real cause of the bug fixed tho'.

greearb commented 5 years ago

The upstream code completely ignored the reordering. That seemed wrong to me, but maybe it works well enough anyway. Possibly I only need to pay attention to reordering in very certain cases. Let me know if you see more crashes or problems.

shelterx commented 5 years ago

Yes, I ran the AppleTV for a while today which caused issues before. I also played around with the iPad, no crashes or issues yet.

shelterx commented 5 years ago

I'd say it works now. Haven't seen the CT firmware this stable on my R7800 before. All connected devices TX/RX rates looks normal too, I haven't done any benchmarking at all so I can't really say anything about that, but normal internet bandwidth measurements are all good.

shelterx commented 5 years ago

Just crashed again. dmesg-2019-01-13.txt

shelterx commented 5 years ago

Again dmesg-2019-01-13-2.txt

greearb commented 5 years ago

Those crashes are the same as for bug 58 it seems. Please try this image, it has more debugging to help track down this issue. firmware-5-full-community.bin.gz

shelterx commented 5 years ago

No crash yet with the above image, oddly enough.

greearb commented 5 years ago

Closing this bug, will track the rate-ctrl crash in bug 58.