greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 40 forks source link

skb_over_panic kernel panic #83

Open hauke opened 5 years ago

hauke commented 5 years ago

I was running into a kernel panic in ath10k-ct 4.19 driver with OpenWrt master and hostapd 2.8.

The client (Intel 6300 running Linux 4.9) was connected to the ath9k card and did a scan when his happened., no client was connected to the ath10k card.

I used today's OpenWrt master + and updated hostapd to version 2.8 I used the sae-mixed mode and I was not able to reproduce this problem.

The following ath10k-ct FW was used: [ 15.324289] ath10k_pci 0000:02:00.0: firmware ver 10.1-ct-8x-__fW-022-883e26a8 api 2 features wmi-10.x,has-wmi-mgmt-tx,mfp,txstatus-noack,wmi-10.x-CT,ratemask-CT,txrate-CT,get-temp-CT,tx-rc-CT,cust-stats-CT,retry-gt2-CT,txrate2-CT,beacon-cb-CT crc32 7a67ce60

Sat May  4 13:43:31 2019 kern.info kernel: [ 1321.924448] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Sat May  4 13:43:31 2019 kern.info kernel: [ 1321.930185] br-lan: port 2(wlan0) entered blocking state
Sat May  4 13:43:31 2019 kern.info kernel: [ 1321.934823] br-lan: port 2(wlan0) entered forwarding state
Sat May  4 13:43:31 2019 daemon.notice hostapd: wlan0: interface state HT_SCAN->ENABLED
Sat May  4 13:43:31 2019 daemon.notice hostapd: wlan0: AP-ENABLED
Sat May  4 13:43:31 2019 daemon.notice netifd: Network device 'wlan1' link is up
Sat May  4 13:43:31 2019 daemon.notice netifd: Network device 'wlan0' link is up
Sat May  4 13:43:39 2019 daemon.info hostapd: wlan0: STA 00:24:d7:5f:d4:e4 IEEE 802.11: authenticated
Sat May  4 13:43:39 2019 daemon.info hostapd: wlan0: STA 00:24:d7:5f:d4:e4 IEEE 802.11: associated (aid 1)
Sat May  4 13:43:39 2019 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 00:24:d7:5f:d4:e4
Sat May  4 13:43:39 2019 daemon.info hostapd: wlan0: STA 00:24:d7:5f:d4:e4 RADIUS: starting accounting session B790DB21F2FA74EC
Sat May  4 13:43:39 2019 daemon.info hostapd: wlan0: STA 00:24:d7:5f:d4:e4 WPA: pairwise key handshake completed (RSN)
Sat May  4 13:43:57 2019 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 00:24:d7:5f:d4:e4
Sat May  4 13:44:01 2019 daemon.info hostapd: wlan1: STA 00:24:d7:5f:d4:e4 IEEE 802.11: authenticated
Sat May  4 13:44:01 2019 daemon.info hostapd: wlan1: STA 00:24:d7:5f:d4:e4 IEEE 802.11: associated (aid 1)
Sat May  4 13:44:01 2019 daemon.notice hostapd: wlan1: AP-STA-CONNECTED 00:24:d7:5f:d4:e4
Sat May  4 13:44:01 2019 daemon.info hostapd: wlan1: STA 00:24:d7:5f:d4:e4 RADIUS: starting accounting session E0AF7FFDC8A32053
Sat May  4 13:44:01 2019 daemon.info hostapd: wlan1: STA 00:24:d7:5f:d4:e4 WPA: pairwise key handshake completed (RSN)
Sat May  4 13:44:01 2019 daemon.info dnsmasq-dhcp[1937]: DHCPREQUEST(br-lan) 192.168.1.222 00:24:d7:5f:d4:e4
Sat May  4 13:44:01 2019 daemon.info dnsmasq-dhcp[1937]: DHCPACK(br-lan) 192.168.1.222 00:24:d7:5f:d4:e4 hauke-x201s
[ 1398.148330] skbuff: skb_over_panic: text:86820d28 len:360 put:360 head:  (null) data:  (null) tail:0x168 end:0x0 dev:<NULL>
[ 1398.158120] Kernel bug detected[#1]:
[ 1398.161599] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.114 #0
[ 1398.167687] task: 806427c0 task.stack: 80630000
[ 1398.172202] $ 0   : 00000000 00000001 0000006f 00000000
[ 1398.177424] $ 4   : 8110e2fc 8110e2fc 81112f38 8063fcc0
[ 1398.182646] $ 8   : 00010000 00000000 00000007 00000000
[ 1398.187868] $12   : 00000177 50800000 00000176 00000000
[ 1398.193090] $16   : 00000000 86e724f0 8655701c 00000408
[ 1398.198313] $20   : 0000012c 87c0dda4 00000000 00000000
[ 1398.203536] $24   : 00000002 00000000                  
[ 1398.208758] $28   : 80630000 87c0dd20 00000780 8038bed4
[ 1398.213982] Hi    : 0000000e
[ 1398.216855] Lo    : d47ae183
[ 1398.219752] epc   : 8038bed4 skb_panic+0x58/0x5c
[ 1398.224352] ra    : 8038bed4 skb_panic+0x58/0x5c
[ 1398.228951] Status: 1100ff03 KERNEL EXL IE 
[ 1398.233132] Cause : 10800024 (ExcCode 09)
[ 1398.237137] PrId  : 00019556 (MIPS 34Kc)
[ 1398.241049] Modules linked in: ath9k ath9k_common ath9k_hw ath10k_pci ath10k_core ath pppoe nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD pppox ppp_async owl_loader nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack ltq_deu_vr9 iptable_mangle iptable_filter ip_tables crc_ccitt compat drv_dsl_cpe_api drv_mei_cpe nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 pppoatm ppp_generic slhc br2684 atm drv_ifxos dwc2 gpio_button_hotplug
[ 1398.311301] Process swapper/0 (pid: 0, threadinfo=80630000, task=806427c0, tls=00000000)
[ 1398.319385] Stack : 86453560 8054b7e0 86820d28 00000168 00000168 00000000 00000000 00000168
[ 1398.327740]         00000000 805fe4e4 87c18600 8038bf64 00000010 00000010 00000000 00000000
[ 1398.336096]         81113674 86820d28 86f40000 86843580 00008000 ffffffac 00000024 80640000
[ 1398.344452]         81113674 86e78598 80640000 86e715a0 86e7259c 80640000 86e78598 00000000
[ 1398.352808]         00000004 8655701c 8655701c 00000001 86e715a0 86e725e4 84ba0248 865cd780
[ 1398.361163]         ...
[ 1398.363600] Call Trace:
[ 1398.366055] [<8038bed4>] skb_panic+0x58/0x5c
[ 1398.370326] [<8038bf64>] skb_put+0x48/0x54
[ 1398.374480] [<86820d28>] ath10k_htt_t2h_msg_handler+0x2438/0x2bcc [ath10k_core]
[ 1398.381762] Code: 00602825  0c02045c  2484fa78 <000c000d> 8c8200a4  8c880054  8c8700a0  00451023  01054021 
[ 1398.391455] 
[ 1398.393020] ---[ end trace 65975cbfff07e903 ]---

It looks like the skb returned by ath10k_htt_rx_netbuf_pop() is empty while we expect it to have at least 360 bytes of free space.

(gdb) list *(ath10k_htt_t2h_msg_handler+0x2438)
0x20d58 is in ath10k_htt_rx_handle_amsdu (/home/hauke/openwrt/openwrt/build_dir/target-mips_24kc_musl/linux-lantiq_xrx200/ath10k-ct-2019-04-08-9cd701a4/ath10k-4.19/htt_rx.c:374).
369 
370         /* FIXME: we must report msdu payload since this is what caller
371          * expects now
372          */
373         skb_put(msdu, offsetof(struct htt_rx_desc, msdu_payload));
374         skb_pull(msdu, offsetof(struct htt_rx_desc, msdu_payload));
375 
376         /*
377          * Sanity check - confirm the HW is finished filling in the
378          * rx data.
(gdb) 

This is my hostapd configuration:

driver=nl80211
logger_syslog=127
logger_syslog_level=2
logger_stdout=127
logger_stdout_level=2
hw_mode=a
beacon_int=100
channel=36

ieee80211n=1
ht_coex=0
ht_capab=[HT40+][LDPC][SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1][MAX-AMSDU-7935][DSSS_CCK-40]
vht_oper_chwidth=1
vht_oper_centr_freq_seg0_idx=42
ieee80211ac=1
vht_capab=[RXLDPC][SHORT-GI-80][TX-STBC-2BY1][RX-ANTENNA-PATTERN][TX-ANTENNA-PATTERN][RX-STBC-1][MAX-MPDU-11454][MAX-A-MPDU-LEN-EXP7]

interface=wlan0
ctrl_interface=/var/run/hostapd
ap_isolate=1
bss_load_update_period=60
chan_util_avg_period=600
disassoc_low_ack=1
preamble=1
wmm_enabled=1
ignore_broadcast_ssid=0
uapsd_advertisement_enabled=1
utf8_ssid=1
multi_ap=0
sae_require_mfp=1
wpa_passphrase=XXXXXXXX
auth_algs=1
wpa=2
wpa_pairwise=CCMP
ssid=OpenWrt5G
bridge=br-lan
wpa_disable_eapol_key_retries=0
wpa_key_mgmt=WPA-PSK WPA-PSK-SHA256 SAE
okc=0
disable_pmksa_caching=1
ieee80211w=1
group_mgmt_cipher=AES-128-CMAC
bssid=00:37:b7:36:dc:0d
ynezz commented 5 years ago

@hauke I'm probably seeing the same issue and Ben has suggested to add this patch to aid him with debugging:

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 108da3ceeff6..9ad21f3bb1d5 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -328,6 +328,12 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
         htt->rx_ring.netbufs_ring[idx] = NULL;
         ath10k_htt_reset_paddrs_ring(htt, idx);

+       if (!msdu->data) {
+               ath10k_err(ar, "htt-rx-netbuf-pop, msdu: %p has null data.  idx: %i  ring-size: %i  fill-count: %i",
+                          msdu, idx, htt->rx_ring.size_mask, htt->rx_ring.fill_cnt);
+               BUG_ON(1);
+       }
+
         idx++;
         idx &= htt->rx_ring.size_mask;
         htt->rx_ring.sw_rd_idx.msdu_payld = idx;

I've recompiled the kernel with this patch, but wasn't able to trigger it anymore so far.