greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 41 forks source link

WDS (4addr) doesn't work (wlanX.staY interface doesn't get created) #67

Closed enryIT closed 5 years ago

enryIT commented 5 years ago

Possibly related to issue #13

Description of the problem (how to configure, how to reproduce, how often it happens). Set up WDS (4addr) AP. On station connect nor wlanX.staY gets created nor added to the specified bridge in hostapd configuration.

Software (OS, Firmware version, kernel, driver, etc)

Hardware (NIC chipset, platform, etc) Netgear R7800 - IPQ8065 SoC with QCA9984

Logs (dmesg, maybe supplicant and/or hostap) No useful log in syslog nor dmesg.

Tomorrow I'll try with atheros debug on and with OWRT 18.0.6 stable with ct driver and firmware.

greearb commented 5 years ago

Can you see if ct driver and stock firmware works? I'm curious to know if bug is in the firmware or driver.

enryIT commented 5 years ago

OWRT 18.0.6, with ath10k-ct driver and firmware 10.4-3.9.0.2-00021 works. OWRT 18.0.6, with ath10k-ct driver and firmware 10.4b-ct-9984-fW-012-ddc348c02 the wlanX.staY interface gets created but then the firmware crashes.

This, I think, is the main error:

Wed Feb 6 21:34:20 2019 kern.info kernel: [ 137.871183] wlan0.sta1: HW problem - can not stop rx aggregation for 30:b5:c2:08:b9:e0 tid 0

Some other errors:

daemon.err hostapd: Failed to set beacon parameters ath10k_pci 0000:01:00.0: failed to send pdev bss chan info request

Attached full log + crash dump. log_and_dump.zip

greearb commented 5 years ago

Please see if this is a regression in my firmware and/or driver by checking if QCA firmware has the same issue.

enryIT commented 5 years ago

OWRT 18.0.6, with ath10k-ct driver and stock firmware 10.4-3.9.0.2-00021 works.

greearb commented 5 years ago

OK, then please try to bisect to find out where I introduced the bug:

A tarball of images to bisect is here: http://www.candelatech.com/downloads/ath10k-9984-10-4b/ath10k-fw-beta/all_builds-9984b-H-feb-6-2019.tar.gz

How to bisect is described at bottom of this page: http://www.candelatech.com/ath10k-bugs.php

enryIT commented 5 years ago

Found it, the last not so much working commit is number 61 firmware-version: 10.4b-ct-9984-fW-003-b1ef15e9b

61 can create the wlan0.sta1 interface but it crashes after some minutes. 62 can't create the interface at all.

commit 50 work almost perfectly but it keeps spamming the syslog with ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats and it crashes anyway, less times that 61, i would say it crashes under heavy load.

commit 50 crashlog with a clearly culrpit: cfg80211_calculate_bitrate

Fri Feb 8 23:54:57 2019 kern.warn kernel: [ 421.751204] ------------[ cut here ]------------ Fri Feb 8 23:54:57 2019 kern.warn kernel: [ 421.751336] WARNING: CPU: 0 PID: 9887 at backports-2017-11-01/net/wireless/util.c:1144 cfg80211_calculate_bitrate+0x1c8/0x234 [cfg80211] Fri Feb 8 23:54:57 2019 kern.warn kernel: [ 421.754912] invalid rate bw=2, mcs=5, nss=3 [...omitting linked modules...] Fri Feb 8 23:54:57 2019 kern.warn kernel: [ 422.056073] CPU: 0 PID: 9887 Comm: hostapd Not tainted 4.14.97 #0 Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.070522] Hardware name: Generic DT based system Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.076640] [] (unwind_backtrace) from [] (show_stack+0x14/0x20) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.081315] [] (show_stack) from [] (dump_stack+0x88/0x9c) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.089213] [] (dump_stack) from [] (warn+0xf0/0x11c) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.096234] [] (warn) from [] (warn_slowpath_fmt+0x34/0x4c) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.103149] [] (warn_slowpath_fmt) from [] (cfg80211_calculate_bitrate+0x1c8/0x234 [cfg80211]) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.110966] [] (cfg80211_calculate_bitrate [cfg80211]) from [] (nl80211_put_sta_rate+0x48/0x1f4 [cfg80211]) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.121146] [] (nl80211_put_sta_rate [cfg80211]) from [] (nl80211_start_ap+0xaa4/0x1310 [cfg80211]) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.132442] [] (nl80211_start_ap [cfg80211]) from [] (nl80211_get_station+0xb0/0x11c [cfg80211]) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.143172] [] (nl80211_get_station [cfg80211]) from [] (genl_rcv_msg+0x2ec/0x3a0) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.153876] [] (genl_rcv_msg) from [] (netlink_rcv_skb+0x94/0x110) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.162986] [] (netlink_rcv_skb) from [] (genl_rcv+0x2c/0x48) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.170883] [] (genl_rcv) from [] (netlink_unicast+0x164/0x224) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.178435] [] (netlink_unicast) from [] (netlink_sendmsg+0x334/0x390) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.185913] [] (netlink_sendmsg) from [] (sock_sendmsg+0x18/0x34) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.194242] [] (sock_sendmsg) from [] (_sys_sendmsg+0x214/0x250) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.202143] [] (___sys_sendmsg) from [] (sys_sendmsg+0x48/0x78) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.209956] [] (__sys_sendmsg) from [] (ret_fast_syscall+0x0/0x54) Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.217954] ---[ end trace 8a33807041e0ac5e ]--- Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.226135] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.230998] ath10k_pci 0001:01:00.0: SWBA overrun on vdev 0, skipped old beacon Fri Feb 8 23:54:58 2019 kern.warn kernel: [ 422.237619] ath10k_pci 0001:01:00.0: SWBA overrun on vdev 0, skipped old beacon

greearb commented 5 years ago

Thanks for the bisect. The change in 62 reorders some struct objects for better packing to save memory. And, it appears your crash is due to an assumption that the mac-addr is 32-bit aligned. Please try this image and let me know how it goes. Post the crash log if it still crashes.

firmware-5-full-community.bin.gz

enryIT commented 5 years ago

Thank you, using your attached firmware 10.4b-ct-9984-fW-012-1a8f0f760 it works flawlessly. Let me test this version for a couple days in order to check for instabilities but I think you've already resolved the issue.

djangoa commented 5 years ago

I'm experiencing a similar issue with a D-Link DAP-2695/Devolo 1750e - QCA9880 and firmware 10.1-ct-8x-__fW-022-1bbfa151 on OWRT master:

Sun Feb 10 13:15:33 2019 daemon.info hostapd: wlan0: STA 00:11:22:33:44:55 IEEE 802.11: authenticated Sun Feb 10 13:15:33 2019 daemon.info hostapd: wlan0: STA 00:11:22:33:44:55 IEEE 802.11: associated (aid 1) Sun Feb 10 13:15:33 2019 daemon.err hostapd: Failed to create interface wlan0.sta1: -122 (Not supported) Sun Feb 10 13:15:42 2019 daemon.info hostapd: wlan0: STA 00:11:22:33:44:55 IEEE 802.11: deauthenticated due to local deauth request Sun Feb 10 13:15:42 2019 daemon.err hostapd: nl80211: NL80211_ATTR_STA_VLAN (addr=00:11:22:33:44:55 ifname=wlan0 vlan_id=0) failed: -2 (No such file or directory) Sun Feb 10 13:15:42 2019 daemon.err hostapd: Failed to remove interface (ifidx=0) Sun Feb 10 13:15:42 2019 daemon.notice hostapd: wlan0: WDS-STA-INTERFACE-REMOVED ifname=wlan0.sta1 sta_addr=00:11:22:33:44:55

I've also tried compiling with stock ath10k driver and firmware and get the same errors in the log. Is the CT 10.1 firmware effected by the same issue?

greearb commented 5 years ago

The wave-1 issue looks different, probably missing support instead of buggy support. Please open a new bug, but since stock FW doesn't support it either, it is at least not a regression. I'm not adding a lot of new features to wave-1 these days unless someone wants to fund it, so not sure if or when I'll get that feature supported.

enryIT commented 5 years ago

Ok, now we have a stable experience on stable OpenWRT 18.0.6.

The same firmware 10.4b-ct-9984-fW-012-1a8f0f760 still crashes on trunk version

[ 120.827254] ------------[ cut here ]------------ [ 120.827328] WARNING: CPU: 0 PID: 3357 at /home/enrico/owrt_r7800/trunk/master/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x/ath10k-ct-2018-12-20-118e16da/ath10k-4.19/htt_rx.c:903 ath10k_htt_t2h_msg_handler+0x11c8/0x2c68 [ath10kcore] [...omitted linked modules...]_ [ 121.153565] CPU: 0 PID: 3357 Comm: hostapd Not tainted 4.14.98 #0 [ 121.175722] Hardware name: Generic DT based system [ 121.181902] [] (unwind_backtrace) from [] (show_stack+0x14/0x20) [ 121.186590] [] (show_stack) from [] (dump_stack+0x88/0x9c) [ 121.194485] [] (dump_stack) from [] (warn+0xf0/0x11c) [ 121.201510] [] (warn) from [] (warn_slowpath_null+0x20/0x28) [ 121.208392] [] (warn_slowpath_null) from [] (ath10k_htt_t2h_msg_handler+0x11c8/0x2c68 [ath10k_core]) [ 121.216145] [] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [] (ath10k_htt_t2h_msg_handler+0x13b8/0x2c68 [ath10k_core]) [ 121.227075] [] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [] (ath10k_htt_t2h_msg_handler+0x2898/0x2c68 [ath10k_core]) [ 121.239660] [] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [] (ath10k_htt_txrx_compl_task+0x710/0xc5c [ath10k_core]) [ 121.252409] [] (ath10k_htt_txrx_compl_task [ath10k_core]) from [] (ath10k_pci_napi_poll+0x7c/0x11c [ath10k_pci]) [ 121.265179] [] (ath10k_pci_napi_poll [ath10k_pci]) from [] (net_rx_action+0x144/0x31c) [ 121.277036] [] (net_rx_action) from [] (do_softirq+0xf0/0x264) [ 121.286499] [] (__do_softirq) from [] (irq_exit+0xdc/0x148) [ 121.294395] [] (irq_exit) from [] (handle_domain_irq+0xa8/0xc8) [ 121.301423] [] (handle_domain_irq) from [] (gic_handle_irq+0x6c/0xb8) [ 121.309408] [] (gic_handle_irq) from [] (irq_usr+0x50/0x80) [ 121.317566] Exception stack(0xdb113fb0 to 0xdb113ff8) [ 121.325207] 3fa0: b6f4f2c0 00000000 b6f50448 00000000 [ 121.330248] 3fc0: 00000064 b6eb2a4a 00000145 00000000 00000014 00000001 00000014 00000b1f [ 121.338406] 3fe0: 00800000 becb8f04 00000000 b6f01734 60000010 ffffffff [ 121.346633] ---[ end trace 081d8f295394393e ]--- [ 121.353198] ath10k_pci 0001:01:00.0: SWBA overrun on vdev 0, skipped old beacon [ 121.357866] ath10k_pci 0001:01:00.0: SWBA overrun on vdev 0, skipped old beacon [ 121.477496] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready [ 121.477880] br-lan: port 3(wlan0) entered blocking state [ 121.483076] br-lan: port 3(wlan0) entered listening state [ 123.511484] br-lan: port 3(wlan0) entered learning state [ 125.591228] br-lan: port 3(wlan0) entered forwarding state [ 125.591308] br-lan: topology change detected, propagating

and after the crash hostapd reports this error:

Sun Feb 10 19:05:40 2019 kern.info kernel: [ 123.511484] br-lan: port 3(wlan0) entered learning state Sun Feb 10 19:05:41 2019 daemon.info hostapd: wlan0: STA 30:b5:c2:08:b9:e0 IEEE 802.11: authenticated Sun Feb 10 19:05:41 2019 daemon.info hostapd: wlan0: STA 30:b5:c2:08:b9:e0 IEEE 802.11: associated (aid 1) Sun Feb 10 19:05:41 2019 daemon.err hostapd: Failed to create interface wlan0.sta1: -95 (Not supported) Sun Feb 10 19:05:41 2019 daemon.info hostapd: wlan0: STA 30:b5:c2:08:b9:e0 RADIUS: starting accounting session 7BCE2EC8A5348F0D Sun Feb 10 19:05:41 2019 daemon.info hostapd: wlan0: STA 30:b5:c2:08:b9:e0 WPA: pairwise key handshake completed (RSN) Sun Feb 10 19:05:43 2019 kern.info kernel: [ 125.591228] br-lan: port 3(wlan0) entered forwarding state Sun Feb 10 19:05:43 2019 kern.info kernel: [ 125.591308] br-lan: topology change detected, propagating

In this configuration the radio link on the station is up and running, the only thing missing is the wlan0.sta1 interface on AP side.

greearb commented 5 years ago

That is a driver warning, not a firmware crash. And, hostapd error is coming from userspace. So, likely not a firmware issue since it works on the different OpenWRT. You will need to bisect openwrt and/or deal with bugs in the stack to fix the problem above.

enryIT commented 5 years ago

That ath10k_htt_t2h_msg_handler trace is not a crash?

Anyway, do you have any suggestion on how to proceed? I checked openwrt issue tracker but it's really not used, no real activity there, Do you know of another platform?

greearb commented 5 years ago

comments in OpenWRT IRC indicate that this is a kernel bug and someone has a fix they are working to get upstream. Not sure if or when it will be in the OpenWRT repo. Not a FW bug now though, so closing this bug. Please ask on #openwrt-devel (freenode IRC) and maybe you can get a patch to try.