openwrt / openwrt

This repository is a mirror of https://git.openwrt.org/openwrt/openwrt.git It is for reference only and is not active for check-ins. We will continue to accept Pull Requests here. They will be merged via staging trees then into openwrt.git.
Other
19.87k stars 10.34k forks source link

WiFi stalls for a few minutes after the 21.02.2 update #9455

Open timkgh opened 2 years ago

timkgh commented 2 years ago

Device: Netgear R7800 (IPQ8065, QCA9984) WiFi driver + firmware: mainline (not CT) Band: 5 GHz

When some devices such as phones leave the network, other devices left on the wifi network experience high latency and heavy packet loss for a few minutes.

This started happening after the update to 21.02.2, it was very stable before.

Others are reporting the same issue, see this thread: https://forum.openwrt.org/t/ipq806x-nss-drivers/12613/2557

Workaround that appears to help:

echo 0 > /sys/kernel/debug/ieee80211/phy0/aql_enable
echo 0 > /sys/kernel/debug/ieee80211/phy1/aql_enable

Potentially related: https://lists.infradead.org/pipermail/ath10k/2022-February/013341.html

I suspect it is some of the backports in hostapd/mac80211 that were introduced between 21.02.1 and 21.02.2

Not sure whether it affects only ath10k or other devices also.

timkgh commented 2 years ago

This is what I see in the log when the issue happens:

Wed Mar  9 09:45:50 2022 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED xx:xx:xx:xx:xx:xx
Wed Mar  9 09:45:50 2022 daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: disassociated due to inactivity
Wed Mar  9 09:45:51 2022 daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.197590] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 0
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.197631] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 0
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.203782] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 0
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.211061] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 0
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.218685] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 7
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.225663] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 7
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.232951] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 7
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.240228] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 7
intersectRaven commented 2 years ago

I've experienced this as well in the Xiaomi 4a GBE. I'll try the workaround for a few days and report back on whether it helps.

intersectRaven commented 2 years ago

It still happens but a lot less frequently than without the workaround. There must be some other issue as well.

justindoherty commented 2 years ago

The workaround seems to have helped for me somewhat but I am experiencing occasional problems with my Archer C7 and RE450 mesh where either node to node or a client on another access point things stall and sometimes never recover until reconnected. I'm hoping it's related to this.

timkgh commented 2 years ago

I agree, it helps but it does not solve the problem, I just had it happen to me, after a few days of it being stable.

The only thing I can do for now is go back to 21.02.1

timkgh commented 2 years ago

21.02.1 is stable, 6 days of uptime and no wifi issues.

timkgh commented 2 years ago

@nbd168 any thoughts on this? Seems related to https://github.com/openwrt/openwrt/commit/a5888ad6b33840d913438ce664c0e7da7e7f53e6

timkgh commented 2 years ago

It seems the issue is fixed in mainline and 22.03.0-rc6. Can we please have it backported to the next 21.02.x release also?