openwrt / mt76

mac80211 driver for MediaTek MT76x0e, MT76x2e, MT7603, MT7615, MT7628 and MT7688
751 stars 343 forks source link

MT7981 5GHz occasionally cannot disconnect clients that have left and causes bad performance. #922

Open victor186 opened 1 month ago

victor186 commented 1 month ago

I'm testing AX3000T on a restaurant for future network upgrade, but a've noticed poor speeds on 5GHz ramdomly, solved with radio restart, but when it occours, the network goes down due to low speed/high latency.

The AP is running on 80MHz/AX mode. Openwrt 23.05.5. Screenshot_20241019-203131_Chrome~2

Screenshot_20241020-195607_Speedtest

lukasz1992 commented 1 month ago

Is the second device also connected to the network? I see really bad signal from it, communication with such device can highly decrease performance.

victor186 commented 1 month ago

Is the second device also connected to the network? I see really bad signal from it, communication with such device can highly decrease performance.

This devices on list is in 2.4GHz

romanovj commented 3 weeks ago

Can you list your wifi clients(device models)?

victor186 commented 3 weeks ago

Can you list your wifi clients(device models)?

I can't, due this device is running as AP on a restaurant for administrative and client's Wi-Fi

romanovj commented 2 weeks ago

Looks like Qualcomm QCA9377 + windows 10 driver + 5GHz can cause this. No problems on 2.4 band.

lukasz1992 commented 2 weeks ago

Do you have driver 10.0.0.1272 for Windows installed?

victor186 commented 2 weeks ago

Looks like Qualcomm QCA9377 + windows 10 driver + 5GHz can cause this. No problems on 2.4 band.

I not understood, Wi-Fi 5GHz adapter with QCA9377 is causing 5GHz network bad performance? I don't have QCA9377 on network and the router is mediatek.

romanovj commented 2 weeks ago

@victor186

I don't have QCA9377 on network

How can you be sure?

device is running as AP on a restaurant for administrative and client's Wi-Fi

victor186 commented 2 weeks ago

@victor186

I don't have QCA9377 on network

How can you be sure?

device is running as AP on a restaurant for administrative and client's Wi-Fi

The clients only use smartphones. The unique PC on Wi-Fi is using a realtek wi-fi adapter

romanovj commented 1 week ago

@victor186

I don't have QCA9377 on network

How can you be sure?

device is running as AP on a restaurant for administrative and client's Wi-Fi

The clients only use smartphones. The unique PC on Wi-Fi is using a realtek wi-fi adapter

If QCA9377 can affect 5GHz AP on mt76+mt7915(mt7981), then maybe some other clients can do the same.

I'm not an owner of QCA9377. I just helped a user to isolate the problem on openwrt 23.05.5 mt7981 device.

@nbd168 what do you think about this?

nbd168 commented 1 week ago

One thing you could try is copy the latest MT7981 firmware from https://github.com/openwrt/mt76/tree/master/firmware to your device. If that doesn't help, trying a recent snapshot might also be a good idea.

romanovj commented 1 week ago

One thing you could try is copy the latest MT7981 firmware from https://github.com/openwrt/mt76/tree/master/firmware to your device.

Already done this, it didn't help.

If that doesn't help, trying a recent snapshot might also be a good idea.

That user didn't want to experiment with snapshot. Connecting QCA9377 to 2.4GHz AP solved issue with 5GHz AP for him.

lukasz1992 commented 1 week ago

I'd say there are too little details we could help you

IrineSistiana commented 2 days ago

Openwrt 23.05.5. H3C Magic NX30 Pro.

Same issue here. Encountered it several times

Almost zero speed (1kb/s) through 5G wifi. Enough for DHCP but anything else will be broken, even ping.

I noticed that when this happening, there are 2 dead clients (which maybe leave the wifi range at the same time) in luci wifi page. With RX Rate / TX Rate 6.0 Mbit/s, 20 MHz. If I manually click the "Disconnect" button, the wifi works again immediately.

IrineSistiana commented 2 days ago

More info

Also, when I check the log. The log keeps showing that the two offline clients were still AP-STA-POLL-OK. Started when they were out of the wifi range, till I clicked the luci "Disconnect" button.

P.S. OFFLINE:MAC:1 OFFLINE:MAC:2 are clients that went away.

Wed Nov 20 19:33:33 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:1**
Wed Nov 20 19:35:31 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:2**
Wed Nov 20 19:38:44 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:1**
Wed Nov 20 19:40:51 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:2**
Wed Nov 20 19:44:03 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:1**
...
Wed Nov 20 20:06:42 2024 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED **OFFLINE:MAC:1**
Wed Nov 20 20:06:44 2024 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED **OFFLINE:MAC:2**
Wed Nov 20 20:06:47 2024 daemon.info hostapd: phy1-ap0: STA **OFFLINE:MAC:1** IEEE 802.11: deauthenticated due to local deauth request
Wed Nov 20 20:06:49 2024 daemon.info hostapd: phy1-ap0: STA **OFFLINE:MAC:2** IEEE 802.11: deauthenticated due to local deauth request

When I restart the 5g wifi a few minutes later. Another sus log.

Wed Nov 20 20:13:06 2024 kern.warn kernel: [2135649.716364] Ignoring NSS change in VHT Operating Mode Notification from **OFFLINE:MAC:1** with invalid nss 2
Wed Nov 20 20:13:06 2024 kern.info kernel: [2143605.339316] device phy1-ap0 left promiscuous mode
Wed Nov 20 20:13:06 2024 kern.info kernel: [2143605.354371] br-lan: port 5(phy1-ap0) entered disabled state
Wed Nov 20 20:13:07 2024 daemon.notice wpa_supplicant[1538]: Set new config for phy phy1
Wed Nov 20 20:13:07 2024 daemon.notice hostapd: Set new config for phy phy1: /var/run/hostapd-phy1.conf
Wed Nov 20 20:13:07 2024 daemon.notice hostapd: Reload config for bss 'phy1-ap0' on phy 'phy1'
Wed Nov 20 20:13:07 2024 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED **AN:ONLINE:CLIENT:MAC:1**
Wed Nov 20 20:13:08 2024 daemon.notice hostapd: Reloaded settings for phy phy1
Wed Nov 20 20:13:08 2024 daemon.notice netifd: Wireless device 'radio1' is now up
Wed Nov 20 20:13:08 2024 daemon.notice netifd: Network device 'phy1-ap0' link is up
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.148600] br-lan: port 5(phy1-ap0) entered blocking state
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.154384] br-lan: port 5(phy1-ap0) entered disabled state
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.160337] device phy1-ap0 entered promiscuous mode
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.165646] br-lan: port 5(phy1-ap0) entered blocking state
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.171424] br-lan: port 5(phy1-ap0) entered forwarding state
Wed Nov 20 20:13:09 2024 daemon.info dnsmasq[1]: read /etc/hosts - 12 names
Wed Nov 20 20:13:09 2024 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg01411c - 4 names
Wed Nov 20 20:13:09 2024 daemon.info dnsmasq-dhcp[1]: read /etc/ethers - 0 addresses
...

Wireless config

cat /etc/config/wireless

config wifi-device 'radio0'
        option type 'mac80211'
        option path 'platform/18000000.wifi'
        option channel '1'
        option band '2g'
        option htmode 'HT20'
        option country 'CN'
        option cell_density '0'

config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option ssid 'ssid1'
        option encryption 'psk2+ccmp'
        option key 'WIFIPASSWD'

config wifi-device 'radio1'
        option type 'mac80211'
        option path 'platform/18000000.wifi+1'
        option channel '149'
        option band '5g'
        option htmode 'HE80'
        option country 'CN'
        option cell_density '0'
        option txpower '27'

config wifi-iface 'default_radio1'
        option device 'radio1'
        option network 'lan'
        option mode 'ap'
        option ssid 'ssid2'
        option encryption 'sae-mixed'
        option key 'WIFIPASSWD'

May related: https://github.com/openwrt/openwrt/issues/14415

IrineSistiana commented 2 days ago

I reproduced this bug. If a client leaves the WiFi coverage, there is a certain probability that the above bug will occur.

It is almost the same as this issue https://github.com/openwrt/openwrt/issues/14415 .

Log keeps showing (p.s. I added option max_inactivity '60'.)

Thu Nov 21 09:25:38 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:26:46 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:27:56 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:29:04 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:30:24 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:31:33 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:32:39 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:33:44 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:34:51 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
...
iw dev phy1-ap0 station dump

Station **WENT:AWAY:CLIENT:MAC** (on phy1-ap0)
        inactive time:  46190 ms
        rx bytes:       7315589
        rx packets:     52352
        tx bytes:       66444699
        tx packets:     69473
        tx retries:     6987
        tx failed:      7033
        rx drop misc:   2
        signal:         -95 [-97, -99] dBm
        signal avg:     -91 [-93, -95] dBm
        tx bitrate:     6.0 MBit/s
        tx duration:    83677141 us
        rx bitrate:     6.0 MBit/s
        rx duration:    4720659 us
        last ack signal:-96 dBm
        avg ack signal: -95 dBm
        airtime weight: 256
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes
        connected time: 8708 seconds
        associated at [boottime]:       2183028.795s
        associated at:  1732143676976 ms
        current time:   1732152384528 ms

p.s. Above device is a smartphone with snapdragon FastConnect 6800 (However, I do believe other clients can do the same.). It left the wifi range hour ago and kilometers away from wifi.

If I manually click the "Disconnect" button in luci, the wifi works again immediately, (no restart).

I'm using the offical unmodified Openwrt 23.05.5 image. https://github.com/openwrt/openwrt/issues/14415 seems using a fork openwrt ~with a modified driver(?)~ (I misunderstund, they enabled /sys/module/mt7915e/parameters/wed_enable.).

I did not set the wed_enable.

cat /sys/module/mt7915e/parameters/wed_enable
N
victor186 commented 1 day ago

I reproduced this bug. If a client leaves the WiFi coverage, there is a certain probability that the above bug will occur.

It is almost the same as this issue openwrt/openwrt#14415 .

Log keeps showing (p.s. I added option max_inactivity '60'.)

Thu Nov 21 09:25:38 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:26:46 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:27:56 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:29:04 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:30:24 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:31:33 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:32:39 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:33:44 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:34:51 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
...
iw dev phy1-ap0 station dump

Station **WENT:AWAY:CLIENT:MAC** (on phy1-ap0)
        inactive time:  46190 ms
        rx bytes:       7315589
        rx packets:     52352
        tx bytes:       66444699
        tx packets:     69473
        tx retries:     6987
        tx failed:      7033
        rx drop misc:   2
        signal:         -95 [-97, -99] dBm
        signal avg:     -91 [-93, -95] dBm
        tx bitrate:     6.0 MBit/s
        tx duration:    83677141 us
        rx bitrate:     6.0 MBit/s
        rx duration:    4720659 us
        last ack signal:-96 dBm
        avg ack signal: -95 dBm
        airtime weight: 256
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes
        connected time: 8708 seconds
        associated at [boottime]:       2183028.795s
        associated at:  1732143676976 ms
        current time:   1732152384528 ms

p.s. Above device is a smartphone with snapdragon FastConnect 6800 (However, I do believe other clients can do the same.). It left the wifi range hour ago and kilometers away from wifi.

If I manually click the "Disconnect" button in luci, the wifi works again immediately, (no restart).

I'm using the offical unmodified Openwrt 23.05.5 image. openwrt/openwrt#14415 seems using a fork openwrt ~with a modified driver(?)~ (I misunderstund, they enabled /sys/module/mt7915e/parameters/wed_enable.).

I did not set the wed_enable.

cat /sys/module/mt7915e/parameters/wed_enable
N

It's make sense, because the router as public Wi-Fi have client's entering and quiting the network at all time. And i noticed via luci some client's with signal -9x dBm that never disconnect's, like your example, client out of range never disapears.

rx78gp01 commented 1 day ago

You can try this patch from mtk

victor186 commented 1 day ago

You can try this patch from mtk

I don't know how to use this

IrineSistiana commented 1 day ago

Sorry. My router is a main device, It is hard for me to play with it. But I can provide log if needed.

@victor186 I feel this is a common bug, for all MT7981, but it happens occasionally, hard to reproduce and notice.

Maybe we could change the title to make it easier for more users to find?

"MT7981 5GHz occasionally cannot disconnect clients that have left and causes bad performance."

victor186 commented 1 day ago

Sorry. My router is a main device, It is hard for me to play with it. But I can provide log if needed.

@victor186 I feel this is a common bug, for all MT7981, but it happens occasionally, hard to reproduce and notice.

Maybe we could change the title to make it easier for more users to find?

"MT7981 5GHz occasionally cannot disconnect clients that have left and causes bad performance."

Done

IrineSistiana commented 1 day ago

A dirty temp fix. Tested, works for me. Do not know if there is any side effect.

Run this script every minute via cron.

It will "disconnect" all clients that have a very very low signal strength (should be the clients that have already left the wifi coverage but still buggy as "associated".).

#!/bin/sh

# threshold (dBm)
thr=-90
# add other interface name if any, "phy1-ap0 phy1-ap1 phy1-ap2"
wlanlist="phy1-ap0" 

disconnect() {
        mac=$1
        wlan=$2
        rssi=$3
        echo "disconnecting client at $wlan $mac with $rssi dBm (thr=$thr)" | logger -t disconnected-client-killer
        ubus call hostapd.$wlan del_client "{'addr':'$mac', 'reason':5, 'deauth':true, 'ban_time':1000}"
        # "ban_time" prohibits the client to reassociate for the given amount of milliseconds.
}

for wlan in $wlanlist; do
        iwinfo ${wlan} assoclist | grep SNR | while read line; do
                mac=$(echo "${line}" | awk '{ print $1 }')
                rssi=$(echo "${line}" | awk '{ print $2 }')
                if [ $rssi -lt $thr ]; then
                        disconnect $mac $wlan $rssi
                fi
        done
done