openwrt / openwrt

This repository is a mirror of https://git.openwrt.org/openwrt/openwrt.git It is for reference only and is not active for check-ins. We will continue to accept Pull Requests here. They will be merged via staging trees then into openwrt.git.
Other
20.16k stars 10.46k forks source link

FS#2396 - Clients become isolated via hairpin_mode=0 after DFS radar detection #6110

Open openwrt-bot opened 5 years ago

openwrt-bot commented 5 years ago

petersv:

Summary: After a while clients become isolated, even though "isolate clients" is not enabled. The DFS radar deteion somehow leads to hairpin_mode being set to 0.

Devices: Tested on several TP-Link Archer C7 rev 5 devices. All exhibit the same behavior.

Build: Running trunk e05310b9 from 2019-07-12. Configured with wpad-basic.

Configuration: The devices are configured with the ath10k 5GHz radio on frequencies requireing DFS. "Isolate clients" is disabled in LuCI. There are two SSIDs on each radio brdiging into br-lan and br-guest.

Steps to reproduce: At startup the device works and all clients can see each other. The file /sys/devices/virtual/net/br-lan/lower_wlan0/brport/hairpin_mode contains "1" as expected.

Every now and then (normally several days) there is a "DFS-RADAR-DETECTED" event in the log. The hostapd immediately starts to move to a new frequency as expected. One minute later the interface comes back up and clients reassociate, again as expected.

At this point clients are no longer able to reach each other. The effect is as if the "isolate clients" was enabled. Looking at the /sys/.../brport/hairpin_mode file of the radio interface it now contains "0".

The log around that time is: un Jul 21 20:02:43 2019 daemon.notice hostapd: wlan0: DFS-RADAR-DETECTED freq=5520 ht_enabled=0 chan_offset=0 chan_width=3 cf1=5530 cf2=0 Sun Jul 21 20:02:43 2019 daemon.notice hostapd: wlan0: DFS-NEW-CHANNEL freq=5260 chan=52 sec_chan=1 Sun Jul 21 20:02:43 2019 daemon.warn hostapd: DFS failed to schedule CSA (-22) - trying fallback Sun Jul 21 20:02:43 2019 daemon.notice hostapd: wlan0: AP-DISABLED Sun Jul 21 20:02:43 2019 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 58:c5:cb:cd:95:5d Sun Jul 21 20:02:43 2019 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED c8:f7:33:d6:b7:5d Sun Jul 21 20:02:43 2019 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED d8:9c:67:75:4a:c5 Sun Jul 21 20:02:43 2019 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 94:b8:6d:04:cc:8e Sun Jul 21 20:02:43 2019 daemon.notice netifd: Network device 'wlan0-1' link is down Sun Jul 21 20:02:43 2019 kern.info kernel: [193131.259734] br-guest: port 3(wlan0-1) entered disabled state Sun Jul 21 20:02:43 2019 kern.info kernel: [193131.268551] device wlan0-1 left promiscuous mode Sun Jul 21 20:02:43 2019 kern.info kernel: [193131.273453] br-guest: port 3(wlan0-1) entered disabled state Sun Jul 21 20:02:43 2019 daemon.notice hostapd: nl80211: Failed to remove interface wlan0-1 from bridge br-guest: No such device Sun Jul 21 20:02:43 2019 daemon.notice hostapd: nl80211: deinit ifname=wlan0 disabled_11b_rates=0 Sun Jul 21 20:02:43 2019 kern.info kernel: [193131.354845] device wlan0 left promiscuous mode Sun Jul 21 20:02:43 2019 kern.info kernel: [193131.359702] br-lan: port 2(wlan0) entered disabled state Sun Jul 21 20:02:43 2019 daemon.notice netifd: Network device 'wlan0' link is down Sun Jul 21 20:02:43 2019 daemon.notice hostapd: wlan0: interface state ENABLED->DISABLED Sun Jul 21 20:02:44 2019 kern.warn kernel: [193132.555691] ath10k_pci 0000:00:00.0: 10.1 wmi init: vdevs: 16 peers: 127 tid: 256 Sun Jul 21 20:02:44 2019 kern.info kernel: [193132.573394] ath10k_pci 0000:00:00.0: wmi print 'P 128 V 8 T 410' Sun Jul 21 20:02:44 2019 kern.info kernel: [193132.579989] ath10k_pci 0000:00:00.0: wmi print 'msdu-desc: 1424 sw-crypt: 0 ct-sta: 0' Sun Jul 21 20:02:44 2019 kern.info kernel: [193132.588382] ath10k_pci 0000:00:00.0: wmi print 'alloc rem: 24632 iram: 26408' Sun Jul 21 20:02:44 2019 kern.warn kernel: [193132.657322] ath10k_pci 0000:00:00.0: pdev param 0 not supported by firmware Sun Jul 21 20:02:44 2019 kern.info kernel: [193132.673165] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready Sun Jul 21 20:02:44 2019 kern.info kernel: [193132.685679] br-lan: port 2(wlan0) entered blocking state Sun Jul 21 20:02:44 2019 kern.info kernel: [193132.691285] br-lan: port 2(wlan0) entered disabled state Sun Jul 21 20:02:44 2019 kern.info kernel: [193132.697181] device wlan0 entered promiscuous mode Sun Jul 21 20:02:44 2019 daemon.notice hostapd: wlan0: interface state DISABLED->COUNTRY_UPDATE Sun Jul 21 20:02:44 2019 daemon.notice hostapd: wlan0: interface state COUNTRY_UPDATE->HT_SCAN Sun Jul 21 20:02:44 2019 daemon.notice hostapd: wlan0: interface state HT_SCAN->DFS Sun Jul 21 20:02:44 2019 daemon.notice hostapd: wlan0: DFS-CAC-START freq=5260 chan=52 sec_chan=1, width=1, seg0=58, seg1=0, cac_time=60s Sun Jul 21 20:03:46 2019 daemon.notice hostapd: wlan0: DFS-CAC-COMPLETED success=1 freq=5260 ht_enabled=0 chan_offset=0 chan_width=3 cf1=5290 cf2=0 Sun Jul 21 20:03:46 2019 daemon.err hostapd: Using interface wlan0 with hwaddr b0:be:76:f4:c2:ad and ssid "MP 5GHz" Sun Jul 21 20:03:46 2019 kern.info kernel: [193194.981050] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready Sun Jul 21 20:03:46 2019 kern.info kernel: [193194.988086] br-lan: port 2(wlan0) entered blocking state Sun Jul 21 20:03:46 2019 kern.info kernel: [193194.993690] br-lan: port 2(wlan0) entered forwarding state Sun Jul 21 20:03:46 2019 daemon.notice netifd: Network device 'wlan0' link is up Sun Jul 21 20:03:46 2019 kern.info kernel: [193195.005990] br-guest: port 3(wlan0-1) entered blocking state Sun Jul 21 20:03:46 2019 kern.info kernel: [193195.012050] br-guest: port 3(wlan0-1) entered disabled state Sun Jul 21 20:03:46 2019 kern.info kernel: [193195.018539] device wlan0-1 entered promiscuous mode Sun Jul 21 20:03:46 2019 kern.info kernel: [193195.040940] IPv6: ADDRCONF(NETDEV_UP): wlan0-1: link is not ready Sun Jul 21 20:03:46 2019 kern.info kernel: [193195.047485] br-guest: port 3(wlan0-1) entered blocking state Sun Jul 21 20:03:46 2019 kern.info kernel: [193195.053432] br-guest: port 3(wlan0-1) entered forwarding state Sun Jul 21 20:03:46 2019 daemon.err hostapd: Using interface wlan0-1 with hwaddr b2:be:76:f4:c2:ad and ssid "MPGuest 5GHz" Sun Jul 21 20:03:46 2019 daemon.notice hostapd: wlan0: interface state DFS->ENABLED Sun Jul 21 20:03:46 2019 daemon.notice hostapd: wlan0: AP-ENABLED Sun Jul 21 20:03:46 2019 kern.info kernel: [193195.110775] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0-1: link becomes ready

Iniital analysis: I found that of the programs compiled in my build only netifd and hostapd writes to the hairpin_mode file. I instrumented those two programs to log when they write to hairpin mode. At startup only netifd writes a "1" to the hairpin_mode file for the bridge port.

No writes are made to the hairpin_mode file at the time of failure. Since it takes several days for the DFS radar detection to fire I have not yet checked if e.g. hostapd or netifd removes and re-adds the interfaces to the bridge without restoring the hairpin_mode. Maybe someone knows?

From looking at the logs and the code I am not sure if hostapd, netifd or both thinks they are responsible for managing the bridging of the interfaces.

openwrt-bot commented 5 years ago

petersv:

I have enabled more debug output from hostapd and netifd to see what actions they are taking after a DFS detection event. I am waiting for the next spurious DFS event.

openwrt-bot commented 5 years ago

NerdyProjects:

I had hairpin_mode being set to 0 sometimes on my Archer C5 v1 some months ago (running 18.06). Although I don't know (it seemed very random to me) if it was the same cause, it seems very likely to me. So see this as a "there is likely another person affected by this".

Thanks for looking into it!

openwrt-bot commented 4 years ago

petersv:

I finally had some spurious DFS detections again. I am still working my way through the interaction between hostapd and netifd.

It seems that system_bridge_addif() in netifd gets triggered when the wifi interface are originally brought up, and this sets hairpin_mode=1. However, this does not happen after the DFS event.

In both cases the actual addtion of the interface to the bridge seems to be done by hostapd in i802_check_bridge(). This does not modify the hairpin_mode bit.

I do not yet understand what path leads to system_bridge_addif() in the first place. Is that triggered by the hostapd actions or by the wifi interface being enabled or something else?

An easy workaround would be to set the hairpin mode from hostapd. But I feel there is something unclear about the interaction between hostapd and netifd.

openwrt-bot commented 4 years ago

janhoffmann:

This issue also occured for me with OpenWrt 19.07.2 running on a FRITZ!Box 4040.

It looks like the issue can also be easily reproduced by manually switching the channel via ubus without reloading the configuration (a non-DFS channel needs to be specified for this):

ubus call hostapd.wlan1 switch_chan '{"freq": 5180}'

Afterwards, //hairpin_mode// and //multicast_to_unicast// are both 0 on the bridge port for the wireless interface.

openwrt-bot commented 3 years ago

barath:

How can we get this marked confirmed?

Running 19.07.7, r11306-c4a6851c72.

For SEO/other fellow travelers: I'm pretty sure this is why my raspberry pi sometimes loses contact to other clients on the same Wifi network, but not to the internet or the router. ARP requests aren't fowarded. When this happens, the OpenWRT log shows similar DFS events as described in the OP and setting hairpin_mode to 1 manually fixes the problem.

openwrt-bot commented 3 years ago

skemper:

Hi all,

I have the same issue here on dlink dir-825-c1 with 21.02-rc4. I saw in your logs that you have ath10h. For me 5GHz device is using ath9k.

I never knew about this issue before I installed 21.02-rc4. I was using 19.07 all the time. But with 21.02-rc4 I experience these connection issues quite often. As far as I've seen it occasionally happens after DFS radar detection. I cannot remember seeing DFS radar detection events in logs, before coming to 21.02-rc4. That doesn't mean there weren't any. With 21.02-rc4 I see quite a few, though.

I even found a way to trigger DFS radar being detected. I have a mobile that when I move away from AP and it gets disconnected it sporadically triggers DFS radar detection on AP. No clue why. I played with that on Saturday and I saw DFS radar detection in log. And the AP did switch channels. But afterward the AP was still working fine.

Sat Aug 28 17:44:54 2021 daemon.notice hostapd: wlan1: DFS-RADAR-DETECTED freq=5580 ht_enabled=1 chan_offset=1 chan_width=2 cf1=5590 cf2=0 Sat Aug 28 17:44:54 2021 daemon.notice hostapd: wlan1: DFS-NEW-CHANNEL freq=5180 chan=36 sec_chan=1 Sat Aug 28 17:44:54 2021 daemon.info hostapd: wlan1: IEEE 802.11 driver starting channel switch: freq=5180, ht=1, vht_ch=0x0, offset=1, width=2 (40 MHz), cf1=5190, cf2=0 Sat Aug 28 17:44:54 2021 daemon.notice hostapd: wlan1: CTRL-EVENT-STARTED-CHANNEL-SWITCH freq=5180 ht_enabled=1 ch_offset=1 ch_width=40 MHz cf1=5190 cf2=0 dfs=0 Sat Aug 28 17:44:54 2021 daemon.info hostapd: wlan1-1: IEEE 802.11 driver starting channel switch: freq=5180, ht=1, vht_ch=0x0, offset=1, width=2 (40 MHz), cf1=5190, cf2=0 Sat Aug 28 17:44:54 2021 daemon.notice hostapd: wlan1-1: CTRL-EVENT-STARTED-CHANNEL-SWITCH freq=5180 ht_enabled=1 ch_offset=1 ch_width=40 MHz cf1=5190 cf2=0 dfs=0 Sat Aug 28 17:44:55 2021 daemon.info hostapd: wlan1-1: IEEE 802.11 driver had channel switch: freq=5180, ht=1, vht_ch=0x0, offset=1, width=2 (40 MHz), cf1=5190, cf2=0 Sat Aug 28 17:44:55 2021 daemon.notice hostapd: wlan1-1: CTRL-EVENT-CHANNEL-SWITCH freq=5180 ht_enabled=1 ch_offset=1 ch_width=40 MHz cf1=5190 cf2=0 dfs=0 Sat Aug 28 17:44:55 2021 daemon.notice hostapd: wlan1-1: AP-CSA-FINISHED freq=5180 dfs=0 Sat Aug 28 17:44:55 2021 daemon.info hostapd: wlan1: IEEE 802.11 driver had channel switch: freq=5180, ht=1, vht_ch=0x0, offset=1, width=2 (40 MHz), cf1=5190, cf2=0 Sat Aug 28 17:44:55 2021 daemon.notice hostapd: wlan1: CTRL-EVENT-CHANNEL-SWITCH freq=5180 ht_enabled=1 ch_offset=1 ch_width=40 MHz cf1=5190 cf2=0 dfs=0 Sat Aug 28 17:44:55 2021 daemon.notice hostapd: wlan1: AP-CSA-FINISHED freq=5180 dfs=0

root@hank2:/sys# find -name hairpin_mode ./devices/pci0000:00/0000:00:00.0/net/wlan1/brport/hairpin_mode ./devices/pci0000:00/0000:00:00.0/net/wlan1-1/brport/hairpin_mode ./devices/platform/ahb/18100000.wmac/net/wlan0-1/brport/hairpin_mode ./devices/platform/ahb/18100000.wmac/net/wlan0/brport/hairpin_mode ./devices/virtual/net/eth0.1/brport/hairpin_mode root@hank2:/sys# find -name hairpin_mode -exec cat {} \; 1 1 1 1 0 root@hank2:/sys#

But this morning I couldn't connect to a PC in LAN anymore (ssh). Both PC and laptop connected to same 5G wifi ssid. I had seen this before and ARP requests from laptop were received on AP but never answered. So I go and check the logs:

Mon Aug 30 01:29:10 2021 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED so:me:ma:ca:dd:re

Provider CPE reboots:

Mon Aug 30 03:19:30 2021 kern.info kernel: [18411.098723] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 5 is down Mon Aug 30 03:19:38 2021 kern.info kernel: [18419.290310] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 5 is up Mon Aug 30 03:19:40 2021 kern.info kernel: [18421.338546] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 5 is down Mon Aug 30 03:19:42 2021 kern.info kernel: [18423.386273] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 5 is up Mon Aug 30 03:20:15 2021 kern.info kernel: [18456.154035] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 5 is down Mon Aug 30 03:20:18 2021 kern.info kernel: [18459.225722] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 5 is up Mon Aug 30 03:20:43 2021 kern.info kernel: [18483.801657] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 5 is down Mon Aug 30 03:20:45 2021 kern.info kernel: [18485.849374] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 5 is up

A few minutes later radar gets detected. I doubt there were any active wifi connections to the AP at this time of night at all, otherwise there would've been AP-STA-DISCONNECTED messages in the log.

Mon Aug 30 03:22:47 2021 daemon.notice hostapd: wlan1: DFS-RADAR-DETECTED freq=5580 ht_enabled=1 chan_offset=1 chan_width=2 cf1=5590 cf2=0 Mon Aug 30 03:22:47 2021 daemon.notice hostapd: wlan1: DFS-NEW-CHANNEL freq=5620 chan=124 sec_chan=1 Mon Aug 30 03:22:47 2021 daemon.warn hostapd: DFS failed to schedule CSA (-22) - trying fallback Mon Aug 30 03:22:47 2021 daemon.notice hostapd: wlan1: AP-DISABLED Mon Aug 30 03:22:47 2021 kern.info kernel: [18608.416268] br-guest: port 2(wlan1-1) entered disabled state Mon Aug 30 03:22:47 2021 daemon.notice netifd: Network device 'wlan1-1' link is down Mon Aug 30 03:22:47 2021 kern.info kernel: [18608.429875] device wlan1-1 left promiscuous mode Mon Aug 30 03:22:47 2021 kern.info kernel: [18608.436128] br-guest: port 2(wlan1-1) entered disabled state Mon Aug 30 03:22:47 2021 daemon.notice hostapd: nl80211: deinit ifname=wlan1 disabled_11b_rates=0 Mon Aug 30 03:22:47 2021 kern.info kernel: [18608.451785] device wlan1 left promiscuous mode Mon Aug 30 03:22:47 2021 kern.info kernel: [18608.456490] br-lan: port 2(wlan1) entered disabled state Mon Aug 30 03:22:47 2021 daemon.notice netifd: Network device 'wlan1' link is down Mon Aug 30 03:22:47 2021 daemon.notice hostapd: wlan1: interface state ENABLED->DISABLED Mon Aug 30 03:22:48 2021 kern.info kernel: [18608.533860] br-lan: port 2(wlan1) entered blocking state Mon Aug 30 03:22:48 2021 kern.info kernel: [18608.539352] br-lan: port 2(wlan1) entered disabled state Mon Aug 30 03:22:48 2021 kern.info kernel: [18608.545172] device wlan1 entered promiscuous mode Mon Aug 30 03:22:48 2021 kern.info kernel: [18608.550183] br-lan: port 2(wlan1) entered blocking state Mon Aug 30 03:22:48 2021 kern.info kernel: [18608.555629] br-lan: port 2(wlan1) entered forwarding state Mon Aug 30 03:22:48 2021 daemon.notice hostapd: wlan1: interface state DISABLED->COUNTRY_UPDATE Mon Aug 30 03:22:48 2021 daemon.notice hostapd: wlan1: interface state COUNTRY_UPDATE->HT_SCAN Mon Aug 30 03:22:48 2021 daemon.notice hostapd: wlan1: DFS-RADAR-DETECTED freq=5580 ht_enabled=1 chan_offset=1 chan_width=2 cf1=5590 cf2=0 Mon Aug 30 03:22:48 2021 daemon.notice hostapd: wlan1: DFS-RADAR-DETECTED freq=5580 ht_enabled=1 chan_offset=1 chan_width=2 cf1=5590 cf2=0 Mon Aug 30 03:22:48 2021 daemon.notice hostapd: wlan1: DFS-RADAR-DETECTED freq=5580 ht_enabled=1 chan_offset=1 chan_width=2 cf1=5590 cf2=0 Mon Aug 30 03:22:48 2021 daemon.notice hostapd: wlan1: interface state HT_SCAN->DFS Mon Aug 30 03:22:48 2021 daemon.notice hostapd: wlan1: DFS-CAC-START freq=5620 chan=124 sec_chan=1, width=0, seg0=0, seg1=0, cac_time=60s Mon Aug 30 03:22:48 2021 kern.info kernel: [18609.462771] br-lan: port 2(wlan1) entered disabled state Mon Aug 30 03:23:49 2021 daemon.notice hostapd: wlan1: DFS-CAC-COMPLETED success=1 freq=5620 ht_enabled=1 chan_offset=1 chan_width=2 cf1=5630 cf2=0 Mon Aug 30 03:23:49 2021 kern.info kernel: [18670.346946] IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready Mon Aug 30 03:23:49 2021 kern.info kernel: [18670.353723] br-lan: port 2(wlan1) entered blocking state Mon Aug 30 03:23:49 2021 kern.info kernel: [18670.359196] br-lan: port 2(wlan1) entered forwarding state Mon Aug 30 03:23:49 2021 daemon.notice netifd: Network device 'wlan1' link is up Mon Aug 30 03:23:49 2021 kern.info kernel: [18670.375926] br-guest: port 2(wlan1-1) entered blocking state Mon Aug 30 03:23:49 2021 kern.info kernel: [18670.381996] br-guest: port 2(wlan1-1) entered disabled state Mon Aug 30 03:23:49 2021 kern.info kernel: [18670.388189] device wlan1-1 entered promiscuous mode Mon Aug 30 03:23:49 2021 kern.info kernel: [18670.402761] br-guest: port 2(wlan1-1) entered blocking state Mon Aug 30 03:23:49 2021 kern.info kernel: [18670.408580] br-guest: port 2(wlan1-1) entered forwarding state Mon Aug 30 03:23:49 2021 kern.info kernel: [18670.433267] IPv6: ADDRCONF(NETDEV_CHANGE): wlan1-1: link becomes ready Mon Aug 30 03:23:49 2021 daemon.notice netifd: Network device 'wlan1-1' link is up Mon Aug 30 03:23:49 2021 daemon.notice hostapd: wlan1: interface state DFS->ENABLED Mon Aug 30 03:23:49 2021 daemon.notice hostapd: wlan1: AP-ENABLED Mon Aug 30 03:54:04 2021 daemon.notice hostapd: wlan1: DFS-NOP-FINISHED freq=5580 ht_enabled=0 chan_offset=0 chan_width=0 cf1=5580 cf2=0 Mon Aug 30 03:54:04 2021 daemon.notice hostapd: wlan1: DFS-NOP-FINISHED freq=5600 ht_enabled=0 chan_offset=0 chan_width=0 cf1=5600 cf2=0

So I check hairpin:

root@hank2:/sys# find -name hairpin_mode ./devices/pci0000:00/0000:00:00.0/net/wlan1/brport/hairpin_mode ./devices/pci0000:00/0000:00:00.0/net/wlan1-1/brport/hairpin_mode ./devices/platform/ahb/18100000.wmac/net/wlan0-1/brport/hairpin_mode ./devices/platform/ahb/18100000.wmac/net/wlan0/brport/hairpin_mode ./devices/virtual/net/eth0.1/brport/hairpin_mode root@hank2:/sys# find -name hairpin_mode -exec cat {} \; 0 1 1 1 0 root@hank2:/sys#

Then I did this to make the connection work again:

root@hank2:/sys# echo 1 > devices/pci0000:00/0000:00:00.0/net/wlan1/brport/hairpin_mode

In Peter's log I see the same log entry when it fails:

DFS failed to schedule CSA (-22) - trying fallback

So it looks like the issue may be in the fallback method I guess. If it's not triggered everything seems fine.

Kind regards, Seb