libremesh / lime-packages

LibreMesh packages configuring OpenWrt for wireless mesh networking
https://libremesh.org/
GNU Affero General Public License v3.0
277 stars 96 forks source link

MESH-SAE-AUTH-FAILURE #837

Open rallep71 opened 3 years ago

rallep71 commented 3 years ago

I have three routers in my Lime mesh network, TP WDR 4300, TP Archer C50 V3 and V4.

All built the firmware according to the instructions. What is the error?

Thu Dec 24 11:54:04 2020 daemon.notice wpa_supplicant[2606]: wlan0-mesh: MESH-SAE-AUTH-FAILURE addr=b0:4e:26:45:63:ac Thu Dec 24 11:54:23 2020 daemon.notice wpa_supplicant[2606]: wlan0-mesh: MESH-SAE-AUTH-FAILURE addr=b0:4e:26:45:63:ac Thu Dec 24 11:54:38 2020 daemon.notice wpa_supplicant[2606]: wlan0-mesh: MESH-SAE-AUTH-FAILURE addr=b0:4e:26:45:63:ac Thu Dec 24 11:54:38 2020 daemon.notice wpa_supplicant[2606]: wlan0-mesh: MESH-SAE-AUTH-BLOCKED addr=b0:4e:26:45:63:ac duration=300

ghost commented 3 years ago

MANUAL WORKAROUND FOUND: "DISABLE, WAIT FOR 5+ MINUTES, RE-ENABLE MESH"

Aug 31 09:26:03 Disabled the mesh SSID interface via LUCI manually.

(Waited for 6 minutes, "more than 5 minutes", then re-enabled the mesh SSID interface via LUCI)

Aug 31 09:34:11 HST-WifiAP-01 hostapd: wlan0-1: AP-ENABLED
Aug 31 09:34:18 WifiAP-01 netifd: Interface 'nwi_mesh0' is now up
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: mesh plink with 68:ff:7b:0e:xx:xx established
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: MESH-PEER-CONNECTED 68:ff:7b:0e:xx:xx

The mesh link came up immediately and was use-able.

ilario commented 3 years ago

Would this be enough to do once per day?

ghost commented 3 years ago

Well, if you'd like to do a workaround script like a watchdog, then I'd suggest:

#!/bin/sh
logread -f | grep "MESH-SAE-AUTH-BLOCKED addr=.* duration=300" | while read file; do
    ifconfig wlan0 down
    sleep 330
    ifconfig wlan0 up
done
nemesifier commented 3 years ago

I observed this today again, I have two interfaces:

  • mesh0 on 2GHz
  • mesh1 on 5GHz

Mesh1 on 5GHz works, connection to the rest of the LAN works. It looks to me that the mesh0 link gets disabled automatically for inactivity:

mesh0     ESSID: "*******"
          Access Point: ***********************
          Mode: Mesh Point  Channel: 11 (2.462 GHz)
          Center Channel 1: 11 2: unknown
          Tx-Power: 20 dBm  Link Quality: 46/70
          Signal: -64 dBm  Noise: unknown
          Bit Rate: 1.0 MBit/s
          Encryption: WPA3 SAE (CCMP)
          Type: nl80211  HW Mode(s): 802.11bgn
          Hardware: 14C3:7603 14C3:7603 [MediaTek MT7603E]
          TX power offset: none
          Frequency offset: none
          Supports VAPs: yes  PHY name: phy0

Look at Bit Rate: 1.0 MBit/s. I've looked around and it seems that's the result of the rate control algorithm of the mac80211 driver when it detects inactive wifi links. So, could it be possible that in this case, the MESH_SAE_AUTH_FAILURE is just a result of the minstrel_ht rate control which tunes down this WiFi interface because it's not being used? BTW, here's iw dev mesh0 station dump:

Station *********** (on mesh0)
    inactive time:  36 ms
    rx bytes:   111090
    rx packets: 1380
    tx bytes:   3584
    tx packets: 28
    tx retries: 8
    tx failed:  0
    rx drop misc:   173
    signal:     -64 [-64, -71] dBm
    signal avg: -63 [-63, -71] dBm
    Toffset:    29762706465 us
    tx bitrate: 1.0 MBit/s
    tx duration:    54336 us
    rx duration:    0 us
    airtime weight: 256
    mesh llid:  0
    mesh plid:  0
    mesh plink: BLOCKED
    mesh airtime link metric: -1
    mesh connected to gate: no
    mesh connected to auth server:  no
    mesh local PS mode: UNKNOWN
    mesh peer PS mode:  UNKNOWN
    mesh non-peer PS mode:  ACTIVE
    authorized: no
    authenticated:  no
    associated: no
    preamble:   long
    WMM/WME:    yes
    MFP:        yes
    TDLS peer:  no
    DTIM period:    2
    beacon interval:100
    connected time: 71 seconds
    associated at [boottime]:   0.000s
    associated at:  1625990421471 ms
    current time:   1626024600677 ms

Notice: mesh plink: BLOCKED.

An update on this: I upgraded to OpenWrt 21.02 RC4 and added option cell_density '1' to the radio configuration, which should avoid the problem of minstrel HT tuning down the radio so much that the mesh would not be able to connect.

Another important update on this subject!

After the changes mentioned in my previous comments I didn't get much of these error messages anymore, however I am still having connection issues from time to time (say once a week or twice a week) and the OpenWrt log shows no traces of issues.

I digged deeper and I found out when I have connection issues the routing table of the mesh router gets filled with garbage, which causes a black hole.

So I wonder if more people like me who have been having issues may have been confused by the MESH-SAE-AUTH-FAILURE log lines and thought that's the cause of the issue they are having, while maybe the real problem is elsewhere.

I have described more in detail the issue here: https://forum.openwrt.org/t/mesh-802-11s-routing-table-gets-filled-with-garbage-21-02-rc4/104808

I am using plain 802.11s, I wonder if this bug is showing up only with the default 802.11s routing protocol or it could happen also with the protocols used by libremesh?.

If anyone reading this is having a similar issue, please read the forum post above and check your routing table with iw <interface_name> mpath dump and let us know. If this bug is affecting us also when the default mesh forwarding protocol is replaced with another protocol, we will have a lot of issues on OpenWrt 21.02 with mesh!

nemesifier commented 2 years ago

Today I had again this issue of MESH-SAE-AUTH-BLOCKED, I can confirm the issue I described in my previous commenct is a different beast.

nemesifier commented 2 years ago

I opened a bug report in the OpenWrt bug tracker, if you have any useful information (eg device/hardware info where this is happening) please add it there: https://bugs.openwrt.org/index.php?do=details&task_id=4098

CC: @dangowrt @ilario @Catfriend1 @djStolen

mickeyreg commented 2 years ago

Hi,

After half a year, I returned to the problem...

Again with my C2 but this time I wanted to set encrypted mesh on 2.4GHz interface on this device. On the start, again without success :( But after changing all the options and upgrading OpenWrt to the newes 19.07 I changed "question to Google" and I found this:

https://github.com/openwrt/mt76/issues/72

And in particular:

I had to make the following change: in /etc/modules.d/rt2800-soc change it from: rt2800soc to: rt2800soc nohwcrypt=1

The 2.4GHz interface in C2 is made of rt2880. So I think I turned off hardware encryption for this module and after this change the encrypted mesh started working. So it looks like the sae encryption is not supported or is not properly supported by hardware or driver and the solution is turn to software mode? With performance decrease?

I don't know if the mt76 problem is similar, but It looks like for some chipsets it is the problem with the driver or even with the hardware. Is it possible to make similar cahnge for the mt76 hardware?

Regards, Mickey

dangowrt commented 2 years ago

The problem with using SAE/WPA3 with rt2800 based hardware (such as MT7620 WiSoC) is the lack of Management Frame Protection (IEEE 802.11w) which is mandatory when used with SAE/WPA3. And yes, doing all crypto in software significantly lowers the performance.

Newer hardware does not have this problem, it's only MT7620 and older Ralink chips which do not support MFP.

ilario commented 1 year ago

Sorry for ignoring this issue over the last years... Do we have any conclusion, at least partial? Something like "never use wolfssl for encrypting mesh, always use openssl for that"?

mickeyreg commented 1 year ago

The problem with using SAE/WPA3 with rt2800 based hardware (such as MT7620 WiSoC) is the lack of Management Frame Protection (IEEE 802.11w) which is mandatory when used with SAE/WPA3. And yes, doing all crypto in software significantly lowers the performance.

Newer hardware does not have this problem, it's only MT7620 and older Ralink chips which do not support MFP.

Based on my recenet experience it is not only problem of the mentioned chips. I wanted to make mesh link using old TL-WA901ND v2: https://openwrt.org/toh/tp-link/tl-wa901nd. I prepered OpenWrt 19.07 with wpad-mesh-wolfssl using imagebuilder. I had to delete some packages, but as a result I had working ar71xx image.

I configured mesh without encryption and it worked fine. Then I configured SAE encryption and my mesh link stopped working. But in the log I saw _wpasupplicant: wlan0: MESH-PEER-CONNECTED suggesting that everything is working :/ Further analysis showed that pings pass, but well below 1%. So based on my previous experience I put ath9k nohwcrypt=1 to /etc/modules.d/ath9k and mesh link works fine with encryption. Twice slower than without encryption, but the link is reliable.

My questions are:

  1. Can I use nohwcrypt=1 with any chipset to make SAE encryption work?
  2. How to check if Management Frame Protection (IEEE 802.11w) is fully supported by the driver?