xoseperez / espurna

Home automation firmware for ESP8266-based devices
http://tinkerman.cat
GNU General Public License v3.0
2.99k stars 637 forks source link

Periodically scan WiFi networks find the best one #2064

Closed ghost closed 3 years ago

ghost commented 4 years ago

Implement a new functionality to force a wireless reconnection each X seconds.

In a wireless environment with several APs with same ssid, I enable wifiScan. However the devices stay connected to the first AP they connected.

If a new AP with better RSSI is activated, devices do not change BSSID. They never connect to this new AP.

With this feature each X seconds devices will force a reconnect and use the wifiScan algorithm to connect to the AP with best RSSI.

This scenario also happens if one of the APs is rebooted. Devices reconnect to another AP with a low stability link, but they never connect back to the best AP.

davebuk commented 4 years ago

This would be useful. I used to have the access point at the bottom of my garden reboot at night. The devices near it would connect to the weaker AP of the same ssid, and never re connect to the stronger signal. A reboot would sometimes be required as I couldn't reliably get to the webUI. Could it be added to the scheduler section?

Personally, I know I shouldn't have to, but a device reboot auto scheduled for say, every week or so would be good. Very occasionally I'll have a device that stops responding and it needs a reboot.

ghost commented 4 years ago

I was suggesting only a wifi reconnect of the "espurna" device; A "reboot" of the espurna device would trigger a power cycle if it is an S20, or some other kind of plug/relay/power functionality.

I have a similar scenario to yours. If I reboot the AP, when the espurna device connects to the lower strengh AP, TCP manages to work, NTP (UDP) does not manage to sync, and the scheduling of the espurna devices does not work.

mcspr commented 4 years ago

Maybe related to https://github.com/xoseperez/justwifi/issues/17 as well, but I am not sure about the

With this feature each X seconds devices will force a reconnect and use the wifiScan algorithm to connect to the AP with best RSSI.

On what condition? RSSI below certain threshold? Or it needs to remember best RSSI variable across re-connections instead of just at the moment of scan and try to match it again?

davebuk commented 4 years ago

Can the device do a scan every say, 10/15 mins. If it finds more than one ssid of the same name, connect to the stronger one. If another known ssid is configured and stronger, connect to that one. If already connected to the strongest ssid out of the ones configured, don't do anything.

ghost commented 4 years ago

I was thinking on using it to do a full reconnect to the wifi each hour. This should not hurt too much. It would trigger a mqtt reconnect, an ntp reconnect, a dhcp refresh, etc,... I have noticed that different version of ESP core behaves differently with same APs, APs that have troubles keeping the connection under certain circumstances, etc... A full reconnect on regular intervals initiated by the client, is the best effort to keep -in working order- the connectivity to the network.

mcspr commented 4 years ago

Sure, but I am leaning towards periodic scan. The only issue is with JustWifi expecting it to always happen as part of the connection loop, thus it needs to learn to do it whenever we want to and update internal networks rating with new BSSID based on RSSI values. Then it is just a matter of disconnecting from the current AP and the connection loop will automatically choose a better one when deciding which network to pick.

Regarding Core versions and AP problems (Mikrotik?), that's just a workaround. And I would expect Core versions 2.6.2+ to behave much better than 2.3.0 if you having issues with unresponsive devices. I'd rather not do it if we don't really have to.

Another option is to have forced BSSID & channel as part of the settings, so it will never pick up a wrong AP

ghost commented 4 years ago

I am recompiling the code of 1.14.0 myself and using 2.6.2, If connecting to an Asus access point the esp devices tend to work properly. With TPlink routers they get stuck sometimes and only a power cycle helps. With core 2.3.0 the issues were happening with any kind of AP

However even if using 2.6.2 and discarding connectivity problems from the wifi stack of the Core version, this functionality would be still useful (reconnect at cyclic intervals). I observe that many esp devices stay connected to SSID with lower RSSI. Imagine, an apartment block with APs from neighbors with a scheduled on/off wireless. The list of SSIDS, the channel occupancy, and what SSID/BSSID is in what channel fluctuate quite often. I force the channel in my APs to mitigate this.

For a force "BSSID & channel", is the same that having independent SSID per AP.

mcspr commented 4 years ago

By stuck you mean they loose connection and never connect again or loose IP but keep the connection (#614, supposed fix by the #1877)? There are a lot of SDK options in the current Core: https://github.com/esp8266/Arduino/blob/99aeeadb4d672778952cfd1a23d94a998f930ee0/tools/platformio-build.py#L167-L197 I have not had a lot of issues with either, but Tasmota maintainers proposed to use NONOSDK22x_190703 as the default based on theirs and their users experience

My only issue with forced disconnect before any scanning is possible loss of any connection, as I do routinely see esp8266 missing the target SSID in an apartment WiFi environment with a lot of other APs, despite it being 2 meters between them. JustWifi needs to be changed for this change too, to force to reconnect back in case we have not found any new networks

ghost commented 4 years ago

I am using 2.6.2 and have connectivity issues. It is much better than 2.30, but still happen with certain APs

mcspr commented 4 years ago

What I meant is try out a different WiFi SDK included by the Core. Default is july's, but Espressif claims to have fixed some connectivity issues in the november one. Script source that I linked checks for a specific C defines that a supposed to go into the build_flags = ... variable in the paltformio.ini (e.g. how we change LWIP version - https://docs.platformio.org/en/latest/platforms/espressif8266.html#lwip-variant) Arduino IDE has a menu option if you select Generic board.

How do these connectivity issues manifest?


On the topic... I am still trying out different approaches for scanning.

ghost commented 4 years ago

Hello,

I would like to clarify the topics:

mcspr commented 4 years ago

Yes, sorry for offtopic. When you do have some more info about build configuration, please do report that in a separate issue. I hope we can figure out some way to avoid those.

For the issue at hand, I need some real device to test things first, as I just played around with justwifi state machine so far. Will push an update asap

davebuk commented 4 years ago
  • Need of having espurna devices reconnecting to the best RSSI AP: I believe this is not handled in hardware nor in the sdk, nor in core nor in espurna (justwifi). I have AP that can reboot/disconnect/have interference... The espurna device loses the signal and connect to an AP with weak RSSI, having extremely slow pings, and losing udp packets. The espurna device never reconnects to the strong RSSI AP when is back. I wish a mechanism to try to reconnect to a stronger AP form time to time (if interval 0, this regular reconnection is disabled)

WRT this topic, can the RPN rules be used here? Either using RSSI value as a trigger, uptime or daily at a specific time and run the terminal command wifi.reset. Can the rules module send terminal commands?

ghost commented 4 years ago

I tried to check the documentation on RPN... But there is the need of a global counter to store the last reconnect (for example copying the uptime value) and execute terminal commands. Any of the features are implemented in RPN.

mcspr commented 4 years ago

See #2088

JustWifi callback now sends MESSAGE_FOUND_BETTER_NETWORK as message whenever jw.setPeriodicScanInterval() was called with some >= 0 value (in milliseconds) at setup. Library default is 5 minutes, PR has it disabled. If connection drops to fallback mode, timer resets and it should try to use basic scan.

Right now it will compare new RSSI with the current one and if the difference is >= 20 (wifiScanRSSI), it will reconnect. wifiScanIntvl (ms) setting controls rescan interval.

Have not tested besides dummy host program in justwifi test folder and a very quick test with 2 APs

mcspr commented 4 years ago

Regarding RPN, it is true that the should be some temporary storage for some temporary statistics. There are variables, but I don't see any way to store variables from rule, only way is through MQTT topics.

I guess a basic rssi ≥ -75 can also work, but there needs to be a new operator returning WiFi.RSSI() value.

mcspr commented 4 years ago

* see #2088 to fix bug in JustWifi branch, when scan function immediately disconnected STA

mcspr commented 4 years ago

Any feedback for the #2088 so far? 👀

ghost commented 4 years ago

I have recompiled 1.14.1 with this branch https://github.com/mcspr/justwifi.git#better-networks the justwifi library and core 2.6.3

I notice an overall improvement with the situation. Overall, devices seem to connect to a better AP.

However, I notice that some devices get disconnected and are unable to reconnect properly to the wifi. I see in the monitoring logs that they try to reconnect each 10-15 min but they are unable to stay connected. The AP is 3m far away with good RSSI. Unplugging and replugging the device solves the situation. It seems that there are still issues with Core 2.6.3. However with ESPEasy, and custom developments I do not get this issue. In last both cases I force a reconnect each hour.

Also I notice some Espurna devices that are unable to connect to MQTT. NTP is connected. I force a MQTT.reset command via telnet and does not have any improvment. Sometimes a Wifi reset command works, and afterwards there is mqtt connectivity, other times there is need of a full reset.

mcspr commented 4 years ago

Do you mean that you used #2088 changes on top of the 1.14.1? Which scan time have you used? Any other settings related to wifi? Not sure what you meant bc library itself does not trigger re-connection, espurna needs to handle specific message from justwifi when it finds the network with better rssi.

Any specific characteristic of those devices (vendor, board type, location etc.)? Are they loosing the connection after periodic scan added in the new justwifi branch or do you mean router logs? WiFi sleep setting comes to mind, since ESPEasy and default Arduino uses MODEM sleep, while we use NONE: https://github.com/xoseperez/espurna/blob/62ad7da332f3f904ad8241a2b738be9820b196e4/code/espurna/config/general.h#L457 https://github.com/xoseperez/espurna/blob/62ad7da332f3f904ad8241a2b738be9820b196e4/code/espurna/wifi.ino#L105-L108 (wifiSleep 0 for NONE, 1 for LIGHT and 2 for MODEM, light one is tricky performance-wise though) https://github.com/letscontrolit/ESPEasy/blob/b747fa571ef77f0f82ffa77997fbba7a2e140837/src/Custom-sample.h#L60

ghost commented 4 years ago

Please see the attached file for the json settings, "wifiSleep": "0", iot-plug-office-printer.randomizer.space.json.txt

I use sonoff devices (pow, pow2, rfbrdige, s20, slampher)

I use espurna 1.14.1 + justwifi branch https://github.com/xoseperez/justwifi/compare/master...mcspr:better-networks + SDK version 2.2.2-dev(38a443e) + Core version 2.6.3

BTW, the SDK version message in the web UI main page could be modified to use ESP.getFullVersion() instead of ESP.getSdkVersion()

mcspr commented 4 years ago

That's what I meant - looking at iot-plug-office-printer.randomizer.space.json.txt there is no wifiScanIntvl or wifiScanRSSI (unless you set those as flags WIFI_PERIODIC_SCAN_INTERVAL or WIFI_PERIODIC_SCAN_RANGE, so I am not seeing those. default scan time is 0 ms == disabled) I might set some default scan time though and have additional "enabled" flag, not sure which one makes sense here.

But I still need to look at justwifi again to make sure it does not lock things somehow so new networks appear.

mcspr commented 4 years ago

I have synced #2088 with the dev and set default to 3 minute scan interval. So far my understanding was that the scanning works and it does not cause any serious issues with the existing connectivity (since network stack is busy flipping channels, not managing any traffic).

Any updates on reliability of the whole thing? I assume the OT SDK issues were solved in the meantime

mcspr commented 3 years ago

note of the https://gitter.im/tinkerman-cat/espurna?at=6072fa89969f8b38ee702b4f initial commit introduced a bug where trying an non-existent AP locked the device, since the SDK function did not report that connection failed :/ in addition to that, reloading configuration (webui or reload) while disconnected, but just before the reconnect timer finishes caused a logic error and starting the re-connect too soon without any helper objects present

per #2088 and the recent commits, that should be fixed closing this since the feature is merged, any issues should be tracked separately