rstrouse / ESPSomfy-RTS

A controller for Somfy RTS shades and blinds
The Unlicense
428 stars 32 forks source link

Wifi connection timeout #377

Closed alka79 closed 2 weeks ago

alka79 commented 1 month ago

Describe you new feature you'd like

My wifi access point has sometimes it's own life. It randomly resets the 2.4G band every day or two ! There must be to many IOT devices on my wifi ! When this happens, 2.4G wifi is lost for about 40secs. All devices on my network recover almost immediately when wifi comes back, but ESPSomfy took several more minutes. That made me investigate. It has highlighted an ESPSomfy wifi reconnection behaviour which is not optimal IMHO. Leading to some suggestions below.

The way it works as I could understand:

So in the end, the board recovers from a temporary wifi failure but takes some extra time.

IMHO this could be improved.

I have done some minimal changes to the code to accommodate:

I understand the need to re-enter SoftAP mode in case the wifi settings must be changed. This fortunately occurs rarely and is predictable, So I don't mind to wait some extra minutes when it happens.

BTW; as I understand, an easy way to force restart in SoftAP mode is to wipe the SSID in the settings and reset. The question was once asked but not sure about the answer at that time.

rstrouse commented 1 month ago

Yeah I kinda wandered my way here. The timing I ended up with pretty much came from requests like these to accommodate some challenging environments. Originally, the software checked the boot button an would pop into AP mode and the LED would flash. This proved to be a no go given the myriad of boards out there that use different pins for these things. I looked really hard at preserving this but using the built_ins are a pipe dream without compiling for every board variant. Github would probably shut it down.

So perhaps it is time to revisit this again given the way wired connections work. If a connection is established with a wired connection it immediately pops out of AP mode and establishes a link. I think I may be able to do this same thing with AP_STA mode but it won't be as elegant as simply dropping the AP mode and connecting. Where this gets weird is because there is only one wifi radio and I am unsure as to what it does when changing its modes.

I really have three conditions that need to be taken care of. First is a non-existent SSID that is detected by checking whether one has been entered or that SSID does not show up in a scan. The second is where the SSID exists but it cannot issue a connection. This could be the result of AP publishing its SSID but is not ready to issue IPs yet, the passphrase has changed, or there is some general error in the link negotiation.

Currently, the SoftAP is opened when a connection cannot be met within 20 seconds. However, catching the soft AP is not as hard as you might think. If a connection is made then it will not drop the AP mode until there are 0 attached clients or the device is rebooted (at which time there will be 0 clients). This network will show up rather quickly as the ESP32 announces its AP.

alka79 commented 1 month ago

We have two subjects in one thread : led and wifi connection !

Arduino folks introduced LED_BUILTIN constant to accommodate for the board variants. This constant is correct on my ESP32 board but indeed it was not properly maintained over the years for all the ESP boards. You would not have to create separate builds, just add a LedPin setting to override in case led_builtin is missing or wrong. Led is a nice to have. It is useful when we start playing with the system. Once up and running, the board will be forgotten, except by some HA automation ;) I like the visual feedback and had some fun to add the led (also a brief flash when the board sends a command). It is surely not a requirement for all users.

It really makes sense to go straight to softAP the first time, or when no SSID is in the settings. After that, we rarely need to go to softAP again. Who changes regularly the SSID or passphrase ? and if one plans to change, one may simply modify the settings in the WebUI prior to reset the board.

My point was that automatically switch to softAP after STA-AP connection is lost for 20sec is to short IMHO. It may happen that AP becomes not reachable for a minute or so (most probably a reboot). In that case, the board switches to softAP after 20sec and stays there for 3 minutes, waiting for a client. Then softAP is closed and it tries to connect to STA-AP again. And so on. I just tested an AP out of reach for 40 seconds: my other diy board reconnect immediately but ESPSomfy is unavailable for a total 4minutes. It may go unnoticed by most users, but it feels suboptimal. I prefer a 2 or 3 minutes tolerance before switching to softAP.

I am not familiar with the dual AP_STA mode. It seems that ESP32 can handle both at the same time. An elegant solution as you suggest might be to keep trying to connect to STA-AP in the background and kill softAP if STA-AP connection becomes active again and no client is connected to softAP. Again, don't overthink this. It is so rarely needed, that it does not deserve the additional work. In addition, testing this is painful! The behavior and timing should be properly described in the doc and we can live with it.

rstrouse commented 1 month ago

Arduino folks introduced LED_BUILTIN constant to accommodate for the board variants.

Unfortunately that constant is a define that is used during the compile process. The board selection sets that value and must be compiled in at compile time.

An elegant solution as you suggest might be to keep trying to connect to STA-AP in the background and kill softAP if STA-AP connection becomes active again and no client is connected to softAP.

I am going to do some research with this. If this can operate the way a wired connection does then it will be very smooth. If not then I definitely think a change is in order for when the connection is simply lost.

alka79 commented 1 month ago

OK.

While you look at it, the APs are scanned in setup() just to print the list on the serial output. useful ? (it takes 4 to 6 seconds - doubling the setup time in my case)

There comes another curiosity question : why change to connect to wifi async and not in the setup as in 2.4.2 ?

rstrouse commented 1 month ago

While you look at it, the APs are scanned in setup() just to print the list on the serial output. useful ? (it takes 4 to 6 seconds - doubling the setup time in my case)

This is an artifact of the original attempts at getting to connect to the strongest AP in a mesh. The original docs indicated that simply setting the sort would make it connect properly but that wasn't the entire truth. At this point it should be removed. The other thing that it did was make sure the buffers used to scan APs were not allocated in the middle of the heap.

There comes another curiosity question : why change to connect to wifi async and not in the setup as in 2.4.2 ?

This has to do with several things. First, are delays in wired configurations. The PHY reports the link up long before it actually starts communicating. Not sure if that is a bug or if the expectation is that only after an IP is issued it should be considered connected.

Next the watchdog needs to be fed in a standard way. Moving this to an async operation allows the dog to be fed in standard places. While this appears to be fundamentally different than v2.4.2 it really isn't since the associated services are not started until a connection is established. It just gets triggered from the loop instead of created a loop in the setup. This eliminates the need to add delays and yields for other resources to get a slice.

Finally, reconnecting to wifi, changing of AP, and even fallback from a wired connection can all be done in the same fashion and there is no performance penalty for doing it this way. However, the changeAP code has yet to be turned over partly because this operation is a sub-second connection directly to the mac. Although it will be transitioned and the code will simply set the connection target and the connecting flag. This is the toughest thing to test and operating the initial connection in a way that acts like the change allows the code to operate in a similar way with similar steps.

alka79 commented 1 month ago

thanks for the detailed explanation.

rstrouse commented 2 weeks ago

I am going to close this one for now as v2.4.4 has improved wifi fallback and AP management. It should reconnect as soon as the router starts issuing IP addresses.