rstrouse / ESPSomfy-RTS

A controller for Somfy RTS shades and blinds
The Unlicense
428 stars 32 forks source link

Never reconnect to WiFi after a router restart #410

Open Jezza34000 opened 1 week ago

Jezza34000 commented 1 week ago

Hardware

ESP32

Firmware version

v2.4.4

Application version

v2.4.4

What happened? What did you expect to happen?

Hi @rstrouse Since version 2.4.4, when my router restarts or updates its firmware, the ESPSomfy RTS starts the AP and never reconnects to the defined SSID even if it is available.

How to reproduce it (step by step)

- When ESPSomfyRTS is connected
- Restart router
- Soft AP start and it never reconnect

Logs

No response

alka79 commented 1 week ago

same issue here with 2.4.4. Opens softAP and never re-connects to the AP as the 2.4.4 fix intended to. Reset manually the board restarts in STA mode.

Gehilfe commented 1 week ago

Same problem here with 2.4.4 and 2.4.5

alka79 commented 1 week ago

Automatically switch to local SoftAP when connexion is lost seems to cause more trouble than needed. I was wondering about the rationale to do that. The only situation I can think of, is when the user changes the credentials of the AP, which is rare and predictable. Wouldn't it be easier to ask user to reset the board in that (rare) case ?

rstrouse commented 1 week ago

Do you by chance have any devices where you need to forget the ESPSomfyRTS network? ESPSomfy RTS will not shut down the AP if it has a client connected to it for obvious reasons.

However, if you have a wifi device that you previously connected to the ESPSomfyRTS SSID and did not uncheck the remember this network, that client will automatically connect to it. ESPSomfy RTS will not drop the AP while it has a client connected to it.

Jezza34000 commented 1 week ago

I just did the test and I actually "forgot" the network on my smartphone to prevent it from connecting to the ESPSomfyRTS. And I confirm I have reproduced the bug: restarting the router (~15sec) the AP starts and the ESPSomfyRTS no longer reconnects to the intended SSID

alka79 commented 6 days ago

On my side, I am positive that the only device that ever connected to the softAP (my phone) was in my hands and connected to the main AP when I noticed that the board was stucked in softAP. I will later try to reproduce with serial port plugged to get a better trace.

rstrouse commented 6 days ago

I think I know what is going on. I will try some things out as I managed to duplicate what is going on with yours. Apparently shutting down wifi with a wired backhaul means disabling all the mesh nodes on my network.

Jezza34000 commented 6 days ago

In the same Wifi subject, I noticed that when you have a board with ETHERNET and the Wifi Fallback is activated. The fallback is done well, but on the other hand the fallback is not done at startup. It is not possible to boot directly to Wifi if Ethernet has not been connected at least once

alka79 commented 6 days ago

I reproduced. Here is the trace. You may notice that I added more serial prints.

The board was reset at 11:18. All went normal. When quiet for a few minutes, I stopped Wifi AP. I only have one, no mesh. AP was turned off at 11:24:00 and turned on again at 11:25:00.

The hotspot started at 11:24:46 and was never closed. I did not connect to SoftAP and it was removed from my wifi AP list to avoid any autoconnect chance.

11:24:23.730 -> Connection Lost...A0:...:D9 CH:7 (-79dbm)  
11:24:23.730 -> (evt) Disconnected from WiFi access point
11:24:23.870 -> Set hostname to:ESPSomfyRTS
11:24:23.870 -> Starting AP scan...
11:24:26.867 -> (evt) Completed scan for access points
11:24:26.867 -> Timing Net: 3120ms
11:24:29.457 -> (evt) Disconnected from WiFi access point
11:24:32.230 -> (evt) Disconnected from WiFi access point
11:24:35.239 -> (evt) Disconnected from WiFi access point
11:24:38.209 -> (evt) Disconnected from WiFi access point
11:24:41.222 -> (evt) Disconnected from WiFi access point
11:24:44.197 -> (evt) Disconnected from WiFi access point
11:24:46.878 -> 
11:24:46.878 -> Turning the HotSpot On
11:24:46.878 -> (evt) WiFi SoftAP Started IP:192.168.4.1
11:24:46.878 -> Initializing AP for credentials modification
11:24:46.878 -> (evt) WiFi SoftAP Stopped
11:24:46.878 -> (evt) WiFi SoftAP Started IP:192.168.4.1
11:24:47.067 -> Timing Net: 204ms
11:24:47.067 -> Socket [0] Disconnected!
11:26:23.760 -> (evt) Lost IP address and IP address is reset to 0

socket[0] is homeassistant.

I also added traces when connexion occurs to softAP. These events were not triggered.

case ARDUINO_EVENT_WIFI_AP_STACONNECTED:    Serial.println("(evt) Device Connexion to SoftAP"); break;
case ARDUINO_EVENT_WIFI_AP_STADISCONNECTED: Serial.println("(evt) Device Disconnexion from SoftAP"); break;

I may have missed, but I don't see in the code where it tries to regularly check if AP is available again and tries to reconnect when softAP is started.

rstrouse commented 6 days ago

The code for the station mode scan is happening on another thread. However, it appears my test case was flawed. Simply dropping the SSID from my network doesn't reset the BSSID (and it shouldn't) so it never fell into the scan. If you install the v2.4.6 pre-release it would be appreciated if you could run it through your test case. My test case now involves shutting down all the nodes in my mesh.

ultratoto14 commented 4 days ago

For me, my deco mesh system contains a high number of mesh devices (6 APs) and takes time to restart and get into a ready state. I have a scheduled reboot every night at 4 am and every morning, the ESPSomfy-RTS is not connected. I installed the 2.4.6 but it does not fixed the issue, it remains in softAP mode. I'll try to get the logs tomorrow morning.

rstrouse commented 3 days ago

I added some more aggressive wifi scanning so update your 2.4.6 to the current pre-release.

Nickduino commented 3 days ago

Do you by chance have any devices where you need to forget the ESPSomfyRTS network? ESPSomfy RTS will not shut down the AP if it has a client connected to it for obvious reasons.

Nope. But after noticing there was an ESPSomfyRTS network, I connected to it with my laptop and disconnected. ESPSomfyRTS behaved as expected: it reconnected to my AP (which confirms there was no other device connected to it).

After that, I updated to the v2.4.6 Pre

alka79 commented 3 days ago

The code for the station mode scan is happening on another thread. However, it appears my test case was flawed. Simply dropping the SSID from my network doesn't reset the BSSID (and it shouldn't) so it never fell into the scan. If you install the v2.4.6 pre-release it would be appreciated if you could run it through your test case. My test case now involves shutting down all the nodes in my mesh.

I will early next week. Nickduino's finding seem to indicate that it is required to first connect to softAP to get the proper indication of number of stations connected. I'll add some traces to confirm WiFi.softAPgetStationNum() works as expected. Also, do we have any indication in the main thread that ESP32 actually tries to connect to wifi AP in the other thread ? is this a feature of the dual APSTA mode ?

Nickduino commented 3 days ago

Nickduino's finding seem to indicate that it is required to first connect to softAP to get the proper indication of number of stations connected.

If you can log it, it will be interesting to know what this says just in case. But I didn't say it was "required", I meant that the fact ESPSomfy connected to my AP right after I disconnected from its soft AP tends to show that my laptop was the only device connected to it (and that there were zero device connected 3 minutes earlier).

What surprises me is that it never happened with the v2.4.2 or v2.4.3 (and it has happened with v2.4.4 and v2.4.5 although, before today I didn't think of checking if an ESPSomfy network popped up).

Also, I can't confirm if that was after a router restart or not.

paede81 commented 3 days ago

Bug also affect me since I upgraded from 2.4.0 to 2.4.5

rstrouse commented 2 days ago

Install the current build of v2.4.6 pre-release. I previously relied on timing but that was not predictable in all environments. The code base now looks at the availability of the Ssid which is now continuously scanned.

Nickduino commented 2 days ago

@rstrouse Any idea why this seems to have started happening with v2.4.4?

rstrouse commented 2 days ago

Actually it started happening for as of 2.4.1 but it didn't matter for many since there was a 5-10 second delay caused by an initial wifi scan at the start that then changed the point at which the Soft AP would start. It appears that not all channels associated with an SSID and ESPSomfy RTS was picking the channel that was available at boot and not the strongest channel published by the SSID. ESP32 does not automatically hop to a stronger channel. My network boots very quickly so I didn't quite gather what folks were experiencing.

Ultimately, I went through the code and realized that I cannot rely on just waiting long enough to start scanning. This was not the way to go and needed to rework how the device boots when the SSID cannot be found at boot. This also means that it needs to periodically look for the strongest channel. I am pretty sure I got to the bottom of it in v2.4.6 but then again my environment never really experienced the goofy channel selection or the perpetual Soft AP.

alka79 commented 2 days ago

Some improvements in 246: scan is performed every 5 secs continuously when SoftAP is open, but board does not reconnect when AP is back.

here is the serial output: AP stopped at 20H32, restarted at 20H34. I did not connect to SoftAP. Socket[0] is Homeassistant.

``` 20:32:16.257 -> Connection Lost...A0:xxxx:D9 CH:7 (-51dbm) 20:32:16.257 -> (evt) Disconnected from WiFi access point 20:32:16.351 -> Set hostname to:ESPSomfyRTS 20:32:16.398 -> Starting AP scan... 20:32:19.183 -> (evt) Completed scan for access points 20:32:19.183 -> Timing Net: 2920ms 20:32:21.933 -> (evt) Disconnected from WiFi access point 20:32:24.751 -> (evt) Disconnected from WiFi access point 20:32:27.536 -> (evt) Disconnected from WiFi access point 20:32:30.331 -> (evt) Disconnected from WiFi access point 20:32:32.924 -> (evt) Disconnected from WiFi access point 20:32:35.531 -> (evt) Disconnected from WiFi access point 20:32:38.112 -> (evt) Disconnected from WiFi access point 20:32:39.153 -> 20:32:39.153 -> Turning the HotSpot On 20:32:39.153 -> (evt) WiFi SoftAP Started IP:192.168.4.1 20:32:39.153 -> Initializing AP for credentials modification 20:32:39.200 -> (evt) WiFi SoftAP Stopped 20:32:39.200 -> (evt) WiFi SoftAP Started IP:192.168.4.1 20:32:39.387 -> Socket [0] Disconnected! 20:32:39.387 -> Timing Net: 208ms 20:32:44.003 -> (evt) Completed scan for access points 20:32:48.665 -> (evt) Completed scan for access points 20:32:53.270 -> (evt) Completed scan for access points 20:32:57.923 -> (evt) Completed scan for access points 20:33:02.576 -> (evt) Completed scan for access points 20:33:07.189 -> (evt) Completed scan for access points 20:33:11.854 -> (evt) Completed scan for access points 20:33:16.455 -> (evt) Completed scan for access points 20:33:21.124 -> (evt) Completed scan for access points 20:33:25.737 -> (evt) Completed scan for access points 20:33:30.387 -> (evt) Completed scan for access points 20:33:35.034 -> (evt) Completed scan for access points 20:33:39.678 -> (evt) Completed scan for access points 20:33:44.312 -> (evt) Completed scan for access points 20:33:48.929 -> (evt) Completed scan for access points 20:33:53.558 -> (evt) Completed scan for access points 20:33:58.202 -> (evt) Completed scan for access points 20:34:02.853 -> (evt) Completed scan for access points 20:34:07.493 -> (evt) Completed scan for access points 20:34:12.108 -> (evt) Completed scan for access points 20:34:16.266 -> (evt) Lost IP address and IP address is reset to 0 20:34:16.737 -> (evt) Completed scan for access points 20:34:21.374 -> (evt) Completed scan for access points 20:34:26.048 -> (evt) Completed scan for access points 20:34:30.646 -> (evt) Completed scan for access points 20:34:35.290 -> (evt) Completed scan for access points 20:34:39.930 -> (evt) Completed scan for access points 20:34:44.573 -> (evt) Completed scan for access points 20:34:49.217 -> (evt) Completed scan for access points 20:34:53.856 -> (evt) Completed scan for access points 20:34:58.505 -> (evt) Completed scan for access points 20:35:03.101 -> (evt) Completed scan for access points etc. ```
rstrouse commented 1 day ago

Did you update the firmware again since yesterday?

ultratoto14 commented 1 day ago

Hi @rstrouse, updated to new 2.4.6-pre yesterday and the problem did not occur this morning, well reconnected to my mesh system.

alka79 commented 1 day ago

Did you update the firmware again since yesterday?

yes, I updated yesterday evening just before the tests. I do not have a wifi mesh and I have roaming option unchecked.

Nickduino commented 1 day ago

I am pretty sure I got to the bottom of it in v2.4.6 but then again my environment never really experienced the goofy channel selection or the perpetual Soft AP.

Couldn't you replicate it by turning your WiFi network off long enough for ESPSomfy to turn the soft AP on, then turning your WiFi on again and see if it reconnects?

alka79 commented 1 day ago

I ran another test, this time with roaming option checked, eventhough it is useless in my config

behaviour is different : the scans stop when AP is back online, but still no reconnect.

AP stopped at 12:40:20 AP back at 12:42:00.

``` 12:40:34.352 -> (evt) Disconnected from WiFi access point 12:40:34.352 -> Connection Lost...A0:40:A0:A5:DC:D9 CH:7 (-61dbm) 12:40:34.352 -> --- start blinking wifi connect 12:40:34.445 -> Set hostname to:ESPSomfyRTS 12:40:34.493 -> Starting AP scan... 12:40:37.276 -> (evt) Completed scan for access points 12:40:37.276 -> Timing Net: 2915ms 12:40:39.849 -> (evt) Disconnected from WiFi access point 12:40:42.651 -> (evt) Disconnected from WiFi access point 12:40:45.232 -> (evt) Disconnected from WiFi access point 12:40:48.006 -> (evt) Disconnected from WiFi access point 12:40:50.798 -> (evt) Disconnected from WiFi access point 12:40:53.589 -> (evt) Disconnected from WiFi access point 12:40:56.389 -> (evt) Disconnected from WiFi access point 12:40:57.243 -> 12:40:57.243 -> Turning the HotSpot On 12:40:57.243 -> (evt) WiFi SoftAP Started IP:192.168.4.1 12:40:57.290 -> Initializing AP for credentials modification 12:40:57.290 -> (evt) WiFi SoftAP Stopped 12:40:57.290 -> (evt) WiFi SoftAP Started IP:192.168.4.1 12:40:57.477 -> Socket [0] Disconnected! 12:40:57.477 -> Timing Net: 207ms 12:41:02.079 -> (evt) Completed scan for access points 12:41:06.719 -> (evt) Completed scan for access points 12:41:11.365 -> (evt) Completed scan for access points 12:41:16.006 -> (evt) Completed scan for access points 12:41:20.645 -> (evt) Completed scan for access points 12:41:25.271 -> (evt) Completed scan for access points 12:41:29.934 -> (evt) Completed scan for access points 12:41:34.576 -> (evt) Completed scan for access points 12:41:39.174 -> (evt) Completed scan for access points 12:41:43.843 -> (evt) Completed scan for access points 12:42:34.330 -> (evt) Lost IP address and IP address is reset to 0 stops there. No more outputs. ```
rstrouse commented 1 day ago

Are there any special characters in your SSID?

alka79 commented 1 day ago

Are there any special characters in your SSID? nope. just letters.

rstrouse commented 1 day ago

And there is absolutely no log at all after it hits the lost IP message? Is the Soft AP still available?

Bear in mind that although the SSID can show up in the list it may not be ready to serve connections. It may take some time for the router to stand up the DHCP, service other connections in the queue and authenticate them. However, the lost IP message comes nearly a full minute after the attempt to connect. It stops scanning when it finds an SSID.

rstrouse commented 1 day ago

@Nickduino

Couldn't you replicate it by turning your WiFi network off long enough for ESPSomfy to turn the soft AP on, then turning your WiFi on again and see if it reconnects?

Forunately, (unfortunately), I have a wired backhaul and simply shutting down the radio for the AP that I thought I was connected to beamformed to another AP. I finally replicated it by unplugging all the APs on the mesh and rebooting the router. My mesh also virtualizes the channel so it doesn't change channels went hopping between APs. In the end I needed to drop the mesh entirely. The recent change to publish the Soft AP to the hostname also helped because I have 10 devices that were all publishing a Soft AP on ESPSomfyRTS.

I finally was able to replicate the scenario and get a better idea of what is happening.