letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.22k stars 2.2k forks source link

ESP node can't reconnect to WiFi AP with weak signal after warm boot #2757

Closed ghtester closed 4 years ago

ghtester commented 4 years ago

As already discussed in some other issue topics, it looks there's an issue in several latest firmware releases which prevents ESP node to reconnect a WiFi AP with weak signal (RSSI about -90 dB) after warm boot. After cold boot (turn power off / on) the node is connected quickly to the same AP without issue. Tested various WiFi settings on ESP but it looks still the same. The issue is Reproducible in environment where a lot of WiFi APs is visible at different distances (with a different signal level). WIFISCAN command invoked from serial console does show many APs after cold boot. After warm (re)boot due to node crash or REBOOT command only reduced AP list is returned by WIFISCAN command. So it looks node WiFi sensitivity is significantly reduced after a warm (re)boot and it makes the reconnect to AP with weak signal impossible. Latest test performed with official build:

Firmware

Build:⋄ | 20104 - Mega System Libraries:⋄ | ESP82xx Core 2_6_0, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support Git Build:⋄ | mega-20191119 Plugins:⋄ | 79 [Normal] [Testing] Build Md5: | 2a40b605d5a65d4f9cee7fc88ad790 Md5 check: | passed. Build Time:⋄ | Nov 19 2019 22:04:11 Binary Filename:⋄ | ESP_Easy_mega-20191119_test_core_260_sdk222_alpha_ESP8266_4M1M.b

Sasch600xt commented 4 years ago

i can confirm.

one of my esp has about -87 to -90 RSSI and can´t connect after warm reboot.

TD-er commented 4 years ago

@Sasch600xt What core version do you use?

Sasch600xt commented 4 years ago

ESP_Easy_mega-20191119_normal_core_260_sdk222_alpha_ESP8266_4M1M

TD-er commented 4 years ago

I did change the platformio.ini file structure, so things may have a slightly different name. The default is now core 2.6.1 (SDK 2.2.2), and I also have some build definitions with core 2.6.1 SDK3

Maybe you can also test core 2.6.1 with both SDK versions for this issue?

Sasch600xt commented 4 years ago

sure.....so next release then ? tonight ?

TD-er commented 4 years ago

I started a test build. Will be ready in 45 minutes I guess.

Sasch600xt commented 4 years ago

i will be out of hous until tonight. So i will check as soon as possible....maybe i can manage it tonight after i come back.

TD-er commented 4 years ago

Here is the test build: ESPEasy_mega-20191119-22-g17fbb474.zip

Sasch600xt commented 4 years ago

i have to leave now but i had time for 2 quick tests.

ESP_Easy_mega-20191119-22-g17fbb474_test_beta_ESP8266_4M1M and ESP_Easy_mega-20191119-22-g17fbb474_normal_ESP8266_4M1M

where not working. they did not connect after warm reboot. only hard reset brought them on again

TD-er commented 4 years ago

OK, good to know. It makes me curious about the tests with core 2.6.1 SDK3

Sasch600xt commented 4 years ago

i did OTA Update to ESP_Easy_mega-20191119_test_core_260_sdk3_alpha_ESP8266_4M1M again and it came up after update, so all is fine with this firmware

ghtester commented 4 years ago

Yes it looks core 2.6.0 with sdk3 is a working combination while sdk2.2.2 together with both core 2.6.0 and core 2.6.1 has the WiFi issue. I am just building the customized firmware using Vagrant so perhaps I could share a fresh quick experience with core 2.6.1 & sdk3 soon...

ghtester commented 4 years ago

OK, so the first impressions with this custom build:

Entry Info
Build:⋄ 20104 - Mega
System Libraries:⋄ ESP82xx Core 2_6_1, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support
Git Build:⋄ My Build: Nov 21 201917:32:57
Plugins:⋄ 37 [Normal]
Build Md5: e2d52b9dca1ae3c9e7ca431e929220
Md5 check: passed.
Build Time:⋄ Nov 21 2019 17:34:24
Binary Filename:⋄ ESP_Easy_20191121_vagrant_custom_sdk3_ESP8266_4M1M.bin

In general it somehow works, WiFi connection is made even after a warm reboot with a remote AP (RSSI about -84dB) but not so quickly, even after cold boot. It could be due to AP model type, to be tested with another AP in different location. The worse thing is that I have experienced several wdt reboots. It needs more time to do a better testing. But for sure I would like to create a custom build based on core 2.6.0 and sdk3.

thomastech commented 4 years ago

I believe I have this problem too. Using ESP_Easy_mega-20191108-36-PR_2728_test_core_260_sdk222_alpha_ESP on a NodeMCU.

Cold boots experience fast WiFi connection. Warm boots fail to connect. The typical RSSI at this device location is usually -80dBm or stronger (currently -74dBm).

Just speculation, but perhaps signal quality temporarily gets worse in my operating environment (walk near device, RF interference, bad mojo, etc). Then the "reboot issue" occurs and that starts this warm boot failure mode.

BTW, despite the warm reboots, I didn't experience WiFi re-connection problems with a late August build using 260_sdk3_alpha core. So at this point I think that that the recent test_core_260_sdk222_alpha is involved.

Sasch600xt commented 4 years ago

wich bin from today can i use ? i miss a SDK3 for 4M1M. i am not sure i can use the custom one ?

TD-er commented 4 years ago

Today's build has changed the SDK version back to July version. So when you use a version which doesn't have a core version mentioned (or core 2.6.1 explicit mentioned), then it has SDK 2.2.2 from July.

See discussion here: https://github.com/esp8266/Arduino/issues/5784#issuecomment-557500450

Sasch600xt commented 4 years ago

i see

ghtester commented 4 years ago

Thanks for the info. Hopefully the bad WiFi sensitivity after a warm boot will be fixed somehow in future (if it's the same for SDK from July). BTW. it looks sdk3 significantly helped with this issue but I also experienced more unexpected reboots.

TD-er commented 4 years ago

I have 4 nodes running that core version with uptimes over 55 days and 2 with over 20 days. So the core version is capable of running stable. But the number of WiFi reconnects on those nodes is quite low, so maybe not entirely on-topic in this issue.

ghtester commented 4 years ago

I'm not sure which core / sdk combination do you mean. I think if the signal from AP is strong and stable, it (almost) does not matter which core / sdk is used for ESP Easy mega build and it works quite good and stable. Nevertheless, the different WiFi sensitivity after a cold versus warm boot is a really bad issue IMHO... Just uploaded one my node with the fresh official build:

Firmware

Build:⋄ | 20104 - Mega System Libraries:⋄ | ESP82xx Core 2.7.0-dev stage, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support Git Build:⋄ | mega-20191123 Plugins:⋄ | 79 [Normal] [Testing] Build Md5: | a667330ae76d2cfa961f72db502680 Md5 check: | passed. Build Time:⋄ | Nov 23 2019 03:49:28 Binary Filename:⋄ | ESP_Easy_mega-20191123_test_beta_ESP8266_4M1M.bin

The WiFi issue is there again, node can't reconnect to AP after a warm reboot (RSSI -89). After a cold boot it's connected immediately to the same AP. I'll keep it running to test the stability with core 2.7.0.

TD-er commented 4 years ago

That's not the core 2.6.1 That's running the beta version. Please test with a version without "_beta".

ghtester commented 4 years ago

That's OK. ;-) I think beta versions should be tested as well. So far every tested core with sdk 2.2.2 had the same WiFi "warm boot" issue. Perhaps I should find a solution how to automatically perform a cold boot right after a warm one... :-)

ghtester commented 4 years ago

Let me share the test results on 2 nodes with the firmware mentioned above (ESP82xx Core 2.7.0-dev stage, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support). So far it looks very good, seems to be quite stable under weak RSSI and performing very well. The reconnect issue after a warm reboot is there but the unexpected reboot did not happen so far.

The ESP Easy mega node with only BMX280 plugin and Home Assistant (openHAB) MQTT Controller, RSSI about -82: 367533284: WD : Uptime 6126 ConnectFailures 144 FreeMem 15776 WiFiStatus 3 Sending data from the BMP280 to MQTT Controller every 15 secs.

The ESP Easy mega node with more plugins but most of time only listen to MQTT import, RSSI about -91: 535412697: WD : Uptime 8924 ConnectFailures 884 FreeMem 12624 WiFiStatus 3

TD-er commented 4 years ago

In another repo, I came across some comment next to the WiFi.disconnect(); call in the Setup() function. See the PR I just made: https://github.com/letscontrolit/ESPEasy/pull/2789

Maybe it can be tested to see if it does fix this issue? Please try this test build ESPEasy_mega-20191130-2-PR_2789.zip

thomastech commented 4 years ago

Please try this test build ESPEasy_mega-20191130-2-PR_2789.zip

I've installed ESP_Easy_mega-20191130-2-PR_2789_test_ESP8266_4M_VCC.bin on a NodeMCU and will report back on the test results.

ghtester commented 4 years ago

As already mentioned in another thread, I tested quickly the ESP_Easy_mega-20191130-3-PR_2792_test_beta_ESP8266_4M1M.bin and have to confirm that the issue with a limited WiFi connectivity after a warm boot is still there (looks to be related with SDK 2.2.2).

thomastech commented 4 years ago

Test update:

After ~3.5 days my ESP_Easy_mega-20191130-2-PR_2789_test_ESP8266_4M_VCC.bin (on a NodeMCU) has experienced a warm / soft reboot.

The device appears to have rebooted with partial WiFi connectivity because I received an email from it that announced the reboot. But web access is broken.

Initially I saw partial ESPEasy information from the browser. But after a couple refreshes all web access stopped (browser access times out). A cold power reset has restored operation.

TD-er commented 4 years ago

I just got an idea about what may be different between a cold and a warm boot for wifi reconnects. Can you test a few times (with some minutes interval in between) to run a wifi scan from the tools page? Preferably with Eco mode enabled to increase the effect I'm thinking about. Does the AP you've configured appear in the list? (both if you setup more than one)

When running the most recent builds (test build, not even nightly's, for example this one: ESPEasy_mega-20191130-17-PR_2798.zip) then the wifi scan will also store in RTC memory the strongest AP you have configured. So when you then click the wifi disconnect button (or command WifiDisconnect from serial), then the unit will disconnect and reconnect to the last found strongest AP. (reconnect takes about 300 msec)

If you do this too frequently (within 5 minutes), the "next" configured AP will be selected, even though it is not the strongest signal.

So in short:

OK, now the idea I have. What if we need not to perform an "active" scan, but a "passive" scan? The passive scan is when we just wait for as long as the timeout (default 200 msec for ESP8266, 300 for ESP32) for an AP to send its beacon signal. (typical 102.4 msec interval, but may differ between brands) The active scan (which we perform) does send out a "ping" to all AP's to announce themselves. So an active scan can be shorter than the timeout, but it can also result in less AP's found.

Now what I am curious about: When you perform a wifi scan from the tools page, and one or both of the APs you have configured is not listed. Then what happens when you force a WiFi disconnect? Will it try for a long time to connect to something that can hardly be reached?

What I can change:

What may affect the tests:

So a lot to consider and I hope my braindump here is not too chaotic to follow ;)

ghtester commented 4 years ago

Well, thanks for the hints to test, I'll try to find some time to read your message carefully and test at least part of the suggested things. I am not sure if the RTC can even help with the bad WiFi sensitivity after a warm boot when the SDK2.2.2 is used for a firmware build. Yes in general it's a nice feature for a quick reconnecting, if it will reliably work as designed. But couldn't somebody (of developpers) find the related difference between SDK2.2.2 and other SDKs without this issue?

thomastech commented 4 years ago

Can you test a few times (with some minutes interval in between) to run a wifi scan from the tools page? Preferably with Eco mode enabled to increase the effect I'm thinking about. Does the AP you've configured appear in the list?

I tried several times over a two hour period, Eco Mode temporarily enabled. My WiFi router always appears in the list (only one router is configured on my devices).

TD-er commented 4 years ago

But couldn't somebody (of developpers) find the related difference between SDK2.2.2 and other SDKs without this issue?

Well, I have not been digging deep in the differences between SDK2.x and SDK3. And even if I did, I cannot look into the WiFi code as that's proprietary and closed source. I've been debugging WiFi issues the last 20 months with "black box debugging", which does resemble the debugging style of "writeln("blaat"); writeln("blaat2");" and looking at the output.

I am not sure if the RTC can even help with the bad WiFi sensitivity after a warm boot when the SDK2.2.2 is used for a firmware build. Yes in general it's a nice feature for a quick reconnecting, if it will reliably work as designed.

The main reason I added it (apart from the possibility to save energy on battery powered nodes) is to try and fix this issue we're dealing with here. The WiFi settings stored in RTC do remain in tact with warm reboots (crashes included) and remove the need for scanning for WiFi networks. It simply knows the last BSSID and channel used and also what SSID settings were used. So the first 2 attempts will be to connect to the same AP as the last successful connection before the reboot or lost connection. This also means you are not depending on whether the AP will react during the short scan interval, which can sometimes be an issue.

I tried several times over a two hour period, Eco Mode temporarily enabled. My WiFi router always appears in the list (only one router is configured on my devices).

OK, so at least for your setup it may not be a factor to change the scan mode from active to passive.

ghtester commented 4 years ago

Let me share an update with some recent FW builds:

BAD = bad WiFi sensitivity after a warm (re)boot OK = WiFi sensitivity is still the same (good) after a cold or a warm (re)boot

BAD Build:⋄ 20106 - Mega System Libraries:⋄ ESP82xx Core a04c3244, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support Git Build:⋄ Plugin Count:⋄ 82 [Normal] [Testing] Build Md5: 138327be07fcd8e807a677412dc247 Md5 check: passed. Build Time:⋄ Mar 29 2020 04:22:08 Binary Filename:⋄ ESP_Easy_mega-20200328-6-PR_2972_test_beta_ESP8266_4M1M.bin

BAD Build:⋄ 20105 - Mega System Libraries:⋄ ESP82xx Core a04c3244, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support Git Build:⋄ mega-20200328 Plugin Count:⋄ 82 [Normal] [Testing] Build Md5: b8bb1bd39cd2df423cee65ea1b81fcc Md5 check: passed. Build Time:⋄ Mar 28 2020 02:33:26 Binary Filename:⋄ ESP_Easy_mega-20200328_test_beta_ESP8266_4M1M.bin

OK Build:⋄ 20104 - Mega System Libraries:⋄ ESP82xx Core 2_6_3, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support Git Build:⋄ My Build: Mar 11 202010:22:28 Plugin Count:⋄ 37 [Normal] Build Md5: 53f44a927343c969efcca48142a883c Md5 check: passed. Build Time:⋄ Mar 11 2020 10:24:06 Binary Filename:⋄ ESP_Easy_20200311_vagrant_custom_sdk3_ESP8266_4M1M.bin

OK Build:⋄ 20105 - Mega System Libraries:⋄ ESP82xx Core 3d128e5c, NONOS SDK 2.2.2-dev(a58da79), LWIP: 2.1.2 PUYA support Git Build:⋄ mega-20200328 Plugin Count:⋄ 13 [Normal] [Minimal, IR with AC] Build Md5: 4d3a6ba6ad3029a3ed908269f9c98d83 Md5 check: passed. Build Time:⋄ Mar 28 2020 02:09:47 Binary Filename:⋄ ESP_Easy_mega-20200328_minimal_IRext_ESP8266_4M1M.bin

OK Build:⋄ 20106 - Mega System Libraries:⋄ ESP82xx Core 3d128e5c, NONOS SDK 2.2.2-dev(a58da79), LWIP: 2.1.2 PUYA support Git Build:⋄ Plugin Count:⋄ 16 [Normal] Build Md5: e47e18c32043ebacc36819bc61a8eed Md5 check: passed. Build Time:⋄ Mar 29 2020 03:38:56 Binary Filename:⋄ ESP_Easy_mega-20200328-6-PR_2972_custom_ESP8266_4M1M.bin

So it looks SDK 2.2.2-dev(a58da79) fixed the WiFi issue reported above. So far I had to use SDK 3.0.0 (which was not recommended for use) for a custom firmware builds to avoid the bad WiFi sensitivity after a warm (re)boot which happens to me a bit often due to Exceptions.

TD-er commented 4 years ago

Another striking correlation is that a high plugin count correlates with bad WiFi stability.

Not sure that it has something to do with it, just that it is a surprising correlation seen in your tests.

ghtester commented 4 years ago

It's interesting to me that the latest custom build also reconnects OK even after warm (re)boot with the same SDK release... So hopefully the issue is fixed and we could close this case?


OK Build:⋄ 20106 - Mega
System Libraries:⋄ ESP82xx Core 5511180c, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support
Git Build:⋄ My Build: Apr 11 2020 10:05:33
Plugin Count:⋄ 36 [Normal]
Build Md5: bc161e1b8b7984d07379d96b34972be5
Md5 check: passed.
Build Time:⋄ Apr 11 2020 10:07:20
Binary Filename:⋄ ESP_Easy_20200411_vagrant_custom_beta_ESP8266_4M1M.bin
TD-er commented 4 years ago

Let's hope so. Maybe you can also test a few nightly build files, to make sure it isn't a build issue.