Closed ghtester closed 4 years ago
i can confirm.
one of my esp has about -87 to -90 RSSI and can´t connect after warm reboot.
@Sasch600xt What core version do you use?
ESP_Easy_mega-20191119_normal_core_260_sdk222_alpha_ESP8266_4M1M
I did change the platformio.ini file structure, so things may have a slightly different name. The default is now core 2.6.1 (SDK 2.2.2), and I also have some build definitions with core 2.6.1 SDK3
Maybe you can also test core 2.6.1 with both SDK versions for this issue?
sure.....so next release then ? tonight ?
I started a test build. Will be ready in 45 minutes I guess.
i will be out of hous until tonight. So i will check as soon as possible....maybe i can manage it tonight after i come back.
Here is the test build: ESPEasy_mega-20191119-22-g17fbb474.zip
i have to leave now but i had time for 2 quick tests.
ESP_Easy_mega-20191119-22-g17fbb474_test_beta_ESP8266_4M1M and ESP_Easy_mega-20191119-22-g17fbb474_normal_ESP8266_4M1M
where not working. they did not connect after warm reboot. only hard reset brought them on again
OK, good to know. It makes me curious about the tests with core 2.6.1 SDK3
i did OTA Update to ESP_Easy_mega-20191119_test_core_260_sdk3_alpha_ESP8266_4M1M again and it came up after update, so all is fine with this firmware
Yes it looks core 2.6.0 with sdk3 is a working combination while sdk2.2.2 together with both core 2.6.0 and core 2.6.1 has the WiFi issue. I am just building the customized firmware using Vagrant so perhaps I could share a fresh quick experience with core 2.6.1 & sdk3 soon...
OK, so the first impressions with this custom build:
Entry | Info |
---|---|
Build:⋄ | 20104 - Mega |
System Libraries:⋄ | ESP82xx Core 2_6_1, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support |
Git Build:⋄ | My Build: Nov 21 201917:32:57 |
Plugins:⋄ | 37 [Normal] |
Build Md5: | e2d52b9dca1ae3c9e7ca431e929220 |
Md5 check: | passed. |
Build Time:⋄ | Nov 21 2019 17:34:24 |
Binary Filename:⋄ | ESP_Easy_20191121_vagrant_custom_sdk3_ESP8266_4M1M.bin |
In general it somehow works, WiFi connection is made even after a warm reboot with a remote AP (RSSI about -84dB) but not so quickly, even after cold boot. It could be due to AP model type, to be tested with another AP in different location. The worse thing is that I have experienced several wdt reboots. It needs more time to do a better testing. But for sure I would like to create a custom build based on core 2.6.0 and sdk3.
I believe I have this problem too. Using ESP_Easy_mega-20191108-36-PR_2728_test_core_260_sdk222_alpha_ESP on a NodeMCU.
Cold boots experience fast WiFi connection. Warm boots fail to connect. The typical RSSI at this device location is usually -80dBm or stronger (currently -74dBm).
Just speculation, but perhaps signal quality temporarily gets worse in my operating environment (walk near device, RF interference, bad mojo, etc). Then the "reboot issue" occurs and that starts this warm boot failure mode.
BTW, despite the warm reboots, I didn't experience WiFi re-connection problems with a late August build using 260_sdk3_alpha core. So at this point I think that that the recent test_core_260_sdk222_alpha is involved.
wich bin from today can i use ? i miss a SDK3 for 4M1M. i am not sure i can use the custom one ?
Today's build has changed the SDK version back to July version. So when you use a version which doesn't have a core version mentioned (or core 2.6.1 explicit mentioned), then it has SDK 2.2.2 from July.
See discussion here: https://github.com/esp8266/Arduino/issues/5784#issuecomment-557500450
i see
Thanks for the info. Hopefully the bad WiFi sensitivity after a warm boot will be fixed somehow in future (if it's the same for SDK from July). BTW. it looks sdk3 significantly helped with this issue but I also experienced more unexpected reboots.
I have 4 nodes running that core version with uptimes over 55 days and 2 with over 20 days. So the core version is capable of running stable. But the number of WiFi reconnects on those nodes is quite low, so maybe not entirely on-topic in this issue.
I'm not sure which core / sdk combination do you mean. I think if the signal from AP is strong and stable, it (almost) does not matter which core / sdk is used for ESP Easy mega build and it works quite good and stable. Nevertheless, the different WiFi sensitivity after a cold versus warm boot is a really bad issue IMHO... Just uploaded one my node with the fresh official build:
Build:⋄ | 20104 - Mega System Libraries:⋄ | ESP82xx Core 2.7.0-dev stage, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support Git Build:⋄ | mega-20191123 Plugins:⋄ | 79 [Normal] [Testing] Build Md5: | a667330ae76d2cfa961f72db502680 Md5 check: | passed. Build Time:⋄ | Nov 23 2019 03:49:28 Binary Filename:⋄ | ESP_Easy_mega-20191123_test_beta_ESP8266_4M1M.bin
The WiFi issue is there again, node can't reconnect to AP after a warm reboot (RSSI -89). After a cold boot it's connected immediately to the same AP. I'll keep it running to test the stability with core 2.7.0.
That's not the core 2.6.1 That's running the beta version. Please test with a version without "_beta".
That's OK. ;-) I think beta versions should be tested as well. So far every tested core with sdk 2.2.2 had the same WiFi "warm boot" issue. Perhaps I should find a solution how to automatically perform a cold boot right after a warm one... :-)
Let me share the test results on 2 nodes with the firmware mentioned above (ESP82xx Core 2.7.0-dev stage, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support). So far it looks very good, seems to be quite stable under weak RSSI and performing very well. The reconnect issue after a warm reboot is there but the unexpected reboot did not happen so far.
The ESP Easy mega node with only BMX280 plugin and Home Assistant (openHAB) MQTT Controller, RSSI about -82: 367533284: WD : Uptime 6126 ConnectFailures 144 FreeMem 15776 WiFiStatus 3 Sending data from the BMP280 to MQTT Controller every 15 secs.
The ESP Easy mega node with more plugins but most of time only listen to MQTT import, RSSI about -91: 535412697: WD : Uptime 8924 ConnectFailures 884 FreeMem 12624 WiFiStatus 3
In another repo, I came across some comment next to the WiFi.disconnect();
call in the Setup() function.
See the PR I just made: https://github.com/letscontrolit/ESPEasy/pull/2789
Maybe it can be tested to see if it does fix this issue? Please try this test build ESPEasy_mega-20191130-2-PR_2789.zip
Please try this test build ESPEasy_mega-20191130-2-PR_2789.zip
I've installed ESP_Easy_mega-20191130-2-PR_2789_test_ESP8266_4M_VCC.bin on a NodeMCU and will report back on the test results.
As already mentioned in another thread, I tested quickly the ESP_Easy_mega-20191130-3-PR_2792_test_beta_ESP8266_4M1M.bin and have to confirm that the issue with a limited WiFi connectivity after a warm boot is still there (looks to be related with SDK 2.2.2).
Test update:
After ~3.5 days my ESP_Easy_mega-20191130-2-PR_2789_test_ESP8266_4M_VCC.bin (on a NodeMCU) has experienced a warm / soft reboot.
The device appears to have rebooted with partial WiFi connectivity because I received an email from it that announced the reboot. But web access is broken.
Initially I saw partial ESPEasy information from the browser. But after a couple refreshes all web access stopped (browser access times out). A cold power reset has restored operation.
I just got an idea about what may be different between a cold and a warm boot for wifi reconnects. Can you test a few times (with some minutes interval in between) to run a wifi scan from the tools page? Preferably with Eco mode enabled to increase the effect I'm thinking about. Does the AP you've configured appear in the list? (both if you setup more than one)
When running the most recent builds (test build, not even nightly's, for example this one: ESPEasy_mega-20191130-17-PR_2798.zip)
then the wifi scan will also store in RTC memory the strongest AP you have configured.
So when you then click the wifi disconnect button (or command WifiDisconnect
from serial), then the unit will disconnect and reconnect to the last found strongest AP. (reconnect takes about 300 msec)
If you do this too frequently (within 5 minutes), the "next" configured AP will be selected, even though it is not the strongest signal.
So in short:
attempt > 1 && attempt modulo 2 == 0
the RTC preferred AP is deleted and the 'next' AP is chosen.OK, now the idea I have. What if we need not to perform an "active" scan, but a "passive" scan? The passive scan is when we just wait for as long as the timeout (default 200 msec for ESP8266, 300 for ESP32) for an AP to send its beacon signal. (typical 102.4 msec interval, but may differ between brands) The active scan (which we perform) does send out a "ping" to all AP's to announce themselves. So an active scan can be shorter than the timeout, but it can also result in less AP's found.
Now what I am curious about: When you perform a wifi scan from the tools page, and one or both of the APs you have configured is not listed. Then what happens when you force a WiFi disconnect? Will it try for a long time to connect to something that can hardly be reached?
What I can change:
What may affect the tests:
Restart WiFi Lost Conn:
), which does turn WiFi off and on againSo a lot to consider and I hope my braindump here is not too chaotic to follow ;)
Well, thanks for the hints to test, I'll try to find some time to read your message carefully and test at least part of the suggested things. I am not sure if the RTC can even help with the bad WiFi sensitivity after a warm boot when the SDK2.2.2 is used for a firmware build. Yes in general it's a nice feature for a quick reconnecting, if it will reliably work as designed. But couldn't somebody (of developpers) find the related difference between SDK2.2.2 and other SDKs without this issue?
Can you test a few times (with some minutes interval in between) to run a wifi scan from the tools page? Preferably with Eco mode enabled to increase the effect I'm thinking about. Does the AP you've configured appear in the list?
I tried several times over a two hour period, Eco Mode temporarily enabled. My WiFi router always appears in the list (only one router is configured on my devices).
But couldn't somebody (of developpers) find the related difference between SDK2.2.2 and other SDKs without this issue?
Well, I have not been digging deep in the differences between SDK2.x and SDK3. And even if I did, I cannot look into the WiFi code as that's proprietary and closed source. I've been debugging WiFi issues the last 20 months with "black box debugging", which does resemble the debugging style of "writeln("blaat"); writeln("blaat2");" and looking at the output.
I am not sure if the RTC can even help with the bad WiFi sensitivity after a warm boot when the SDK2.2.2 is used for a firmware build. Yes in general it's a nice feature for a quick reconnecting, if it will reliably work as designed.
The main reason I added it (apart from the possibility to save energy on battery powered nodes) is to try and fix this issue we're dealing with here. The WiFi settings stored in RTC do remain in tact with warm reboots (crashes included) and remove the need for scanning for WiFi networks. It simply knows the last BSSID and channel used and also what SSID settings were used. So the first 2 attempts will be to connect to the same AP as the last successful connection before the reboot or lost connection. This also means you are not depending on whether the AP will react during the short scan interval, which can sometimes be an issue.
I tried several times over a two hour period, Eco Mode temporarily enabled. My WiFi router always appears in the list (only one router is configured on my devices).
OK, so at least for your setup it may not be a factor to change the scan mode from active to passive.
Let me share an update with some recent FW builds:
BAD = bad WiFi sensitivity after a warm (re)boot OK = WiFi sensitivity is still the same (good) after a cold or a warm (re)boot
BAD Build:⋄ 20106 - Mega System Libraries:⋄ ESP82xx Core a04c3244, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support Git Build:⋄ Plugin Count:⋄ 82 [Normal] [Testing] Build Md5: 138327be07fcd8e807a677412dc247 Md5 check: passed. Build Time:⋄ Mar 29 2020 04:22:08 Binary Filename:⋄ ESP_Easy_mega-20200328-6-PR_2972_test_beta_ESP8266_4M1M.bin
BAD Build:⋄ 20105 - Mega System Libraries:⋄ ESP82xx Core a04c3244, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support Git Build:⋄ mega-20200328 Plugin Count:⋄ 82 [Normal] [Testing] Build Md5: b8bb1bd39cd2df423cee65ea1b81fcc Md5 check: passed. Build Time:⋄ Mar 28 2020 02:33:26 Binary Filename:⋄ ESP_Easy_mega-20200328_test_beta_ESP8266_4M1M.bin
OK Build:⋄ 20104 - Mega System Libraries:⋄ ESP82xx Core 2_6_3, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support Git Build:⋄ My Build: Mar 11 202010:22:28 Plugin Count:⋄ 37 [Normal] Build Md5: 53f44a927343c969efcca48142a883c Md5 check: passed. Build Time:⋄ Mar 11 2020 10:24:06 Binary Filename:⋄ ESP_Easy_20200311_vagrant_custom_sdk3_ESP8266_4M1M.bin
OK Build:⋄ 20105 - Mega System Libraries:⋄ ESP82xx Core 3d128e5c, NONOS SDK 2.2.2-dev(a58da79), LWIP: 2.1.2 PUYA support Git Build:⋄ mega-20200328 Plugin Count:⋄ 13 [Normal] [Minimal, IR with AC] Build Md5: 4d3a6ba6ad3029a3ed908269f9c98d83 Md5 check: passed. Build Time:⋄ Mar 28 2020 02:09:47 Binary Filename:⋄ ESP_Easy_mega-20200328_minimal_IRext_ESP8266_4M1M.bin
OK Build:⋄ 20106 - Mega System Libraries:⋄ ESP82xx Core 3d128e5c, NONOS SDK 2.2.2-dev(a58da79), LWIP: 2.1.2 PUYA support Git Build:⋄ Plugin Count:⋄ 16 [Normal] Build Md5: e47e18c32043ebacc36819bc61a8eed Md5 check: passed. Build Time:⋄ Mar 29 2020 03:38:56 Binary Filename:⋄ ESP_Easy_mega-20200328-6-PR_2972_custom_ESP8266_4M1M.bin
So it looks SDK 2.2.2-dev(a58da79) fixed the WiFi issue reported above. So far I had to use SDK 3.0.0 (which was not recommended for use) for a custom firmware builds to avoid the bad WiFi sensitivity after a warm (re)boot which happens to me a bit often due to Exceptions.
Another striking correlation is that a high plugin count correlates with bad WiFi stability.
Not sure that it has something to do with it, just that it is a surprising correlation seen in your tests.
It's interesting to me that the latest custom build also reconnects OK even after warm (re)boot with the same SDK release... So hopefully the issue is fixed and we could close this case?
OK Build:⋄ | 20106 - Mega |
---|---|
System Libraries:⋄ | ESP82xx Core 5511180c, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support |
Git Build:⋄ | My Build: Apr 11 2020 10:05:33 |
Plugin Count:⋄ | 36 [Normal] |
Build Md5: | bc161e1b8b7984d07379d96b34972be5 |
Md5 check: | passed. |
Build Time:⋄ | Apr 11 2020 10:07:20 |
Binary Filename:⋄ | ESP_Easy_20200411_vagrant_custom_beta_ESP8266_4M1M.bin |
Let's hope so. Maybe you can also test a few nightly build files, to make sure it isn't a build issue.
As already discussed in some other issue topics, it looks there's an issue in several latest firmware releases which prevents ESP node to reconnect a WiFi AP with weak signal (RSSI about -90 dB) after warm boot. After cold boot (turn power off / on) the node is connected quickly to the same AP without issue. Tested various WiFi settings on ESP but it looks still the same. The issue is Reproducible in environment where a lot of WiFi APs is visible at different distances (with a different signal level). WIFISCAN command invoked from serial console does show many APs after cold boot. After warm (re)boot due to node crash or REBOOT command only reduced AP list is returned by WIFISCAN command. So it looks node WiFi sensitivity is significantly reduced after a warm (re)boot and it makes the reconnect to AP with weak signal impossible. Latest test performed with official build:
Firmware
Build:⋄ | 20104 - Mega System Libraries:⋄ | ESP82xx Core 2_6_0, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support Git Build:⋄ | mega-20191119 Plugins:⋄ | 79 [Normal] [Testing] Build Md5: | 2a40b605d5a65d4f9cee7fc88ad790 Md5 check: | passed. Build Time:⋄ | Nov 19 2019 22:04:11 Binary Filename:⋄ | ESP_Easy_mega-20191119_test_core_260_sdk222_alpha_ESP8266_4M1M.b