Open hutch120 opened 7 years ago
If I read this correctly you are storing a copy of the connection parameters elsewhere and restoring them when they are mysteriously disappearing. Is that correct?
If so, connection parameters shouldn't be mysteriously disappearing. One possibility is that they are disappearing due to flash corruption. Flash corruption has been an annoying problem due to a bug in the Espressif library. I found the failure modes to be many and varied and some flash corruptions will never be fixed during normal use and reprogramming. If that happens there are a few ways to erase the flash, one of which is a sketch at https://github.com/kentaylor/EraseEsp8266Flash .
@hutch120, I'm wondering if your underlying problem is corrupted flash.
Hi @kentaylor, thanks for the feedback!
I'm not ruling out flash corruption, but this issue seems to be reproducible, and seems to be occurring on a few of the units I have. I'm still putting the final touches on the code, and will be rolling out to a few more units over the coming days, so will report on findings.
Note that once I implemented the code above (writing details to EEPROM) it never forgets, and always connects on restart.
@hutch120, flash corruption will not be fixed by reprogramming. Once it's occurred it's effects will be reproducible, so be sure to try erasing flash.
Connection parameters are in a part of flash that user code does not access and should not mysteriously disappear. They are meant to be permanent until altered, similar to your store. If they are not permanent until changed through calls to the Espressif library then it would be better if that was fixed rather than having another more permanent store to back them up, which of course would be a useful workaround until the underlying problem was fixed. Previously I've found it was sometimes impossible to write connection parameters at all, which I referred to as bricking the ESP. I had to erase the flash first, so it is also possible your workaround will fail if the underlying problem manifests itself this way for you.
If connection parameters are being lost and it is not due to previous flash corruption then there is an error in the Espressif libraries that should be reported and @tablatronix has had some success with reporting issues previously. I am not seeing this loss of connection parameters across reboots but that doesn't mean it doesn't occur. It could be just rare as was the previous problem or you are doing something different to trigger it. I was amazed that the previous issue was not discovered and fixed earlier. It meant thousands of people have used these things without noticing.
You should be using version 2.3.0 of the Arduino library which is the first to have this workaround for the underlying bug that as far as I'm aware is still in the Espressif library. That workaround was supposed to fix the flash corruption problem but maybe it hasn't in all circumstances.
In my case, the loss of saved credentials is immediate. I don't need to restart ESP 2 times to get it lost. From the WiFiManager configuration, I push on save, and WiFiManager switches to STA mode and connect (successfully). After that, I reboot the ESP, and this one goes back to AP mode, because of blank SSID and psk.
Do you have flash size set correctly in ide ? Certainly does sound like one of the flash corruption bugs in the sdk.
I will check tonight. But, anyway, this article shows a way to check the flash size of our ESP modules: http://www.esp8266.com/viewtopic.php?p=14601&sid=0e7cccec1d8cb4d757ce0c391f4cab7c#p14601
I had the wrong size selected, indeed (was configured for 512k flash). Now it is correctly selected (4M), I did verify with CheckFlashConfig script, but anyway WiFi persistence still doesn't work at all.
did you erase flash with esptool to fix the corruption ?
no, how should I do this?
@hutch120 it has occurred to me that if you have lost faith in the permanence of the Espressif WiFi connection parameter store you could improve your algorithm to not use it all.
If you call WiFi.persistent(false) before you call WiFi.begin(SSID,password) it will not store the WiFi parameters. Then there is no need to check if your parameters match those in the Espressif store and you remove the risk of bricking that I referred to previously. The cost is, it takes a little longer to connect to WiFi as the connection process normally commences prior to the Arduino sketch being launched.
@kentaylor Thanks for the tip!! You're right, I do feel much better knowing my code is now responsible for save/load of credentials, I didn't consider disabling the other method, I was thinking it was nice have redundancy, but given the bricking possibility, I will disable it. BTW: Sorry I've gone quiet on this, I've been dragged onto another project for the next couple of weeks. I'll definitely look into this, and into erasing the flash memory when I get back.
I have experience the exact problem you are referring to. My esp8266 devices will drop their saved ssid & password if they fail to connect to the AP. I can reproduce this easily. This really seems to be a problem if there is a power outage. My access point takes several minutes to boot up, but my esp devices are nearly instant.
I am thinking of doing something similar to you and save the credentials manually. As to not reinvent the wheel though, did you settle on some code that you would be willing to share?
Hi @erocm123
It looks like there are others working on this issue and committing back to master branch, so maybe look at that in the first instance.
If you are still keen to go your own path like I did, then essentially the code I posted above is what I'm running and it has been running non-stop on about 10 ESP modules since I wrote it about 3 weeks ago, so I'm pretty happy with it.
If this is still an open issue early next year, or there is more interest, then when I get some time I'll write a pull request, but please don't wait, I'm not sure when I'll be able to do that, probably not until at least January, and as I said, it looks like this is being examined in the core code anyway, and hopefully eventually will be fixed and this workaround will be redundant.
@hutch120 , thanks for your response. I ended up using much of your code but with just a slightly different approach. I call getWIFIConfig() after wifiManager makes a successful connection. That way, the credentials are saved in EEPROM right after the device connects instead of on the next reboot. I have also put some safeguards in to make sure the wifi settings aren't written if they have not been changed.
Nice one! @erocm123 if you have time consider adding a pull request. :)
I dont understand these notes arent these things already done in the arduino sdk are they not? Arent credentials only saved only if not matching. Also the credentials are not saved after reboot they are just not read until next reboot.
Discussed here is a workaround to a problem that occurs when wifi config is being "erased". Yesterday was a good example for me. I had a power outage which caused all of my devices to reboot. My access point takes about 5 minutes to boot up while my ESP devices boot and connect (try to connect) almost immediately. When I got home all ESP devices were broadcasting their AP for configuration. I tried rebooting them and they still were broadcasting and it was required that I connect to them and re-configure them to connect to the AP.
That doesn't seem like appropriate behavior (maybe it is?). I would expect it to timeout connecting and start the AP (but not delete the saved credentials until new ones are tried). Then a timeout could happen on the AP and it could try to connect again. This process in a loop would be desirable.
So, as a work around I am saving the credentials myself so they do not get "erased" after failed connection attempts. This allows me to replicate the looping behavior above.
Yeah this is interesting have you tried stable branch and kens fork?
Sounds like something we could reproduce easily to find the bug. I am guessing this is something to do with wifimanager trying to manually deal with connecting and not letting the sdk do it
In my case I can reproduce it 100%. If I unplug my access point and reboot my ESP devices then wait several minutes, they fail to connect and their wifi settings disappear.
I will try the branch / fork you mention to see if that makes a difference. Thank you for the help.
So it sounds like it connection fails for any reason after boot the settings are erased. Umm is the source calling wifi.disconnect at all? Ill try to duplicate this tomorrow
Has there been any progress on this issue? I'm seeing the same behavior. After a power outage, the devices go into config mode and have to be manually reconfigured.
I call getWIFIConfig() after wifiManager makes a successful connection. That way, the credentials are saved in EEPROM right after the device connects instead of on the next reboot.
@erocm123 I like that idea. Can I talk you into posting your working code here?
I will try the branch / fork you mention to see if that makes a difference. Thank you for the help.
@erocm123 were you able to test that fork?
Well, I shortly found out later that my change caused additional problems and had to reverse it. I am still looking for a solution.
@hutch120 how is your solution holding up?
@Daemach Thanks for the inquiry, unfortunately I don't have much good news.
Firstly, I haven't had an opportunity to look at this in the detail it needs, so the following are just some general comments.
I think the solution I presented here is working fine. However, since getting some beta test sites running, they have reported that the devices fail to report events to the web server after a while (weeks maybe), and need a full power cycle to come back, despite having an internal timer that reboots them once a week. I'm not certain why this is, and I doubt it has anything to do with this module.
Further, I'm finding it really hard to work on this product without any device feedback in the wild. That is, some way to provide error logs when it isn't on my bench. Which is why it is hard to make any definitive comment about this new issue, which as an engineer irritates me no end.
If I do get back to this, it won't be for at least another couple of months.
are you guys settings timeouts on configportal ?
wifiManager.setConfigPortalTimeout(60);
If you don't the default it to wait for config indefinitely on connect failure.
Also consider increasing setConnectTimeout
value also
I did wifiManager.setConfigPortalTimeout(180);
It is wdt crashing on time out though...
Do you get exception ? Which branch ?
wdt exception. I am using the master branch.
Whats the exception ? must be something in your code, not wifimanager.
I will double check to verify, but I'm pretty sure that using setConfigPortalTimeout:
a) prevents the AP from appearing after initial flash for the duration used with that method.
b) still has the problem mentioned in this thread. If I set the timeout to 60 seconds, but it takes an access point 240 seconds to fully boot, then the credentials still get wiped.
Like I said, I will double check, but I believe this is what I discovered when I first started troubleshooting this problem.
timeout just keeps from getting stuck in config after an outage, ideally you can do like thetimeout
example and simply esp restart or use your own loop or condition to start portal. Which is what one problem was describing above (daemach). It does exactly what its called, timeout the config portal after it starts.
Not sure why you have ap appearing issues, and i have yet to reproduce flash cred wipes, so I am narrowing it down until I can reproduce, noone has produced a simplified sketch to reproduce this issue yet.
heres my test sketch, cannot reproduce loss of credentials. using master branch
#include <ESP8266WiFi.h> //https://github.com/esp8266/Arduino
//needed for library
#include <DNSServer.h>
#include <ESP8266WebServer.h>
#include <WiFiManager.h> //https://github.com/tzapu/WiFiManager
void espInfo(Stream & consolePort){
system_print_meminfo();
consolePort.print(F("system_get_sdk_version(): "));
consolePort.println(system_get_sdk_version());
consolePort.print(F("system_get_boot_version(): "));
consolePort.println(system_get_boot_version());
}
void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
Serial.setDebugOutput(true);
delay(500);
Serial.println("\nStartup");
espInfo(Serial);
WiFi.printDiag(Serial);
//WiFiManager
//Local intialization. Once its business is done, there is no need to keep it around
WiFiManager wifiManager;
//reset settings - for testing
// wifiManager.resetSettings();
//sets timeout until configuration portal gets turned off
wifiManager.setTimeout(60);
//fetches ssid and pass and tries to connect
//if it does not connect it starts an access point with the specified name
//here "AutoConnectAP"
//and goes into a blocking loop awaiting configuration
if(!wifiManager.autoConnect("AutoConnectAP")) {
Serial.println("failed to connect and hit timeout");
delay(3000);
//reset and try again, or maybe put it to deep sleep
ESP.reset();
}
//if you get here you have connected to the WiFi
Serial.println("connected...yeey :)");
}
void loop() {
// put your main code here, to run repeatedly:
}
The only reason for loss of credentials is if something in your code is calling disconnect
or one of the other methods, like disabling persistent
.
If you have devices in the field and need to debug them , you can get information such as reboot reason, or log reboots, use analog in to measure internal voltage and record various things to eeprom or send to you when online I suppose.
Are you using eps8266 stable ?
Hi Guys,
The most obvious way I've been able to see this issue in action is to view the output from WiFi.SSID().
Consider adding the output from WiFi.SSID() to the espInfo function in the test sketch. Note that WiFi.SSID() is used by WiFiManager.cpp line 245 to determine if there are saved credentials.
So, given WiFi.SSID() is being output, then to reproduce using the test sketch above you might follow these steps: 1) Connect to an Access Point in under the timeout (60 seconds in the test sketch), 2) Check it worked (reboot). It should output the SSID of the last connected AP on boot. 3) Try to break it, reboot/AP powercycle/etc... if WiFi.SSID() ever returns nothing then you've reproduced the issue.
Note that in the workaround I've presented I extracted the WiFi.SSID() code from ESP8266WiFiSTA.cpp line 476 in order to detect the SSID, but you could just used WiFi.SSID(). I haven't tested the following code, but should be able to drop into the test sketch.
if (WiFi.SSID() == "") {
Serial.println("============= OH NO!!! LOST SSID =============");
} else {
Serial.println("SSID still available...yeey :)");
}
As far as reproducing the issue, my sketch already outputs info using printdiag(). The problem isnt debugging output, it is reproducing, which I have yet to see any problems.
Agreed, reproducing it is the issue. I didn't see you had that code, yes, WiFi.printdiag would also show the missing SSID... just wanted to be clear on the steps I used, and sometimes helps to be super clear in the debug messages if the issue occurred, using the exact same check used in the module.
I have not tested kens branch yet, but rebooted hundreds of times in and out if configportal no probs.
kens branch is unstable for me, my esp fails to connect after every boot, then flash gets wiped. There is something going on with memory, i think it has something to do with doing the scannetworks while the esp is connecting. I can reproduce with his branch, no idea why he is doing a WiFi.scanNetworks(); on constructor.
I have a feeling there is something wrong with that code, or there is another memory overflow, cause as soon as I put his branch on my board it went to crap. exceptions all over the place. Gonna try erasing flash and doing some memory dumps
I still cannot narrow down, wtf is causing it to not connect, and what is subsequently wiping config.
sigh, what a damn waste of time, scannetworks calls disconnect! I thought I added debug there but I must have reverted it. Well there you go.
int status = wifi_station_get_connect_status();
if(status != STATION_GOT_IP && status != STATION_IDLE) {
WiFi.disconnect(false);
}
ok so this is a race condition or some other sdk bug.
basically if you call scannetworks
either BEFORE sta has connected (race) or if there is one of the issues where you do not get an ip STATION_GOT_IP
which is essentially WL_CONNECTED
then scannetworks will do a disconnect and wipe your flash.
I say or some other bug, because I have seen issues where my wifi status is always 0, for some whack reason, maybe channel incompatibility ( see the many issues in sdk about failure to connect to ap after doing scans, or using ap, theres even one with bunk beacon frames from some routers. )
Kens branch does an implicit scannetworks in the constructor ( as soon as you load the library basically ) This occurs BEFORE his connect timeout loop! He seems to be caching it up front for some unknown reason, can probably be fixed since I doubt it is necessary.
Work around is to wrap scannetworks in a persistent false block.
I opened this https://github.com/esp8266/Arduino/issues/2946
proof ( race condition , so it might not always work depending on router speed and handling )
void startup(){
Serial.println(WiFi.begin("MYAP",""));
WiFi.printDiag(Serial);
// remove this wait loop and watch your flash go bye bye
while (WiFi.waitForConnectResult() != WL_CONNECTED) {
delay(500);
Serial.print(F("."));
}
int n = WiFi.scanNetworks();
WiFi.printDiag(Serial); // blank if you did not wait
}
oh and a good way to know there is some kind of problem you might see
*WM: AutoConnect
*WM: After waiting
*WM: 0.00 <-- 0 seconds , da fuq????
@tablatronix Thanks for your persistence (sorry for the pun)!
Maybe I'm wrong, but I thought persistent was just to do with if flash should be written everytime it connects or just if the values of SSID and password are different, maybe should have been called persistAlways(true|false) (that is, does a read first, rather than just blindly write.) So almost always should be set to false to avoid flash wear issues.
Also, I thought WiFi.disconnect(false) was safe, but WiFi.disconnect(true) wiped the settings?
I've been hunting around trying to find the some evidence for these assumptions (source code), but I can't seem to locate references. I did find this doco regarding use of WiFi.persistant. https://github.com/esp8266/Arduino/blob/4897e0006b5b0123a2fa31f67b14a3fff65ce561/doc/esp8266wifi/generic-class.md
The documentaion might need to be clearer on this, maybe ill fix it, persistent means save to flash anything passed to begin
for the most part. It also means it erases flash when doing stuff like calling disconnect.
The problem with persistent is that disconnect wipes flash when true. There are no flash wear issues unless you constantly switch modes or disconnect and begin over and over. These are side affects.
Using persistant with normal begin is fine it checks before saving or never saves if persistent is false.
But if you got advanced stuff like this lib then you have to take considerations or use sdk level code. In this case it was an oversight as disconnect might be used in various places noone had been looking.
@tablatronix Thanks for the explanation.
So, if I've got this right, the disconnect function in ESP8266WiFiSTA.cpp either calls wifi_station_set_config which clears flash if persistent is true, else calls wifi_station_set_config_current, which doesn't clear the flash settings. And disconnect is called unwittingly by this module via the WiFi.scanNetworks() function call.
In that case, maybe a fix implemented in this module could be achieved by manipulating WiFi.persistent before and after the WiFi.scanNetworks() function call?
Along these lines in WiFiManager.cpp
WiFi.persistent(false)
n = WiFi.scanNetworks();
WiFi.persistent(true)
Yeah thats what I suggested
Work around is to wrap scannetworks in a persistent false block.
Hi Guys,
Firstly, I'm a big fan of this library, thanks @tzapu and @kentaylor
I'm currently running the @kentaylor version of the manager, but I believe this issue is relevant for @tzapu version too.
I've found that the WiFi config does not persist across multiple restarts. That is, WiFi.SSID() returns nothing after a couple of failed connection attempts/timeouts and hence gets into a state where it will never connect to the last known WiFi and needs to be manually reconfigured. Note that this can also be observed by looking at the output from WiFi.printDiag(Serial); and turning off the AP you are trying to connect to, restart x2.
I have tried a few things to get this going in a simple way such as calling WiFi.persistent(true); and ensuring WiFi.disconnect(true); (which clears settings) is not run, but none of these things seem to have any effect.
So, in my efforts to get this going, I've essentially implemented my own SSID and Password persist code and thought I'd share.
The code below are extracts from my working code to show just essential functions.
Firstly I modified the WiFiManager function startConfigPortal to allow the caller to optionally specify a Station SSID and Password to be used only if the timeout is reached (does not run if normal connect occurs).
WiFiManager.h:
And added this into startConfigPortal just where it drops out from the while loop.
WiFiManager.cpp
Then, I added the following to my module (but maybe this could be encapsulated in the WiFiManager?