xoseperez / espurna

Home automation firmware for ESP8266-based devices
http://tinkerman.cat
GNU General Public License v3.0
2.99k stars 637 forks source link

Sonoff POW factory resets by itself, probably when power goes out #1778

Open schweini opened 5 years ago

schweini commented 5 years ago

Maybe this is related to #584 .

I have many Sonoff POWs in the field. At some locations, various (but not all) units just stop sending data, and when I go there to investigate, they seem completely factory reset. Especially the wifi configuration is lost. The lack of wifi configuration seems especially weird to me, because I am under the impression that the ESP8266 has it's own little memory for the last valid wifi config.

I have noticed that, if multiple units fail at a location, they often do it simultaneously. So some external factor, probably power issues (I'm in a developing country) has something to do with this.

They are running the latest dev branch. All my config options set at compile time are ok (e.g. MQTT server, and so on), but settings set after compilation are lost.

This has happened many, many times, and is quite a nuisance.

While this would be sub-optimal, would pre-configuring a wifi network at compile time (if possible) and doing a custom compilation for the affected locations maybe help in this case?

mcspr commented 5 years ago

What do info and crash command say about latest boot status and exception after that happens? If it is related to #584 and crash recorder, stack trace might be quite large. Is it POWr2 or old POW? What Arduino Core version is used to build the firmware?

BTW, there are bulid-time wifi settings. See WIFI1_SSID, WIFI1_PASS etc. Espressif SDK does have wifi settings storage, but we use justwifi instead of it to support multiple network configurations (similar to wifimanager, Core's wifimulti libraries).

schweini commented 5 years ago

This is happening to POW version 1 units right now. Is suspect that I've seen it on POW2 units, but am not 100% sure tight now.

I sadly can't accessed the crash logs, be ause the units are far away, offline and connected to AC.

Various units do stop sending data at exactly the same time - as if there was as power surge or mini blackout in some, but not all, rooms.

The strange thing is that, in the same building, some units are super stable and ok, but others are problem children that get stuck in factory reset mode about each week - even though they all run the same firmware, and we're all bought at the same time.

What I don't understand is how, after e.g. a watchdog induced reboot, the wifi settings are gone, without having executed a factory reset.

On Mon, Jun 17, 2019, 12:02 Max Prokhorov notifications@github.com wrote:

What do info and crash command say about latest boot status and exception after that happens? If it is related to #584 https://github.com/xoseperez/espurna/issues/584 and crash recorder, stack trace might be quite large. Is it POWr2 or old POW? What Arduino Core version is used to build the firmware?

BTW, there are bulid-time wifi settings. See WIFI1_SSID, WIFI1_PASS etc. Espressif SDK does have wifi settings storage, but we use justwifi instead of it to support multiple network configurations (similar to wifimanager, Core's wifimulti libraries).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xoseperez/espurna/issues/1778?email_source=notifications&email_token=ABOF3LZNFEGO5MGMGFRKI53P27GR3A5CNFSM4HYRYLI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX37IJQ#issuecomment-502789158, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOF3LYDYIIFO2G7XJUV2OTP27GR3ANCNFSM4HYRYLIQ .

philkry commented 5 years ago

I have experienced the same symptoms, however i am very certain it has nothing to do with a power outage (living in Germany) in my case. I'm using two zhilde-eu44-w powerstrips connected to a dual wall socket. today only one of them was not accessible anymore and i found the ESPURNA wifi being available again. it completely reset itself to factory defaults. the second powerstrip does not show this behaviour, both are running 1.13.3.

i just realized that i re-wired my wifi AP yesterday, maybe it is related to this (e.g. failed to reconnect -> factory reset?)

schweini commented 5 years ago

Interesting!

Because, just like in my case, SOME units are affected by this (delteing their wifi config), and some are not.

Have you tried 'hard coding' the wifi config at compile time?

On Wed, Jun 19, 2019 at 2:28 AM philkry notifications@github.com wrote:

I have experienced the same symptoms, however i am very certain it has nothing to do with a power outage (living in Germany) in my case. I'm using two zhilde-eu44-w powerstrips connected to a dual wall socket. today only one of them was not accessible anymore and i found the ESPURNA wifi being available again. it completely reset itself to factory defaults. the second powerstrip does not show this behaviour, both are running 1.13.3.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xoseperez/espurna/issues/1778?email_source=notifications&email_token=ABOF3L6J74ASLXFRXLRNYKDP3HUZ3A5CNFSM4HYRYLI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYBDM3Q#issuecomment-503461486, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOF3LY3EL6BB5WJSPXMHDTP3HUZ3ANCNFSM4HYRYLIQ .

mcspr commented 5 years ago

Huh. If both 1.13.3 is 1.13.6-dev are affected, I think I messed up in fixing the crash log. We can try turn off the stack trace writing part, or limit how much of it is written via some runtime flag.

@schweini did you use self-built binaries or releases from here? if self-build, which version of espasynctcp did you use and does it include this https://github.com/me-no-dev/ESPAsyncTCP/commit/0450e61 change? (based on #1752)

@philkry If it is indeed crash-related, which exception was the latest? crash shows last recorded exception and stack trace. info will show only the current boot status.

ristomatti commented 5 years ago

I've experienced this with a BlitzWolf SHP6 running 1.13.5 downloaded from the releases page. The device was plugged in at our summer home where brief power outages are common (either lights dimming for a fraction of a second or 5-60 minutes if a power line is broken by a storm nearby).

I've got three Sonoff Pow R2, one Sonoff Basic and two D1 Mini's with a relay shield running Espurna which were not affected. Unfortunately I cannot pinpoint the day when it happened as I was just testing the device and had not linked it to any external system.

The same has happened once with the Sonoff Basic a year ago running version 1.12.6a but not after that one occasion.

I wonder if this could be a hardware related issue, e.g. a brief power spike erasing the EEPROM?

schweini commented 5 years ago

I have tried setting at least one WIFI config in custom.h, and that seems to have worked - i.e. hardcoded wifi settings do not reset themselves. OTOH, this might be because they are running a slightly more up-to-date version, because of the recompile.

On Wed, Jul 24, 2019 at 2:06 AM Ristomatti Airo notifications@github.com wrote:

I've experienced this with a BlitzWolf SHP6 running 1.13.5 downloaded from the releases page. The device was plugged in at our summer home where brief power outages are common (either lights dimming for a fraction of a second or 5-60 minutes if a power line is broken by a storm nearby).

I've got three Sonoff Pow R2, one Sonoff Basic and two D1 Mini's with a relay shield running Espurna which were not affected. Unfortunately I cannot pinpoint the day when it happened as I was just testing the device and had not linked it to any external system.

The same has happened once with the Sonoff Basic a year ago running version 1.12.6a but not after that one occasion.

I wonder if this could be a hardware related issue, e.g. a brief power spike erasing the EEPROM?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xoseperez/espurna/issues/1778?email_source=notifications&email_token=ABOF3LZYKZUUM4DS7XGDOMTQBAEQNA5CNFSM4HYRYLI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VQ2KY#issuecomment-514526507, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOF3L63A3JJCRWLH3AEFPLQBAEQNANCNFSM4HYRYLIQ .

ristomatti commented 5 years ago

I wonder if every user interface setting is possible to be defined as a build flag/parameter? I'm thinking if the issue could be circumvented altogether by orchestrating flashing & OTA updates to all the devices from platformio.ini file. In my case I rarely need to touch the settings after the initial configuration and currently all devices running Espurna are reachable from a single network (through VPN).