xoseperez / espurna

Home automation firmware for ESP8266-based devices
http://tinkerman.cat
GNU General Public License v3.0
2.99k stars 637 forks source link

ai-light turns off after few tens of seconds #599

Closed arturol76 closed 6 years ago

arturol76 commented 6 years ago

Hi, i've flashed an AI-Thinker Wifi Light with espurna ai-thinker ai-light bins from this repo. Everything works as expected and i'm even able to control the light through MQTT (which was my original purpose) but... the light turns off after few tens of seconds. I can turn it on, but it will happen again.

I tried both 1.12.1 and 1.12.3. Same results.

I would say that the HW is fine, as i didn't notice such a strange behavior with the stock FW.

arturol76 commented 6 years ago

Additional info: it seems that the light turns off because of a reset (uptime restarts from zero).

K1nslayer commented 6 years ago

I've seen this before. It was fixed by doing a "factory reset" of the firmware. Log in via telnet and issue a "factory.reset" command, then re-setup your light. https://github.com/xoseperez/espurna/wiki/Terminal

arturol76 commented 6 years ago

Thanks! But how to connect via telnet? I've enabled telnet in the admin section of the webpage and restarted the light... Then I'm connecting with a telnet client to port 22 of the light but the connection is refused. What do i miss? Thank you for your kind help

arturol76 commented 6 years ago

nevermind, it works, it was my fault (telnet port is 23 not 22...) I've issued a factory.reset now. Let's see if the light is stable now...

arturol76 commented 6 years ago

Yep, the factory reset seems to have solved the issue. Thanks!

arturol76 commented 6 years ago

Hello. It happens again. Any clues?

Cabalist commented 6 years ago

Are you still using MQTT to turn on and off the lights? Perhaps a message is being published to turn off the light?

arturol76 commented 6 years ago

Yes I'm using mqtt to turn it on but I'm sure that the off command is not sent. I've checked it by subscribing to the topic. Moreover, when the light turns off the uptime counter resets to zero (and this suggests that something weird has happened).

Cabalist commented 6 years ago

The uptime counter resetting would point to a power reset. Does this happen across multiple bulbs? If you use a different socket does the problem follow?

arturol76 commented 6 years ago

I have only that bulb...

Cabalist commented 6 years ago

No worries! I just ask in an effort to narrow what the issue could be. Right now it very much sounds power related.

When you telnet in is there anything in the crash log?

arturol76 commented 6 years ago

Not checked actually. How to check the log?

Cabalist commented 6 years ago

Telnet in and type crash . If Espurna has crashed this will show the stack dump. Please post the results here.

arturol76 commented 6 years ago

Here it is. Btw, this time i turned it on using the web interface and not mqtt.

[TELNET] Client #0 connected crash -ERROR unknown command crash [DEBUG] Crash at 300759 ms [DEBUG] Reason of restart: 2 [DEBUG] Exception cause: 3 [DEBUG] epc1=0x40229b12 epc2=0x00000000 epc3=0x00000000 [DEBUG] excvaddr=0x40026a81 depc=0x00000000 [DEBUG] >>>stack>>> [DEBUG] 3fff18d0: 00000000 3fff196c 3fff1940 402023a5 [DEBUG] 3fff18e0: 3fff1900 3fff22ac 00000000 3fff0a80 [DEBUG] 3fff18f0: 00000000 00000000 0ccccccc 00000009 [DEBUG] 3fff1900: 00000019 00000001 00000000 3fff0a80 [DEBUG] 3fff1910: 3fffdad0 0000012c 00000000 40229bc8 [DEBUG] 3fff1920: 3fff1ba8 000008a8 000008a8 40216220 [DEBUG] 3fff1930: 3fffdad0 0000012c 3fff196c 40231051 [DEBUG] 3fff1940: 40201054 00000000 00000000 40202148 [DEBUG] 3fff1950: 3fffdad0 0000012c 00000000 4021b6a0 [DEBUG] 3fff1960: 00000000 00000000 00000000 00000000 [DEBUG] 3fff1970: 00000000 00000000 3fff04c9 4021a947 [DEBUG] 3fff1980: 000000ff 00000000 3fff04f0 40224ea6 [DEBUG] 3fff1990: 3fff19b0 3fff22ac 00000000 3fff1a40 [DEBUG] 3fff19a0: 00000085 3fff090c 3fff08f0 40213924 [DEBUG] 3fff19b0: 45532d4d 48435241 48202a20 2f505454 [DEBUG] 3fff19c0: 3fff1ba8 000008a8 000008a8 4010020c [DEBUG] 3fff19d0: 00000000 3ffe8659 00000001 4010068c [DEBUG] 3fff19e0: 402304ed 00000000 3fff0a90 00000000 [DEBUG] 3fff19f0: 3fff090c 3fff1a40 40202ed8 3fff0a90 [DEBUG] 3fff1a00: 3fff6c8c 4774656e 00000064 40202f50 [DEBUG] 3fff1a10: 65636976 0a0d313a 3a6e614d 40202f90 [DEBUG] 3fff1a20: 401047b2 0598fd41 3fff6524 40206387 [DEBUG] 3fff1a30: 4010486e 3ffef0dc 0598fd41 00000000 [DEBUG] 3fff1a40: 40104a40 0598fd41 3fff1b00 00000000 [DEBUG] 3fff1a50: 3ffef100 3fff1b00 3fff0a90 3fff1b00 [DEBUG] 3fff1a60: 3fffdad0 3fff0a80 40202ed8 3fff0a90 [DEBUG] 3fff1a70: 3fffdad0 00000000 3fff04f0 40225038 [DEBUG] 3fff1a80: 00000000 00000000 3fff0a79 4022504c [DEBUG] 3fff1a90: 3fffdad0 00000000 3fff0a79 40202f24 [DEBUG] 3fff1aa0: feefeffe feefeffe 3fff0a90 402045a8 [DEBUG] <<<stack<<< +OK

xoseperez commented 6 years ago

"Crash at 300759 ms" might point to a crash when doing a heartbeat report. I have one bulb that exhibits this behavior. Don't ask me why but that specific bulb crashes when calling ESP.getFreeHeap().

Steps to reproduce:

If this is the case, the is a special "hacky" solution I added for that specific lightbulb:

Fix:

Now try again the steps to reproduce, the heap value will be 9999 (useless) but the bulb should not reboot and keep normal operation.

https://github.com/xoseperez/espurna/blob/68e05343501eb2bae21ec384bd382b156edc04ac/code/espurna/utils.ino#L46-L52

arturol76 commented 6 years ago

Hi. I did as you suggested but the "heap" command did not cause a crash/reboot. I tried several times with the same result (i.e. no crash).

Then, I sent "set wtfHeap 1" command anyway (and "heap" returned "9999" as expected). I turned on the light and it is still on since more than 30 mins. It's probably my record.

That's not in line with your expectations probably but it is an improvement anyway.

I will do some more tests and let you know. Thank you so much for your kind help!

arturol76 commented 6 years ago

Hi. The "wtf" worakround works! As suggested above, i telnet-ed in and sent command "set wtfHeap 1". The uptime is now 4 days, that means no more reboots since then.

For the records, my bulb is a EECOO/Haofy bulb, based of course on AiThinker design.

Do you have plans to make this workaround officially (and more easily) available through the web interface and/or through the API? It would be great to enable/disable it through MQTT for instance.

Thank you so much for your great support!

xoseperez commented 6 years ago

Well, as you can imagine, it's kind of hackish... it simply should not fail but it does. I will raise an issue to the Arduino Core project with this...