visualapproach / WiFi-remote-for-Bestway-Lay-Z-SPA

Hack - ESP8266 as WiFi remote control for Bestway Lay-Z spa Helsinki
GNU General Public License v3.0
283 stars 72 forks source link

4-Wire 54138 Watchdog timer reboot causing E13 #742

Open DeanozUK opened 1 month ago

DeanozUK commented 1 month ago

Describe the bug After install board seems to be stable with no issues until I select 4wire 54138 on CIO and DSP and reboot. While the board is idle (not even connected to pump) it will periodically reboot and when checking reason it says watchdog

To Reproduce Steps to reproduce the behavior:

  1. After install goto hardware config and select 54138 for CIO and DSP and also select v2b
  2. Reboot ESP and setup constant ping to device
  3. Goto web page home page (maybe change temp)
  4. page might drop out. I left it idle for sometime while on monitor via usb and I will post the log

Expected behavior Expect the unit to be stable when idle and when connected to unit

Screenshots ESP WDT Error.txt

Hardware (please complete the following information):

Software (please complete the following information):

Additional context When its connected to the pump after startup I will just leave it without touching any buttons. only load the web page and after short amount of time it will E13. Also tried 2 other ESPs and the beta version to no effect.

Below is the result from USB Monitor.

Writing cmdq.json (I was changing temp on web page) Done! Writing cmdq.json Done! Writing cmdq.json Done! Writing cmdq.json Done! Writing cmdq.json Done! Writing cmdq.json Done!

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Soft WDT reset

Exception (4): epc1=0x40234f9c epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

Level1Interrupt: Level-1 interrupt as indicated by set level-1 bits in the INTERRUPT register
epc1=0x40234f9c in AsyncTCPbuffer::write(unsigned char const*, unsigned int) at ??:?

stack>>>

ctx: cont sp: 3fff1d40 end: 3fff2040 offset: 0160 3fff1ea0: 00000000 00000066 89374bc6 07f64dd6 3fff1eb0: 4010a85b 0006e2db 00000000 40109bdc 3fff1ec0: 4010a85b 00000066 0000012f 4021f402 3fff1ed0: 0006da50 0000012f 40284928 00000004 3fff1ee0: 40284930 00000004 40284938 40109bac 3fff1ef0: 4010a72c 40109bdc 00000195 4021f6d1 3fff1f00: 00000001 4028467c 00000005 40284684 3fff1f10: 95017e81 00000000 00000000 00000000
3fff1f20: 00000000 40284660 00000004 402846a4 3fff1f30: 00000000 00000004 00000001 00000001 3fff1f40: 00000001 00000001 00000000 402846cc 3fff1f50: 00000006 402846d4 00000004 4010a72c 3fff1f60: 40109bdc 00000195 40109bac 4021e761 3fff1f70: 00000000 402846c4 00000004 40283a0c 3fff1f80: 00000001 4010a000 00000000 3ffefca4 3fff1f90: e1f3b4e6 41a998c4 3fff1fb0 3ffefca4 3fff1fa0: 00000001 3ffef85c 3ffef6bc 4020d163 3fff1fb0: 4010a72c 0195019f 80000000 00000000 3fff1fc0: 00000001 3ffef938 3ffef85c 4021c72d 3fff1fd0: 401081d4 3ffefb14 401081d4 4021b181 3fff1fe0: 00000000 000f000f 00000000 00000000 3fff1ff0: 00000000 611e0a0a feefeffe feefeffe 3fff2000: 401081d4 feefeffe feefeffe 3ffefca4 3fff2010: 3fffdad0 00000000 3ffefc78 3ffefca4 3fff2020: 3fffdad0 00000000 3ffefc78 402310d0 3fff2030: feefeffe feefeffe 3fffdab0 401014a1 <<<stack<<<

0x4021f402 in WebSockets::write(WSclient_t, unsigned char, unsigned int) at ??:? 0x40284928 in etharp_output at ??:? 0x40284930 in etharp_output at ??:? 0x40284938 in etharp_output at ??:? 0x4021f6d1 in WebSockets::sendFrame(WSclient_t, WSopcode_t, unsigned char, unsigned int, bool, bool) at ??:? 0x4028467c in etharp_output at ??:? 0x40284684 in etharp_output at ??:? 0x40284660 in etharp_output at ??:? 0x402846a4 in etharp_output at ??:? 0x402846cc in etharp_output at ??:? 0x402846d4 in etharp_output at ??:? 0x4021e761 in WebSocketsServerCore::broadcastTXT(unsigned char*, unsigned int, bool) at ??:?
0x402846c4 in etharp_output at ??:? 0x40283a0c in etharp_output at ??:? 0x4020d163 in sendWS() at ??:? 0x4021c72d in loop at ??:? 0x4021b181 in setup at ??:? 0x402310d0 in loop_wrapper() at core_esp8266_main.cpp:? 0x401014a1 in cont_wrapper at ??:?

--------------- CUT HERE FOR EXCEPTION DECODER --------------- 4#-a|(␕␗�P␚v␛X �~h~u␗Xh 4�r(␑,+,#A␑␑@�␕4nzAl )z�~,j �r(␗%␑␙@#A.,��␎|l )z�~,PZ8�~,PZ�P,U.4/'�rf cal sector: 1020 freq trace enable 0 rf[112] : 0� Start Millis: 91 @ line: 41 startWiFi() @ millis: 328 Setting static IP WiFi > using static IP t(�?␏ on gateway 10.10.30.1 WiFi > Using WiFiManager Config Portal WM: WM: AutoConnect WM: Connecting as wifi client... WM: Status: WM: 0 WM: Using last saved values, should be faster got IP: 10.10.30.97 start NTP WS IRamheap 9296 IRamheap 8048 startmqtt Failed to read mqtt.json. Using defaults. WM: Connection result: WM: 3 WM: IP Address: WM: 10.10.30.97 *WM: freeing allocated params! End of setup() Millis: 3460 @ line: 73 33072

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Soft WDT reset

Exception (4): epc1=0x402316c9 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

Level1Interrupt: Level-1 interrupt as indicated by set level-1 bits in the INTERRUPT register
epc1=0x402316c9 in __delay at ??:?

stack>>>

ctx: cont sp: 3fff1d60 end: 3fff2040 offset: 0160 3fff1ec0: 4010a846 0000004a 0000013a 4021f41e 3fff1ed0: 00066203 0000013a 40284928 00000004
3fff1ee0: 40284930 00000004 40284938 40109bac 3fff1ef0: 4010a70c 40109bdc 00000184 4021f6d1 3fff1f00: 00000001 4028467c 00000005 40284684 3fff1f10: 84017e81 00000000 00000000 00000000 3fff1f20: 00000000 40284660 00000004 402846a4
3fff1f30: 00000000 00000004 00000001 00000001 3fff1f40: 00000001 00000001 00000000 402846cc 3fff1f50: 00000006 402846d4 00000004 4010a70c 3fff1f60: 40109bdc 00000184 40109bac 4021e761 3fff1f70: 00000000 402846c4 00000004 40283a0c 3fff1f80: 00000001 4010a000 00000000 3ffefca4 3fff1f90: 7b6eb30a 41a998c2 3fff1fb0 3ffefca4 3fff1fa0: 00000000 3ffef85c 3ffef6bc 4020d163 3fff1fb0: 4010a70c 0184018f 80000000 00000000 3fff1fc0: 00000000 3ffef938 3ffef85c 4021c72d 3fff1fd0: 401081d4 3ffefb14 401081d4 4021b181 3fff1fe0: 00000000 000f000f 00000000 00000000 3fff1ff0: 00000000 611e0a0a feefeffe feefeffe 3fff2000: 401081d4 feefeffe feefeffe 3ffefca4 3fff2010: 3fffdad0 00000000 3ffefc78 3ffefca4 3fff2020: 3fffdad0 00000000 3ffefc78 402310d0 3fff2030: feefeffe feefeffe 3fffdab0 401014a1 <<<stack<<<

0x4021f41e in WebSockets::write(WSclient_t, unsigned char, unsigned int) at ??:? 0x40284928 in etharp_output at ??:? 0x40284930 in etharp_output at ??:? 0x40284938 in etharp_output at ??:? 0x4021f6d1 in WebSockets::sendFrame(WSclient_t, WSopcode_t, unsigned char, unsigned int, bool, bool) at ??:? 0x4028467c in etharp_output at ??:? 0x40284684 in etharp_output at ??:? 0x40284660 in etharp_output at ??:? 0x402846a4 in etharp_output at ??:? 0x402846cc in etharp_output at ??:? 0x402846d4 in etharp_output at ??:? 0x4021e761 in WebSocketsServerCore::broadcastTXT(unsigned char*, unsigned int, bool) at ??:?
0x402846c4 in etharp_output at ??:? 0x40283a0c in etharp_output at ??:? 0x4020d163 in sendWS() at ??:? 0x4021c72d in loop at ??:? 0x4021b181 in setup at ??:? 0x402310d0 in loop_wrapper() at core_esp8266_main.cpp:? 0x401014a1 in cont_wrapper at ??:?

--------------- CUT HERE FOR EXCEPTION DECODER --------------- 4#-a|(␕�r�dNX �~h~u␗Xh 4�"(␑,+,#A�␑@�␕4nzAl )z�~,j �r(␗%␑␙@#A.,�␎|l )z�~,PZ8�~,PZ�P,U.4/N�rf cal sector: 1020 freq trace enable 0 rf[112] : 0� Start Millis: 91 @ line: 41 startWiFi() @ millis: 334 Setting static IP WiFi > using static IP t(�?␏ on gateway 10.10.30.1 WiFi > Using WiFiManager Config Portal WM: WM: AutoConnect WM: Connecting as wifi client... WM: Status: WM: 0 WM: Using last saved values, should be faster got IP: 10.10.30.97 start NTP WS IRamheap 9296 IRamheap 8048 startmqtt Failed to read mqtt.json. Using defaults. WM: Connection result: WM: 3 WM: IP Address: WM: 10.10.30.97 *WM: freeing allocated params! End of setup() Millis: 3469 @ line: 73 33296

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Soft WDT reset

Exception (4): epc1=0x4023100c epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

Level1Interrupt: Level-1 interrupt as indicated by set level-1 bits in the INTERRUPT register
epc1=0x4023100c in esp_try_delay(unsigned int, unsigned int, unsigned int) at ??:?

stack>>>

ctx: cont sp: 3fff1d60 end: 3fff2040 offset: 0160 3fff1ec0: 4010a77f 00000115 00000073 4021f41e 3fff1ed0: 00031647 00000073 40284928 00000004
3fff1ee0: 40284930 00000004 40284938 40109bac 3fff1ef0: 4010a70c 40109bdc 00000188 4021f6d1 3fff1f00: 00000001 4028467c 00000005 40284684 3fff1f10: 88017e81 00000000 00000000 00000000 3fff1f20: 00000000 40284660 00000004 402846a4
3fff1f30: 00000000 00000004 00000001 00000001 3fff1f40: 00000001 00000001 00000000 402846cc 3fff1f50: 00000006 402846d4 00000004 4010a70c 3fff1f60: 40109bdc 00000188 40109bac 4021e761 3fff1f70: 00000000 402846c4 00000004 40283a0c 3fff1f80: 00000001 4010a000 00000000 3ffefca4 3fff1f90: 9b6a9771 41a998b1 3fff1fb0 3ffefca4 3fff1fa0: 00000000 3ffef85c 3ffef6bc 4020d163 3fff1fb0: 4010a70c 0188018f 80000000 00000000 3fff1fc0: 00000000 3ffef938 3ffef85c 4021c72d 3fff1fd0: 401081d4 3ffefb14 401081d4 4021b181 3fff1fe0: 00000000 000f000f 00000000 00000000 3fff1ff0: 00000000 611e0a0a feefeffe feefeffe 3fff2000: 401081d4 feefeffe feefeffe 3ffefca4 3fff2010: 3fffdad0 00000000 3ffefc78 3ffefca4 3fff2020: 3fffdad0 00000000 3ffefc78 402310d0 3fff2030: feefeffe feefeffe 3fffdab0 401014a1 <<<stack<<<

0x4021f41e in WebSockets::write(WSclient_t, unsigned char, unsigned int) at ??:? 0x40284928 in etharp_output at ??:? 0x40284930 in etharp_output at ??:? 0x40284938 in etharp_output at ??:? 0x4021f6d1 in WebSockets::sendFrame(WSclient_t, WSopcode_t, unsigned char, unsigned int, bool, bool) at ??:? 0x4028467c in etharp_output at ??:? 0x40284684 in etharp_output at ??:? 0x40284660 in etharp_output at ??:? 0x402846a4 in etharp_output at ??:? 0x402846cc in etharp_output at ??:? 0x402846d4 in etharp_output at ??:? 0x4021e761 in WebSocketsServerCore::broadcastTXT(unsigned char*, unsigned int, bool) at ??:?
0x402846c4 in etharp_output at ??:? 0x40283a0c in etharp_output at ??:? 0x4020d163 in sendWS() at ??:? 0x4021c72d in loop at ??:? 0x4021b181 in setup at ??:? 0x402310d0 in loop_wrapper() at core_esp8266_main.cpp:? 0x401014a1 in cont_wrapper at ??:?

--------------- CUT HERE FOR EXCEPTION DECODER --------------- 4#-a|(␕␗�P␚v␛X �~h~u␗Xh 4�r(␑,+,#A␑␑@�␕␔nzAl )z�~,j �r(␗%␑␙@#A.,4nzAl )z�~,PZ8�~,PZ�P,U.4/'�rf cal sector: 1020 freq trace enable 0 rf[112] : 0� Start Millis: 91 @ line: 41 startWiFi() @ millis: 340 Setting static IP WiFi > using static IP t(�?␏ on gateway 10.10.30.1 WiFi > Using WiFiManager Config Portal WM: WM: AutoConnect WM: Connecting as wifi client... WM: Status: WM: 0 WM: Using last saved values, should be faster got IP: 10.10.30.97 start NTP WS IRamheap 9296 IRamheap 8048 startmqtt Failed to read mqtt.json. Using defaults. WM: Connection result: WM: 3 WM: IP Address: WM: 10.10.30.97 *WM: freeing allocated params! End of setup() Millis: 3494 @ line: 73 33176

deanZZZZZ commented 3 weeks ago

development_v4 yes, no router restart

{"BOOTINFO":"External System 2024-07-10 10:28:31"} {"BOOTINFO":"Software Watchdog 2024-07-10 12:46:06"} {"BOOTINFO":"Software Watchdog 2024-07-10 12:56:30"} {"BOOTINFO":"Software Watchdog 2024-07-10 12:57:34"} {"BOOTINFO":"Software Watchdog 2024-07-10 12:59:11"} {"BOOTINFO":"Software Watchdog 2024-07-10 12:59:43"} {"BOOTINFO":"External System 2024-07-10 13:00:22"} {"BOOTINFO":"Software Watchdog 2024-07-10 13:26:05"} {"BOOTINFO":"Software Watchdog 2024-07-10 13:28:40"} {"BOOTINFO":"Software Watchdog 2024-07-10 20:58:20"} {"BOOTINFO":"Software Watchdog 2024-07-10 23:29:10"} {"BOOTINFO":"Software Watchdog 2024-07-11 06:34:21"} {"BOOTINFO":"Exception 2024-07-11 06:55:50"} {"BOOTINFO":"Software/System restart 2024-07-11 06:57:02"} {"BOOTINFO":"Software Watchdog 2024-07-11 07:01:38"} {"BOOTINFO":"Software Watchdog 2024-07-11 07:04:20"} {"BOOTINFO":"Software/System restart 2024-07-11 07:07:21"} {"BOOTINFO":"External System 2024-07-11 07:10:42"}

visualapproach commented 3 weeks ago

Ok. I'll see if I can do anything about it

DeanozUK commented 3 weeks ago

Just a note on seeing deanzzzzz reply, I just remoted in on my router I forced a disconnect from wifi for the ESP and blocked it from reconnecting for 1min then unblocked it and it reconnected fine. Checked esp and no WD reboot. It just reconnected with no issues.

visualapproach commented 3 weeks ago

Yes, @deanZZZZZ just to rule out some things, does it make a difference if you make a clean build (platformio command) and also restart you router?

visualapproach commented 3 weeks ago

I'll process this post later. Had to prioritize the WDT issue. Thanks

Not a problem, would you rather I create a feature request to keep it separate from this issue?

@DeanozUK You should be able to just switch to model NO54173. It has the features you describe and should work.

deanZZZZZ commented 3 weeks ago

Yes, @deanZZZZZ just to rule out some things, does it make a difference if you make a clean build (platformio command) and also restart you router?

now I tried full clean disconnected the wifi 3x and it works fine :) ... I'll let you know how it goes ... with only pump runing power show 1,900 W all off show 950 W

deanZZZZZ commented 3 weeks ago

{"BOOTINFO":"External System 2024-07-12 11:36:23"} {"BOOTINFO":"Software Watchdog 2024-07-12 23:39:05"} {"BOOTINFO":"Software Watchdog 2024-07-12 23:41:50"}

visualapproach commented 3 weeks ago

@deanZZZZZ ok. In development branch you could turn on logging to file by editing platformio.ini and upload:

    ; -DBWC_DEBUGGING=BWC_DEBUG_OUTPUT_OFF
    ; -DBWC_DEBUGGING=BWC_DEBUG_OUTPUT_SERIAL
    -DBWC_DEBUGGING=BWC_DEBUG_OUTPUT_FILE

This will print the debug text to file and in the long run fill up the flash memory but it should be fine to run until you get a sw wdt. It is not completely finished yet but maybe you can see what is happening before the reset.

Or wait until I have more time to look at it.

DeanozUK commented 3 weeks ago

Ive had some sw wdt resets too now, not sure why all of a sudden.

{"BOOTINFO":"Software/System restart 2024-07-09 22:04:08"} {"BOOTINFO":"Software Watchdog 2024-07-11 18:24:17"} {"BOOTINFO":"Software/System restart 2024-07-11 21:17:06"} {"BOOTINFO":"Software Watchdog 2024-07-12 07:53:19"} {"BOOTINFO":"Software Watchdog 2024-07-13 15:58:23"} {"BOOTINFO":"Power On 2024-07-13 17:22:30"} {"BOOTINFO":"Software Watchdog 2024-07-13 19:55:56"}

I will looking into it tomoz when I get time.

Note: I tried the 54173 options and they do work better on web gui but I noticed that the buttons just beeped when pressed on pump and didn't do anythg. Ive switched back to 54138 again now.

visualapproach commented 3 weeks ago

Note: I tried the 54173 options and they do work better on web gui but I noticed that the buttons just beeped when pressed on pump and didn't do anythg. Ive switched back to 54138 again now.

@DeanozUK that is very strange! The only difference is that 173 allows more states, like air and jets on at the same time. Did you swap on both the DSP and CIO setting?

visualapproach commented 3 weeks ago

@deanZZZZZ @DeanozUK I have uploaded a new version to development branch. See if it helps with the resets. I frequently give back CPU to the OS, which would in theory help, if the problem is that user code takes too long. Unless I see the crash message it's really difficult to know exactly which part of the code is taking too much time. But if it works it works. There is also a potential fix to a "restore states" bug, for your info.

deanZZZZZ commented 3 weeks ago
image

Survived the night without restarting :)

DeanozUK commented 2 weeks ago

HI VA,

I've updated to the latest dev and can confirm it has been stable for 24hrs now. Damn its fast on boot now, If I restart the esp from browser its back up before the E13 timeout message which is really nice, well done.

The only SW WD I have encountered was when the unit had first booted (within 30secs) that I repeatedly press temp up (from default 20) to get to 37 and it WD reset on me. I have unfortunately been unable to reliably recreate this but will do some more testing. Note, I'm using the alt temp adjust display method instead of sliders.

`Note: I tried the 54173 options and they do work better on web gui but I noticed that the buttons just beeped when pressed on pump and didn't do anythg. Ive switched back to 54138 again now.

@DeanozUK that is very strange! The only difference is that 173 allows more states, like air and jets on at the same time. Did you swap on both the DSP and CIO setting?`

Please ignore this as it did same on 54138. I've since changed the board over to another and its fine on both so must have been a loose wire or config my end.

visualapproach commented 2 weeks ago

Super happy to hear that it works good (enough) now! Thanks for testing and reporting! I believe the fastest startup is with static ip. Remember to enter a DNS as well or the NTP won't find the time. Ask me how I know 😀

DeanozUK commented 2 weeks ago

haha, I wont, I've been there myself. When I said it was fast....it was on dhcp too 😀 soooo will be even faster on static. Nice work!

I always put in a NTP server, but is there a default fall-back or it is a must? image

DeanozUK commented 2 weeks ago

The pings look so much better now, much more stable even clicking around on web page. image

visualapproach commented 2 weeks ago

haha, I wont, I've been there myself. When I said it was fast....it was on dhcp too 😀 soooo will be even faster on static. Nice work!

I always put in a NTP server, but is there a default fall-back or it is a must?

These addresses will be used your choice, if any else pool.ntp.org else time.nist.gov

Thanks for your feedback. I also feel the web UI on all models is much more responsive now, thanks to fixing your issue.

DeanozUK commented 1 week ago

Hi VA, Its been 2 weeks and not a single crash/reboot or E13 error. you've cracked it. I've not updated it anymore since but 14days uptime is a win from me.

I did however get 1 issue where beeping from the DSP which displayed a flashing 'En8' its an odd 1 as I've seen this in the past with the old firmware when I was getting loads of connection issues and I remember 1 other user on a post here reported once. Its an odd one as its not documented as a error code and its not 'End' either. For now thou its not an issue as its been perfect other than this. Well done! PXL_20240726_172738000

deanZZZZZ commented 1 week ago

hmm... but for me it worked for two days without a problem, then SW 4x a day... and again 1 day ok... and similar