openshwprojects / OpenBK7231T_App

Open source firmware (Tasmota/Esphome replacement) for BK7231T, BK7231N, BL2028N, T34, XR809, W800/W801, W600/W601, BL602 and LN882H
https://openbekeniot.github.io/webapp/devicesList.html
1.41k stars 245 forks source link

Constant Restart/Reboot and Possible Overheat for 20A SmartPlug BK7231N/CB2S with BL0937 #874

Open viny182 opened 1 year ago

viny182 commented 1 year ago

Hello,

I've recently bought a 20A SmartPlug (link below on Firmware section) that uses the BL0937 for power measurement and I suspect that something on the web interface from OpenBeken is causing the CB2S module to overheat which makes the device restart.

I've bought 4 of them, and before flasing OpenBenken they were working fine. After I flashed it (I was dumb and flashed all of them trough tuya-cloudcutter, so I do not have the original backup firmware anymore), they start to restart after some time.

First I noticed that there were huge drops to the voltage being reported to HomeAssistant trough MQTT, like Voltage drops to 60V or even 110V, on a 127V phase.

By looking at the logs I noticed that the drops were happening on the lines for the SSDP module, so I disable it. I notice a slightly improvement on the stability for the restart/drops, but they are still occurring.

Then, I stopped the ping watchdog, to spare some load on the module, to see if any benefits, and the module was stable for a while with no restarts, but still with Power drops. After I couple of time troubleshooting, I notice the drops were happening when I access the Web Interface.

When there is a heavy loads on the device (e;g, > 1400w), and when I access the web interface, the module restarts completely.

When there is no heavy load (<800W, even with 0 load), I see drops but the device does not restart, with SSDP and PingWatchDog Stopped.

I do have other device models using exactly the same chips (BK7231N/BL0937), but they are 10A using lower loads, but even with 800W load they do not send drops on the measurements for HA.

Also, with a multimeter on the devices output, I noticed that the voltage does not drops. Only the measurements are affected when sent to HA trough MQTT, but sometimes as the devices still reboots, it cuts the power to the output, so any appliance connected to it would power off, turning the module unusable.

By opening the device, I noticed that the power LED is connected in a weird way to the relay, it's not controllable trough BK7231N, it is hardwired to the relay command pin.

I was planning to submit a complete device teardown/review on the forums, but I prefer to solve this issue before.

Firmware:

Drivers: SSDP, NTP and BL0937 on startup (SSDP stopped after troubleshooting)

To Reproduce 1 - Setup a Power(W)/Voltage(V) monitor tool like HomeAssistant with MQTT. The drops are also visible on the web interface, but it's better to monitor with an external tool as the test requires to open/close it multiple times... 2 - Access WebInterface and click on different links to generate load to observe if there is any dropes on Voltage or Watts 3- Repeat tests with different loads (0W, 800W, 1000W, 1500W)

Screenshots If applicable, add screenshots to help explain your problem.

With Loads, device restarted on second time image

Without Loads, drops are less impacting, but still wrong. I do not have such drops on other different devices... image

Other relevant info I've tried to capture the logs before the device restarts, but to my knowledge there is nothing being logged related to the issue... Apparently it restarts after it can send any useful information to the log screen... Also, with the "debug/all" option activated, and having all the flags checked, there is a lot of output, difficulting the analysis. Is there any tips for what to select to filter considering this scenario?

image

openshwprojects commented 1 year ago

I am sorry to hear that you've lost the config. Unfortunatelly, we had to add some more config space for second SSID field, etc, so backwards update from v4 config to v3 is not supported. We do not, however, plan to alter flash config any more so most likely future versions backwards OTA will work correctly.

I never recommend mass updating devices, it is always worth to check the operation on one first.

If you suspect that something changed in the firmware, please check which version is guilty.

At about 100 versions to check, you should use the bisection approach. Start at the middle, let's say , ver 50, then see if it breaks. If it breaks, then check ver 25, if not, check 75, at so on.

Not much was changed for power handling these days, here is a log file for bl_Shared :

https://github.com/openshwprojects/OpenBK7231T_App/commits/main/src/driver/drv_bl_shared.c

On May 2 the simple configurable floating point rounding was added, but could that be the culprit?

Of course, the issue could be in the another file...

deltamelter commented 1 year ago

This device was flashed to 1.7.180 but not yet configured, OTA downgraded to 1.7.58 set wifi, started BL0937 manually and added mqtt broker and set startup to -1. Also set voltage lasted longed than the other device, but still rebooting image

deltamelter commented 1 year ago

This device is running 1.7.175, PowerSave 1, mqtt and startup -1 Has been going >24hrs now without reboot. This one has been under (light) load and a few periods of heavier load. This has identical looking hardware but was older date and older tuya firmware, was flashed with tuya-cloudcutter, that's the only (obvious) difference. image

deltamelter commented 1 year ago

The (freezer) plug that I set PowerSave 1, turned off NTP and consumption stats, running 1.7.175 has not rebooted in coming up for 48!! hours now... I have rolled 2 other plugs back to 1.7.175 to similar effect so far... image

image

1.7.58, is better than 1.7.180, but still restarting for me....

deltamelter commented 1 year ago

Is there a way to flush the config and storage and just OTA flash new firmware? 2 plugs that were either serial flashed straight to 1.7.175, or to 1.7.180, then OTA to1.7.175 are still going almost 3rd complete day without rebooting

Build on Jul 8 2023 09:02:55 version 1.17.175 Online for 2 days, 18 hours, 6 minutes and 4 seconds

Other 2 plugs that were configured with energy stats, NTP and mqtt where later downgraded from 1.7.180 to1.7.175 (one via 1.7.58) and stopped NTP, PowerSave 1, disabled energystats and retained config between versions, they keep on rebooting like crazy. Not as much as with 1.7.180,and going longer sometimes in between reboots, but still rebooting, sometimes with just a couple of mins in between.

viny182 commented 1 year ago

@deltamelter I have not tested this, but take a look in here: https://github.com/openshwprojects/OpenBK7231T_App/issues/500

TL;DR: There is a "ClearConfig" command.

Edit: From the docs https://github.com/openshwprojects/OpenBK7231T_App/blob/d6466ed3a8201599dc354b8ccfdf482dce1d31ea/docs/commands.md

clearConfig |   | Clears all config, including WiFi data -- | -- | --
viny182 commented 1 year ago

there is also a clearAll command:

clearAll |   | Clears config and all remaining features, like runtime scripts, events, etc | File: cmnds/cmd_main.cFunction: CMD_ClearAll -- | -- | -- | -- clearAllHandlers |   | This clears all added event handlers | File: cmnds/cmd_eventHandlers.cFunction: CMD_ClearAllHandlers clearClockEvents |   | Removes all set clock events | File: driver/drv_ntp_events.cFunction: CMD_NTP_ClearEvents clearConfig |   | Clears all config, including WiFi data
deltamelter commented 1 year ago

Don't know what else is different between these 2 devices...

image|image

deltamelter commented 1 year ago

Nothing got "Fried" 😀 needs a reboot? it will do that itself any second...


Info:CMD:Fried 0 handlers
Info:CMD:Fried 0 rep. events
Info:CMD:CMD_ClearAll: all clear
Info:CMD:[WebApp Cmd 'clearALL' Result] OK
Info:MAIN:Time 124, idle 147995/s, free 61608, MQTT 1(1), bWifi 1, secondsWithNoPing 53, socks 3/38 POWERSAVE
Info:MAIN:Time 125, idle 75232/s, free 73304, MQTT 1(1), bWifi 1, secondsWithNoPing 54, socks 2/38 POWERSAVE
Info:MAIN:Time 126, idle 72306/s, free 73304, MQTT 1(1), bWifi 1, secondsWithNoPing 55, socks 2/38 POWERSAVE
Info:MAIN:Time 127, idle 76897/s, free 73304, MQTT 1(1), bWifi 1, secondsWithNoPing 56, socks 2/38 POWERSAVE
Info:MAIN:Time 128, idle 76785/s, free 73304, MQTT 1(1), bWifi 1, secondsWithNoPing 57, socks 2/38 POWERSAVE
Info:MAIN:Time 129, idle 74210/s, free 73304, MQTT 1(1), bWifi 1, secondsWithNoPing 58, socks 2/38 POWERSAVE
Info:MAIN:Time 130, idle 82263/s, free 49912, MQTT 1(1), bWifi 1, secondsWithNoPing 59, socks 4/38 POWERSAVE```
deltamelter commented 1 year ago

clearAll seemed to help, but the 2 devices that restart continuously still do so, they seemed to go for 24 hours without rebooting, but now back to their instable ways...
Any chance of a debug utility to see what is running, taking up processing resources and therefore power? like an obkTop command :)

openshwprojects commented 1 year ago

I am not sure currently if it's possible to add such feature.

How's the devices stability after 2 days?

@deltamelter can you tell me whether those devices restart when run with clear config (no drivers, no power metering, etc)?

deltamelter commented 1 year ago

@deltamelter can you tell me whether those devices restart when run with clear config (no drivers, no power metering, etc)?

After ClearAll I started power monitoring and powersave 1. These 2 devices didn't improve. One of those is currently running libretiny-esphome to see how it performs, the other I unplugged because the constant click-click was starting to get annoying :grinning:

One of the "older" devices has been running happliy on 1.7.175 with BL0937 loaded and mqtt for nearly 6 days Build on Jul 8 2023 09:02:55 version 1.17.175 Online for 5 days, 22 hours, 17 minutes and 30 seconds

The other one on 1.7.175 (down from unconfigured 1.7.180) is currently unplugged, but had not rebooted on its own.

I will plug in one of the less stable devices and clearAll again and leave it with no drivers, should I at least configure the wifi?