openshwprojects / OpenBK7231T_App

Open source firmware (Tasmota/Esphome replacement) for BK7231T, BK7231N, BL2028N, T34, XR809, W800/W801, W600/W601, BL602 and LN882H
https://openbekeniot.github.io/webapp/devicesList.html
1.46k stars 271 forks source link

Constant Restart/Reboot and Possible Overheat for 20A SmartPlug BK7231N/CB2S with BL0937 #874

Open viny182 opened 1 year ago

viny182 commented 1 year ago

Hello,

I've recently bought a 20A SmartPlug (link below on Firmware section) that uses the BL0937 for power measurement and I suspect that something on the web interface from OpenBeken is causing the CB2S module to overheat which makes the device restart.

I've bought 4 of them, and before flasing OpenBenken they were working fine. After I flashed it (I was dumb and flashed all of them trough tuya-cloudcutter, so I do not have the original backup firmware anymore), they start to restart after some time.

First I noticed that there were huge drops to the voltage being reported to HomeAssistant trough MQTT, like Voltage drops to 60V or even 110V, on a 127V phase.

By looking at the logs I noticed that the drops were happening on the lines for the SSDP module, so I disable it. I notice a slightly improvement on the stability for the restart/drops, but they are still occurring.

Then, I stopped the ping watchdog, to spare some load on the module, to see if any benefits, and the module was stable for a while with no restarts, but still with Power drops. After I couple of time troubleshooting, I notice the drops were happening when I access the Web Interface.

When there is a heavy loads on the device (e;g, > 1400w), and when I access the web interface, the module restarts completely.

When there is no heavy load (<800W, even with 0 load), I see drops but the device does not restart, with SSDP and PingWatchDog Stopped.

I do have other device models using exactly the same chips (BK7231N/BL0937), but they are 10A using lower loads, but even with 800W load they do not send drops on the measurements for HA.

Also, with a multimeter on the devices output, I noticed that the voltage does not drops. Only the measurements are affected when sent to HA trough MQTT, but sometimes as the devices still reboots, it cuts the power to the output, so any appliance connected to it would power off, turning the module unusable.

By opening the device, I noticed that the power LED is connected in a weird way to the relay, it's not controllable trough BK7231N, it is hardwired to the relay command pin.

I was planning to submit a complete device teardown/review on the forums, but I prefer to solve this issue before.

Firmware:

Drivers: SSDP, NTP and BL0937 on startup (SSDP stopped after troubleshooting)

To Reproduce 1 - Setup a Power(W)/Voltage(V) monitor tool like HomeAssistant with MQTT. The drops are also visible on the web interface, but it's better to monitor with an external tool as the test requires to open/close it multiple times... 2 - Access WebInterface and click on different links to generate load to observe if there is any dropes on Voltage or Watts 3- Repeat tests with different loads (0W, 800W, 1000W, 1500W)

Screenshots If applicable, add screenshots to help explain your problem.

With Loads, device restarted on second time image

Without Loads, drops are less impacting, but still wrong. I do not have such drops on other different devices... image

Other relevant info I've tried to capture the logs before the device restarts, but to my knowledge there is nothing being logged related to the issue... Apparently it restarts after it can send any useful information to the log screen... Also, with the "debug/all" option activated, and having all the flags checked, there is a lot of output, difficulting the analysis. Is there any tips for what to select to filter considering this scenario?

image

openshwprojects commented 1 year ago

Hello, first of all, do you use PowerSave 1?

viny182 commented 1 year ago

hi @openshwprojects, I've used on one of them, but I did not see any differences to the other 3 devices...

openshwprojects commented 1 year ago

I know this is a long shot, but powering your device from a reliable 5V power supply (through the 3.3V LDO on the board) could tell us whether the issue is OBK fault or a power supply fault in your particular plug.

Of course, this has to be one in a specific way, otherwise you'll blow up the plug... they are not isolated from mains.

I can see two potential causes of this issue.

  1. BL0937 driver relies on an interrupt and maybe it fires too slowly, but... but that would also affect your other plugs you mentioned BK7231N/BL0937
  2. your plug has low quality power supply and it falters during wifi transmit/etc.... maybe the supply got also overloaded by the lack of PowerSave 1 and the capacitors at the output has degraded

I also noticed: "I've recently bought a 20A SmartPlug (link below on Firmware section) that uses the BL0937 for power measurement and I suspect that something on the web interface from OpenBeken is causing the CB2S module to overheat which makes the device restart." I had exactly the same issue in the past and it was caused by not using PowerSave 1 on device which was taking too much current and the tiny Tuya power supply failed and I had to repair it, see topic (use google translate): https://www.elektroda.pl/rtvforum/topic3898805.html I was not aware about it back then, but some Tuya devices have so low quality power supplies that PowerSave 1 is mandatory.

viny182 commented 1 year ago

The option 2 totally makes sense...

But then are we supposing that the original firmware uses an option like "PowerSave 1" by default? If so, should we evaluate if OpenBeken should have enabled by default as well, or at least having this option on a place with more visibility to ask user to test before degrading capacitors?

One thing I forgot to mention, I noticed that the LED attached to the command Relay Pin, which is controlled by CB2S P26 (PWM5) does flashes/blink randomly... perhaps there is no sufficient power being provided to the module and I should monitor this pin with a multimeter?

Also, here is a picture of the device... I accidentally damaged the relay and one fuse for this one, but the issue happens for the other 3 which I never opened... And still this one that was damaged is still working like the others hahaha

image

openshwprojects commented 1 year ago

My plan was to make PowerSave 1 default but then I got that strange complaint that it broke on users device and that caused a delay up to this day, you can see this issue: https://github.com/openshwprojects/OpenBK7231T_App/issues/678 image

I am currently not sure what to do, maybe after reading your devices issue I will decide to push a PowerSave 1 by default commit, but I am not entirely convinced yet. What is your opinion on that?

The thing is, only some devices really requires powerSave. I have many devices which seemed to run fine without PowerSave for months. You must have been unlucky and got the really, really low quality devices.

That LED blinking is very, very strange, are you sure it's connected just to relay? Is LED driven via transistor with relay or via GPIO? I have no idea what else than a serious power supply issue could cause it...

viny182 commented 1 year ago

first of all, I don't even know you but I already have a huge respect for you and a deep admiration on all your work.

Asking opinion of a user reporting you a issue, shows character and good personality. Thanks for all you've done for this community!!!!

To decide if we should push "PowerSave 1" in a next commit, I see that the edit: majority of the other users reported no issues in the link post you sent, but doing a small risk analysis I would come with this check list:

If the answer is "yes" or "maybe" for any of the questions above, then I would NOT push it as default in a next commit. But if it's no, then I believe we won't have any major issues with it.


The LED seems to be driven with ressitor attached to the command relay... like a pull up resistor maybe? I have not measured it... but I think it will be the next step... In other words, it seems that both Relay Command Pin and the LED are connected in parallel to P26 (PWM5) from the CB2S module... (but I still need to better analyze the tracks on the board to see if this is really it...)


As for the fix you suggested... I do have a similar capacitor (470uf 16V ) on the board which seems to be fine by visual inspection... I Still need to measure if it is directly connected to any power supply... I do however have also other 2 capacitors of 4.7uf / 400V, and a third one that I will have to take off from the board to read the values...

To summarize the fix: Do you think that replacing the first 3 capacitors and enable "PowerSave 1" would be something good to try as a good/long shot?

DeDaMrAzR commented 1 year ago

Hi @viny182, can you maybe provide more detailed pictures of the board, both sides, so we can all try to evaluate and help you better?

Thanks.

openshwprojects commented 1 year ago

Thank you for kind words.

Well, we could only enable PowerSave in WiFi client mode, not in AP mode, but that still wouldn't change fact that beginners will don't know what's happening if PowerSave is breaking their devices as soon as they configure their SSID.

Maybe I could create some kind of popup dialog or a simple main panel display in yellow font saying "PowerSave not enabled - consider enabling!". I will look into that soon.

Regarding capacitors - well, I am afraid that in the case of switch mode power supplies, you have to get a low ESR capacitor rated for switching usage. You can't just get any capacitor out of electronic scrap boards, it has to be low ESR cap. Otherwise you will have to replace it again after a week or after a month. Trust me, I tried.

The cap that breaks is a low voltage one, 16V/470uF or something. No need to replace 400V ones.

Of course, we don't know yet if it's really a capacitor fault. Maybe it's not.

I will just say that if the LED on GPIO blinks while GPIO is set to constant high (or low) in OBK, then it's bad.

Does that LED on GPIO also blinks if you set GPIO role to "AlwaysHigh" in OBK (or AlwaysLow, choose a proper one)

viny182 commented 1 year ago

The PopUp idea is very cool!

I''ve bought the ESR capacitor and I'll receive it by friday... As soon as I replace it I'll let you know the results.... Even if it's not its fault, it won't cost much to replace it...

And as for the LED on GPIO, it stills blinks randomly while P26 (PWM5) is set to "AlwaysHigh"... so, how bad is it?

Also, I just noticed I have made the same mistake twice saying the relay was on P24 on two different comments above... I will edit them, the relay and led is (and always have been) on P26 (PWM5), now set to AlwaysHigh, but still blinking...

image

viny182 commented 1 year ago

hey, @DeDaMrAzR here are the pictures as requested (let me know if you need any of them on higher quality, so I'll upload it somewhere)...

@openshwprojects as I had to disassemble the device to take the pictures, I just replaced the 470uf/16v cap by a new one... (not the low ESR yet, it will arrive on friday)... I'll report if I find anything different...

And just FYI, the other 10A devices I have with the same chips config are this model -> link

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

deltamelter commented 1 year ago

Oh, I think I am facing the same issue with some new CB2S based plugs. The biggest problem is not the reboot but the fact that after reboot, the device toggles off instead of keeping the power state. Is there any danger enabling powersave 1? I'm willing to try if it will let my device stay on. Had problems with these connecting to my AP when trying to initially set wifi.

There only appears to be one LED and this gets incorrectly assigned as wifiLED_n pin, so I have have it changed to an ordinary LED and changed the channel to 1, (same as the button and relay), from 0. Can this impact in any way?

viny182 commented 1 year ago

@deltamelter in case you don't mind that the whatever device you have connected will turn off/on when the smart plug CB2S restarts, then you can set the relay to be always closed/on in the "Configure Startup" screen, by setting your channel to "1".

But be aware that depending on the load you have, your CB2S module and the device connected to it will restart a lot!

In my case, I have a dishwasher connected to one of the modules, so if I let this happen, probably my dishwasher will fry... So for now, no measurements from the dishwasher, until we figure this out...

image

image

DeDaMrAzR commented 1 year ago

Just to interject a bit, if you set your Relay channel (0 for example) start value to -1 it will retain whatever was the state before the restart.

Also @viny182 on the module landing page there is a reboot reason stat, I am curious to see what would be the value on yours.

image

DeDaMrAzR commented 1 year ago

@viny182 I am trying to get one of these devices my self to do some in depth testing

deltamelter commented 1 year ago

@viny182 I had this test plug on a chest freezer, I prefer not to have that poweroff randomly with CB2S reboot. 2 of the devices were newer tuya firmware (1.1.12) and the will reboot on their own with no load. The "older" plug is identical inside with regard to hardware (1.1.8) but seemed to have no problem with the load from the freezer, but when a clothes dryer was turned on, off the same plug, it started rebooting, after an hour at first and later more frequently Total load was around 1.5KW on the dryer

@DeDaMrAzR I wondered what this means, mine say: Reboot reason: 0 - Pwr

DeDaMrAzR commented 1 year ago

@deltamelter that would be something related to the power supply, basically something like a power cut.

deltamelter commented 1 year ago

I didn't calibrate these yet and the voltage very erratic, I will set that first. But I had not interruptions to power... One of these devices image

On a tasmota powermonitor plug that was calibrated: image

viny182 commented 1 year ago

@DeDaMrAzR, I've got the same reboot status as @deltamelter , 0 - Power. image


Well, I've just got the low ESR 16v/470uf cap that @openshwprojects suggested to replace, and already done the replacement on one of the modules.

As a pre-test, I connected a 2000W hair dryer to it and let it work for 5 min.... The voltage is not very stable (like other devices), and when I've open the web interface I see it dropping to 42v... The LED attached to the Relay on P26 is still blinking/flashing randomly also...

The CB2S device was not hot at all, so I think I'll try to reconnect it to my dishwasher and see if it will turn off during a full cycle ....

image

DeDaMrAzR commented 1 year ago

@deltamelter I am afraid that what @openshwprojects said it may be the power supply inside of the device it self, not yet sure but that reboot reason info is somewhat telling. You should get "Reboot reason: 3 - Unk" if it was a FW that is causing those problems.

Again, not 100% sure, just guessing as I haven't had that issue so far and I've been testing similar devices for 60+ days albeit not with that power supply configuration.

DeDaMrAzR commented 1 year ago

Also what is the quality of your WiFi signal (RSSI value)? That may impact the consumption as well?

I tried to figure out the topology and what that IC is form the pictures @viny182 provided but so far no luck.

viny182 commented 1 year ago

@DeDaMrAzR , what IC do you need to check? As I have open another module, I see that are some IC than can be easier identified.... Can you screenshot and show me the IC you need?

Here is the Wifi signal value... Should I test it with lower levels as well?

image

DeDaMrAzR commented 1 year ago

@viny182 if you have a disassembled device can you confirm that the topology looks similar to this datasheet? -https://eu.mouser.com/datasheet/2/277/MP150-3106221.pdf

EDIT: sorry forgot to mention the SOT23-5 device with the marking IADGN

viny182 commented 1 year ago

if you refer to the IC on the pic below, then it is a A1117B 3.3V Edit: Correct link: https://stm32-base.org/assets/pdf/regulators/CJA1117B.pdf

(The other bigger one is the BL0937)

image

viny182 commented 1 year ago

Sorry @DeDaMrAzR , I found the IADGN.... here it is:

image

deltamelter commented 1 year ago

20230713_183615 20230713_183037

DeDaMrAzR commented 1 year ago

Sorry @DeDaMrAzR , I found the IADGN.... here it is:

image

Can you check the schematic in the datasheet against your board, does it look similar? I am aiming at it's 200mA nominal current output to blame for your problems (not guaranteed of course - just guessing)

@deltamelter

Not enthusiastic about getting this datasheet - https://www.sekorm.com/doc/2190585.html

deltamelter commented 1 year ago

so powersave 1 is it a good idea here? make any difference?

DeDaMrAzR commented 1 year ago

so powersave 1 is it a good idea here? make any difference?

Give it a try but try testing it first, meaning don't use it on something critical that fails with consequences (freezer comes to mind 😄 )

deltamelter commented 1 year ago

Give it a try but try testing it first currently trying startup value -1 and hadn't noticed it restarted several times until after I plugged in a bulb for load and checked mqtt... image

I'll try powersave 1 next. but is this uncalibrated state going to be causing any problems?

deltamelter commented 1 year ago

I thought online time was resetting at reboot before tho

Build on Jul 11 2023 10:05:11 version 1.17.180
Online for 2 hours, 19 minutes and 26 seconds
viny182 commented 1 year ago

@deltamelter uncalibrated state woudn't cause any issues...

but startup -1 means that if device restarts, it will reboot remembering the last relay state... so If you do not have anything connected, your only way to check if it was restarted is looking for the uptime status on the bottom of the web interface.... and if you have set -1, and have a device connected, it will always power on the device if it restarts....

Perhaps in a test you should consider setting startup = 0 to see if the device you are connecting gets powered off...

viny182 commented 1 year ago

Sorry @DeDaMrAzR , I found the IADGN.... here it is: image

Can you check the schematic in the datasheet against your board, does it look similar? I am aiming at it's 200mA nominal current output to blame for your problems (not guaranteed of course - just guessing)

@deltamelter

Not enthusiastic about getting this datasheet - https://www.sekorm.com/doc/2190585.html

Seems very similar to the schematics, but I'm not an expert... is there any specific pin that I should make sure is equal?

deltamelter commented 1 year ago

-1 works great to keep the plug toggled on, except it does flash off for less than a second. fine for "analog" stuff where blips off will not be a problem (probably) PowerSave 1, mixed results so far goes slightly longer between reboots, one device that was rebooting after 10-15mins without load, went over an hour before reboot. Another device (on the freezer) has not restarted in over 9 hours so far since powersave...

I had a similar issue with an older tasmota device that started failing to reach the AP after years with no problems. The only change was a new router/AP. To "fix" that (probably temporarily) I used the minimal tasmota without the bells and whistles and power requirement and it was enough. This got me to thinking about whether obk could be "minimalised" and I started with turning off NTP service, because I'm not sure why it needs to be running constantly. Maybe I'm wrong, but I'm not trying to get the plug device to keep its own schedule, just needs to spit out mqtt, does mqtt need accurate times? Does anything else if I am going to use HA to do the "smart" stuff?

Anyway, on the worst rebooting device, last night I ran stopDriver NTP on the console command line and that device has not rebooted since (>9hours so far), there is not load on the device, but that didn't stop it rebooting before.... Will put some load on this and test a bit longer.

What else can be "minimalised"/"dumbed-down" on the device to reduce the power needed?

deltamelter commented 1 year ago

image over11hrs and rebooted... see the top for the difference after stopping NTP tho

deltamelter commented 1 year ago

I only stopped NTP on commandline, not in the startup command so after the reboot, it's has rebooted again twice...

viny182 commented 1 year ago

Hi, I confirm that stopping the NTP driver gives it more stability... Just like stopping the SSDP driver like I mentioned on the first post... I think we are on the right track... Now we should investigate what else can be disabled...

DeDaMrAzR commented 1 year ago

To my knowledge (not that I have too much of it) NTP is used for consumption_stats that are problematic in their own way as the implementation of it is incompatible with HA due to the state limit, @openshwprojects is aware of that, and I just did some quick test to check the current consumption and can confirm that enabling that driver (NTP) caused a spike of almost double the nominal current (from 90 to 170+mA) on connecting.

So I guess disabling NTP and enabling Powersave can be a possible solution?

viny182 commented 1 year ago

@DeDaMrAzR can you try to disable EnergyStats to see if there is any relevant consumption drop? (SetupEnergyStats 0 60 60).

I've already disable it on here, and guess what: More stability....

We can live without it as we can use Riemann Calc helper from HomeAssistant to generate the daily/total historical statistics...

DeDaMrAzR commented 1 year ago

@DeDaMrAzR can you try to disable EnergyStats to see if there is any relevant consumption drop? (SetupEnergyStats 0 60 60).

I've already disable it on here, and guess what: More stability....

We can live without it as we can use Riemann Calc helper from HomeAssistant to generate the daily/total historical statistics...

@viny182 I'll try that and report results some time later.

DeDaMrAzR commented 1 year ago

@viny182 I am having trouble testing with my socket (AC danger) but will test soon and report back. It takes some time to set that up so please be patient - but by the first test there is a difference, both of you are on the right track and power supply is 100% to blame @openshwprojects is correct in that aspect.

I'll try testing and record on my test boards as that may be easier but the results may be interesting as well.

Thank you both for you effort to make this project better!

DeDaMrAzR commented 1 year ago

This is that short current measurement test but on a test device (not a plug with power meter)!

EDIT: peek is 153, not 156mA, check the CSV file for details.

Screenshot 2023-07-14 182548 SPDXXXX_data.csv

openshwprojects commented 1 year ago

It looks like a really good piece of research was done here. Good job! I am surprised by that NTP power requirement peak, but our NTP relies on Beken libraries so that requires more investigation.

In a meantime, could anyone with problematic plug try disconnecting them from mains and powering the from 5V at the 3.3V regulator input to see if problem persists? Of course, use a good 5V power supply.

viny182 commented 1 year ago

IMPORTANT UPDATE:

I run 15+ openbeken devices, and some of them were not updated to the latest FW version, so I decided to updated all of them except one that was not being used (even open another post on how to do it link), and after I updated them, 3 or 4 devices that were stable started to reboot as well!

So, I decided to downgrade all my devices to the FW version from the one that I have not updated (1.17.58), and the Voltage measurements are way more stable, even for the 20A device that started this issue... So now maybe we have something to investigate between this 1.17.58 version and the newer ones??

Picture below for the measurements, still not calibrated yet but it does not variates like in the newer FW that was installed before... Also, I still need to test it with high loads to see if it restarts...

PS: The downgrade made me lost all configuration for the devices, so I had to setup it from scratch....

image

deltamelter commented 1 year ago

I have 8 plugs that mostly failed to flash wirelessly, probably because the wifi and power was sketchy, I have 1 on 1.7.175, 4 on 1.7.180 and the rest still to be flashed via serial. serial is not easy with these because the useful pins can only be accessed "blindly" and not sure where I would need to connect stable power. If to the CB2s 3v3 pin, it would be hard to do consistently.

Can I downgrade via OTA interface? or would I have to try serial flash?

deltamelter commented 1 year ago

also, what's the best way to backup all of the config to apply on multiple devices?

DeDaMrAzR commented 1 year ago

@deltamelter as @viny182 mentioned you will lose your config if you downgrade so be aware of that!! Make screenshots, download templates, config files and autoexec before that!!

Downgrading is not advised due to multitude of reasons and you have to understand that the project is in rapid development and feedback from both of you is very appreciated. If you are do downgrade via OTA do it in small increments, version by version, again NOT advised(!!!) as you my mess up the wifi parameters for example and end up needing to flash via UART.

Trouble is that there is no way to test and follow up on all devices and potential problems with them and especially if the device is not in our hands and we can't recreate the problem on the devices that we own.

@viny182 great news that you found the build that works for you, can you "sacrifice" that device now and play with it by updating versions slowly to see which build will start behaving problematically? (1.17.58 is almost 130 builds "old")

@deltamelter try using the flash tool - https://github.com/openshwprojects/BK7231GUIFlashTool/releases/tag/v1.1.1

deltamelter commented 1 year ago

@DeDaMrAzR I'm only looking to try the older fw on one plug to test if it really makes a difference. Only thinking of downgrade to get there so I can test like for like. All the other plugs are hardware identical, so I could flash directly.

@deltamelter try using the flash tool - https://github.com/openshwprojects/BK7231GUIFlashTool/releases/tag/v1.1.1

EDIT: I used v1.1.0b to serial flash 5 of the plugs already. Another 1 is tuya-cloudcut original FW, 1 in cloudcut flashed to obk 1.7.175. 4 were patched tuya and had to be serial flashed and they are all on 1.7.180 and least stable. It's not the flasher that is hard, it is finding the pins by touch and GND hooked to continuity meter (so I know my pin prodder is in the right place before I power it)...

When talking about backing up and copying configs, I would only be looking to apply these to the same FW version, to update all the plugs once I have settled on a "working" setup :smile:

viny182 commented 1 year ago

I've downgraded by OTA... Lost all the configs, but at least ALL my 15 devices booted on AP mode, so I did not need to flash trough serial....

I'll choose 1 device to update gradually to try to identify which is the "faulty" version and let you guys know....

But as there is 100+versions to test, it will take some time... I'll made some jumps between the versions to try to narrow the range with issues... But anyway, Not sure if I'll be able to make it this weekend because of some other appointments I have....

Once again, thank you all for the help and support!

deltamelter commented 1 year ago

I've downgraded by OTA... Lost all the configs, but at least ALL my 15 devices booted on AP mode, so I did not need to flash trough serial....

OTA from 181 to 58?

viny182 commented 1 year ago

I've downgraded by OTA... Lost all the configs, but at least ALL my 15 devices booted on AP mode, so I did not need to flash trough serial....

OTA from 181 to 58?

Yes!