letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.29k stars 2.22k forks source link

Hardware watchdog... how to find the cause? #1656

Closed giig1967g closed 5 years ago

giig1967g commented 6 years ago

Hi, in the last versions I am experiencing a constant reboot (every other day) with "Hardware watchdog" as the Reboot cause. I have also changed from Static IP to DHCP. How can I find the cause of the Hardware watchdog? What does exactly it means?

s0170071 commented 6 years ago

No free on the string. I meant to check if there is heap available, try to allocate it, free it and then reserve the string buffer.

uzi18 commented 6 years ago

@TD-er maybe it is possible to just allocate/reserve some big buffer 200+ chars and use it as static place to manipulate with strings?

TD-er commented 6 years ago

Then you have to implement a lot of operations yourself.

TD-er commented 6 years ago

There is no mention of MQTT in this thread as far as will look for.

Yesterday I added a delay(1) to the readByte part of MQTT client PubSubClient. Can you please test if this is now still an issue?

And if another plugin is active, please mention that one too.

thomastech commented 6 years ago

I've been running ESP_Easy_mega-20181023_dev_ESP8266_4096.bin on two NodeMCU devices. One rebooted today, hardware Wdog reset.

Load: | 23.20% (LC=9683)
Free Mem: 10520 (7232 - ruleMatch2)
Free Stack: 3536 (640 - LoadTaskSettings)
Boot: Manual reboot (3)
Reset Reason: Hardware Watchdog

controllers devices

Domosapiens commented 6 years ago

This could be a major game changer: Release mega-20181025: [WDT] Change yield() to delay(0)

thomastech commented 6 years ago

@Domosapiens: Thanks for the heads-up. I will flash ESP_Easy_mega-20181025_dev_ESP8266_4096.bin into my two devices.

Grovkillen commented 6 years ago

Yep we hope to close this on. :+1:

TD-er commented 6 years ago

@thomastech What uptime did you get on your node? And please have a look at the controller settings. Especially those that may increase memory usage, like Max Queue depth and minimum send interval.

thomastech commented 6 years ago

@TD-er: The device that rebooted had been manually reset (RST button press). Then about 18 hours later it rebooted due to hardware wdog.

MQTT controller settings: controller_1

TD-er commented 6 years ago

Hmm, those are "interesting" settings. No retries, no queue and "ignore new". So in other words, a new sample will be tried once and kept in the queue when there is no wifi connection. Also at first attempt it will be removed from the queue.

I would expect "delete oldest" when using no queue, or else you may prefer an older value when the broker has been unreachable for a while.

thomastech commented 6 years ago

Hmm, those are "interesting" settings.

They were the defaults when I originally installed the Controller. What should all the settings be for a typical OpenHab MQTT controller?

TD-er commented 6 years ago

You can delete the controller and re-add it. Then you have the new defaults. (make sure to press save after adding it)

Proper defaults are: image

You may lower the minimum send interval if your broker is fast enough. I run 10 msec here on a raspberry pi 3

Domosapiens commented 6 years ago

Installed Release mega-20181025 yesterday (because it was not available earlier ;) No conclusions yet, but with the last daily releases, I have seen no memory nor stack problems. image

image (so great that you just can paste a snapshot!)

image Up-time seems still be a problem. But ... I'm hunting also for the cause of excessive RCWL-0516 (multiple units in the lab interfering?) detections As with #1857 I need to use a rule for LDC On/Off resulting in excessive rule calls. So no conclusions yet.

TD-er commented 6 years ago

Nice to see the free stack is also increasing a few bytes at a time on new builds :)

thomastech commented 6 years ago

You can delete the controller and re-add it. Then you have the new defaults.

@TD-er: Thanks, MQTT controller has been updated with new defaults.

s0170071 commented 6 years ago

@Domosapiens I have a set of nodes running for several days now. The build from yesterday evening was running all night. Your uptime problems must be due to something else. Try a fresh hardware and another power supply an no devices/plugins. Please report back if that worked better.

Domosapiens commented 6 years ago

@s0170071 Thanks for your advice.

I have 4 boxes under test as described here: https://www.letscontrolit.com/forum/viewtopic.php?f=2&t=5955&sid=db230a574377fbb18394ecdcb9e9b75a So fresh HW is not an option, power supply is sufficient and clean, and with no devices/plugins they are useless.

Yes, I can understand your positive experience .... Without hardware there is no reason for the Hardware Watchdog to reboot ;) But I will flash a few bare Wemos units.

One unit is running mega-2080322 for over 141 hr !!! No reboot. No DS18B20 NAN.

With the other 3, I follow the latest developments. One unit did 40 hr, the others less.

Still hunting for the dog!

thomastech commented 6 years ago

Feedback on ESP_Easy_mega-20181025_dev_ESP8266_4096.bin

One NodeMCU still running without reboot. ~28 hrs. Second NodeMCU rebooted at 27 hrs. Details below.

Load: | 25.50% (LC=9670)
Free Mem: | 10848 (8144 - sendContentBlocking)
Free Stack: | 3584 (720 - LoadTaskSettings)
Boot: | Manual reboot (2)
Reset Reason: | Hardware Watchdog

Thomas

TD-er commented 5 years ago

I think this is no longer an issue. If it still is an issue. please open a new issue.

I will close this one now, since its last post was a year ago.