Closed SvenLuebke closed 1 year ago
Do you have a stack trace of the wdt? On the attached log I can't see any reason. Have you tried to use a different port for the IRQ
As I can see in your settings both of the intervals a shorter than default. Can you increase them to verify if they causing the wdt?
Unfortunately I have no stack trace! I remember that I saw it in another project. Shouldn't this be a standard output? Probably wrong baud rate?
I tried some things to get rid of this issue. I just now changed the IRQ pin again...let's see. The change of the intervals was something I did to check whether a buffer overrun appeared. I reverted the values back to original.
I saw yesterday, that the system survives more than 9h in night time mode, without any SPI and reduced serial traffic, so hardware looks OK.
If the log off your initial post contains the complete log then there ist no software issue. The baud rate was ok, all the logs are printed with the same baud. Hopefully the IRQ Pin change helps.
Of course it's not the complete log, that would be ~800KB of data, but a complete log of the time, that issue happened (before and after). Hope that is fine?! I remember, that I selected "Printable output" for session logging in Putty. I changed that to "All session output" now to see whether other output (with a different baud rate) is generated.
Regarding the baudrate I remember, that the ESP8266 changes the baudrate directly after reset to 74880. But I guess, the missing stack output is not a core function but some kind of user software function.
The issue just happened again, so change of IRQ pin didn't help.
Just for fun I will try to flash an own build, but I don't think this will change something...
Are you using MQTT? As I can see in your settings MQTT seems to be not set. I faced an issue during testing yesterday an fixed it in the latest development build. Can you try to install that version to your ESP?
64fb587c from build Action wild be firmware version 0.5.32
@lumapu we had several (at least two or three on the Discord) reports of users with no MQTT and experiencing such WDT timer issues repeatedly in versions prior to 0.5.32. So if you fixed anything in this regards, I would say this is a strong case for @SvenLuebke and others to retry with the latest development build / release. Thanks!
@stefan123t yes during development I saw an issue regarding MQTT. It happend directly at boot and endet in a boot loop.
Maybe it helps others to get their system more stable starting with version 0.5.32
Unfortunately this proposed version didn't help. I also activated MQTT, which also didn't help. The WDT resets were still happening. After flashing the 0.5.32 I tried
https://github.com/lumapu/ahoy/commits/4093be7
which seems to be stable now: Uptime: 4 Days, 12:16:05
so it could be closed now? can you verify the release version?
Let's wait another day. I flashed 4c52e9c before...which seem to be stable, but I wasn't able to update the system via web update to dec333f. It just said "failed" and 0.5.40 started again (althoug no reboot happened according to the uptime).
I flashed it via USB serial and it seems to be working for now (Uptime: 0 Days, 04:47:02). The WDT was rebooting the system before only when NRF24L01 traffic was happening.
dec333f restarted yesterday at ~10 PM (when no traffic happened) and some seconds ago. 4c52e9c was more stable for some reason. But are there so many differences? I guess not, right?
I cannot confirm stability issues using dec333f. My ESP8266 based DTU (however, CE and IRQ swapped) is stable since more than four days now. 👍 Perhaps it makes sense to change the power supply. Capacitor stabilizing 3.3 V power source is used?
@SvenLuebke do you have the option to change the Power Source and/or Micro USB cable. It has been reported that Power Supply is a major issue for WDTs on ESPs in general.
Here is a blog post from a Makerlab in Hannover about tracing the ESP power supply using an oscilloscope with revealing results: https://arduino-hannover.de/2018/07/25/die-tuecken-der-esp32-stromversorgung/
@SvenLuebke A power bank providing a USB 5 V output may also be helpful to check power adapter issues.
@SvenLuebke can you update on stability with latest development or release version ?
Hi!
@stefan123t I exchanged
and soldered a 2200µF capacitor to the 3.3V power pins. The software still reboots as soon as SPI traffic is happening. Really strange! This is happening with all the versions I tested up to 0.5.76 .
Why do you use a 2200uF capacitor. We encourage the use of a 10uF to 100uF cap for smoothing the voltage ripples and sustaining the 3.3V at the NRF module. Yours is more than 22 times as large this may be the reason too ?
To be honest this capacitor was available in my box. Do you think a 2200µF cap will smooth the voltage worse than a 100µF one? It might be a little bit slower. I thought it's for stabilizing the 3.3 V power of the ESP8266. I'll try to find a 10µF one...and will also attach a ceramic cap.
I just saw a new reboot_reason (copied the rest for some system information):
sdk
2.2.2-dev(38a443e)
cpu_freq
80
heap_free
16720
sketch_used
486
version
0.5.66
wifi_rssi
-53
ts_uptime
31
esp_type
ESP8266
core_version
3.0.2
flash_size
4096
heap_frag
14
max_free_blk
7080
reboot_reason
Software/System restart
Radio
nrf24l01+
is connected
Datarate
250 kbps
Power Level
MIN
I didn't trigger the "Software/System restart". What is the reason for that? I tried to open the "live" website and then it restarted.
Software/system restart could be an indication of a NullPointerException or OOM. Both I had also seen with 0.5.66 and i documented them in other bug reports. I believe most issues I had documented are fixed in 0.5.92, have you tried it already?
Having said all that, without a stack trace I think it's just guessing. What's your serial output look like when the reboot happens?
Hey @Argafal Thanks for your message! I tried different versions after 0.5.66...and they behaved even more strange: After some uptime nearly all pages couldn't be displayed anymore. The menu bar on the left contained only one entry (don't remember which one) and the rest vanished. Page refresh often took more than 10s. Tried some things and then I flashed back to 0.5.66 which restarts ~3 times a day but doesn't show this page vanishing.
I just installed 0.5.92...looks much more better, but I have to wait for the sun.
BTW: I noticed that WiFi between ESP8266 and my router is not stable (also have this with my laptop). There are more than 20 reconnects a day. Could that lead to my reported reset behaviour?
The first thing you describe about the webUI sounds like issue #660. Is that what it looks like? This should be much better again in 0.5.92/93.
I would hope that an unstable wifi connection would not cause random reboots of ahoy. I don't think it does. But without a stack trace it is pure guess work. So I think you need to find a way to record a stack trace if you want to look into this further. For that I would connect the esp via USB to a computer, that might be the easiest way.
Yes, that was exactly my issue! I didn't want to create another issue for this, because I thought, I'm the only one having this issue. Nice, thank you!
I guess these reconnect messages were just a consequence of hourly resets...that's what I think now. Because with 0.5.92 the disconnects are vanished. Yes...it really looks promising: Uptime: 0 Days, 14:37:37
Yippee! Uptime: 1 Day, 16:31:07
...never saw "1 Day" before...if it reaches 4 I guess we can close the issue.
does it reached 4?
Yes, it reached 4 days and ~18 hours, then it resetted again, but that's long enough for me. After 3 days i got a similar behaviour to this https://github.com/lumapu/ahoy/issues/660 again. I had to press refresh two or three times and then it worked again.
Shall I close the ticket?
cool, seems that we fixed something. I will close this issue with the next release.
But I'm still thinking about why I was - more or less - the only one with this issue: Are some versions of the ESP8266 less stable? Are some RAM cells (in some memory area...for example at the end) not stable or dead? Are some PCBs less stable? Currently I don't have an explanation for that.
I don't think you are the only one. I have opened a few issues reporting reboots and/or exceptions running ahoy on ESP8266. As to why that doesn't happen to everyone on an ESP8266, I don't know.
My current status: With the current dev 0.5.98, ahoy runs stable for me as long as I don't use the WebUI. If I use the WebUI it occasionally reboots.
do you have a capacitor placed to your circuit? I had a very unstable ESP8266 which became stable at the moment where I placed a capacitor next to its 3.3V pin
My current status: With the current dev 0.5.98, ahoy runs stable for me as long as I don't use the WebUI. If I use the WebUI it occasionally reboots.
Same here...it didn't survive a day.
do you have a capacitor placed to your circuit?
I have two of them connected to 3.3V, one small and one big one. But that didn't change anything regarding the reset behaviour.
@SvenLuebke kannst du mir bitte kurz dein Setup auflisten (Anzahl Inverter, Esp-Typ, Kondensator, webIf genutzt oder nicht, Heap-Fragmentation) Gibt es Anzeichen warum der ESP die Krätsche macht?
@SvenLuebke Und kannst du bitte auch erwähnen, ob du MQTT benutzt oder nicht, in welchem Interval die Wechselrichter abgefragt werden (siehe Einstellungen) und in welchem Interval MQTT verschickt wird (siehe Einstellungen)? Danke.
Platform
ESP8266
Model name
LoLin NodeMCU V3 (AliExpr.) 4MB
nRF24L01+ Module
nRF24L01+ plus
Antenna
external antenna
Power Stabilization
nothing
Connection diagram
Connection diagram I used:
Connection picture
Version
0.5.28
Github Hash
2e08ee0
Build & Flash Method
ESP Tools (flash)
Desktop
Linux
Setup
Device Host Name
WiFi
Inverter
Inverter 0
General
NTP Server
MQTT
System Config
Pinout (Wemos)
Radio (NRF24L01+)
Serial Console
Debug Serial Log output
Error description
Approx. every hour my ESP8266 flashed with AhoyDTU does a reset. See logs! Might this be a hardware issue? Any other one experiencing this? Beside that, the software runs quite nicely and I am very satisfied!
Thanks!