Open chris299 opened 10 months ago
Do you have the ability to capture the serial output of your device? There most certainly is valuable information printed through the serial connection right before the device reboots. Please try to capture that moment and post the excerpt from the serial output.
@schlimmchen thank you for your answer. I'm willing to try, but I have no idea how to do this. I assume, I need to connect the ESP32 to my computer, but how to proceed then? (I'm using Ubuntu) Is there a command to create a serial connection and dump the received content to a file?
I'm using Ubuntu
Great!
You will see a device pop up when connecting your ESP32. It may be /dev/ttyUSB0
(or USB1, USB2, etc.) or /dev/ttyACM0
(or 1 or 2 etc.). Connect to it using a serial terminal. I recommend minicom for this application (sudo apt install minicom
). This should do (in a terminal):
minicom -D /dev/ttyUSB0 -C /tmp/capture.txt -b 115200
You should see the output you also see in the Web console. If that's true, just leave it like that until the unwanted reboot occurs. You can also look at /tmp/capture.txt
in parallel to make sure the messages are captured there. Then have a look at the file /tmp/capture.txt
and isolate the messages right before the unwanted reboot. Paste them here as a code block.
@schlimmchen thank you for your advice, however, ~~I can't get it to work.... I attached the ESP32 to the computer but all I get from dmesg is~~ EDIT: got it to work by removing brltty, however... I'll paste the log as soon as it happens again...
@chris299 yes as some people on Discord (you?) mentioned under Linux Mint brltty
(Braille Text Console Service) gets installed initially for Accessibility reasons, i.e. Blind people need that input initially to install / use a computer!
Hence if you need that port /dev/ttyUSB0
free for minicom or other console processes to bind to you may have to stop / uninstall brltty
@stefan123t it was not me on Discord... it took a while but it just happened again attached the full log, reset took place just a few minutes before end of log opendtulog.tar.gz
RX Period End
All missing
Nothing received, resend whole request
TX RealTimeRunData 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 0B 00 65 99 82 AF 00 00 00 00 00 00 00 00 0D E1 83
RX Period End
All missing
Nothing received, resend whole request
TX RealTimeRunData 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 0B 00 65 99 82 AF 00 00 00 00 00 00 00 00 0D E1 83
RX Period End
All missing
Nothing received, resend whole request
TX RealTimeRunData 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 0B 00 65 99 82 AF 00 00 00 00 00 00 00 00 0D E1 83
RX Period End
All missing
Nothing received, resend whole request
TX RealTimeRunData 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 0B 00 65 99 82 AF 00 00 00 00 00 00 00 00 0D E1 83
RX Period End
All missing
Nothing received, resend count exeeded
TX SystemConfigPara 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 05 00 65 99 82 AF 00 00 00 00 00 00 00 00 C3 EE 4C
RX Period End
All missing
Nothing received, resend whole request
TX SystemConfigPara 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 05 00 65 99 82 AF 00 00 00 00 00 00 00 00 C3 EE 4C
Websocket: [/livedata][42] connect
RX Period End
All missing
Nothing received, resend whole request
TX SystemConfigPara 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 05 00 65 99 82 AF 00 00 00 00 00 00 00 00 C3 EE 4C
RX Period End
All missing
Nothing received, resend whole request
TX SystemConfigPara 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 05 00 65 99 82 AF 00 00 00 00 00 00 00 00 C3 EE 4C
CORRUPT HEAP: Bad head at 0x3ffde188. Expected 0xabba1234 got 0x00000000
assert failed: multi_heap_free multi_heap_poisoning.c:259 (head != NULL)
Backtrace: 0x40083a41:0x3ffdcfc0 0x4008eca1:0x3ffdcfe0 0x40094f59:0x3ffdd000 0x40094b51:0x3ffdd130 0x40083fe1:0x3ffdd150 0x40094f89:0x3ffdd170 0x4019e2dd:0x3ffdd190 0x400f937a:0x3ffdd1b0 0x400f990b:0x3ffdd1d0 0x400fbb45:0x3ffdd1f0 0x400f6fe7:0x3ffdd210 0x400f6ff3:0x3ffdd230 0x401a16d6:0x3ffdd250 0x401a19f2:0x3ffdd280 0x401a1a08:0x3ffdd2a0 0x401a1b8e:0x3ffdd2c0
ELF file SHA256: 73d74efa8cfd9041
E (9515) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
Rebooting...
Good job.
@schlimmchen
@schlimmchen as restarts did not happen the last two days I assume the problem is related to the HMS restart in the afternoon / daily yield correction which did not occur as well...
I'm running v24.1.14 with two HMS and have software resets on both cpus too....
(didn't notice until I stumbled upon this ticket)
....and I do use MQTT, but without TLS
....and I have never experienced values being reset to zero unexpectedly
With 24.1.21 I still see the issue, but with a slightly different log output:
assert failed: tcp_update_rcv_ann_wnd IDF/components/lwip/lwip/src/core/tcp.c:951 (new_rcv_ann_wnd <= 0xffff)
Backtrace: 0x40083a41:0x3ffb2cc0 0x4008eca1:0x3ffb2ce0 0x40094f59:0x3ffb2d00 0x4012f4a6:0x3ffb2e30 0x4012f554:0x3ffb2e50 0x401a188a:0x3ffb2e70 0x4012bd60:0x3ffb2e90
ELF file SHA256: e7eb5e57151e0ce0
E (11950) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
Rebooting...
issue still there with current version 24.1.26:
TX SystemConfigPara 865.00 MHz --> 15 84 67 55 36 80 16 22 96 80 05 00 65 BC 0E 59 00 00 00 00 00 00 00 00 BC BC BC
CORRUPT HEAP: Bad tail at 0x3ffde5a4. Expected 0xbaad5678 got 0x00000000
assert failed: multi_heap_free multi_heap_poisoning.c:259 (head != NULL)
Backtrace: 0x40083a41:0x3ffdd360 0x4008eca1:0x3ffdd380 0x40094f59:0x3ffdd3a0 0x40094b51:0x3ffdd4d0 0x40083fe1:0x3ffdd4f0 0x40094f89:0x3ffdd510 0x4010a66d:0x3ffdd530 0x4010a67d:0x3ffdd550 0x400e0cbd:0x3ffdd570 0x400e0cdd:0x3ffdd590 0x400e0ce9:0x3ffdd5b0 0x400f9e72:0x3ffdd5d0 0x400fc05d:0x3ffdd5f0 0x400f74ff:0x3ffdd610 0x400f750b:0x3ffdd630 0x401a243e:0x3ffdd650 0x401a275a:0x3ffdd680 0x401a2770:0x3ffdd6a0 0x401a28f6:0x3ffdd6c0
ELF file SHA256: ba393eda5744a754
E (599) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
Rebooting...
it continues to reboot now and then....
Nothing received, resend whole request
TX AlarmData 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 11 00 65 C2 CD 00 00 00 00 00 00 00 00 00 E1 55 7A
RX Period End
All missing
Nothing received, resend whole request
TX AlarmData 865.00 MHz --> 15 83 74 81 74 80 16 22 96 80 11 00 65 C2 CD 00 00 00 00 00 00 00 00 00 E1 55 7A
Websocket: [/livedata][29] connect
CORRUPT HEAP: Bad tail at 0x3ffde4c0. Expected 0xbaad5678 got 0x00000000
assert failed: multi_heap_free multi_heap_poisoning.c:259 (head != NULL)
Backtrace: 0x40083a41:0x3ffd0290 0x4008eca1:0x3ffd02b0 0x40094f59:0x3ffd02d0 0x40094b51:0x3ffd0400 0x40083fe1:0x3ffd0420 0x40094f89:0x3ffd0440 0x401c71c6:0x3ffd0460 0x40191987:0x3ffd0480 0x401c5d6a:0x3ffd04a0
ELF file SHA256: ba393eda5744a754
E (12014) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
Rebooting...
Given that your issue appears in multiple different ways and that not a lot of other people report the same problems, I want to suggest to check your hardware. Is the power supply good? Can you switch the ESP32 with a different one to confirm it's not an issue with the ESP32?
@schlimmchen I understand that the issue is difficult to track down.
Currently the ESP32 is powered by the usb of the computer for logging, but I had the issue before with two other power supplies with very short cables. IMHO With an insufficient power supply the ESP32 would not give a debug output on memory issues, it would just reboot.
And I am pretty sure that the issue came with the versions that introduced the daily yield fix...
Unfortunately I can not test it with a different hardware for now.
IMHO With an insufficient power supply the ESP32 would not give a debug output on memory issues, it would just reboot.
This is not true.. If the Device gets Powered down because of a Bad Power Connection it also writes "Software Reset CPU"
@Sn0w3y I'll try other power sources the next days (currently running on USB of my computer, which should be stable an powefull enough actually).
But would a power down really give you a message on the serial port CORRUPT HEAP: Bad tail at 0x3ffde4c0. Expected 0xbaad5678 got 0x00000000
as I see it before rebooting?
@Sn0w3y @schlimmchen I tried now 3 other power supplies, one of them from Apple π and it keeps happening. However, I think there is a noticeable difference between versions regarding the frequency of crashes: 24.1.26 reboots more often than later versions
Unfortunately similar for me. Everything went well for 9 days, although individual packages were missing here and there. After that it got worse day by day and yesterday I had to restart. Since then everything has been back to normal. What I notice is the Time correction error messages that always accompany the problem. As soon as I find more than two of these in the daily log, I know it's time to restart the DTU.
Since last update to 24.3.15 it seems to have gotton worse again. Just after one day I had to restart the DTU again. However, the problem always occurs along with the 'Time calibration'-messages. As a workaround, it might be possible to restart the DTU automatically as soon as this message appears.
Since last update to 24.3.15 it seems to have gotton worse again. Just after one day I had to restart the DTU again. However, the problem always occurs along with the 'Time calibration'-messages. As a workaround, it might be possible to restart the DTU automatically as soon as this message appears.
Ahhhh, now its getting closer. Do you use the API of the DTU to regulate the Inverters?
No, I regulate it via MQTT. The only time I use the API is to invoke a reboot instead of going to the DTU and doing a power cycle. :wink: Just FYI: I also often use my desktop and mobile browsers to check the web interface. Maybe it could be a WebSocket disconnection issue?
Nope, Time calibration as often as in your case for me only appeared ONLY if i wrote 0% Power for example at night or during times where the Inverters have not been reachable. Maybe try to stop regulating them, set them to 100% and see if the Error still persists ;)
@Sn0w3y I use IOBroker with the OpenDTU-Adapter. AFAIK it uses only the API, not MQTT. I also don't regulate the Inverters I have at all. I just watch...
To avoid problems, I only send commands to the inverter when it tells me it's producing. Which, looking at my HA log files, works well. Edit: I could change the permanent limitation for a while and stop sending temporary limitations for testing purposes, but regarding of @chris299's post I assume that this wouldn't change the behavior.
Edit: I could change the permanent limitation for a while and stop sending temporary limitations for testing purposes, but regarding of @chris299's post I assume that this wouldn't change the behavior.
Try it and report :) Do NOT regulate anything, restart the openDTU and report :) In my case it fixed it ;D
@Sn0w3y If it had worked, it would be a workaround in the best case but no fix. π But for me it didn't help. Changed the permanent limit, deactivated all automations and this morning:
@Sn0w3y If it had worked, it would be a workaround in the best case but no fix. π But for me it didn't help. Changed the permanent limit, deactivated all automations and this morning:
In Germany we say "a try was it worth" π
Or as an Austrian put it: "So that nobody can say afterwards... See, I told you it would work!" π
@mr-p666 fein aus gedrΓΌckt und besser als "oida geh ..." π ehrenmann und danke π
What happened?
since I tested and updated to solve #1397 with 23.12.18 I see now "Software reset CPU" every day at random times, which erases all values to 0.
I already changed the power source but the problem persists, anyway, a power-related reset shows a "Vbat power on reset" instead, so I assume a software issue within the firmware.
To Reproduce Bug
happens every day....
Expected Behavior
no reset without reason.... (as it was before)
Install Method
Pre-Compiled binary from GitHub
What git-hash/version of OpenDTU?
23.12.28
Relevant log/trace output
No response
Anything else?
No response