tbnobody / OpenDTU

Software for ESP32 to talk to Hoymiles/TSUN/Solenso Inverters
GNU General Public License v2.0
1.77k stars 495 forks source link

Opendtu since several versions sometimes needs restart to recive data from hm-800 and hm1200 #1587

Open pos-ei-don opened 8 months ago

pos-ei-don commented 8 months ago

What happened?

Some days, after the Inverters start, sometimes opendtu ist still Red and recived no connection from the inverter. A simple restart help and it starts again reciving data (and so sending it via mqtt).

I thought in Might be in connection with a spéciale Version, but i See this since a whole year.

To Reproduce Bug

Just wait some days. Sometimes it works two ore more days. Maybe it is my Hardware combination, but it is no special Hardware and I have two different Devices showing the same effect

Expected Behavior

Trying to establish the connection on its own.

Install Method

Pre-Compiled binary from GitHub

What git-hash/version of OpenDTU?

v23.12.19

Relevant log/trace output

Here you go

Anything else?

No response

DehGit commented 8 months ago

Similar issue here:

Updated from v23.9.18 to v23.12.24 yesterday. This morning, opendtu stayed "offline". Only after a power cycle it connected to the HM-1500. After 60min it went offline again. Again, a power cycle made it connected immediately.

The whole setup was running without a single issue for 5 months, also with versions prior to v23.9.18.

Hardware: ESP32-D0WD-V3 with nRF24L01+ and HM-1500. Always using precompiled bin from GitHub (opendtu-generic.bin) for OTA update.

stefan123t commented 8 months ago

@pos-ei-don @DehGit you both did not attach a log file with the console output during the time of no data / no connection between DTU and inverter.

This should give us some hint on whether it is a communication problem or something with the lately introduced TaskScheduler which may have become out of order. Though my hunch is a simple comm's problem as you mentioned versions far prior to the TaskScheduler introduced in December 2023.

pos-ei-don commented 8 months ago

@stefan123t I tried to upload the log last time, but some how it didnt work. I now have to wait for the next occurance to 'harvest' the logs.

DehGit commented 8 months ago

@stefan123t yes, wanted to get it back running asap, forgot about the log. Currently, it's running fine. In case it happens again, I will try to get the log. In my case the setup wasn't touched (literally) for months. Hence, an RF or contact issue only after a software update seems strange. But I can't exclude it entirely. Only since yesterday I use a version after v23.9.18. So theoretically, it could be the scheduler...

DehGit commented 8 months ago

Up to now it's running smoothly without any issue. Maybe it was only a short hickup after the big upgrade...?

pos-ei-don commented 8 months ago

I had it twice since reporting, I also thought that I had Updates the issue, but now I do. Not See any Logs here. So I have to wait for the next time.

pos-ei-don commented 8 months ago

RX Period End 16:57:16.510 > All missing 16:57:16.510 > Nothing received, resend whole request 16:57:16.510 > TX RealTimeRunData Channel: 40 --> 15 82 80 62 56 80 11 86 48 80 0B 00 65 9C 1B 59 00 00 00 00 00 00 00 00 37 09 72 16:57:17.124 > RX Period End 16:57:17.124 > All missing 16:57:17.124 > Nothing received, resend whole request 16:57:17.124 > TX RealTimeRunData Channel: 61 --> 15 82 80 62 56 80 11 86 48 80 0B 00 65 9C 1B 59 00 00 00 00 00 00 00 00 37 09 72 16:57:17.733 > RX Period End 16:57:17.733 > All missing 16:57:17.733 > Nothing received, resend whole request 16:57:17.733 > TX RealTimeRunData Channel: 75 --> 15 82 80 62 56 80 11 86 48 80 0B 00 65 9C 1B 59 00 00 00 00 00 00 00 00 37 09 72 16:57:18.240 > RX Period End 16:57:18.240 > All missing 16:57:18.240 > Nothing received, resend count exeeded 16:57:18.339 > TX SystemConfigPara Channel: 3 --> 15 82 80 62 56 80 11 86 48 80 05 00 65 9C 1B 59 00 00 00 00 00 00 00 00 F9 06 BD 16:57:18.718 > RX Period End 16:57:18.718 > All missing 16:57:18.718 > Nothing received, resend whole request 16:57:18.718 > TX SystemConfigPara Channel: 23 --> 15 82 80 62 56 80 11 86 48 80 05 00 65 9C 1B 59 00 00 00 00 00 00 00 00 F9 06 BD 16:57:19.027 > RX Period End 16:57:19.027 > All missing 16:57:19.027 > Nothing received, resend whole request 16:57:19.027 > TX SystemConfigPara Channel: 40 --> 15 82 80 62 56 80 11 86 48 80 05 00 65 9C 1B 59 00 00 00 00 00 00 00 00 F9 06 BD 16:57:19.274 > RX Period End 16:57:19.275 > All missing 16:57:19.275 > Nothing received, resend whole request 16:57:19.275 > TX SystemConfigPara Channel: 61 --> 15 82 80 62 56 80 11 86 48 80 05 00 65 9C 1B 59 00 00 00 00 00 00 00 00 F9 06 BD 16:57:19.644 > RX Period End 16:57:19.644 > All missing 16:57:19.644 > Nothing received, resend whole request 16:57:19.644 > TX SystemConfigPara Channel: 75 --> 15 82 80 62 56 80 11 86 48 80 05 00 65 9C 1B 59 00 00 00 00 00 00 00 00 F9 06 BD 16:57:19.783 > RX Period End 16:57:19.783 > All missing 16:57:19.783 > Nothing received, resend count exeeded 16:57:20.200 > Fetch inverter: 116182806256 16:57:20.200 > Request SystemConfigPara 16:57:20.293 > TX RealTimeRunData Channel: 3 --> 15 82 80 62 56 80 11 86 48 80 0B 00 65 9C 1B 5E 00 00 00 00 00 00 00 00 07 2F 63

pos-ei-don commented 8 months ago

I hope this time it works, it is late and the sun is low, but this didnt work the whole Day. When I reboot opendtu, data is recived immediately.

tbnobody commented 8 months ago

Have you installed a capacitor on the nRF module? If the nRF module resets due to some reason it gets only initialized again after a reboot.

pos-ei-don commented 8 months ago

To be honest, I dont know. I use a prebuild Module which I bought on ebay. Can I see such a reset in the Logs, can I initiate a reboot via some automation? It would be Nixe if opendtu automatically restart if it detects such a reset.

tbnobody commented 8 months ago

It would be Nixe if opendtu automatically restart if it detects such a reset.

Cannot detect it as it makes no difference between night hours when is inverter is not reachable or when the wiring is wrong or when there is a defect.

pos-ei-don commented 8 months ago

I dont mind a false positive reboot. Also an Option like "reboot every Day at 10:00" or even "reboot every 1h at daytime" would be a good solution. (or reboot if no data is recived for 30m).

stefan123t commented 8 months ago

As tbnobody mentioned there is no difference between normal night time (naturally no response from sleeping inverters) and your erroneous situation.

@pos-ei-don you can check the USB Serial Console for the logs which should also be sent when a reset occurs. You need to connect a Serial USB cable to your ESP board and keep it connected to your PC in order to receive and store the logfile. See here for the documentation on how to do that: https://tbnobody.github.io/OpenDTU-docs/firmware/howto/serial_console/

pos-ei-don commented 8 months ago

My opendtu is on a Pole mast, so usb is not easy, especially in Winter right now.. But it you Provider a "restart opendtu" in HA as you currently provicde a "restart inverter" Button, I could write an automation to restart it on my own.

001oc commented 8 months ago

maybe similar intermediate fix as in #1468 will help you

pos-ei-don commented 8 months ago

I read the rhewad, but what fix Do you See there?

001oc commented 8 months ago

maybe opendtu gets to much mqtt messages at the wrong time. That was my problem. After making sure that no mqtt messages are going to opendtu nightly it restart at sunrise without any problem. I think opendtu has to solve such problems by itself but ... as long as it doesnt take care of itself ...

mr-p666 commented 7 months ago

Since the update to v24.2.16, which updates the MQTT client and the Async browser, the connection seems to be significantly more stable than before. Since then I have not had any problems with MQTT or the browser (very rarely single MQTT packages are missing). For me there are no more power cycles. I will continue to monitor and report if the behavior changes again.

dhzl84 commented 5 months ago

Also having this issue from time to time... Would try the capacitor, is there any guide for that?

Websocket: [/livedata][6] disconnect 09:00:23.409 > Fetch inverter: 116191025072 09:00:23.410 > Request SystemConfigPara 09:00:23.420 > TX RealTimeRunData Channel: 23 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 05 00 00 00 00 00 00 00 00 4E A4 54 09:00:23.956 > RX Period End 09:00:23.957 > All missing 09:00:23.957 > Nothing received, resend whole request 09:00:23.957 > TX RealTimeRunData Channel: 40 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 05 00 00 00 00 00 00 00 00 4E A4 54 09:00:24.494 > RX Period End 09:00:24.494 > All missing 09:00:24.494 > Nothing received, resend whole request 09:00:24.494 > TX RealTimeRunData Channel: 61 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 05 00 00 00 00 00 00 00 00 4E A4 54 09:00:25.032 > RX Period End 09:00:25.032 > All missing 09:00:25.032 > Nothing received, resend whole request 09:00:25.032 > TX RealTimeRunData Channel: 75 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 05 00 00 00 00 00 00 00 00 4E A4 54 09:00:25.563 > RX Period End 09:00:25.563 > All missing 09:00:25.563 > Nothing received, resend whole request 09:00:25.563 > TX RealTimeRunData Channel: 3 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 05 00 00 00 00 00 00 00 00 4E A4 54 09:00:26.072 > RX Period End 09:00:26.073 > All missing 09:00:26.073 > Nothing received, resend count exeeded 09:00:26.111 > TX SystemConfigPara Channel: 23 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 05 00 00 00 00 00 00 00 00 80 AB 9B 09:00:27.473 > RX Period End 09:00:27.473 > All missing 09:00:27.473 > Nothing received, resend whole request 09:00:27.473 > TX SystemConfigPara Channel: 40 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 05 00 00 00 00 00 00 00 00 80 AB 9B 09:00:27.499 > RX Period End 09:00:27.499 > All missing 09:00:27.499 > Nothing received, resend whole request 09:00:27.499 > TX SystemConfigPara Channel: 61 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 05 00 00 00 00 00 00 00 00 80 AB 9B 09:00:27.524 > RX Period End 09:00:27.524 > All missing 09:00:27.524 > Nothing received, resend whole request 09:00:27.524 > TX SystemConfigPara Channel: 75 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 05 00 00 00 00 00 00 00 00 80 AB 9B 09:00:27.543 > RX Period End 09:00:27.543 > All missing 09:00:27.543 > Nothing received, resend whole request 09:00:27.543 > TX SystemConfigPara Channel: 3 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 05 00 00 00 00 00 00 00 00 80 AB 9B 09:00:27.570 > RX Period End 09:00:27.570 > All missing 09:00:27.570 > Nothing received, resend count exeeded 09:00:30.395 > Fetch inverter: 114184212942 09:00:30.395 > Request SystemConfigPara 09:00:30.467 > TX RealTimeRunData Channel: 23 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 0C 00 00 00 00 00 00 00 00 1E CE 18 09:00:30.971 > RX Period End 09:00:30.972 > All missing 09:00:30.972 > Nothing received, resend whole request 09:00:30.972 > TX RealTimeRunData Channel: 40 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 0C 00 00 00 00 00 00 00 00 1E CE 18 09:00:31.504 > RX Period End 09:00:31.504 > All missing 09:00:31.504 > Nothing received, resend whole request 09:00:31.504 > TX RealTimeRunData Channel: 61 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 0C 00 00 00 00 00 00 00 00 1E CE 18 09:00:32.032 > RX Period End 09:00:32.033 > All missing 09:00:32.033 > Nothing received, resend whole request 09:00:32.033 > TX RealTimeRunData Channel: 75 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 0C 00 00 00 00 00 00 00 00 1E CE 18 09:00:32.566 > RX Period End 09:00:32.566 > All missing 09:00:32.566 > Nothing received, resend whole request 09:00:32.566 > TX RealTimeRunData Channel: 3 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 0C 00 00 00 00 00 00 00 00 1E CE 18 09:00:33.077 > RX Period End 09:00:33.077 > All missing 09:00:33.077 > Nothing received, resend count exeeded 09:00:33.110 > TX SystemConfigPara Channel: 23 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 0C 00 00 00 00 00 00 00 00 D0 C1 D7 09:00:33.335 > RX Period End 09:00:33.335 > All missing 09:00:33.335 > Nothing received, resend whole request 09:00:33.335 > TX SystemConfigPara Channel: 40 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 0C 00 00 00 00 00 00 00 00 D0 C1 D7 09:00:33.565 > RX Period End 09:00:33.565 > All missing 09:00:33.565 > Nothing received, resend whole request 09:00:33.565 > TX SystemConfigPara Channel: 61 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 0C 00 00 00 00 00 00 00 00 D0 C1 D7 09:00:33.823 > RX Period End 09:00:33.823 > All missing 09:00:33.823 > Nothing received, resend whole request 09:00:33.823 > TX SystemConfigPara Channel: 75 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 0C 00 00 00 00 00 00 00 00 D0 C1 D7 09:00:34.055 > RX Period End 09:00:34.055 > All missing 09:00:34.055 > Nothing received, resend whole request 09:00:34.055 > TX SystemConfigPara Channel: 3 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 0C 00 00 00 00 00 00 00 00 D0 C1 D7 09:00:34.269 > RX Period End 09:00:34.269 > All missing 09:00:34.269 > Nothing received, resend count exeeded 09:00:37.405 > Fetch inverter: 116191025072 09:00:37.405 > Request SystemConfigPara 09:00:37.429 > TX RealTimeRunData Channel: 23 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 13 00 00 00 00 00 00 00 00 2E 42 C4 09:00:37.974 > RX Period End 09:00:37.974 > All missing 09:00:37.974 > Nothing received, resend whole request 09:00:37.974 > TX RealTimeRunData Channel: 40 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 13 00 00 00 00 00 00 00 00 2E 42 C4 09:00:38.493 > RX Period End 09:00:38.493 > All missing 09:00:38.493 > Nothing received, resend whole request 09:00:38.493 > TX RealTimeRunData Channel: 61 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 13 00 00 00 00 00 00 00 00 2E 42 C4 09:00:39.037 > RX Period End 09:00:39.037 > All missing 09:00:39.037 > Nothing received, resend whole request 09:00:39.037 > TX RealTimeRunData Channel: 75 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 13 00 00 00 00 00 00 00 00 2E 42 C4 09:00:39.582 > RX Period End 09:00:39.583 > All missing 09:00:39.583 > Nothing received, resend whole request 09:00:39.583 > TX RealTimeRunData Channel: 3 --> 15 91 02 50 72 80 10 92 56 80 0B 00 66 0B AD 13 00 00 00 00 00 00 00 00 2E 42 C4 09:00:40.124 > RX Period End 09:00:40.125 > All missing 09:00:40.125 > Nothing received, resend count exeeded 09:00:40.184 > TX SystemConfigPara Channel: 23 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 13 00 00 00 00 00 00 00 00 E0 4D 0B 09:00:40.417 > RX Period End 09:00:40.418 > All missing 09:00:40.418 > Nothing received, resend whole request 09:00:40.418 > TX SystemConfigPara Channel: 40 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 13 00 00 00 00 00 00 00 00 E0 4D 0B 09:00:40.645 > RX Period End 09:00:40.645 > All missing 09:00:40.645 > Nothing received, resend whole request 09:00:40.645 > TX SystemConfigPara Channel: 61 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 13 00 00 00 00 00 00 00 00 E0 4D 0B 09:00:40.900 > RX Period End 09:00:40.900 > All missing 09:00:40.900 > Nothing received, resend whole request 09:00:40.900 > TX SystemConfigPara Channel: 75 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 13 00 00 00 00 00 00 00 00 E0 4D 0B 09:00:41.122 > RX Period End 09:00:41.122 > All missing 09:00:41.122 > Nothing received, resend whole request 09:00:41.122 > TX SystemConfigPara Channel: 3 --> 15 91 02 50 72 80 10 92 56 80 05 00 66 0B AD 13 00 00 00 00 00 00 00 00 E0 4D 0B 09:00:41.329 > RX Period End 09:00:41.329 > All missing 09:00:41.329 > Nothing received, resend count exeeded 09:00:44.437 > Fetch inverter: 114184212942 09:00:44.437 > Request SystemConfigPara 09:00:44.511 > TX RealTimeRunData Channel: 23 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 1A 00 00 00 00 00 00 00 00 7E 28 88 09:00:44.954 > RX Period End 09:00:44.954 > All missing 09:00:44.954 > Nothing received, resend whole request 09:00:44.954 > TX RealTimeRunData Channel: 40 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 1A 00 00 00 00 00 00 00 00 7E 28 88 09:00:45.510 > RX Period End 09:00:45.510 > All missing 09:00:45.510 > Nothing received, resend whole request 09:00:45.510 > TX RealTimeRunData Channel: 61 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 1A 00 00 00 00 00 00 00 00 7E 28 88 09:00:46.046 > RX Period End 09:00:46.046 > All missing 09:00:46.046 > Nothing received, resend whole request 09:00:46.046 > TX RealTimeRunData Channel: 75 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 1A 00 00 00 00 00 00 00 00 7E 28 88 09:00:46.610 > RX Period End 09:00:46.610 > All missing 09:00:46.610 > Nothing received, resend whole request 09:00:46.610 > TX RealTimeRunData Channel: 3 --> 15 84 21 29 42 80 10 92 56 80 0B 00 66 0B AD 1A 00 00 00 00 00 00 00 00 7E 28 88 09:00:47.162 > RX Period End 09:00:47.162 > All missing 09:00:47.162 > Nothing received, resend count exeeded 09:00:47.209 > TX SystemConfigPara Channel: 23 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 1A 00 00 00 00 00 00 00 00 B0 27 47 09:00:47.397 > RX Period End 09:00:47.397 > All missing 09:00:47.397 > Nothing received, resend whole request 09:00:47.397 > TX SystemConfigPara Channel: 40 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 1A 00 00 00 00 00 00 00 00 B0 27 47 09:00:47.595 > RX Period End 09:00:47.595 > All missing 09:00:47.595 > Nothing received, resend whole request 09:00:47.595 > TX SystemConfigPara Channel: 61 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 1A 00 00 00 00 00 00 00 00 B0 27 47 09:00:47.881 > RX Period End 09:00:47.881 > All missing 09:00:47.881 > Nothing received, resend whole request 09:00:47.881 > TX SystemConfigPara Channel: 75 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 1A 00 00 00 00 00 00 00 00 B0 27 47 09:00:48.060 > RX Period End 09:00:48.060 > All missing 09:00:48.060 > Nothing received, resend whole request 09:00:48.060 > TX SystemConfigPara Channel: 3 --> 15 84 21 29 42 80 10 92 56 80 05 00 66 0B AD 1A 00 00 00 00 00 00 00 00 B0 27 47 09:00:48.283 > RX Period End 09:00:48.283 > All missing 09:00:48.283 > Nothing received, resend count exeeded

001oc commented 5 months ago

maybe #1468 can help you to get further

stefan123t commented 1 month ago

@pos-ei-don @dhzl84 In the logs we can see the TX of the two commands SystemConfigPara and RealTimeRunData which will ask the inverter to send us the inverter info (especially to verify the current ActivePowerLimit) and the expected current data. In both your logfile excerpts we can only see that the inverter is not responding / RX at all.

This usually happens when either the inverter does not receive the commands from the DTU due to several physical or RF difficulties or due to too frequent ActivePowerLimit commands sent to the inverter. The inverter requires a certain time (usually ~15 seconds) to reach the new ActivePowerLimit sent by the OpenDTU and it will resist to provide any other information / answer for up to 15 minutes in case the same or a new ActivePowerLimit or some other request is sent earlier.

Can you please log a longer log file including the time when it still worked as expected and answered until it stops to respond, so we can better understand the sequence of commands / requests that render the inverter unresponsive ?


Please note that you may attach files to the Issue and/or use the [/] slash commands Details and Code block in order to make a collapsible code block which allows the output from the console to be expanded / collapsed at will. Thanks!

<details><summary>Details</summary>
<p>
` ``
all your log lines are belong to us
` `` 
</p>
</details> 

I had to escape the triple backticks with a space please use ``` instead.

pos-ei-don commented 1 month ago

stefan123t Here is a Log extract. I just changed the InverterNo. It didn't fetch values for several days! Right after a reboot, all data got fetched and updated as expected. opendtu.txt

stefan123t commented 1 month ago

@pos-ei-don you did not record the failure in action, ie first running without problems then the issue occurs. It is just the other way around several minutes of the issue and then after reboot working condition again.

The log starts at 19:36 and runs through 19:52. I can only see that it does not communicate with the inverter despite several retries and exceeding the retry count and then trying again with the next/same inverter.

pos-ei-don commented 1 month ago

it will be difficult to catch this moment, sometimes it works a couple of days without any error. Currently it has 5 days uptime without this error.

There is no syslog-feature o opendtu? MQTT would also be possible to record if that is possible.

stefan123t commented 1 month ago

@pos-ei-don you may attach putty or another terminal to the USB port of your DTU and start logging.

pos-ei-don commented 1 month ago

This is far away from my house, no Power for a Laptop there. The usb cable need to be 40m long. I wont be able to do this for several days. Sorry.

dhzl84 commented 1 month ago

I do not experience the issue since switching from a two inverter setup to a one inverter setup. I'll post an update when it happens again.