tbnobody / OpenDTU

Software for ESP32 to talk to Hoymiles/TSUN/Solenso Inverters
GNU General Public License v2.0
1.82k stars 509 forks source link

no data from hm-800 #1080

Closed kthemall closed 9 months ago

kthemall commented 1 year ago

What happened?

latest firmware 23.6.21 looses connection after a few time (minutes)

konsole-log:

Websocket: [/livedata][12] disconnect 17:21:08.642 > Fetch inverter: xxxxxxxxxxxxxx 17:21:08.786 > TX RealTimeRunData Channel: 40 --> 15 84 60 73 35 80 11 20 52 80 0B 00 64 98 5B 65 00 00 00 00 00 00 00 00 11 E2 EE 17:21:08.882 > Interrupt received 17:21:08.939 > RX Channel: 75 --> 95 84 60 73 35 84 60 73 35 01 00 01 00 08 00 01 00 00 01 13 00 04 00 0C 00 00 86 | -80 dBm 17:21:08.994 > Interrupt received 17:21:09.067 > RX Channel: 40 --> 95 84 60 73 35 84 60 73 35 83 00 00 00 00 00 00 01 47 00 6A 4F 21 54 | -80 dBm 17:21:09.175 > RX Period End 17:21:09.175 > Middle missing 17:21:09.175 > Request retransmit: 2 17:21:09.175 > TX RequestFrame Channel: 61 --> 15 84 60 73 35 80 11 20 52 82 D6 17:21:09.287 > Interrupt received 17:21:09.398 > RX Channel: 23 --> 95 84 60 73 35 84 60 73 35 02 71 7D 00 00 A2 6C 00 00 00 10 09 2C 13 87 00 00 F4 | -80 dBm 17:21:09.497 > RX Period End 17:21:09.497 > Success 17:21:09.572 > TX AlarmData Channel: 75 --> 15 84 60 73 35 80 11 20 52 80 11 00 64 98 5B 65 00 00 00 00 00 00 00 00 CB F9 35 17:21:09.636 > Interrupt received 17:21:09.695 > RX Channel: 61 --> 95 84 60 73 35 84 60 73 35 02 00 03 2D 28 00 00 00 00 00 00 B0 2F 00 5A 2E 01 7B | -80 dBm 17:21:09.807 > Interrupt received 17:21:09.897 > RX Channel: 3 --> 95 84 60 73 35 84 60 73 35 04 02 26 B0 2F 00 5C 2E 03 2E 03 03 9A 02 29 B0 2F 5B | -80 dBm 17:21:09.963 > Interrupt received 17:21:10.031 > RX Channel: 75 --> 95 84 60 73 35 84 60 73 35 06 2E 03 0A 9F 02 24 B0 2F 00 5F 2E 04 2E 04 03 A8 66 | -80 dBm 17:21:10.095 > Interrupt received 17:21:10.220 > RX Channel: 40 --> 95 84 60 73 35 84 60 73 35 0B 00 65 2E 09 2E 09 03 85 02 27 B0 2F 00 66 2E 09 86 | -80 dBm 17:21:10.423 > RX Period End 17:21:10.423 > Last missing 17:21:10.423 > Request retransmit: 12 17:21:10.423 > TX RequestFrame Channel: 3 --> 15 84 60 73 35 80 11 20 52 8C D8 17:21:10.518 > Interrupt received 17:21:10.568 > RX Channel: 23 --> 95 84 60 73 35 84 60 73 35 8C 2E 09 03 88 02 25 55 43 84 | -80 dBm 17:21:10.715 > RX Period End 17:21:10.715 > Middle missing 17:21:10.715 > Request retransmit: 1 17:21:10.715 > TX RequestFrame Channel: 23 --> 15 84 60 73 35 80 11 20 52 81 D5 17:21:10.767 > Interrupt received 17:21:10.829 > RX Channel: 75 --> 95 84 60 73 35 84 60 73 35 01 00 01 B0 01 00 01 2D 20 2D 20 00 00 00 00 20 D1 D4 | -80 dBm 17:21:10.929 > RX Period End 17:21:10.929 > Middle missing 17:21:10.929 > Request retransmit: 3 17:21:10.929 > TX RequestFrame Channel: 40 --> 15 84 60 73 35 80 11 20 52 83 D7 17:21:11.014 > Interrupt received 17:21:11.077 > RX Channel: 75 --> 95 84 60 73 35 84 60 73 35 03 2E 01 03 9E 02 26 B0 2F 00 5B 2E 03 2E 03 03 91 56 | -80 dBm 17:21:11.329 > RX Period End 17:21:11.329 > Middle missing 17:21:11.329 > Request retransmit: 5 17:21:11.329 > TX RequestFrame Channel: 61 --> 15 84 60 73 35 80 11 20 52 85 D1 17:21:11.454 > RX Period End 17:21:11.454 > Middle missing 17:21:11.454 > Request retransmit: 5 17:21:11.454 > TX RequestFrame Channel: 75 --> 15 84 60 73 35 80 11 20 52 85 D1 17:21:11.545 > RX Period End 17:21:11.545 > Middle missing 17:21:11.545 > Retransmit timeout

To Reproduce Bug

turn it on and wait a few moments

Expected Behavior

should send data every 10 seconds as configured in the dtu settings.

Install Method

Pre-Compiled binary from GitHub

What git-hash/version of OpenDTU?

23.6.21

Relevant log/trace output

Websocket: [/livedata][12] disconnect
17:21:08.642 > Fetch inverter: xxxxxxxxxxxxxx
17:21:08.786 > TX RealTimeRunData Channel: 40 --> 15 84 60 73 35 80 11 20 52 80 0B 00 64 98 5B 65 00 00 00 00 00 00 00 00 11 E2 EE 
17:21:08.882 > Interrupt received
17:21:08.939 > RX Channel: 75 --> 95 84 60 73 35 84 60 73 35 01 00 01 00 08 00 01 00 00 01 13 00 04 00 0C 00 00 86 | -80 dBm
17:21:08.994 > Interrupt received
17:21:09.067 > RX Channel: 40 --> 95 84 60 73 35 84 60 73 35 83 00 00 00 00 00 00 01 47 00 6A 4F 21 54 | -80 dBm
17:21:09.175 > RX Period End
17:21:09.175 > Middle missing
17:21:09.175 > Request retransmit: 2
17:21:09.175 > TX RequestFrame Channel: 61 --> 15 84 60 73 35 80 11 20 52 82 D6 
17:21:09.287 > Interrupt received
17:21:09.398 > RX Channel: 23 --> 95 84 60 73 35 84 60 73 35 02 71 7D 00 00 A2 6C 00 00 00 10 09 2C 13 87 00 00 F4 | -80 dBm
17:21:09.497 > RX Period End
17:21:09.497 > Success
17:21:09.572 > TX AlarmData Channel: 75 --> 15 84 60 73 35 80 11 20 52 80 11 00 64 98 5B 65 00 00 00 00 00 00 00 00 CB F9 35 
17:21:09.636 > Interrupt received
17:21:09.695 > RX Channel: 61 --> 95 84 60 73 35 84 60 73 35 02 00 03 2D 28 00 00 00 00 00 00 B0 2F 00 5A 2E 01 7B | -80 dBm
17:21:09.807 > Interrupt received
17:21:09.897 > RX Channel: 3 --> 95 84 60 73 35 84 60 73 35 04 02 26 B0 2F 00 5C 2E 03 2E 03 03 9A 02 29 B0 2F 5B | -80 dBm
17:21:09.963 > Interrupt received
17:21:10.031 > RX Channel: 75 --> 95 84 60 73 35 84 60 73 35 06 2E 03 0A 9F 02 24 B0 2F 00 5F 2E 04 2E 04 03 A8 66 | -80 dBm
17:21:10.095 > Interrupt received
17:21:10.220 > RX Channel: 40 --> 95 84 60 73 35 84 60 73 35 0B 00 65 2E 09 2E 09 03 85 02 27 B0 2F 00 66 2E 09 86 | -80 dBm
17:21:10.423 > RX Period End
17:21:10.423 > Last missing
17:21:10.423 > Request retransmit: 12
17:21:10.423 > TX RequestFrame Channel: 3 --> 15 84 60 73 35 80 11 20 52 8C D8 
17:21:10.518 > Interrupt received
17:21:10.568 > RX Channel: 23 --> 95 84 60 73 35 84 60 73 35 8C 2E 09 03 88 02 25 55 43 84 | -80 dBm
17:21:10.715 > RX Period End
17:21:10.715 > Middle missing
17:21:10.715 > Request retransmit: 1
17:21:10.715 > TX RequestFrame Channel: 23 --> 15 84 60 73 35 80 11 20 52 81 D5 
17:21:10.767 > Interrupt received
17:21:10.829 > RX Channel: 75 --> 95 84 60 73 35 84 60 73 35 01 00 01 B0 01 00 01 2D 20 2D 20 00 00 00 00 20 D1 D4 | -80 dBm
17:21:10.929 > RX Period End
17:21:10.929 > Middle missing
17:21:10.929 > Request retransmit: 3
17:21:10.929 > TX RequestFrame Channel: 40 --> 15 84 60 73 35 80 11 20 52 83 D7 
17:21:11.014 > Interrupt received
17:21:11.077 > RX Channel: 75 --> 95 84 60 73 35 84 60 73 35 03 2E 01 03 9E 02 26 B0 2F 00 5B 2E 03 2E 03 03 91 56 | -80 dBm
17:21:11.329 > RX Period End
17:21:11.329 > Middle missing
17:21:11.329 > Request retransmit: 5
17:21:11.329 > TX RequestFrame Channel: 61 --> 15 84 60 73 35 80 11 20 52 85 D1 
17:21:11.454 > RX Period End
17:21:11.454 > Middle missing
17:21:11.454 > Request retransmit: 5
17:21:11.454 > TX RequestFrame Channel: 75 --> 15 84 60 73 35 80 11 20 52 85 D1 
17:21:11.545 > RX Period End
17:21:11.545 > Middle missing
17:21:11.545 > Retransmit timeout

Anything else?

reboot of the hm-800 and it works again for a few minutes.

Technikfan commented 1 year ago

Hello, I have been observing the same problem for a few weeks. In the past, my HM-1500 and ESP32-PICO-D4 with nRF24 and OLED display SH1106 ran fine. I periodically look for new firmware and download it with OTA, on my OpenDTU, so I can not reflect since when the problem occurs.

But the behavior is as described above. I get my data values from the OpenDTU via MQTT in my iobroker and everything seems fine and I let the system run unattended. Then I happen to look at the OpenDTU Disply and it says "Offline" and in a few seconds it's fine again. This alternates several times during runtime. Sometimes in the morning there is also "Offline" in the display and it does not go online by itself, then only resetting the ESP32 with the reset button helps.

There is no pattern to recognize.

My console log is attached. Currently loaded firmware is v23.6.21.

Translated with www.DeepL.com/Translator (free version)

console-log_Offline.txt

Technikfan commented 1 year ago

Hello,

a new status report. I've done three things in the last two days.

  1. Due to the known boot problem on (some) ESP32 boards I put a 10uF capacitor between GND and EN (reset). It solved my problem that the firmware did not start after cycling the power of the ESP32 board.
  2. I was trying to find a new place for the openDTU outside on my balcony to get better dBm performance on the channels. Better than -80dBm.
  3. For this I needed a power source other than the mains plug. So I connected a power pack (solar charged) But outside only one channel achieves better dBm (-30 dBm) while the other channels stay achieve -80 dBm. I put the openDTU with the powerpack back in the old place in the appartment.

But since these steps I have never seen an "Offline" on the display again, not outside and inside my apartment. Maybe a power problem?

For a better diagnosis it would be helpful to see the renewing messages of the Display in the console log. And maybe it's possible to show a "clone" of the display view in the info menu on the site?

I will keep watching.

Best regards Uwe

jstammi commented 1 year ago

@Technikfan

I was trying to find a new place for the openDTU outside on my balcony to get better dBm performance on the channels. Better than -80dBm.

AFAIS in the sourcecode, the rssi is not measured really for HM-6/7/800, can be only -30 or -80dBm (for all inverters used with NRF radio). And meaning is, that -30dBm = "Strong signal > 64dBm", -80dBm = "Weak signal < 64dBm" (HoymilesRadio_NRF line 58)

As you receive packets in principle, I guess you are good for this. And if packets are missing ... IMHO this is quite normal. And this is no reason for stopping working completely.

But concerning power supply there are 2notes in the troubleshooting section, that could be relevant (and I followed it's advice already from the beginning): "

Did you change the transmit power to a higher level?

If so, then for sure, I guess you should add the capacitor. Else ... it should not do any harm ;-).

Plus, in any case: by moving your device, you may have touched the cabling. Maybe there is some sub-optimal soldering or (close-to-)broken cable (really annoying, just faced this recently). I would double-check with a strong magnifier and additionally measure cabling.

And then there is the WLAN connection, that I see causing problems. What is shown to you for it's RSSI in Info -> Network?

Technikfan commented 1 year ago

Hello,

No, I did not change the transmit power. I switched it to max from the start. The RSSI from the WLAN (the AP is 5 meters away) and the power cable are fine. The DTU has been working fine for a few months after the DTU command failed issue disappeared.

From now on, the topic of "offline" comes up. And since today the “DTU command failed” is also back.

What does the message "Offline" mean? Does it stand for transmission via NRF24 or for Wifi or for both? I'll watch it for a few days and then try again with the power supply and the 10uF capacitor.

Best regards Uwe

Technikfan commented 1 year ago

Hello, next update:

My power pack was empty and I connected the power supply (in series) to run the DTU and charge the power pack at the same time.

The offline message was back. I then connected a 10uF capacitor to the 5V line, thought better safe than sorry.

But the offline message was still on the display.

After a while and various resets and wiggling the cables, marveling and waiting... the offline message was gone, but the display showed 0W for a long time, too long. Then I pulled off the power supply to make the stuff mobile and to be able to take it outside again if necessary. At the same moment, the current wattage appears on the display. Reconnected the power supply and left everything as is, it still works.

It's difficult to check what's happening all the time without messages on the web interface or the console when the DTU is out of sight of the PC like mine. Maybe the offline message is just a false message? We already asked, what's offline?

Best regards Uwe

jstammi commented 1 year ago

With max from the start ... the capacitor is highly recommended. The one at the NRF 3,3V Vcc/Gnd(!). Or reduce TX power again. But in order to identify this as the cause, you 1st need to be able to reproduce the problem again: that it is stopping to work after some minutes.

WLAN RSSI: if you have multiple WLAN APs and the one 5m away becomes inaccessible for whatever reason (eg reboot), at least for me the openDTU locks to another one with low quality connection. And it does not return to the best one until I restart the openDTU or the AP the openDTU has meanwhile locked to. But as you wrote that it works for some minutes, then after restart again, IMHO this should not be the topic for your device.

"DTU command failed": this is something appearing and going again from time to time here. I did not get sorted out finally what this correlates with. Me personally suspects that something get's blocked due to data from openDTU is not transmitted fast enough by it's wlan connection. mqtt; plus for each browser showing the live view, bwo a websocket connection. Reducing publishing and polling frequencies and/or disabling publising for each panel on mqtt should reduce the problem then. But I did not further test on this, as I already observed the message not appearing for days and weeks. And then suddenly they show up again. This makes verification close to impossible atm.

"Offline": if I understand the sources correctly, then this means, that the RX failure count for realtime data for ALL inverters working on is >2. This counter increases (for each inverter independently) every time that you see reporting on a missing fragment(s) after the "RX period end". And it is reset to 0 on a success message for the affected inverter only afterwards. Unfortunately this offline message is not given in the console. As I do not have a display attached, I do not know if I ever faced this situation off-nominal (I have several console logs).

"Display out of sight": you may open it's live view in a browser (easy). The green sometimes turns to yellow/red followed by blue and green again. But this is something I never had a closer look in the sources to, yet, so I cannot in detail tell about their meaning (yet). Or you can grab & save the websocket of the openDTU (advanced, e.g. using wscat) for later offline analysis.

Technikfan commented 1 year ago

It is difficult to isolate the problem. This morning I had to reset the openDTU and then it shows 0W (this is definitely wrong) and after 10s it shows the correct values. Then runs for a while (1-2 hours) and later I have to reset it again and it runs the rest of the day. I just looked at the "Last Updated" counter and got confused. The display shows "Online" and the last synchronization was 130 seconds ago and continues to count up. And then I refresh the web GUI and whoops, the refresh stand every 3-4 seconds. Sometimes this happens even without interaction. Overall, this gives me the impression that communication between the libraries is unstable and misinterpretations occur. I've now slowed down the MQTT interval to 10s. Let's see if that has an impact.

jstammi commented 1 year ago

The last_updated in mqtt topics shows the time, when openDTU received last valid Statistics aka RunTimeData from the inverter. This time is used also for calculating the duration since last update in the web ui.

The display shows the current time of the openDTU (ntp sync'ed) and it is completely independent from any inverter data.

The uptime in mqtt and system info (static) is completely independent from any inverters, too, and shows the duration the openDTU is running.

The web ui opens a websocket towards the openDTU and receives it's data from there. With not having the browser/browser tab active in the foreground, it happens to me, too, that I need to reload the tab to see valid data again - without affecting mqtt updates nor communication with the inverter.

Considering this and from what you describe, I am quite sure that you are facing a communication problem between openDTU and inverter. As the values you describe as being stuck are all related to that communication path.

Did you reduce TX power and/or add the condensator, as shown here https://github.com/tbnobody/OpenDTU/issues/1105 between 3,3V and GND close to the NRF in the lower right?

Technikfan commented 1 year ago

Hello,

The capacitor between 3.3V and GND did not change the situation.

I lowered the transmit power from maximum to high and checked on the console. Now the message "Nothing received, resend entire request" sounds for the first time. But it seems that I am getting data from the inverter.

I keep looking.

Enigmatic greetings

2023-07-04_0917_Console-Log.txt

Technikfan commented 1 year ago

There are no changes. The same problems as every day. I have now set the interval of DTU and MQTT to 10s.

I think the tip for a 10uF capacitor on the RF module is obsolete by an powersupply with enough ampere.

Maybe the quality from the RF Moudul inside or the antenna the HM is the problem. As next step i will spare the NRF24L01+ with pcb-antenna with one with external antenna.

There are more then one bug reports here, should they better merged to one? https://github.com/tbnobody/OpenDTU/issues/1105 https://github.com/tbnobody/OpenDTU/issues/1032 https://github.com/tbnobody/OpenDTU/issues/910

Greetings Uwe

Technikfan commented 1 year ago

After a few days of watching it feels like the problems are getting bigger.

More than one in the morning I saw on the display a wattage, but after a while I recongnizing that is a constant display and not the realy wattage the inverted delivers. After reset the esp works again.

The message for Command Failure, Offline are there and then not and then again.

The only constant is the work of openDTU has no stability.

And the version is now 23.7.12

Desperate greetings

jstammi commented 1 year ago

Ok, then please let us start at the very beginning. Could you please provide images of your openDTU and it's wiring?

And is it possible for you to provide a serial log, not only the console log? With containing some some minutes it is working correctly with finally losing the connection?

tbnobody commented 9 months ago

Closing this issue do to inactivity. Feel free to reopen if the error still occours.

github-actions[bot] commented 7 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.