[BUG] Retry on Modbus not working

johny-mnemonic commented 1 year ago

Describe the bug Once there is an error communicating with the inverter the integration gets stuck and can't recover from it.

Mandatory details

Integration Version: 2023.01.1
HA Version: 2023.1.2
Inverter Brand: Solax X3 Gen4 10kW
Plugin Used:
First 6 digits of Serial: H34A10
Connection Method: ModBus TCP using WaveShare RS485 TO ETH (B)

Detailed Error Log

Logger: custom_components.solax_modbus
Source: custom_components/solax_modbus/__init__.py:469
Integration: SolaX Inverter Modbus (documentation, issues)
First occurred: 15:00:52 (113 occurrences)
Last logged: 18:06:45

Something went wrong reading from modbus
Traceback (most recent call last):
  File "/config/custom_components/solax_modbus/__init__.py", line 407, in read_modbus_data
    res = self.read_modbus_registers_all()
  File "/config/custom_components/solax_modbus/__init__.py", line 482, in read_modbus_registers_all
    res = res and self.read_modbus_block(block, 'holding')
  File "/config/custom_components/solax_modbus/__init__.py", line 469, in read_modbus_block
    val = decoder.decode_16bit_uint()
  File "/usr/local/lib/python3.10/site-packages/pymodbus/payload.py", line 402, in decode_16bit_uint
    return unpack(fstring, handle)[0]
struct.error: unpack requires a buffer of 2 bytes

and

Logger: custom_components.solax_modbus
Source: custom_components/solax_modbus/__init__.py:269
Integration: SolaX Inverter Modbus (documentation, issues)
First occurred: 06:15:46 (1080 occurrences)
Last logged: 09:34:49

SolaX: read failed at 0x8d: battery_type
SolaX: read failed at 0x8e: battery_charge_float_voltage
SolaX: read failed at 0x8f: battery_discharge_cut_off_voltage
SolaX: read failed at 0x90: battery_charge_max_current
SolaX: read failed at 0x91: battery_discharge_max_current

Additional context As you can see above, it says in the log that it is doing retries, but they are probably not implemented correctly, as the only way to fix the situation is to restart whole HA...

The reason why there are connection issues on the Modbus is that I have added second client reading few values from the inverter. Which brings me to the wiki which would need an improvement. This part is incorrect according to the Waveshare documentation:

In the unlikely case you have multiple inverters connected to one RS485 adapter, you may need to modify the multi-host flag (untested). The system will increase the gap between messages, which will slow down the communication throughput. Make sure your polling interval is long enough (the act led should not light up all the time).

The multi-host settings actually have nothing to do with connecting multiple devices on Modbus RTU side (i.e. two inverters). It is about connecting multiple clients on the Modbus TCP side (it should support up to 30 according to doc). For multiple TCP clients to work it needs to be switched from ""Simple Modbus TCP to RTU" to "Auto query storage" with both the "timeout" and "gap" configured, where for timeout doc state this:

Timeout is determined according to the length of the back and forth response command. If the sending command is N bytes and the response is M bytes, the recommended value is: "Packet interval" ×(N+M+5)+100

I have started with default of 224ms, but quickly found that Solax integration is transferring much more data. With 608ms it worked fine for last two days, but today it again got stuck, so I now increased it to 896ms. Hope it will fix it.

So to sum it up the error on Modbus is not caused by this integration, but once it encounters it it gets permanently stuck and can't recover from it. I tried also resting the Waveshare to make sure it is not returning some broken buffered response (as it is caching those), but that was not the case.

johny-mnemonic commented 1 year ago

It is interesting that when the connection fails on TCP level, it seems it can retry it just fine:

Logger: custom_components.solax_modbus
Source: custom_components/solax_modbus/__init__.py:317
Integration: SolaX Inverter Modbus (documentation, issues)
First occurred: 18:10:36 (3 occurrences)
Last logged: 18:52:46

Modbus Error: [Connection] ModbusTcpClient(X.X.X.X:502): Connection unexpectedly closed 0.000024 seconds into read of 8 bytes without response from unit before it closed connection: SolaX cannot read holding registers at device 1 position 0x0
Modbus Error: [Connection] ModbusTcpClient(X.X.X.X:502): Connection unexpectedly closed 0.000013 seconds into read of 8 bytes without response from unit before it closed connection: SolaX cannot read holding registers at device 1 position 0x0
Traceback (most recent call last):
  File "/config/custom_components/solax_modbus/__init__.py", line 453, in read_modbus_block
    else:              realtime_data = self.read_holding_registers(unit=self._modbus_addr, address=block.start, count=block.end - block.start)
  File "/config/custom_components/solax_modbus/__init__.py", line 317, in read_holding_registers
    return self._client.read_holding_registers(address, count, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/common.py", line 114, in read_holding_registers
    return self.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 109, in execute
    return self.transaction.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 174, in execute
    response, last_exception = self._transact(
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 294, in _transact
    result = self._recv(response_length, full)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 326, in _recv
    read_min = self.client.framer.recvPacket(min_size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/framer/__init__.py", line 49, in recvPacket
    return self.client.recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 89, in recv
    return self._recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 297, in _recv
    return self._handle_abrupt_socket_close(
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 342, in _handle_abrupt_socket_close
    raise ConnectionException(msg)
pymodbus.exceptions.ConnectionException: Modbus Error: [Connection] ModbusTcpClient(X.X.X.X:502): Connection unexpectedly closed 0.000024 seconds into read of 8 bytes without response from unit before it closed connection

This is what happens when you try to restart the Waveshare. So it looks like only when TCP works fine, but there is error on Modbus level, it can't recover. Maybe the solution would be to close the TCP connection even in case where there is no issue on TCP and there is only an error on Modbus level in case you are not doing so already.

infradom commented 1 year ago

Thanks for the feedback, I will have a look in more detail soon, but I fear that there are more things to consider when you connect 2 masters. Furthermore we recommend to keep the polling interval high, especially with 2 clients connected. Anyway, I will check the documentation and correct it when your observations are correct.

wills106 commented 1 year ago

@johny-mnemonic Are you still having issues with the Integration? Or did these get resolved in later versions?

johny-mnemonic commented 1 year ago

@wills106 it happens from time to time, yes.

As a dirty workaround I have created automation like this:

alias: "Solax not updated "
description: ""
trigger:
  - platform: time_pattern
    minutes: /5
condition:
  - condition: template
    value_template: >-
      {{ as_timestamp(now()) -
      as_timestamp(states.sensor.solax_house_load.last_updated) > 300 }}
action:
  - service: homeassistant.reload_config_entry
    data: {}
    target:
      device_id: MY_INTEGRATION_ID
  - service: notify.mobile_app_XXX
    data:
      message: "Solax Modbus is not updated "
mode: single

As the house load always changes this is not creating false positives and usually it fixes the situation. Not always though. Sometimes I have to restart HA. Fortunately that's really rare.

johny-mnemonic commented 10 months ago

Any chance this will get some attention? 🙏

mvystrcil commented 5 months ago

Same problem here. Unfortunately the issue happens like twice per day. Don't know if longer polling period can help.

Can't see any problem on network layer, seems to be stable enough - no ICMP packet lost.

wills106 / homeassistant-solax-modbus

[BUG] Retry on Modbus not working #294