kellerza / sunsynk

Deye/Sunsynk Inverter Python library and Home Assistant OS Addon
https://kellerza.github.io/sunsynk/
MIT License
206 stars 88 forks source link

Read Errors: USR-W630 #18

Closed Ivan-L closed 2 years ago

Ivan-L commented 2 years ago

Thanks for this great add-on! I am using it with Deye SUN-3.6/5K-SG03LP1-EU and a USR-630.

When I restart the add-on, it does get the first set of values from the USR-630, however shortly afterwards, an error occurs and the add-on goes into a bad state and does not recover. Below is an extract from the logs. Any idea what write function it is trying to call? I am only trying to read sensor data, not write anything.

Relevant config:

PORT: ''
PORT_ADDRESS: 10.0.1.66:8899
PROFILES: 0
SUNSYNK_ID: 'XXXXXX'
SENSORS:
  - battery_soc
  - grid_load
  - grid_ct_load
  - inverter_output
  - load_power
  - pv1_power
  - pv2_power
  - battery_power
  - battery_current
  - battery_voltage

Log:

2022-02-10 21:22:05,449 DEBUG   Inverter Output low=65521 high=0 value=-16W
2022-02-10 21:22:05,450 DEBUG   Load Power low=1596 high=0 value=1597W
2022-02-10 21:22:05,450 DEBUG   send: 0x0 0x5 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xb7 0x0 0x9
2022-02-10 21:22:05,451 DEBUG   Adding transaction 5
2022-02-10 21:22:06,136 DEBUG   recv: 0x0 0x5 0x0 0x0 0x0 0xe 0x1 0x3 0x12 0x13 0x85 0x0 0x51 0x6 0x34 0x0 0x0 0x0 0x2 0x0
2022-02-10 21:22:06,136 DEBUG   Processing: 0x0 0x5 0x0 0x0 0x0 0xe 0x1 0x3 0x12 0x13 0x85 0x0 0x51 0x6 0x34 0x0 0x0 0x0 0x2 0x0
2022-02-10 21:22:06,137 DEBUG   Factory Response[ReadHoldingRegistersResponse: 3]
2022-02-10 21:22:06,138 ERROR   unpack requires a buffer of 2 bytes
2022-02-10 21:22:06,138 ERROR   Fatal error: protocol.data_received() call failed.
handle_traceback: Handle created at (most recent call last):
  File "/usr/src/app/./run.py", line 236, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 629, in run_until_complete
    self.run_forever()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 596, in run_forever
    self._run_once()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1882, in _run_once
    handle._run()
  File "/usr/local/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 754, in _add_reader
    self._loop._add_reader(fd, callback, *args)
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 259, in _add_reader
    handle = events.Handle(callback, args, self, None)
protocol: <pymodbus.client.asynchronous.async_io.ModbusClientProtocol object at 0x7f2207c44f10>
transport: <_SelectorSocketTransport fd=8 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 870, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 204, in data_received
    self._dataReceived(data)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 151, in _dataReceived
    self.framer.processIncomingPacket(data, self._handleResponse, unit=unit)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/framer/socket_framer.py", line 153, in processIncomingPacket
    self._process(callback)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/framer/socket_framer.py", line 175, in _process
    raise ModbusIOException("Unable to decode request")
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] Unable to decode request
2022-02-10 21:22:06,142 DEBUG   Client disconnected from modbus server: Modbus Error: [Input/Output] Unable to decode request
2022-02-10 21:22:06,143 DEBUG   Getting transaction 5
2022-02-10 21:22:06,145 INFO    Protocol lost connection.
2022-02-10 21:22:06,151 ERROR   Read Error: Modbus Error: [Connection] Connection lost during request
2022-02-10 21:22:06,154 DEBUG   send: 0x0 0x6 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:06,155 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:07,166 DEBUG   send: 0x0 0x7 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:07,173 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:08,170 DEBUG   send: 0x0 0x8 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:08,178 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:09,174 DEBUG   send: 0x0 0x9 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:09,176 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:10,178 DEBUG   send: 0x0 0xa 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:10,182 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:11,182 DEBUG   send: 0x0 0xb 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:11,184 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:12,187 DEBUG   send: 0x0 0xc 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:12,187 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:13,193 DEBUG   send: 0x0 0xd 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:13,194 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:14,194 DEBUG   send: 0x0 0xe 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:14,195 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:15,198 DEBUG   send: 0x0 0xf 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:15,199 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-10 21:22:16,202 DEBUG   send: 0x0 0x10 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-10 21:22:16,206 ERROR   Read Error: 'NoneType' object has no attribute 'write'
kellerza commented 2 years ago

It seems like the TCP client disconnects and then the "read" fails here

pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] Unable to decode request
2022-02-10 21:22:06,142 DEBUG   Client disconnected from modbus server: Modbus Error: [Input/Output] Unable to decode request
2022-02-10 21:22:06,143 DEBUG   Getting transaction 5
2022-02-10 21:22:06,145 INFO    Protocol lost connection.
2022-02-10 21:22:06,151 ERROR   Read Error: Modbus Error: [Connection] Connection lost during request

As part of reading a register you write the address and that is likely where the error about "write" comes from.

With the previous versions I was not so lenient to catch errors and the watchdog would have restarted... so here, we should either try to srestart or let the application exit and the atchdog do the reconnect/restart

so add the following logic there

catch pymodbus.exceptions.ModbusIOException:
   // TCP: try to reconnect since it got a fairly serious error
catch AttributeError:
  // it is (still) disconnected!!

The other option would be to terminate the program after 3-4 failed read attempts, then the watchdog can restart. I'll look into this on monday evening!

If you can roll back to a previous version it might help in the interim

kellerza commented 2 years ago

Made a quick update with the ideas above, let me know

Ivan-L commented 2 years ago

Thanks for the quick update!

Looks like the add-on is killing itself now and the watchdog restarts it. These errors are consistently occurring after a restart though, far too often in my opinion.

2022-02-11 02:51:12,623 DEBUG   publish homeassistant/select/XXXX/system_mode_voltages/config
2022-02-11 02:51:12,625 DEBUG   PUBLISH 1R homeassistant/select/XXXX/system_mode_voltages/config, {"unique_id": “XXXX_system_mode_voltages", "device": {"identifiers": ["XXXX"], "manufacturer": "Sunsynk", "model": "Inverter XXXX", "name": "Sunsynk Inverter XXXX"}, "state_topic": "SUNSYNK/status/XXXX/system_mode_voltages", "name": " System Mode Voltages", "availability": [{"topic": "SUNSYNK/status/XXXX/availability"}], "command_topic": "SUNSYNK/status/XXXX/system_mode_voltages_set", "options": ["UPDATE"]}
2022-02-11 02:51:13,012 DEBUG   send: 0x0 0x2 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-11 02:51:13,015 DEBUG   Adding transaction 2
2022-02-11 02:51:13,283 DEBUG   recv: 0x0 0x2 0x0 0x0 0x0 0xe 0x1 0x3 0x14 0x1 0x42 0x1 0x42 0x0 0x0 0x1 0x42 0xff 0xf0 0x0
2022-02-11 02:51:13,284 DEBUG   Processing: 0x0 0x2 0x0 0x0 0x0 0xe 0x1 0x3 0x14 0x1 0x42 0x1 0x42 0x0 0x0 0x1 0x42 0xff 0xf0 0x0
2022-02-11 02:51:13,284 DEBUG   Factory Response[ReadHoldingRegistersResponse: 3]
2022-02-11 02:51:13,284 ERROR   unpack requires a buffer of 2 bytes
2022-02-11 02:51:13,295 ERROR   Fatal error: protocol.data_received() call failed.
handle_traceback: Handle created at (most recent call last):
  File "/usr/src/app/./run.py", line 250, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 629, in run_until_complete
    self.run_forever()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 596, in run_forever
    self._run_once()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1882, in _run_once
    handle._run()
  File "/usr/local/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 754, in _add_reader
    self._loop._add_reader(fd, callback, *args)
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 259, in _add_reader
    handle = events.Handle(callback, args, self, None)
protocol: <pymodbus.client.asynchronous.async_io.ModbusClientProtocol object at 0x7fb5eeee2910>
transport: <_SelectorSocketTransport fd=8 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 870, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 204, in data_received
    self._dataReceived(data)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 151, in _dataReceived
    self.framer.processIncomingPacket(data, self._handleResponse, unit=unit)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/framer/socket_framer.py", line 153, in processIncomingPacket
    self._process(callback)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/framer/socket_framer.py", line 175, in _process
    raise ModbusIOException("Unable to decode request")
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] Unable to decode request
2022-02-11 02:51:13,306 DEBUG   Client disconnected from modbus server: Modbus Error: [Input/Output] Unable to decode request
2022-02-11 02:51:13,306 DEBUG   Getting transaction 2
2022-02-11 02:51:13,307 INFO    Protocol lost connection.
2022-02-11 02:51:13,313 ERROR   Read Error: Modbus Error: [Connection] Connection lost during request
2022-02-11 02:51:13,313 INFO    Retrying individual sensors: ['Battery SOC:last', 'Grid load:step:mean', 'Grid CT load:step:mean', 'Inverter Output:step:mean', 'Load Power:step:mean', 'PV1 power:step:mean', 'PV2 power:step:mean', 'Battery power:step:mean', 'Battery current:step:mean', 'Battery voltage:step:mean']
2022-02-11 02:51:13,337 DEBUG   send: 0x0 0x3 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xb8 0x0 0x1
2022-02-11 02:51:13,342 ERROR   Read ErrorBattery SOC: 'NoneType' object has no attribute 'write'
2022-02-11 02:51:13,366 DEBUG   send: 0x0 0x4 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0x1
2022-02-11 02:51:13,366 ERROR   Read ErrorGrid load: 'NoneType' object has no attribute 'write'
2022-02-11 02:51:13,389 DEBUG   send: 0x0 0x5 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xac 0x0 0x1
2022-02-11 02:51:13,389 ERROR   Read ErrorGrid CT load: 'NoneType' object has no attribute 'write'
Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 155, in read
    await SUNSYNK.read(sensors)
  File "/usr/local/lib/python3.9/site-packages/sunsynk/sunsynk.py", line 100, in read
    r_r = await self.client.read_holding_registers(
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 34, in execute
    req = self._execute(request)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 141, in _execute
    self.write_transport(packet)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 131, in write_transport
    return self.transport.write(packet)
AttributeError: 'NoneType' object has no attribute 'write'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 250, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/src/app/./run.py", line 216, in main
    await read([f.sensor for f in SENSORS], retry_single=True)
  File "/usr/src/app/./run.py", line 174, in read
    await read([sen], msg=sen.name, retry_single=False)
  File "/usr/src/app/./run.py", line 168, in read
    raise Exception(f"Multiple Modbus read errors: {err}") from err
Exception: Multiple Modbus read errors: 'NoneType' object has no attribute 'write'
[cmd] ./run.py exited 1
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.

Another error I seem to get after a restart is the following:

2022-02-11 02:57:35,187 INFO    Loading HASS OS configuration
2022-02-11 02:57:35,189 INFO    Filter *last used for ['battery_soc']
2022-02-11 02:57:35,189 INFO    Filter *step used for ['grid_load', 'grid_ct_load', 'inverter_output', 'load_power', 'pv1_power', 'pv2_power', 'battery_power', 'battery_current', 'battery_voltage']
2022-02-11 02:57:35,190 DEBUG   Using selector: EpollSelector
2022-02-11 02:57:35,191 DEBUG   Connecting.
2022-02-11 02:57:35,211 DEBUG   Client connected to modbus server
2022-02-11 02:57:35,212 INFO    Protocol made connection.
2022-02-11 02:57:35,213 DEBUG   <asyncio.TransportSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('172.30.33.1', 47742), raddr=('10.0.1.66', 8899)> connected to 10.0.1.66:8899: (<_SelectorSocketTransport fd=8 read=polling write=<idle, bufsize=0>>, <pymodbus.client.asynchronous.async_io.ModbusClientProtocol object at 0x7f9c7a91dc10>)
2022-02-11 02:57:35,213 INFO    Connected to 10.0.1.66:8899.
2022-02-11 02:57:35,214 DEBUG   send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0x3 0x0 0x5
2022-02-11 02:57:35,215 DEBUG   Adding transaction 1
2022-02-11 02:57:35,726 DEBUG   recv: 0x0 0x1 0x0 0x0 0x0 0xe 0xf0 0x1 0x69 0x0 0x0 0x1 0x69 0x6c 0xd6 0x1 0x3 0xa 0x32 0x31
2022-02-11 02:57:35,730 DEBUG   Processing: 0x0 0x1 0x0 0x0 0x0 0xe 0xf0 0x1 0x69 0x0 0x0 0x1 0x69 0x6c 0xd6 0x1 0x3 0xa 0x32 0x31
2022-02-11 02:57:35,730 DEBUG   Factory Response[ReadCoilsResponse: 1]
2022-02-11 02:57:35,732 DEBUG   Getting transaction 1
2022-02-11 02:57:35,739 ERROR   Read Error: 'ReadCoilsResponse' object has no attribute 'registers'
2022-02-11 02:57:35,739 INFO    ############################################################
2022-02-11 02:57:35,739 INFO    No response on the Modbus interface, try checking the wiring to the Inverter, the USB-to-RS485 converter, etc
2022-02-11 02:57:35,740 INFO    ############################################################
2022-02-11 02:57:35,742 CRITICAL This Add-On will terminate in 30 seconds, use the Supervisor Watchdog to restart automatically.
kellerza commented 2 years ago

Will have to check why you still get the stack trace…

but if it finally restarts (after 3 errors) and cannot connect there might be an issue with you tcp-rtu bridge. What do you use there? Is it still alive?

lispwarez commented 2 years ago

Looking at the port number being used by any chance are you connecting to the solarman wifi device that plugs into the WiFi/RS232 port?

Ivan-L commented 2 years ago

So yes, I should have disclosed this part at the beginning.

I waited more than one month for the USR-630 to arrive from China. In the meantime, I did manage to set up polling the solarman wifi dongle directly using some python scripts sourced from here which I tweaked to have a run loop, etc. This has been working well, except for causing a bit less data to be uploaded to solarman - it seems the dongle isn't very powerful and polling it frequently causes certain load to occur either on it or the inverter or both (I am not sure). See below screenshot for the effect of the frequent polling (some missing data in solarmanpv online): DA3F8FB9-3F02-4FDD-9001-F832864C1D59_1_201_a Note that I do not care about the solarmanpv data online - what matters to me is near-realtime data in Home Assistant. As an example, this is what the data looks like using customised python scripts polling the dongle, publishing to an MQTT broker which then Home Assistant uses - very nice and frequent, except for an occasional flat line where timeouts occur: 4A45430E-03C8-4614-A6F6-0730AC52A963_1_201_a

Now that my USR-630 arrived, I would like to switch over to using it, this was the intention all along. I would also like to use this addon, and stop polling the solarman wifi dongle using the customised python scripts.

I did have both running at the same time last night:

  1. Custom python scripts running and polling the solarman wifi dongle, publishing to a MQTT broker used by my main Home Assistant Core installation running in Docker - this gives me near-realtime data because my polling is fairly aggressive (with exponential backoff implemented when a timeout occurs).
  2. @kellerza's awesome add-on running within a separate Home Assistant OS installation in a virtual machine, connecting to the USR-630, which is connected to the BMS RS485 port on the inverter.

It did occur to me that potentially 1 might interfere with 2, so I turned 1 off by just switching off the Docker container. This I did after my last post. I let only point 2 run for 3 hours with the watchdog restarting it where necessary. This was the result, notice how little data I got: 765EF784-2557-46F5-B7A6-C50CA4C020F5_1_201_a

So obviously a lot of timeouts and errors occur with this addon connecting to the USR-630. I get much better data at this stage using the custom python scripts polling the solarman wifi dongle. I would however like to change to this addon and the USR-630 as mentioned above, but don't want to switch yet until I get more stable and reliable data.

I am open to suggestions on how to go proceed, I thought of even physically disconnecting the solarman dongle whilst testing this addon with the USR-630 to see if the results are better.

lispwarez commented 2 years ago

Im not sure what the use of the USR-630 is but why dont you just use one of these I use one connected to a spare RasPi with Docker installed and mbusd. This works for me barring the random read issues but i get near realtime data without interruptions

Ivan-L commented 2 years ago

The USR-630 is meant to perform the same function as:

  1. The Raspberry Pi
  2. with the USB to RS485 Converter plugged into it
  3. and expose it via TCP over WiFi

All in one.

I may as a last resort source a Raspberry Pi + converter as you have suggested, but I thought the USR-630 should be able to do this easily since that is what it was purpose built for.

Ivan-L commented 2 years ago

So I've just tried the following now to rule out a few things.

I stopped my custom python scripts from polling the solarman WiFi dongle and disconnected the dongle entirely. I then disconnected the USR-630 from the inverter BMS RS485 port and connected the USR-630 to the RS232 port where the solarman WiFi dongle was connected to. I used the serial cable that came with the USR-630.

This rules out a few things:

  1. The custom python script polling the solarman WiFi dongle interfering with the separate USR-630 monitoring
  2. The solarman WiFi playing any role in the timeouts experienced from the add-on to USR-630
  3. The cable that I created which connects the USR-630 from its RS485 port to the inverter BMS RS485 port being faulty

Things have improved in terms of timeouts, but they are still not where they should be. Screenshot 2022-02-11 at 17 09 15

The error mentioned in the title of this issue still occurs:

2022-02-11 17:03:39,725 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/battery_soc, 100
2022-02-11 17:03:39,731 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/grid_load, 20
2022-02-11 17:03:39,737 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/grid_ct_load, 20
2022-02-11 17:03:39,746 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/inverter_output, 293
2022-02-11 17:03:39,751 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/load_power, 313
2022-02-11 17:03:39,761 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/pv1_power, 1
2022-02-11 17:03:39,775 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/pv2_power, 387
2022-02-11 17:03:39,787 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/battery_power, -7
2022-02-11 17:03:39,795 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/battery_current, 655.2
2022-02-11 17:03:39,799 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/battery_voltage, 51.68
2022-02-11 17:03:39,805 DEBUG   send: 0x0 0x4 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-11 17:03:39,808 DEBUG   Adding transaction 4
2022-02-11 17:03:39,888 DEBUG   recv: 0x0 0x4 0x0 0x0 0x0 0x17 0x1 0x3 0x14 0x0 0x14 0x0 0x14 0x0 0x0 0x0 0x14 0x1 0x25 0x0 0x0 0x1 0x25 0x1 0x39 0x0 0x0 0x1 0x39
2022-02-11 17:03:39,889 DEBUG   Processing: 0x0 0x4 0x0 0x0 0x0 0x17 0x1 0x3 0x14 0x0 0x14 0x0 0x14 0x0 0x0 0x0 0x14 0x1 0x25 0x0 0x0 0x1 0x25 0x1 0x39 0x0 0x0 0x1 0x39
2022-02-11 17:03:39,890 DEBUG   Factory Response[ReadHoldingRegistersResponse: 3]
2022-02-11 17:03:39,891 DEBUG   Getting transaction 4
2022-02-11 17:03:39,893 DEBUG   Request registers: [169, 172, 175, 178] glen=10. Response [20, 20, 0, 20, 293, 0, 293, 313, 0, 313] len=10. regs={169: 20, 170: 20, 171: 0, 172: 20, 173: 293, 174: 0, 175: 293, 176: 313, 177: 0, 178: 313}
2022-02-11 17:03:39,894 DEBUG   Grid load low=20 high=0 value=20W
2022-02-11 17:03:39,894 DEBUG   Grid CT load low=20 high=0 value=20W
2022-02-11 17:03:39,894 DEBUG   Inverter Output low=293 high=0 value=293W
2022-02-11 17:03:39,894 DEBUG   Load Power low=313 high=0 value=313W
2022-02-11 17:03:39,895 DEBUG   send: 0x0 0x5 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xb7 0x0 0x9
2022-02-11 17:03:39,897 DEBUG   Adding transaction 5
2022-02-11 17:03:39,959 DEBUG   recv: 0x0 0x5 0x0 0x0 0x0 0xe 0x1 0x3 0x12 0x14 0x30 0x0 0x64 0x1 0x41 0x0 0x1 0x1 0x83 0x0
2022-02-11 17:03:39,960 DEBUG   Processing: 0x0 0x5 0x0 0x0 0x0 0xe 0x1 0x3 0x12 0x14 0x30 0x0 0x64 0x1 0x41 0x0 0x1 0x1 0x83 0x0
2022-02-11 17:03:39,962 DEBUG   Factory Response[ReadHoldingRegistersResponse: 3]
2022-02-11 17:03:39,962 ERROR   unpack requires a buffer of 2 bytes
2022-02-11 17:03:39,963 ERROR   Fatal error: protocol.data_received() call failed.
handle_traceback: Handle created at (most recent call last):
  File "/usr/src/app/./run.py", line 250, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 629, in run_until_complete
    self.run_forever()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 596, in run_forever
    self._run_once()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1882, in _run_once
    handle._run()
  File "/usr/local/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 754, in _add_reader
    self._loop._add_reader(fd, callback, *args)
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 259, in _add_reader
    handle = events.Handle(callback, args, self, None)
protocol: <pymodbus.client.asynchronous.async_io.ModbusClientProtocol object at 0x7fc5fb818c10>
transport: <_SelectorSocketTransport fd=8 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 870, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 204, in data_received
    self._dataReceived(data)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 151, in _dataReceived
    self.framer.processIncomingPacket(data, self._handleResponse, unit=unit)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/framer/socket_framer.py", line 153, in processIncomingPacket
    self._process(callback)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/framer/socket_framer.py", line 175, in _process
    raise ModbusIOException("Unable to decode request")
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] Unable to decode request
2022-02-11 17:03:39,966 DEBUG   Client disconnected from modbus server: Modbus Error: [Input/Output] Unable to decode request
2022-02-11 17:03:39,966 DEBUG   Getting transaction 5
2022-02-11 17:03:39,968 INFO    Protocol lost connection.
2022-02-11 17:03:39,970 ERROR   Read Error: Modbus Error: [Connection] Connection lost during request
2022-02-11 17:03:40,808 DEBUG   send: 0x0 0x6 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-11 17:03:40,816 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-11 17:03:41,812 DEBUG   send: 0x0 0x7 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-11 17:03:41,814 ERROR   Read Error: 'NoneType' object has no attribute 'write'
2022-02-11 17:03:42,815 DEBUG   send: 0x0 0x8 0x0 0x0 0x0 0x6 0x1 0x3 0x0 0xa9 0x0 0xa
2022-02-11 17:03:42,815 ERROR   Read Error: 'NoneType' object has no attribute 'write'
Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 155, in read
    await SUNSYNK.read(sensors)
  File "/usr/local/lib/python3.9/site-packages/sunsynk/sunsynk.py", line 100, in read
    r_r = await self.client.read_holding_registers(
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 34, in execute
    req = self._execute(request)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 141, in _execute
    self.write_transport(packet)
  File "/usr/local/lib/python3.9/site-packages/pymodbus/client/asynchronous/async_io/__init__.py", line 131, in write_transport
    return self.transport.write(packet)
AttributeError: 'NoneType' object has no attribute 'write'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 250, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/src/app/./run.py", line 236, in main
    await polltask
  File "/usr/src/app/./run.py", line 228, in poll_sensors
    if await read([f.sensor for f in fsensors]):
  File "/usr/src/app/./run.py", line 168, in read
    raise Exception(f"Multiple Modbus read errors: {err}") from err
Exception: Multiple Modbus read errors: 'NoneType' object has no attribute 'write'
[cmd] ./run.py exited 1
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.
Ivan-L commented 2 years ago

Further tests - the timeouts seem to have been entirely eliminated after stopping of polling the solarman dongle, removing the dongle from the inverter and connecting the USR-W630 in its place via RS232.

I am running a modbus cli which allows me to read one or more registers via the command line. This tool used to get timeouts, but after executing it as fast as I could many times over and over via the command line, I haven't been able to get a single timeout when reading registers. It succeeds every time.

The add-on however keeps throwing the Exception: Multiple Modbus read errors: 'NoneType' object has no attribute 'write' error after a successful read.

I have tried many permutations of the USR-W630's configuration (Adaptive baud rate, UART AutoFrame, Registered Packet Type, Modbus Polling enable/disable) and they don't seem to make a difference.

kellerza commented 2 years ago

The biggest differences are that you use a tool with umodbus as library and the addon uses pymodbus. The one might retry a bit different…

However I’ve never seen this with mbusd as gateway, so something is probably unique with the W630. Mbusb claims to do advanced rety

the exception you see is simply the client that disconnected.

the addon does also request data much quicker than what you can execute a command line tool repeatedly.

For now all I can think of is trying to reconnect more aggressively (but the fact that the addon cant read the serial number after the watchdog restart indicates that the bridge is not available, so not sure if it will help)

kellerza commented 2 years ago

@Ivan-L you can try the (dev) version of the addon... should be visible after you "Check for updates" in the addon store

I've used the following two PORT_URLs:

For now it only reads registers, no writing yet

Ivan-L commented 2 years ago

Thank you @kellerza ! I really didn't expect you to swap out the entire modbus library used!

I managed to install the (dev) version of the addon, but it fails to start, maybe I am doing something silly?

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.
ifelse: fatal: unable to exec ./run.py: Permission denied
[cmd] ./run.py exited 126
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.
kellerza commented 2 years ago

run.py was not executable, please try again

Ivan-L commented 2 years ago

Thanks, that works! I will test this thoroughly. I actually re-did my HA installs last night because my main HA install was HA Core in Docker on my NAS, whereas I was testing this add-on in a completely separate virtual machine with HassOS. I now got HA Supervised to work on my NAS in Docker because the VM was very sluggish. I just have some final MQTT issues to resolve and will get to testing this straight after!

Ivan-L commented 2 years ago

Ok after testing many times with the Dev add-on, I can definitely say that the RS232 cable that came with the W630 yields much more stable results. I will need to evaluate the RS485 cable I made very carefully. I do get timeouts with the RS485 cable, but not with the RS232 cable. I do still get read errors with the latter though. The below info is all from testing the Dev add-on with the RS232 cable.

I have noticed that when i have less sensors configured, say 4, then I get successful reads almost all the time if not all the time. It appears that the more sensors I add, the more likely it is for me to get errors. See below for some samples.

2022-02-17 22:18:21,638 DEBUG   Request registers: [169, 172] glen=4. Response [459, 459, 0, 459] len=4. regs={169: 459, 170: 459, 171: 0, 172: 459}
2022-02-17 22:18:21,639 DEBUG   Grid load low=459 high=0 value=459W
2022-02-17 22:18:21,639 DEBUG   Grid CT load low=459 high=0 value=459W
2022-02-17 22:18:21,708 DEBUG   Request registers: [178] glen=1. Response [449] len=1. regs={178: 449}
2022-02-17 22:18:21,709 DEBUG   Load Power low=449 high=0 value=449W
2022-02-17 22:18:21,788 DEBUG   Request registers: [186, 187, 190, 191] glen=6. Response [0, 1, 0, 0, 7, 16] len=6. regs={186: 0, 187: 1, 188: 0, 189: 0, 190: 7, 191: 16}
2022-02-17 22:18:21,788 DEBUG   PV1 power low=0 high=0 value=0W
2022-02-17 22:18:21,789 DEBUG   PV2 power low=1 high=0 value=1W
2022-02-17 22:18:21,789 DEBUG   Battery power low=7 high=0 value=7W
2022-02-17 22:18:21,789 DEBUG   Battery current low=16 high=0 value=0.16A
2022-02-17 22:18:22,649 DEBUG   Request registers: [169, 172] glen=4. Response [459, 459, 0, 459] len=4. regs={169: 459, 170: 459, 171: 0, 172: 459}
2022-02-17 22:18:22,650 DEBUG   Grid load low=459 high=0 value=459W
2022-02-17 22:18:22,650 DEBUG   Grid CT load low=459 high=0 value=459W
2022-02-17 22:18:22,718 DEBUG   Request registers: [178] glen=1. Response [436] len=1. regs={178: 436}
2022-02-17 22:18:22,719 DEBUG   Load Power low=436 high=0 value=449W
2022-02-17 22:18:22,798 DEBUG   Request registers: [183, 186, 187, 190, 191] glen=9. Response [4993, 86, 437, 0, 1, 0, 0, 0, 1280] len=9. regs={183: 4993, 184: 86, 185: 437, 186: 0, 187: 1, 188: 0, 189: 0, 190: 0, 191: 1280}
2022-02-17 22:18:22,798 DEBUG   PV1 power low=0 high=0 value=0W
2022-02-17 22:18:22,799 DEBUG   PV2 power low=1 high=0 value=1W
2022-02-17 22:18:22,799 DEBUG   Battery power low=0 high=0 value=7W
2022-02-17 22:18:22,799 DEBUG   Battery current low=1280 high=0 value=0.16A
2022-02-17 22:18:22,799 DEBUG   Battery voltage low=4993 high=0 value=49.94V
2022-02-17 22:18:23,648 ERROR   Read Error: 0
2022-02-17 22:18:24,578 ERROR   Read Error: 0
2022-02-17 22:18:25,580 ERROR   Read Error: 
2022-02-17 22:18:26,582 ERROR   Read Error: 0
Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 155, in read
    await SUNSYNK.read(sensors)
  File "/usr/src/app/usunsynk.py", line 32, in read
    r_r = await self.client.read_holding_registers(
  File "/usr/src/app/async_modbus.py", line 180, in read_holding_registers
    return await self._send_message(request)
  File "/usr/src/app/async_modbus.py", line 141, in _send_message
    return await self.protocol._async_send_message(
  File "/usr/src/app/async_modbus.py", line 42, in send_message_tcp
    tcp.raise_for_exception_adu(response_error_adu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/client/tcp.py", line 247, in raise_for_exception_adu
    pdu_to_function_code_or_raise_error(resp_pdu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/functions.py", line 118, in pdu_to_function_code_or_raise_error
    raise error_code_to_exception_map[error_code]
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 250, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/src/app/./run.py", line 236, in main
    await polltask
  File "/usr/src/app/./run.py", line 228, in poll_sensors
    if await read([f.sensor for f in fsensors]):
  File "/usr/src/app/./run.py", line 168, in read
    raise Exception(f"Multiple Modbus read errors: {err}") from err
Exception: Multiple Modbus read errors: 0
[cmd] ./run.py exited 1
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.

More random errors

2022-02-17 22:23:36,585 DEBUG   Request registers: [169, 172] glen=4. Response [435, 435, 0, 435] len=4. regs={169: 435, 170: 435, 171: 0, 172: 435}
2022-02-17 22:23:36,585 DEBUG   Grid load low=435 high=0 value=435W
2022-02-17 22:23:36,586 DEBUG   Grid CT load low=435 high=0 value=435W
2022-02-17 22:23:36,645 DEBUG   Request registers: [178] glen=1. Response [425] len=1. regs={178: 425}
2022-02-17 22:23:36,645 DEBUG   Load Power low=425 high=0 value=425W
2022-02-17 22:23:36,724 DEBUG   Request registers: [183, 186, 187, 190, 191] glen=9. Response [4993, 86, 425, 0, 1, 0, 0, 0, 1280] len=9. regs={183: 4993, 184: 86, 185: 425, 186: 0, 187: 1, 188: 0, 189: 0, 190: 0, 191: 1280}
2022-02-17 22:23:36,725 DEBUG   PV1 power low=0 high=0 value=0W
2022-02-17 22:23:36,725 DEBUG   PV2 power low=1 high=0 value=1W
2022-02-17 22:23:36,725 DEBUG   Battery power low=0 high=0 value=7W
2022-02-17 22:23:36,725 DEBUG   Battery current low=1280 high=0 value=0.16A
2022-02-17 22:23:36,725 DEBUG   Battery voltage low=4993 high=0 value=49.94V
2022-02-17 22:23:36,726 DEBUG   Battery voltage:mean over 6 samples = 49.93333333333333
2022-02-17 22:23:36,726 DEBUG   PUBLISH 0 SUNSYNK/status/XXXX/battery_voltage, 49.93
2022-02-17 22:23:37,586 ERROR   Read Error: 0
2022-02-17 22:23:38,518 ERROR   Read Error: 0
2022-02-17 22:23:39,521 ERROR   Read Error: 
2022-02-17 22:23:40,523 ERROR   Read Error: 0
Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 155, in read
    await SUNSYNK.read(sensors)
  File "/usr/src/app/usunsynk.py", line 32, in read
    r_r = await self.client.read_holding_registers(
  File "/usr/src/app/async_modbus.py", line 180, in read_holding_registers
    return await self._send_message(request)
  File "/usr/src/app/async_modbus.py", line 141, in _send_message
    return await self.protocol._async_send_message(
  File "/usr/src/app/async_modbus.py", line 42, in send_message_tcp
    tcp.raise_for_exception_adu(response_error_adu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/client/tcp.py", line 247, in raise_for_exception_adu
    pdu_to_function_code_or_raise_error(resp_pdu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/functions.py", line 118, in pdu_to_function_code_or_raise_error
    raise error_code_to_exception_map[error_code]
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 250, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/src/app/./run.py", line 236, in main
    await polltask
  File "/usr/src/app/./run.py", line 228, in poll_sensors
    if await read([f.sensor for f in fsensors]):
  File "/usr/src/app/./run.py", line 168, in read
    raise Exception(f"Multiple Modbus read errors: {err}") from err
Exception: Multiple Modbus read errors: 0
[cmd] ./run.py exited 1
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.

The what is common is the following:

ERROR   Read Error: 0
ERROR   Read Error: 0
ERROR   Read Error: 
ERROR   Read Error: 0

When this happens, then all sensors become unavailable until the add-on eventually recovers and the sensors are back. The above samples are with the following sensors configured:

  - battery_soc
  - grid_load
  - grid_ct_load
  - load_power
  - pv1_power
  - pv2_power
  - battery_power
  - battery_current
  - battery_voltage

If however I add another sensor, - inverter_output after - grid_ct_load, I get the following random errors after a few successful reads:

<snip>
2022-02-17 22:36:12,914 ERROR   Read Error: 0
2022-02-17 22:36:13,857 ERROR   Read Error: 0
2022-02-17 22:36:14,859 ERROR   Read Error: 0
2022-02-17 22:36:15,862 ERROR   Read Error: Function code is not an allowable action for the server.
<snip>
2022-02-17 22:37:26,909 ERROR   Read Error: 255
2022-02-17 22:37:27,832 ERROR   Read Error: 94
2022-02-17 22:37:28,835 ERROR   Read Error: 0
2022-02-17 22:38:34,531 DEBUG   Request registers: [169, 172, 175, 178] glen=10. Response [457, 457, 0, 457, 65523, 0, 0, 0, 2035, 444] len=10. regs={169: 457, 170: 457, 171: 0, 172: 457, 173: 65523, 174: 0, 175: 0, 176: 0, 177: 2035, 178: 444}
2022-02-17 22:38:34,531 DEBUG   Grid load low=457 high=0 value=NoneW
2022-02-17 22:38:34,531 DEBUG   Grid CT load low=457 high=0 value=NoneW
2022-02-17 22:38:34,532 DEBUG   Inverter Output low=0 high=0 value=NoneW
2022-02-17 22:38:34,532 DEBUG   Load Power low=444 high=0 value=NoneW
2022-02-17 22:38:34,611 ERROR   Read Error: 0
2022-02-17 22:38:34,611 INFO    Retrying individual sensors: ['Battery SOC:last', 'Grid load:step:mean', 'Grid CT load:step:mean', 'Inverter Output:step:mean', 'Load Power:step:mean', 'PV1 power:step:mean', 'PV2 power:step:mean', 'Battery power:step:mean', 'Battery current:step:mean', 'Battery voltage:mean']
2022-02-17 22:38:34,633 ERROR   Read ErrorBattery SOC: Function code is not an allowable action for the server.
2022-02-17 22:38:34,655 ERROR   Read ErrorGrid load: 0
2022-02-17 22:38:34,742 ERROR   Read ErrorGrid CT load: 0

If I take away - inverter_output, the errors are less frequent and consistently:

<snip>
2022-02-17 22:41:23,309 ERROR   Read Error: 0
2022-02-17 22:41:24,246 ERROR   Read Error: 0
2022-02-17 22:41:25,248 ERROR   Read Error: 
2022-02-17 22:41:26,250 ERROR   Read Error: 0

I am starting to think that reading certain registers makes things fail more randomly with random errors and conversely not reading certain registers makes the failures less frequent and more consistent. The less registers are read, the less errors there are.

More to follow. Looks like I will be trying out many permutations until I can spot a pattern.

Ivan-L commented 2 years ago

So just to confirm, the new modbus library being used appears to work much better despite the read errors that occur.

I have switched over totally to the Dev version of the add-on. Below you can see that the sensors become unavailable far less. Unavailability due to read errors and the restarting of the add-on are indicated by blank gaps in the sensor lines. The geyser sensor is not from the inverter. image

So things are definitely looking better and I will use this version for my day to day use, though I still need to figure out what causes the occasional read errors. I have ruled out the cable since I am still using the pre-moulded RS232 cable that came with the W630. I have also ruled out the solarman dongle interfering because it is unplugged and the W630 is plugged in its place. The firmware of the Deye inverter has recently been upgraded by Deye themselves. So either it's the W630 causing the occasional read errors, or it's the inverter. I guess the only way to tell is to procure a Raspberry Pi and RS485/RS232 converter, use that instead of the W630, and observe the results.

For what it's worth, here are the final sensors that I have configured:

SENSORS:
  - battery_soc
  - grid_load
  - grid_voltage
  - load_power
  - pv1_power
  - pv2_power
  - battery_power
  - battery_current
  - battery_voltage
  - day_grid_import
  - day_grid_export
  - day_pv_energy
  - day_battery_charge
  - day_battery_discharge

The occasional error which occurs:

2022-02-18 06:54:45,334 ERROR   Read Error: 0
2022-02-18 06:54:46,324 ERROR   Read Error: 0
2022-02-18 06:54:47,334 ERROR   Read Error: 139
2022-02-18 06:54:48,344 ERROR   Read Error: unpack requires a buffer of 17 bytes
Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 155, in read
    await SUNSYNK.read(sensors)
  File "/usr/src/app/usunsynk.py", line 32, in read
    r_r = await self.client.read_holding_registers(
  File "/usr/src/app/async_modbus.py", line 172, in read_holding_registers
    return await self._send_message(request)
  File "/usr/src/app/async_modbus.py", line 133, in _send_message
    return await self.protocol._async_send_message(
  File "/usr/src/app/async_modbus.py", line 43, in send_message_tcp
    return tcp.parse_response_adu(response_error_adu + response_remainder, adu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/client/tcp.py", line 235, in parse_response_adu
    function = create_function_from_response_pdu(resp_pdu, req_adu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/functions.py", line 138, in create_function_from_response_pdu
    return function.create_from_response_pdu(resp_pdu, req_pdu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/functions.py", line 550, in create_from_response_pdu
    bytes_ = struct.unpack(fmt, resp_pdu[2:])
struct.error: unpack requires a buffer of 17 bytes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 250, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/src/app/./run.py", line 236, in main
    await polltask
  File "/usr/src/app/./run.py", line 228, in poll_sensors
    if await read([f.sensor for f in fsensors]):
  File "/usr/src/app/./run.py", line 168, in read
    raise Exception(f"Multiple Modbus read errors: {err}") from err
Exception: Multiple Modbus read errors: unpack requires a buffer of 17 bytes
[cmd] ./run.py exited 1
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.

The 17 bytes referred to in the log above vary, sometimes it's 1 byte, sometimes it's more.

@kellerza I just wanted to thank you for the awesome add-on as well as your responsiveness and willingness to help. I'll keep pursuing this issue, trying to get it resolved, but if you want to close the Github issue so that it does not remain open for an extended period of time, it's fine with me.

Ivan-L commented 2 years ago

@kellerza I saw you put up a PR with some extra error handling and in the meantime I have set up a local version of this plugin so I can diagnose things better. I cherry picked your error handling and added my own verbose output.

I am learning to speak Modbus over TCP now, but it's daunting 😄 I am documenting this here for my own purposes as well.

Modbus ADU and MBAP (source).

The Application Data Unit (ADU) for Modbus messages carried over a TCP/IP are build out of two components: a MBAP header and a PDU. The MBAP header is 7 bytes long

The MBAP header contains the following fields: Field Length (bytes) Description
Transaction identifier 2 Identification of a Modbus request/response transaction.
Protocol identifier 2 Protocol ID, is 0 for Modbus.
Length 2 Number of following bytes
Unit identifier 1 Identification of a remote slave

Read Holding Registers (source)

The payload for the ReadHoldingRegisters function is as follows.

Implement Modbus function code 03. The request PDU with function code 03 must be 5 bytes:

Field Length (bytes)
Function code 1
Starting address 2
Quantity 2

The reponse PDU varies in length, depending on the request.

Field Length (bytes)
Function code 1
Byte count 1
Register values Quantity * 2

Analysis

With the above in mind, here is an extract from my logs with the extra verbose output when no error occurs:

2022-02-20 20:55:55,899 DEBUG   send_message_tcp adu: b'u+\x00\x00\x00\x06\x01\x03\x00\xba\x00\x06'
2022-02-20 20:55:55,968 DEBUG     response_error_adu: b'u+\x00\x00\x00\x0f\x01\x03\x0c'
2022-02-20 20:55:55,968 DEBUG     response_error_adu[7:]: b'\x03\x0c'
2022-02-20 20:55:55,968 DEBUG     response_remainder: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07\x00\x0f'
2022-02-20 20:55:55,969 DEBUG   Request registers: [186, 187, 190, 191] glen=6. Response [0, 0, 0, 0, 7, 15] len=6. regs={186: 0, 187: 0, 188: 0, 189: 0, 190: 7, 191: 15}
2022-02-20 20:55:55,969 DEBUG   PV1 power low=0 high=0 value=0W
2022-02-20 20:55:55,969 DEBUG   PV2 power low=0 high=0 value=0W
2022-02-20 20:55:55,969 DEBUG   Battery power low=7 high=0 value=7W
2022-02-20 20:55:55,969 DEBUG   Battery current low=15 high=0 value=0.15A

Here is my interpretation of the above.

Addon says: b'u+\x00\x00\x00\x06\x01\x03\x00\xba\x00\x06', which is:

Transaction Id: 29995 Protocol: 0 Length: 6 Unit id: 1 Function code: 3 Start address: 186 Quantity: 6 (note here that only 4 registers were requested by the add-on, but they were not contiguous, so 6 satisfies the query)

The inverter responds: b'u+\x00\x00\x00\x0f\x01\x03\x0c' (first part ADU) and b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07\x00\x0f' (remainder), which is:

Transaction Id: 29995 Protocol: 0 Length: 15 Unit id: 1 Function code: 3 Byte count: 12 Values: (0, 0, 0, 0, 7, 15)

The above all looks great for successful reads.

By the way, I used the following script to decode the above:

import struct

request = b'u+\x00\x00\x00\x06\x01\x03\x00\xba\x00\x06'
response = b'u+\x00\x00\x00\x0f\x01\x03\x0c'
remainder = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07\x00\x0f'

print('request:')
request_mbap = request[:7]
request_pdu = request[7:]
print(struct.unpack('>HHHB', request_mbap))
print(struct.unpack('>BHH', request_pdu))

print('response:')
response_mbap = response[:7]
response_pdu = response[7:]
print(struct.unpack('>HHHB', response_mbap))
print(struct.unpack('>BB', response_pdu[:2]))
response_func_code, response_byte_count = struct.unpack('>BB', response_pdu[:2])

print('remainder:')
unpack_str = '>' + 'H' * int(response_byte_count / 2)
values = struct.unpack(unpack_str, remainder)
print(values)

Now for the unsuccessful reads.

Ivan-L commented 2 years ago

Now here is an extract from the logs where things failed.

2022-02-20 21:59:43,239 DEBUG   send_message_tcp adu (transaction ID b'\x92-'): b'\x92-\x00\x00\x00\x06\x01\x03\x00\xb7\x00\t'
2022-02-20 21:59:43,298 DEBUG     response_error_adu: b'\x92-\x00\x00\x00\x0e\x01\x03\x12'
2022-02-20 21:59:43,299 DEBUG     response_error_adu[7:]: b'\x03\x12'
2022-02-20 21:59:43,318 DEBUG     response_remainder: b'\x13P\x00U\x02&\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00'
2022-02-20 21:59:43,320 DEBUG   Request registers: [183, 186, 187, 190, 191] glen=9. Response [4944, 85, 550, 0, 0, 0, 0, 0, 1280] len=9. regs={183: 4944, 184: 85, 185: 550, 186: 0, 187: 0, 188: 0, 189: 0, 190: 0, 191: 1280}
2022-02-20 21:59:43,321 DEBUG   PV1 power low=0 high=0 value=0W
2022-02-20 21:59:43,321 DEBUG   PV2 power low=0 high=0 value=0W
2022-02-20 21:59:43,321 DEBUG   Battery power low=0 high=0 value=8W
2022-02-20 21:59:43,321 DEBUG   Battery current low=1280 high=0 value=0.17A
2022-02-20 21:59:43,321 DEBUG   Battery voltage low=4944 high=0 value=49.44V
2022-02-20 21:59:44,056 DEBUG   send_message_tcp adu (transaction ID b'I\xe7'): b'I\xe7\x00\x00\x00\x06\x01\x03\x00\xa9\x00\x01'
2022-02-20 21:59:44,118 DEBUG     response_error_adu: b'\x00\x08\x00\x11I\xe7\x00\x00\x00'
2022-02-20 21:59:44,119 DEBUG     response_error_adu[7:]: b'\x00\x00'
2022-02-20 21:59:44,119 ERROR   Read Error: Exception('(1,169,1) 0')
2022-02-20 21:59:45,058 DEBUG   send_message_tcp adu (transaction ID b'\xed\xba'): b'\xed\xba\x00\x00\x00\x06\x01\x03\x00\xa9\x00\x01'
2022-02-20 21:59:45,119 DEBUG     response_error_adu: b'\x05\x01\x03\x02\x026\xed\xba\x00'
2022-02-20 21:59:45,120 DEBUG     response_error_adu[7:]: b'\xba\x00'
2022-02-20 21:59:45,120 ERROR   Read Error: Exception('(1,169,1) 0')
2022-02-20 21:59:46,061 DEBUG   send_message_tcp adu (transaction ID b'\xb4\x91'): b'\xb4\x91\x00\x00\x00\x06\x01\x03\x00\xa9\x00\x01'
2022-02-20 21:59:46,119 DEBUG     response_error_adu: b'\x00\x00\x05\x01\x03\x02\x026\xb4'
2022-02-20 21:59:46,119 DEBUG     response_error_adu[7:]: b'6\xb4'
2022-02-20 21:59:46,119 ERROR   Read Error: Exception('(1,169,1) 180')
2022-02-20 21:59:47,064 DEBUG   send_message_tcp adu (transaction ID b'h\xf5'): b'h\xf5\x00\x00\x00\x06\x01\x03\x00\xa9\x00\x01'
2022-02-20 21:59:47,064 DEBUG     response_error_adu: b'\x91\x00\x00\x00\x05\x01\x03\x02\x02'
2022-02-20 21:59:47,064 DEBUG     response_error_adu[7:]: b'\x02\x02'
2022-02-20 21:59:47,128 DEBUG     response_remainder: b'6h'
2022-02-20 21:59:47,129 WARNING Did not complete read, only read 13/1
2022-02-20 21:59:47,129 DEBUG   Request registers: [169] glen=1. Response [0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1] len=13. regs={169: 0, 170: 1, 171: 1, 172: 0, 173: 1, 174: 1, 175: 0, 176: 0, 177: 0, 178: 1, 179: 0, 180: 1, 181: 1}
2022-02-20 21:59:47,129 DEBUG   Grid load low=0 high=0 value=566W
2022-02-20 21:59:47,129 DEBUG   Load Power low=1 high=0 value=548W
2022-02-20 21:59:47,130 DEBUG   send_message_tcp adu (transaction ID b'\x84Y'): b'\x84Y\x00\x00\x00\x06\x01\x03\x00\xb2\x00\x01'
2022-02-20 21:59:47,130 DEBUG     response_error_adu: b'\xf5\x00\x00\x00\x05\x01\x03\x02\x02'
2022-02-20 21:59:47,130 DEBUG     response_error_adu[7:]: b'\x02\x02'
2022-02-20 21:59:47,198 DEBUG     response_remainder: b';\x84'
2022-02-20 21:59:47,199 WARNING Did not complete read, only read 14/1
2022-02-20 21:59:47,199 DEBUG   Request registers: [178] glen=1. Response [1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1] len=14. regs={178: 1, 179: 1, 180: 0, 181: 1, 182: 1, 183: 1, 184: 0, 185: 0, 186: 1, 187: 0, 188: 0, 189: 0, 190: 0, 191: 1}
2022-02-20 21:59:47,199 DEBUG   Load Power low=1 high=0 value=1W
2022-02-20 21:59:47,199 DEBUG   PV1 power low=1 high=0 value=0W
2022-02-20 21:59:47,200 DEBUG   PV2 power low=0 high=0 value=0W
2022-02-20 21:59:47,200 DEBUG   Battery power low=0 high=0 value=0W
2022-02-20 21:59:47,200 DEBUG   Battery current low=1 high=0 value=12.8A
2022-02-20 21:59:47,200 DEBUG   send_message_tcp adu (transaction ID b'\x00Z'): b'\x00Z\x00\x00\x00\x06\x01\x03\x00\xba\x00\x06'
2022-02-20 21:59:47,200 DEBUG     response_error_adu: b'Y\x00\x00\x00\x05\x01\x03\x02\x02'
2022-02-20 21:59:47,201 DEBUG     response_error_adu[7:]: b'\x02\x02'
2022-02-20 21:59:47,279 DEBUG     response_remainder: b'(\x00Z\x00\x00\x00\x0f\x01\x03\x0c\x00\x00'
2022-02-20 21:59:47,279 ERROR   Read Error: Exception('(1,186,6) unpack requires a buffer of 2 bytes')
Traceback (most recent call last):
  File "/usr/src/app/usunsynk.py", line 34, in read
    r_r = await self.client.read_holding_registers(
  File "/usr/src/app/async_modbus.py", line 182, in read_holding_registers
    return await self._send_message(request)
  File "/usr/src/app/async_modbus.py", line 143, in _send_message
    return await self.protocol._async_send_message(
  File "/usr/src/app/async_modbus.py", line 53, in send_message_tcp
    return tcp.parse_response_adu(response_error_adu + response_remainder, adu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/client/tcp.py", line 235, in parse_response_adu
    function = create_function_from_response_pdu(resp_pdu, req_adu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/functions.py", line 138, in create_function_from_response_pdu
    return function.create_from_response_pdu(resp_pdu, req_pdu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/functions.py", line 550, in create_from_response_pdu
    bytes_ = struct.unpack(fmt, resp_pdu[2:])
struct.error: unpack requires a buffer of 2 bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 155, in read
    await SUNSYNK.read(sensors)
  File "/usr/src/app/usunsynk.py", line 38, in read
    raise Exception(f"({self.unit_id},{grp[0]},{glen}) {err}")
Exception: (1,186,6) unpack requires a buffer of 2 bytes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 250, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/src/app/./run.py", line 236, in main
    await polltask
  File "/usr/src/app/./run.py", line 228, in poll_sensors
    if await read([f.sensor for f in fsensors]):
  File "/usr/src/app/./run.py", line 168, in read
    raise Exception(f"Multiple Modbus read errors: {err}") from err
Exception: Multiple Modbus read errors: (1,186,6) unpack requires a buffer of 2 bytes
[cmd] ./run.py exited 1
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.

Note the first read is successful, I included it for comparison purposes.

request:
(37421, 0, 6, 1)
(3, 183, 9)
response:
(37421, 0, 14, 1)
(3, 18)
remainder:
(4944, 85, 550, 0, 0, 0, 0, 0, 1280)

The second read failed.

request:
(18919, 0, 6, 1)
(3, 169, 1)
response:
(8, 17, 18919, 0)

Whoa, the response transaction ID does not match the request transaction ID. Furthermore, the response unit ID is 0. Finally, the protocol id is 17, not 0! This is the cause of the issues. Now I need to find out how we end up in this state. The other possibility is that there are an extra 4 bytes which shouldn't be there in the response.

Ivan-L commented 2 years ago

I got it!

When the add-on reads registers [183, 186, 187, 190, 191], the subsequent read consistently fails and causes the above error. When only reading registers [186, 187, 190, 191], subsequent reads succeed. The add-on sometimes picks the one query over the other, I am not certain what the criteria is.

Register 185 is undefined in the Sunsynk Modbus docs. The add-on reads all registers in a non-contiguous range provided that the gap is not greater than 3. My strong submission is that reading register 185 causes an extra leading 4 bytes to be present with the next read, which causes that read to be invalid. The add-on does read that register in the case of the first query above.

The solution? I forced the add-on to only read contiguous registers by setting allow_gap = 1 when uSunsynk calls groupSensors here.

async def read(self, sensors: Sequence[Sensor]) -> None:
        """Read a list of sensors."""
        for grp in group_sensors(sensors, allow_gap = 1):
            glen = grp[-1] - grp[0] + 1

Therefore if the add-on needs to read registers [183, 186, 187, 190, 191] it will read them in 3 batches; [183], [186, 187] and [190, 191]. This does not appear to introduce any noticeable performance penalty, it in fact appears to have solved all of my problems!

Look at these beautiful solid lines without any gaps - the add-on has not restarted at all because the error has not occurred even once in the last 30 minutes!!

image

@kellerza I am not sure whether you would like to make the above change (allow_gap = 1) permanent for the add-on - my suggestion would be yes because that would avoid unnecessarily reading registers which users have not configured and more importantly it will avoid reading registers which cause side effects as described above.

kellerza commented 2 years ago

Seems like you nailed it @Ivan-L !!

The intention of allow_gap was to get a balance between the protocol overheads and the data read. If there are unintended consequences when reading undefined registers, like 185 above, we really should not be doing it!

Since there are filters involved we do not need to read everything every second (an ever increasing kwh measurement is a good example, its safe to skip a few) and the amount of registers that needs to be read/output of group_sensors might change...

kellerza commented 2 years ago

You can try the latest dev version

biggest changes (apart from allow_gap=1)

SENSORS:
  - essential_power
  - non_essential_power
PROFILES:
  - system_mode
MODBUS_SERVER_ID: 1
Ivan-L commented 2 years ago

Thanks @kellerza !

Not sure if I did something wrong here though, but the latest version crashes:

2022-02-22 06:50:45,579 INFO    MQTT: Connecting to sunsynk@core-mosquitto:1883
2022-02-22 06:50:45,617 INFO    MQTT: Connection successful
2022-02-22 06:50:46,156 WARNING Removing HASS MQTT discovery info homeassistant/select/XXXXXX/system_mode_voltages/config
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3452, in _thread_main
    self.loop_forever(retry_first_connection=True)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1779, in loop_forever
    rc = self.loop(timeout, max_packets)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1181, in loop
    rc = self.loop_read(max_packets)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1572, in loop_read
    rc = self._packet_read()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 2310, in _packet_read
    rc = self._packet_handle()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 2936, in _packet_handle
    return self._handle_publish()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3216, in _handle_publish
    self._handle_on_message(message)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3444, in _handle_on_message
    self.on_message(self, self._userdata, message)
  File "/usr/src/app/mqtt.py", line 247, in __on_message
    _loop.create_task(self.publish(topic, None, retain=True))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 433, in create_task
    task = tasks.Task(coro, loop=self, name=name)
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 748, in call_soon
    self._check_thread()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 785, in _check_thread
    raise RuntimeError(
RuntimeError: Non-thread-safe operation invoked on an event loop other than the current one
2022-02-22 06:50:46,201 ERROR   Task was destroyed but it is pending!
source_traceback: Object created at (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 912, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3452, in _thread_main
    self.loop_forever(retry_first_connection=True)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1779, in loop_forever
    rc = self.loop(timeout, max_packets)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1181, in loop
    rc = self.loop_read(max_packets)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1572, in loop_read
    rc = self._packet_read()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 2310, in _packet_read
    rc = self._packet_handle()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 2936, in _packet_handle
    return self._handle_publish()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3216, in _handle_publish
    self._handle_on_message(message)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3444, in _handle_on_message
    self.on_message(self, self._userdata, message)
  File "/usr/src/app/mqtt.py", line 247, in __on_message
    _loop.create_task(self.publish(topic, None, retain=True))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 433, in create_task
    task = tasks.Task(coro, loop=self, name=name)
task: <Task pending name='Task-3' coro=<MQTTClient.publish() running at /usr/src/app/mqtt.py:187> created at /usr/local/lib/python3.9/asyncio/base_events.py:433>
/usr/local/lib/python3.9/threading.py:956: RuntimeWarning: coroutine 'MQTTClient.publish' was never awaited
  self._invoke_excepthook(self)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
  File "/usr/src/app/filter.py", line 113, in tick
    self.active = [self.list[self.idx]]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 252, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/src/app/./run.py", line 238, in main
    await polltask
  File "/usr/src/app/./run.py", line 224, in poll_sensors
    RROBIN.tick()
  File "/usr/src/app/filter.py", line 116, in tick
    self.active = [self.list[0]]
IndexError: list index out of range
[cmd] ./run.py exited 1
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.
kellerza commented 2 years ago

For now add at least one sensor with round_robin filter

try battery_soc or prog1_time

Ivan-L commented 2 years ago

I already had battery_soc. It looks like there is a failure removing old (removed) sensor discovery info. I worked around this by deleting the device created by the add-on and having it recreated. Now I'm only left with this error:

2022-02-22 08:59:13,370 INFO    MQTT: Connecting to sunsynk@core-mosquitto:1883
2022-02-22 08:59:13,397 INFO    MQTT: Connection successful
Traceback (most recent call last):
  File "/usr/src/app/filter.py", line 113, in tick
    self.active = [self.list[self.idx]]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 252, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/src/app/./run.py", line 238, in main
    await polltask
  File "/usr/src/app/./run.py", line 224, in poll_sensors
    RROBIN.tick()
  File "/usr/src/app/filter.py", line 116, in tick
    self.active = [self.list[0]]
IndexError: list index out of range
[cmd] ./run.py exited 1
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.

Config:

SENSORS:
  - battery_soc
  - grid_power
  - grid_voltage
  - load_power
  - essential_power
  - non_essential_power
  - pv1_power
  - pv2_power
  - battery_power
  - battery_current
  - battery_voltage
  - day_grid_import
  - day_grid_export
  - day_pv_energy
  - day_battery_charge
  - day_battery_discharge
PROFILES:
  - system_mode
  - system_mode_voltages
kellerza commented 2 years ago

Sensor adding/removing should be working much better in this release as well

I updated it, but serial / fault or prog* are round_robin filters by default

Ivan-L commented 2 years ago

Thanks, I get further but now I get the following error. In hindsight I probably should have created a separate issue for these new errors.

2022-02-22 09:40:36,034 INFO    Profile System Mode action UPDATE
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/sunsynk/usunsynk.py", line 49, in read
    r_r = await self.client.read_holding_registers(
  File "/usr/local/lib/python3.9/site-packages/async_modbus/core.py", line 174, in read_holding_registers
    return await self._send_message(request)
  File "/usr/local/lib/python3.9/site-packages/async_modbus/core.py", line 135, in _send_message
    return await self.protocol._async_send_message(
  File "/usr/local/lib/python3.9/site-packages/async_modbus/core.py", line 36, in send_message_tcp
    tcp.raise_for_exception_adu(response_error_adu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/client/tcp.py", line 247, in raise_for_exception_adu
    pdu_to_function_code_or_raise_error(resp_pdu)
  File "/usr/local/lib/python3.9/site-packages/umodbus/functions.py", line 118, in pdu_to_function_code_or_raise_error
    raise error_code_to_exception_map[error_code]
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/./run.py", line 252, in <module>
    LOOP.run_until_complete(main(LOOP))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/src/app/./run.py", line 246, in main
    await profile_poll(SUNSYNK)
  File "/usr/src/app/profiles.py", line 179, in profile_poll
    await ss.read(profile.sensors)
  File "/usr/local/lib/python3.9/site-packages/sunsynk/usunsynk.py", line 53, in read
    raise Exception(  # pylint: disable=raise-missing-from
[cmd] ./run.py exited 1
[cont-finish.d] executing container finish scripts...
Exception: (1,268,12) 0
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.

I then removed - system_mode_voltages to avoid that error, which caused a new error:

2022-02-22 09:42:27,169 WARNING Removing HASS MQTT discovery info homeassistant/select/2107307119/system_mode_voltages/config
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3452, in _thread_main
    self.loop_forever(retry_first_connection=True)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1779, in loop_forever
    rc = self.loop(timeout, max_packets)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1181, in loop
    rc = self.loop_read(max_packets)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1572, in loop_read
    rc = self._packet_read()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 2310, in _packet_read
    rc = self._packet_handle()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 2936, in _packet_handle
    return self._handle_publish()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3216, in _handle_publish
    self._handle_on_message(message)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3444, in _handle_on_message
    self.on_message(self, self._userdata, message)
  File "/usr/src/app/mqtt.py", line 247, in __on_message
    _loop.create_task(self.publish(topic, None, retain=True))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 433, in create_task
    task = tasks.Task(coro, loop=self, name=name)
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 748, in call_soon
    self._check_thread()
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 785, in _check_thread
    raise RuntimeError(
RuntimeError: Non-thread-safe operation invoked on an event loop other than the current one
2022-02-22 09:42:27,221 ERROR   Task was destroyed but it is pending!
source_traceback: Object created at (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 912, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3452, in _thread_main
    self.loop_forever(retry_first_connection=True)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1779, in loop_forever
    rc = self.loop(timeout, max_packets)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1181, in loop
    rc = self.loop_read(max_packets)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1572, in loop_read
    rc = self._packet_read()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 2310, in _packet_read
    rc = self._packet_handle()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 2936, in _packet_handle
    return self._handle_publish()
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3216, in _handle_publish
    self._handle_on_message(message)
  File "/usr/local/lib/python3.9/site-packages/paho/mqtt/client.py", line 3444, in _handle_on_message
    self.on_message(self, self._userdata, message)
  File "/usr/src/app/mqtt.py", line 247, in __on_message
    _loop.create_task(self.publish(topic, None, retain=True))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 433, in create_task
    task = tasks.Task(coro, loop=self, name=name)
task: <Task pending name='Task-3' coro=<MQTTClient.publish() running at /usr/src/app/mqtt.py:187> created at /usr/local/lib/python3.9/asyncio/base_events.py:433>
/usr/local/lib/python3.9/threading.py:956: RuntimeWarning: coroutine 'MQTTClient.publish' was never awaited
  self._invoke_excepthook(self)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
2022-02-22 09:42:29,451 INFO    Profile System Mode action UPDATE
2022-02-22 09:42:29,623 INFO    Updating entity options: ['UPDATE']
2022-02-22 09:42:29,626 INFO    publish SUNSYNK/status/XXXXX/system_mode new_839
2022-02-22 09:42:29,743 ERROR   Read Error: (1,166,2) 0

Deleted the device, restarted the add-on.

Now things seem back to normal, just migrating from grid_load to grid_power at the moment.

I post the above because other users may get these issues as well.

kellerza commented 2 years ago

It becomes really interesting once you add your inverter to home assistant's Energy management (then you dont need the day_ sensors, as you get day/month year and much more)

see here for sensors and then configure in your HASS energy management

kellerza commented 2 years ago

The KeyError 0 is something unique returned from the USB-W630 (it is not a valid Modbus error!) but will try catch that and fail more gracefully.

For now try removing system_mode as well, just an empty PROFILES: [] - you could try adding the individual sensos that is part of profiles to see if they read properly

Ivan-L commented 2 years ago

So the KeyError 0 issue was previously caused by 4 leading bytes being present in a next subsequent read after reading a bad register (185). I wouldn't specifically catch that error. I actually gave that some thought. It might be better to compare modbus tcp request and response transaction IDs. If they do not match, then thrown an error. The KeyError 0 is just a side effect of this. I'll investigate in more detail to see why this happened for Profile System Mode action UPDATE

kellerza commented 2 years ago

So the KeyError 0 issue was previously caused by 4 leading bytes being present in a next subsequent read after reading a bad register (185). I wouldn't specifically catch that error. I actually gave that some thought. It might be better to compare modbus tcp request and response transaction IDs

I'll leave this one as a PR for you, since I'm not really able to generate these faults (at least not from what I've seen with direct serial access or via mbusb)

Ivan-L commented 2 years ago

Happy with that, will look at doing a PR some time this week 👍 Also happy to tackle the new error if I am the only one getting it since I now know how to work around it now. Though the allow_gap = 1 change did fix all of my previous errors.

Thanks for all of your help @kellerza !

P.S. I love the new essential & non-essential load sensors, you can see them in action here: image

kellerza commented 2 years ago

In my setup I have a SMA energy meter wired to the essentials and non-essentials and they give slightly different readings

The value on the house&pool is from Sunsynk and the main value to the icon is from the SMA energy meter (but both of these are filtered... so very likely that what I'm seeing is a result of that), but would appreciate feedback

image
RoganDawes commented 1 year ago

So I've just tried the following now to rule out a few things.

I stopped my custom python scripts from polling the solarman WiFi dongle and disconnected the dongle entirely. I then disconnected the USR-630 from the inverter BMS RS485 port and connected the USR-630 to the RS232 port where the solarman WiFi dongle was connected to. I used the serial cable that came with the USR-630.

Hi @Ivan-L

I just wanted to confirm whether you were able to communicate with the inverter using the RS232 port, or only using the RS485 port? The above sounds a lot like you were using the RS232 port? If so, can you confirm the protocol that was being used?

Thanks so much!

Ivan-L commented 1 year ago

Hi @RoganDawes,

With the USR-W630 you are able to communicate either through the RS232 port or the RS485 (BMS485) port on the inverter, i.e. the ports are interchangeable. The only extra gotcha with the RS485/BMS485 port is that you should be using Lithium Mode 0 on the inverter. If you are using anything other than 0, then the behaviour of the RS485/BMS485 port may change and your only option may be using the RS232 for monitoring, which would mean that the data logger/dongle would need to be disconnected. I used the RS232 for many months because I found it more stable.

I have since stopped using the USR-W630 and have switched to a Pi Zero W 2 instead and now use the RS485/BMS485 port for monitoring.

RoganDawes commented 1 year ago

That's great, thank you for responding. That suggests that the dongle is indeed using modbus as a protocol, and it might be possible/easy to either convince the dongle to connect to your own server by providing a hosts entry on the router, or via iptables rules. Or alternatively, just flashing the dongle with your own firmware.

Ivan-L commented 1 year ago

it might be possible/easy to either convince the dongle to connect to your own server by providing a hosts entry on the router, or via iptables rules. Or alternatively, just flashing the dongle with your own firmware.

Why would you want to do that though?

RoganDawes commented 1 year ago

it might be possible/easy to either convince the dongle to connect to your own server by providing a hosts entry on the router, or via iptables rules. Or alternatively, just flashing the dongle with your own firmware.

Why would you want to do that though?

To reuse existing hardware that is known working, rather than buying an RS485 dongle and making your own cable, then troubleshooting it? To eliminate the always active WiFi access point that the Sunsynk dongle leaves broadcasting with a psk of 12345678, that might compromise your actual WiFi credentials?

Ivan-L commented 1 year ago

To reuse existing hardware that is known working, rather than buying an RS485 dongle and making your own cable, then troubleshooting it?

I'm not sure whether you'd be able to use the dongle as a pure Modbus TCP to Modbus RTU gateway, like say mbusd on a PI or a USR-W630, but I stand to be corrected. I initially started by polling the data logger with some custom Python scripts but quickly found that the current add-on was much more reliable in terms of getting data every second. The data logger hardware felt flimsy and by polling it, it actually started uploading less data to the cloud, i.e. there would be gaps in the data. Also the dongle would become unresponsive more often the more aggressive I made the custom Python scripts at polling. In the end I left the data logger to only upload to Solarman and use it for the odd firmware update.

There have been some projects which have continued to try to poll the dongle, like the following, though I have not tested them since I'm so happy with the current add-on (especially with writable sensors): https://github.com/StephanJoubert/home_assistant_solarman

If you do craft some custom firmware for the data logger or manage to put it to good use, then do let us know, that would be very interesting 😄 I see that there have been some hacking attempts here: https://powerforum.co.za/topic/12604-sunsynk-wifi-dongle-hacking/ But creating a cable and using a PI (or other Modbus gateway) seems simpler. For me at least, I can use the PI for monitoring the BMS in addition to monitoring the inverter.

To eliminate the always active WiFi access point that the Sunsynk dongle leaves broadcasting with a psk of 12345678

I wasn't aware of this. I'll see if I can repro this on my side as that would be very problematic if that is the case.