pymodbus-dev / pymodbus

A full modbus protocol written in python
Other
2.13k stars 883 forks source link

Regression of #1924 #2151

Closed AlexanderLanin closed 2 weeks ago

AlexanderLanin commented 1 month ago

The message from #1924 leads to Modbus Error: [Input/Output] ERROR: No response received after 0 retries in pymodbus 3.6.6

It seems all the decoding problems like Modbus Error: [Input/Output] Unable to decode request are "swallowed" internally, and never reported to the caller. Therefore this ticket probably addresses any kind of invalid message.

Any chance of introducing an improved error handling? As far as I've seen the test suite is much better now, so it might be possible to do that without unintended side effects?

Details

On the lowest level within decode() at https://github.com/pymodbus-dev/pymodbus/blob/v3.6.6/pymodbus/factory.py#L315

Note: None raises an exception at https://github.com/pymodbus-dev/pymodbus/blob/v3.6.6/pymodbus/framer/socket_framer.py#L106, but it's not reported to the user. It's catched somewhere else.

Half baked fix:

Wrapping response.decode(data[1:]) within if len(data) > 1: leads to Exception Response(function_code: 132, original_code: 4, message: IllegalFunction) which is already significantly better. Anything is better than No response received after 0 retries :wink:

Although that's not as ideal as IllegalAddress or a simple "decoding error", which would both be much more accurate.

Problem 1: that if should probably contain if function_code <= 0x80: raise ModbusException(...). Problem 2: it should probably lead to calling resetFrame() in some way.

Special handling fix:

Just before https://github.com/pymodbus-dev/pymodbus/blob/v3.6.6/pymodbus/framer/socket_framer.py#L104 introduce following code:

            # Some modbus servers miss a byte at the end in case of errors
            if data[0] & 0x80 and len(data) == 1 and len(self._buffer) > length:
                Log.debug("Malformed error response detected, trying to recover")
                data = self._buffer[self._hsize : length + 1]
                result = self.decoder.decode(data)
                self.resetFrame()
                callback(result)
                return

Safe fix

It seems to me, the safe approach is to somehow forward the "Unable to decode request" exception to the user?! It's already there. The problem is simply, that the user doesn't receive it. See https://github.com/pymodbus-dev/pymodbus/blob/v3.6.6/pymodbus/framer/socket_framer.py#L106

Note: in contrast to the fixes above, this currently leads to (somewhat needless) reconnects of the connection 😢

Disclaimer

I don't have the slightest idea about modbus. I'm just looking at what happens in my very special case, without any knowledge of the protocol or pymodbus.

janiversen commented 1 month ago

You "safe fix' works counter to the automatic retry mechanism.

Furthermore it seems as if you are not testing with dev. We do test for

Anyhow pull requests are as usual welcome.

janiversen commented 1 month ago

your code análisis might be correct, but do not help us at all. We need to see a debug log, that shows what is wrong. Apart from that you need to tell -version you are using

I know we are specifically testing exception responses extensively, so I am confident that there are no length errors at that level.

AlexanderLanin commented 1 month ago

Sorry, I didn't reinclude the information. I'm using AsyncModbusTcpClient on Ubuntu 22 (on WSL2 on Windows 11).

Here are some logs captured with latest dev. As far as I can tell it's exactly the same as 3.6.6. The user is never notified that anything was received, which makes it really hard for me to detect the actual error.

DEBUG:custom_components.sungrow.core.modbus_py:_read_range(read, 5071, 1)
DEBUG:pymodbus.logging:Adding transaction 4
DEBUG:pymodbus.logging:send: 0x0 0x4 0x0 0x0 0x0 0x6 0x1 0x4 0x13 0xce 0x0 0x1
DEBUG:pymodbus.logging:recv: 0x0 0x4 0x0 0x0 0x0 0x2 0x1 0x84 0x2 old_data:  addr=None
DEBUG:pymodbus.logging:Processing: 0x0 0x4 0x0 0x0 0x0 0x2 0x1 0x84 0x2
DEBUG:pymodbus.logging:Factory Response[132]
ERROR:pymodbus.logging:General exception: index out of range
DEBUG:pymodbus.logging:Resetting frame - Current Frame in buffer - 0x0 0x4 0x0 0x0 0x0 0x2 0x1 0x84 0x2
ERROR:asyncio:Fatal error: protocol.data_received() call failed.
protocol: <pymodbus.client.modbusclientprotocol.ModbusClientProtocol object at 0x7fa3887571d0>
transport: <_SelectorSocketTransport fd=7 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 1013, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/workspaces/homeassistant-sungrow/.venv/lib/python3.11/site-packages/pymodbus/transport/transport.py", line 303, in data_received
    self.datagram_received(data, None)
  File "/workspaces/homeassistant-sungrow/.venv/lib/python3.11/site-packages/pymodbus/transport/transport.py", line 337, in datagram_received
    cut = self.callback_data(self.recv_buffer, addr=addr)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/homeassistant-sungrow/.venv/lib/python3.11/site-packages/pymodbus/client/modbusclientprotocol.py", line 81, in callback_data
    self.framer.processIncomingPacket(data, self._handle_response, slave=0)
  File "/workspaces/homeassistant-sungrow/.venv/lib/python3.11/site-packages/pymodbus/framer/old_framer_base.py", line 146, in processIncomingPacket
    self.frameProcessIncomingPacket(single, callback, slave, **kwargs)
  File "/workspaces/homeassistant-sungrow/.venv/lib/python3.11/site-packages/pymodbus/framer/old_framer_socket.py", line 89, in frameProcessIncomingPacket
    raise ModbusIOException("Unable to decode request")
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] Unable to decode request
DEBUG:pymodbus.logging:Connection lost comm due to Modbus Error: [Input/Output] Unable to decode request
DEBUG:pymodbus.logging:callback_disconnected called: Modbus Error: [Input/Output] Unable to decode request
DEBUG:pymodbus.logging:Wait comm 100.0 ms before reconnecting.
DEBUG:pymodbus.logging:Connecting comm
DEBUG:pymodbus.logging:Connected to comm
DEBUG:pymodbus.logging:Connection lost comm due to Server not responding
DEBUG:pymodbus.logging:callback_disconnected called: Server not responding
ModbusIOException !!! Modbus Error: [Input/Output] ERROR: No response received after 0 retries
DEBUG:custom_components.sungrow.core.modbus_py:Details
Traceback (most recent call last):
  File "/workspaces/homeassistant-sungrow/custom_components/sungrow/core/modbus_py.py", line 134, in _read_range
    rr = await func(address_start - 1, count=address_count, slave=self._slave)  # type: ignore
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/homeassistant-sungrow/.venv/lib/python3.11/site-packages/pymodbus/client/base.py", line 181, in async_execute
    raise ModbusIOException(
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] ERROR: No response received after 0 retries
janiversen commented 1 month ago

there are something wrong in the library:

ERROR:pymodbus.logging:General exception: index out of range

Should not happen.

The app should actually get ModbusIOException("Unable to decode request"), but let me check that as well.

janiversen commented 1 month ago

Actually your device responds wrongly:

DEBUG:pymodbus.logging:send: 0x0 0x4 0x0 0x0 0x0 0x6 0x1 0x4 0x13 0xce 0x0 0x1
DEBUG:pymodbus.logging:recv: 0x0 0x4 0x0 0x0 0x0 0x2 0x1 0x84 0x2 old_data:  addr=None
DEBUG:pymodbus.logging:Processing: 0x0 0x4 0x0 0x0 0x0 0x2 0x1 0x84 0x2

The MBAP header from your device is incorrect. Byte 4-5 is length. I quote from the standard: """Length - The length field is a byte count of the following fields, including the Unit Identifier and data fields""", so the length should be 0x0 0x3

In the send you see 0x0 0x6, followed by 6 bytes of data (including the unit identifier) But in the recv you see 0x0 0x2, followed by 3 bytes of data

This causes correctly the last byte 0x2 to be cut, which means the Exception (0x84) misses the data byte.

I still wonder why you get "index out of range".

janiversen commented 1 month ago

I found the "index out of range", that is because the data byte is missing.

AlexanderLanin commented 1 month ago

Do you see a way to provide better feedback than "No response received after 0 retries"? There is a response, it's just not parseable.

I would love "Unable to decode request" or IllegalFunction or IllegalAddress to differentiate it in any way from a simple connection-lost-scenario. Maybe even an "Unable to decode request in all retries". e.g. by counting the non-parseable responses.

janiversen commented 1 month ago

No, we try to make the API as simple as possible, people who wants details activate the debug logging.

And the library do not see a response, just garble data.

But as usual, if you want the library do behave differently, pull requests are welcome.

Closing as this is not a bug, nor a feature request we currently will implement except is e.g. you submit a pull request.

AlexanderLanin commented 1 month ago

I'm currently not able to submit a PR as I don't sufficiently understand how pymodbus works at the moment.

I can however share my AsyncModbusTcpClient error handling solution with others. I'm sure it's specific to my situation, but it may give others a starting point:

       try:
           rr = await client.read_input_registers(address_start - 1, count=address_count, slave=slave)
        except pymodbus.exceptions.ModbusIOException as e:
            if e.string != "[Input/Output] ERROR: No response received after 0 retries":
                raise ModbusError("Unknown IO error") from e

            # Wait for internal reconnect to finish, so we can give a better error msg.
            if reconnect_task := client.ctx.reconnect_task:
                await reconnect_task

            if not client.connected:
                raise ModbusError("Connection lost") from None

            if self._ever_succeeded:
                raise ModbusError("Cannot decode response") from None
            else:
                raise ModbusError("Invalid Slave") from None

This way I can react to different situations within my application.

janiversen commented 1 month ago

hmmm your example is not correct, but works, sort of.

If you use Async clients, you should do "rr = await client.read_input_registers" (see our simple_async_client.py example).

You will in many (most) cases get a future back and not the real result.

AlexanderLanin commented 1 month ago

Updated; this was partially pseudo code to make it simple and readable

janiversen commented 1 month ago

I have seen that a number of devices have this error, so I will try to implement it, can you assist in testing the pull request ?

AlexanderLanin commented 1 month ago

Sure, and here is another hint regarding a fix: https://github.com/AlexanderLanin/homeassistant-sungrow/blob/4494ee539261af424ba0372140f3198328fe5c72/custom_components/sungrow/core/modbus_py.py#L29

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been stalled for 5 days with no activity.