pymodbus-dev / pymodbus

A full modbus protocol written in python
Other
2.28k stars 934 forks source link

Remove a forced close() and reconnect() on communication error #2269

Closed nrodday closed 3 months ago

nrodday commented 3 months ago

Versions

Python: 3.12.3
OS: Debian 12 - 6.6.28+rpt-rpi-v8
Pymodbus: 3.6.9
Modbus Hardware (if used): RelayBoard R4M9B12, PH Sensor, Temp Sensor

Pymodbus Specific

I opened an issue earlier, so sorry to bug you again but I believe there is a problem with the pymodbus library.

https://github.com/pymodbus-dev/pymodbus/issues/2265

There was also a very much related discussion here: https://github.com/pymodbus-dev/pymodbus/issues/1654

Description

I am connecting multiple modbus slaves on a single bus. When a single sensor becomes unavailable, pymodbus resends packets according to a retries parameter. Meanwhile, no other communication is possible on the channel.

When the modbus slave cannot be reached after the predefined amount of retries the connection is closed and a reconnect initiated. The problem here is that communication on the channel will be very unreliable of pymodbus keeps reconnecting all the time.

Desired behavior: Simply issue a log message that a slave is not answering, but do NOT close and reconnect the connection. At least, leave the user the choice by adding a parameter whether closing & reconnecting should be done.

In the issue #1654 you state that "All 3.x. versions close connection on an error [...]". A single slave not responding is a very normal setting in a production environment (the slave is a sensor and was simply disconnected). Therefore, my point is that a close() and reconnect() should not happen, only a simple log entry to keep the communication robust.

Code and Logs

2024-07-27 20:40:56.275 DEBUG (MainThread) [pymodbus.logging] Adding transaction 0 2024-07-27 20:40:56.275 DEBUG (MainThread) [pymodbus.logging] Resetting frame - Current Frame in buffer - 2024-07-27 20:40:56.275 DEBUG (MainThread) [pymodbus.logging] send: 0x15 0x3 0x0 0x0 0x0 0x1 0x87 0x1e 2024-07-27 20:40:56.326 DEBUG (MainThread) [pymodbus.logging] Adding transaction 0 2024-07-27 20:40:56.327 DEBUG (MainThread) [pymodbus.logging] Resetting frame - Current Frame in buffer - 2024-07-27 20:40:56.327 DEBUG (MainThread) [pymodbus.logging] send: 0x15 0x3 0x0 0x0 0x0 0x1 0x87 0x1e 2024-07-27 20:40:56.378 DEBUG (MainThread) [pymodbus.logging] Adding transaction 0 2024-07-27 20:40:56.378 DEBUG (MainThread) [pymodbus.logging] Resetting frame - Current Frame in buffer - 2024-07-27 20:40:56.378 DEBUG (MainThread) [pymodbus.logging] send: 0x15 0x3 0x0 0x0 0x0 0x1 0x87 0x1e 2024-07-27 20:40:56.430 DEBUG (MainThread) [pymodbus.logging] Adding transaction 0 2024-07-27 20:40:56.430 DEBUG (MainThread) [pymodbus.logging] Resetting frame - Current Frame in buffer - 2024-07-27 20:40:56.431 DEBUG (MainThread) [pymodbus.logging] send: 0x15 0x3 0x0 0x0 0x0 0x1 0x87 0x1e 2024-07-27 20:40:56.482 DEBUG (MainThread) [pymodbus.logging] Connection lost comm due to Server not responding 2024-07-27 20:40:56.483 DEBUG (MainThread) [pymodbus.logging] callback_disconnected called: Server not responding 2024-07-27 20:40:56.483 ERROR (MainThread) [homeassistant.components.modbus.modbus] Pymodbus: modbus_hub: Error: device: 21 address: 0 -> Modbus Error: [Input/Output] ERROR: No response received after 3 retries 2024-07-27 20:40:56.484 DEBUG (MainThread) [pymodbus.logging] Wait comm 100.0 ms before reconnecting. 2024-07-27 20:40:56.515 DEBUG (MainThread) [homeassistant.components.modbus.modbus] Pymodbus: modbus_hub: Error: device: 21 address: 1 -> Modbus Error: [Connection] Not connected[AsyncModbusSerialClient /dev/ttyAMA5:0] 2024-07-27 20:40:56.547 DEBUG (MainThread) [homeassistant.components.modbus.modbus] Pymodbus: modbus_hub: Error: device: 12 address: 0 -> Modbus Error: [Connection] Not connected[AsyncModbusSerialClient /dev/ttyAMA5:0] 2024-07-27 20:40:56.586 DEBUG (MainThread) [pymodbus.logging] Connecting to /dev/ttyAMA5. 2024-07-27 20:40:56.586 DEBUG (MainThread) [pymodbus.logging] Connecting comm 2024-07-27 20:40:56.587 DEBUG (MainThread) [pymodbus.logging] Connected to comm

janiversen commented 3 months ago

Pymodbus is not resending the request, unless it is told to do so (default is no), so it is the calling application that does that.

Pymodbus will retry the message if configured to do so, which seems to be case here...and if there are no response after the retries the line is closed and reopened.

This is all a wanted behavior, but it seems your app is not doing as you want, so please change how it uses pymodbus.

If you want the library to have a different behavior then feel free to submit a pull request.

janiversen commented 3 months ago

Closing issue, as it seems to be a matter of how the app uses the library, at leas5 the log show no wrong behavior.

janiversen commented 3 months ago

You talk about 1 sensor, but it is actually a whole device that is unresponsive, which is a very rare situation in real life. If it was just the sensor the device would still respond.

However the app is informed about this with the first close/reconnect and should the take steps to avoid repeating the problems.

nrodday commented 3 months ago

I am aware that I can change the retries to 0, which immediately closes the line after a single frame is not answered and reconnects. But it is still performing a reconnect and that is the issue here.

The problem is the close and reconnect. An unresponsive device will cause a whole serial bus with many modbus slaves to become unreliable. When mentioning a sensor I was referring to a PH sensor, which is a device. If someone pulls the plug on one slave (which is a realistic scenario), the rest should keep working without problems. Modbus is an industrial protocol, it should be reliable in most anticipated circumstances.

Also, the app cannot avoid repeating the problem as the device was physically disconnected. The pymodbus library keeps closing & reconnecting since it cannot reach the device, a behavior that is unexpected.

I suggest to only create a log entry or raise an exception that can be taken care of by the application. Another solution could be a switch such that the app is able to define what should happen in case of an unresponsive device (only log entry or close/reconnect).

With the current implementation the pymodbus library is simply not usable in case a slave is unplugged.

janiversen commented 3 months ago

If you want pymodbus to work differently, then feel free to submit a pull request, do not expect others to do your work.

The app is notified about the close/connect and it's up to the app to stop using that slave.

janiversen commented 3 months ago

Before being rude and stating "With the current implementation the pymodbus library is simply not usable in case a slave is unplugged." you should really read the documentation.

The library is transaction oriented, the app sends a request and gets a response, end of story, so no eternal loop! The app is causing the loop by sending the same request again. So please correct your app to handle the situation of an unavailable slave.

The library is widely used, so if this was a problem, it would have been noted and solved a long time ago.

nrodday commented 3 months ago

I did not want to come across as rude, I stated the problem and its consequences. I am also surprised that no one else had such a problem yet.

I understand that the app is causing the loop by resending the packets in certain intervals, which is desired behavior as the slaves should be polled in certain time intervals. The app can not know whether a slave was physically disconnected and can also not know when it is being physically reconnected. Therefore, the app needs to continue polling for a slave that was once configured.

But it is the library that forces a close/reconnect of the serial bus if a packet is not answered. And that is undesired behavior. If the app could just continue polling without the serial bus being closed and reopened, it would not interfere with other modbus slaves on the same bus, which are rendered unavailable during the reconnect phase.

This is also how other modbus implementations handle such situations, which is why am was surprised to find a different behavior here. E.g. I was working with a RevPI and their Modbus implementation simply adds a log entry that a device is not answering and moves on with the next task in the queue. No reconnects at all.

janiversen commented 3 months ago

The app is notified when the connection is closed and when it's connected, so it's simply not true what you say ! The app knows exactly when a request could not be sent, it looks like you never bothered to read how the library actually works.

I know a couple of other libraries, and they all do the same, inform the app....you seems to be mixing libraries with apps, I would expect the app to make a log entry and then continue with the other slaves, but it seems your app is in need of an update, please stop harassing us, and talk with the maintainers of your app.

But of course if the other apps/libraries works as you want, then why not use them !

Again, pymodbus works as most communication libraries by being transactional and not hold a lifetime history, if you want to have that changed, You either make a pull request or convince someone to do it....please do not ask us maintainers change something, that have worked well for many years.

janiversen commented 3 months ago

And just a friendly advice, since I assume your app is still homeassistant. The next version of pymodbus contains API changes as announced.

This means that even if what you want got implemented, you would not be able to use that version until somebody make relatively interesting changes to the modbus integration in homeassistant.

nrodday commented 2 months ago

Thanks for the advice on the future API change, I will keep it in mind.

I went over the documentation again and looked at the code, particularly transport.py with its 'connection_lost' method. You are right when you say that the app gets notified when the connection is closed and reconnected (as long as a callback was specified). But it is not possible to avoid the closing of the serial line after a missing reply. It is only possible to stop the reconnect after the connection was closed. This is an issue for me as it makes the setup unstable, as explained earlier.

I found that when I simply comment the following lines containing the close() call, the library does not close the serial line anymore and together with retries = 0 works as expected and simply runs the next modbus command.

    def connection_lost(self, reason: Exception | None) -> None:
        """Call from asyncio, when the connection is lost or closed.

        :param reason: None or an exception object
        """
        if not self.transport or self.is_closing:
            return
        Log.debug("Connection lost {} due to {}", self.comm_params.comm_name, reason)
        Log.debug('This is it')
        #self.__close()
        #if (
        #    not self.is_server
        #    and not self.listener
        #    and self.comm_params.reconnect_delay
        #):
        #    self.reconnect_task = asyncio.create_task(self.do_reconnect())
        #    self.reconnect_task.set_name("transport reconnect")
        #self.callback_disconnected(reason)

I am, however, unsure about other problems this might introduce. Do you see potential problems here for other errors that require this method? Otherwise one could add a switch here to default to your existing behavior but resort to not closing the line when the switch was specified.

janiversen commented 2 months ago

We close the com port to reset the connection, as different usb rs485 devices are known to block and cause problems.

It does not affect the performance to close/open a tty port nor does it make the communication unstable (since there are no communication when it happens) but it solves a world of problems.

so not closing the port in case of problems is not an option.

If you want a pull request accepted, that do not close the connection, you have to convince that

a) close/open makes the communication unstable (meaning proving professional servers are wrong) b) add features to handle when the comm port needs to be closed.

Btw, your suggestion is only a small part of what would be needed.

nrodday commented 2 months ago

The close/open does not make the communication unstable by itself, but commands that need to be executed during that time slot cannot be sent over the channel and are delayed. With recurring close/reconnects every 5sec or even less (as this is the polling interval of an unresponsive slave) it happens quite frequently that an automated action such as an ON on a relay is falling into such an interval. I need commands to be executed instantly instead of hoping that they might be executed at a later point in time.

Therefore, isn´t my use-case already proving your point a?

As for your point b, I agree - the code has to be altered to accommodate such a change.

janiversen commented 2 months ago

The open/close is not every 5 seconds, but only when the app sends a request to the failing device! remember the lib only does what the app ask it to do....and in this case it seems the app keeps retrying, which clearly is wrong and should be corrected.

You are still mixing what the app does and what the lib does...there are NO loops in the lib (apart from retry), every request from the app is handled seperate.

The library works as it should, but your app have a problem that lead to data not being read....please stop blaming the library for a fault in your app!

I will block this thread, since it's clear you do not really distinguish between the app and the library, but want to force a bug in the app to be solved in the library.