pymodbus-dev / pymodbus

A full modbus protocol written in python
Other
2.17k stars 891 forks source link

Attach expect_response variable to ModbusRequest #2016

Closed jameshilliard closed 2 months ago

jameshilliard commented 4 months ago

This should be a bit cleaner and allows one-shot/broadcast control on a per-request basis.

jameshilliard commented 4 months ago

You do not mix broadcast and non broadcast on a single client.

What's the issue with that?

janiversen commented 4 months ago

at call level broadcast is already implemented as pr modbus standard, slave=0

jameshilliard commented 4 months ago

at call level broadcast is already implemented as pr modbus standard, slave=0

What is broadcast_enable used for then?

janiversen commented 4 months ago

The modbus protocol is a request/response protocol, so a response is always expected meaning the parameter is implementing a non standard feature.

The "async" creates a future, so what is the problem for the app to wait on that future .... that is actually making the API consistent.

Mixin,py (the calls) depend heavily on this fact.

If broadcast is activated at client level, the app are expected to call execute directly and deal with any response if they arrive. Actually the broadcast implementation in pymodbus is very old and have not been updated as e.g. mix was added.

jameshilliard commented 4 months ago

The modbus protocol is a request/response protocol, so a response is always expected meaning the parameter is implementing a non standard feature.

AFAIU the spec doesn't really mandate that we do anything specific with responses, it seems to only require that slaves should send responses(which buggy devices may not actually do). From my understanding some requests may have extremely long execution times(the spec doesn't appear to place an upper limit on that) and one may not wish to listen for the response and instead continue to fire requests at other devices.

so what is the problem for the app to wait on that future

If the future never completes due to not getting a response then the library cause a spurious disconnect for cases where it's known ahead of time that a response will never be received so I don't think that would be all that feasible.

Mixin,py (the calls) depend heavily on this fact.

It looks like the calls would just return None if expect_response = False, I'm not sure how that would cause an issue.

If broadcast is activated at client level, the app are expected to call execute directly and deal with any response if they arrive.

client.execute calls client.async_execute which then causes a connection failure if no response is received from what I can tell, I'm not sure how one could handle that application side without some sort of escape hatch here in pymodbus to bypass the response handling codepaths.

janiversen commented 4 months ago

AFAIU the spec doesn't really mandate that we do anything specific with responses, it seems to only require that slaves should send responses(which buggy devices may not actually do). From my understanding some requests may have extremely long execution times(the spec doesn't appear to place an upper limit on that) and one may not wish to listen for the response and instead continue to fire requests at other devices.

That is the beauty of async you can have as many calls as you want, and simply have an await somewhere on each single or grouped.

If the future never completes due to not getting a response then the library cause a spurious disconnect for cases where it's known ahead of time that a response will never be received so I don't think that would be all that feasible.

The future will always get replied to! and yes sometimes that causes a reconnect which actually what we want....think of a multidrop line with corrupted data, most times a reconnect is the fastest and only solution.

client.execute calls client.async_execute which then causes a connection failure if no response is received from what I can tell, I'm not sure how one could handle that application side without some sort of escape hatch here in pymodbus to bypass the response handling codepaths.

as I wrote the reconnect is a feature not a bug.

jameshilliard commented 4 months ago

That is the beauty of async you can have as many calls as you want, and simply have an await somewhere on each single or grouped.

This ties into the queue implementation somewhat, if we need to fire a no-response request and don't have something like this then it will muck up the queue processing due to triggering timeout codepaths and other undesirable/inapplicable error recovery code.

The future will always get replied to!

We wait_for the future and if no reply is received we throw an exception from my understanding.

and yes sometimes that causes a reconnect which actually what we want....think of a multidrop line with corrupted data, most times a reconnect is the fastest and only solution.

I mean, it's clearly not what we want when we know when making a specific request that there will never be a response, so triggering a reset at least in that case is clearly inappropriate for that type of request.

That's one nice thing about having an expect_response flag as you then have an escape hatch which lets you avoid triggering inappropriate error handling.

as I wrote the reconnect is a feature not a bug.

Maybe for fully compliant devices, but it's really making a lot of assumptions regarding how failure recovery needs to be handled. For our use case the assumption that a reconnect would be required when a no-response expected request times out is unambiguously wrong.

janiversen commented 4 months ago

I have told you before do not try to combine all your ideas in one PR, that will be very difficult to review. I have politely suggested you make the queue implementation because it seems that we nearly agree on that.

Regarding this PR not expecting a response is against the request/response definition in the modbus standard....please show me one request that do not have a response defined...if the app do not get a response it is a clear protocol violation and it does a timeout again according to the standard request/protocol definition.

I think your definition of broadcast is wrong...broadcast means the client can receive 0, 1 or many responses...which is the reason why this does not work with mixin that are strictly request/response (or timeout).

For broadcast the app calls exceptĂșe, when returns imidiatly with none, and then it needs to poll until all possible replies have arrived.

janiversen commented 4 months ago

Maybe for fully compliant devices, but it's really making a lot of assumptions regarding how failure recovery needs to be handled. For our use case the assumption that a reconnect would be required when a no-response expected request times out is unambiguously wrong.

Then your use case is non modbus standard...there are several flow diagrams in the standard and surrounding discussions telling how to handle failures, and one of the options are to reconnect to achieve sync again.

janiversen commented 4 months ago

Your PR does not handle the following situation (on a serial line):

call without awaiting response call awaiting response.

There are 2 problems here:

and again if you know a device will not respond, then there are no reason to send a request...simply because you have no idea if the device actually executes the request, which in case of a write (e.g. with a stepper motor) can be fatal.

jameshilliard commented 4 months ago

I have told you before do not try to combine all your ideas in one PR, that will be very difficult to review. I have politely suggested you make the queue implementation because it seems that we nearly agree on that.

This isn't exactly combining much in one PR. I've been working on the queue implementation but it's a bit tricky since I'm having trouble figuring out how to implement queue processing back-pressure(when device goes busy) for servers with TID support.

surrounding discussions telling how to handle failures, and one of the options are to reconnect to achieve sync again.

But for said buggy device a lack of a response is not a normal communication failure but rather a firmware bug and should not induce a reconnect at all since reconnecting will not fix a known firmware bug.

Having expect_response be configurable per-call lets us skip response handling only on operations where we know the firmware bug exists and should be compatible with a queue with the awaited request completing after the request is sent.

IMO it also makes sense to have timeouts/retries be per-request configurable as different requests may have maximum known executions times.

Maybe instead of using expect_response to suppress the reconnect call/exception we could instead attach an on_timeout callback or similar to the request which runs a application defined function whenever a request times out instead of the default self.close(reconnect=True) call? This would be useful in general I think as it would allow for the application to do any communication failure related cleanup in the callback.

Right now the only escape hatch I have is to override internal class methods to remove the close calls which is not great to say the least as it makes it difficult to differentiate between expected/unexpected timeouts and I'm trying to ensure the queue design doesn't cause further issues if a device has a firmware bug like this.

the 2 request are sent one after the other without a delay. rs485 prescribes a delay to turn the electric bus and allow a client to respond. Remember there are no RTS/CTS as in traditional serial

Hmm, I suppose we could just set the asyncio.wait_for to the minimum required delay along with a way to suppress the exception handling on a per call basis(ie per request on_timeout callback or similar)?

and again if you know a device will not respond, then there are no reason to send a request...simply because you have no idea if the device actually executes the request

That doesn't sound accurate, I mean if a device has buggy firmware which doesn't respond to a write register command that doesn't mean say a read register request would necessarily have the same bug. So I think to confirm execution we just need to issue a read after the write.

janiversen commented 4 months ago

This isn't exactly combining much in one PR. I've been working on the queue implementation but it's a bit tricky since I'm having trouble figuring out how to implement queue processing back-pressure(when device goes busy) for servers with TID support.

Well you used it as an argument for this PR. The tricky part I see in the back processing is how to detect that the response was "server busy" without breaking the architecture (transaction.py do not know what a response is) !

But for said buggy device a lack of a response is not a normal communication failure but rather a firmware bug and should not induce a reconnect at all since reconnecting will not fix a known firmware bug.

AKA a non standard modbus device, which we pr definition do not support.

IMO it also makes sense to have timeouts/retries be per-request configurable as different requests may have maximum known executions times.

No it does not, practice shows it a lot of unneeded overhead and just complicates the API. Flexibility is not always a good thing, pymodbus tries to strike a balance between having an easy to use API and allow users to set needed parameters.

If you want a timeout pr request, then you should not use the high level API, but e.g. async_execute or even lower.

Maybe instead of using expect_response to suppress the reconnect call/exception we could instead attach an on_timeout callback or similar to the request which runs a application defined function whenever a request times out instead of the default self.close(reconnect=True) call? This would be useful in general I think as it would allow for the application to do any communication failure related cleanup in the callback.

A callback could be done, but we traditionally keep the callbacks as information only.

Right now the only escape hatch I have is to override internal class methods to remove the close calls which is not great to say the least as it makes it difficult to differentiate between expected/unexpected timeouts and I'm trying to ensure the queue design doesn't cause further issues if a device has a firmware bug like this.

A much easier solution is to inject a response by calling callback_data(), that is something we do very frequently for testing.

Hmm, I suppose we could just set the asyncio.wait_for to the minimum required delay along with a way to suppress the exception handling on a per call basis(ie per request on_timeout callback or similar)?

only for no_response, and only for serial ! But it complicates things quite a lot.

A parameter that suppresses the exception handling seems like you are redefining the whole of pymodbus.

That doesn't sound accurate, I mean if a device has buggy firmware which doesn't respond to a write register command that doesn't mean say a read register request would necessarily have the same bug. So I think to confirm execution we just need to issue a read after the write.

With normal devices you would just check the return code.

jameshilliard commented 4 months ago

The tricky part I see in the back processing is how to detect that the response was "server busy" without breaking the architecture (transaction.py do not know what a response is) !

Yeah...still trying to figure out how to deal with that.

A callback could be done, but we traditionally keep the callbacks as information only.

I went ahead and made a prototype of this in #2020.

A much easier solution is to inject a response by calling callback_data(), that is something we do very frequently for testing.

Looking at this it doesn't actually seem possible to fake a response when using a request queue since the application side code would have no real way of knowing when the request was actually sent(since the request may have only been queued and not yet sent).

janiversen commented 4 months ago

Looking at this it doesn't actually seem possible to fake a response when using a request queue since the application side code would have no real way of knowing when the request was actually sent(since the request may have only been queued and not yet sent).

This shows one of the severe weaknesses of the queue concept !!

A queue concept basically calls for a "flush" method, not only in your case, but in real life, because the app needs to ensure everything is sent before shutting down.

with the current implementation it is very simple: call the API without the await, then inject the response and await the API call.

This is something I do quite often, whenever I want to test special cases.

github-actions[bot] commented 3 months ago

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 2 months ago

This PR was closed because it has been stalled for 10 days with no activity.