Process async_execute operations using bounded worker tasks.

pymodbus-dev / pymodbus

A full modbus protocol written in python

Other

2.18k stars 895 forks source link

Process async_execute operations using bounded worker tasks. #1982

Closed jameshilliard closed 4 months ago

jameshilliard commented 5 months ago

Some prototyping to try and eliminate locking requirements.

jameshilliard commented 5 months ago

This only makes sense for serial lines, but you implement it for all types of transport.

So, I'm at the moment trying to come up with a general approach that would help ensure concurrency safety across all transports since it doesn't look like those offer particularly strong concurrency safety.

For tls/tcp/udp this is already handled in transport.py

It doesn't really look like there's anything to ensure concurrency safety and request ordering at the transaction/frame processing layers(not sure I'm using the right terminology) in transport.py along the lines of what I'm trying to do here. From what I can tell transport.py is too low level of a layer for a request queue like this.

janiversen commented 5 months ago

This only makes sense for serial lines, but you implement it for all types of transport.

So, I'm at the moment trying to come up with a general approach that would help ensure concurrency safety across all transports since it doesn't look like those offer particularly strong concurrency safety.

Have a look at transaction.py, that is where concurrency is handled (apart from serial).

For tls/tcp/udp this is already handled in transport.py

It doesn't really look like there's anything to ensure concurrency safety and request ordering at the transaction/frame processing layers(not sure I'm using the right terminology) in transport.py along the lines of what I'm trying to do here. From what I can tell transport.py is too low level of a layer for a request queue like this.

I am not sure how it could be safer, transaction.py allocates a tid to every request, matches the response with the future using the tid and thus the correct call returns.

Transport have concurrency hidden in asyncio.protocol ! so it not something to bother about at transport level.

Framer is currently just adding prefix/postfix to the request/response.

transaction is for tcp.. handling concurrency as defined in the protocol.

The missing part is serial, that do not have a tid in the packet..a simple way to implement it, would be to have a lock in transaction.py when using the rtu_framer.

I write serial and tcp....but a more correct wording is:

when using the rtu_framer there are no transaction id, and thus a lock is needed.

the socket/tls framers have tid and thus no locking is needed.

why do you think transaction.py is too low ? it sits just below the request/response encoding and on top of the framer...this is THE central point between multiple request types and multiple framer types.

While I remember it, most of this code is shared between server and client, so please remember to think about the server as well.

janiversen commented 5 months ago

I hope my comments do not come over as being negative, I sometimes is a bit blunt. My goal is to help you understand the code in order for you to make a solution that helps everyone !

Just keep asking, and I will give my advice and/or code pointers.

jameshilliard commented 5 months ago

Have a look at transaction.py, that is where concurrency is handled (apart from serial).

Yeah, I think that's probably around the right place for this, there's just so much going on there that I'm having a harder time figure out a good spot to implement a queue mechanism.

I am not sure how it could be safer, transaction.py allocates a tid to every request, matches the response with the future using the tid and thus the correct call returns.

I think you're mixing up transaction.py with transport.py, my comment was in reference to transport.py not implementing concurrency safety, not transaction.py.

Transport have concurrency hidden in asyncio.protocol ! so it not something to bother about at transport level.

The transports here I don't think implement concurrency safety directly.

The missing part is serial, that do not have a tid in the packet..a simple way to implement it, would be to have a lock in transaction.py when using the rtu_framer.

So one reason I'm thinking a queue might be better is it should provide better request ordering guarantees(ie requests are processed in the order the async functions are called by the app).

why do you think transaction.py is too low ? it sits just below the request/response encoding and on top of the framer...this is THE central point between multiple request types and multiple framer types.

I don't, it's probably roughly the right level, I think transport.py is too low.

While I remember it, most of this code is shared between server and client, so please remember to think about the server as well.

Yeah, transaction.py being so complex was why I prototype'd this at the client layer initially.

Just keep asking, and I will give my advice and/or code pointers.

Yeah, thanks for going through things, tracing the codepaths is a bit complex due to there being say sync/async variants which makes it unclear how everything is getting used.

Is there an obvious place to add a queue mechanism like this to transaction.py that you see?

janiversen commented 5 months ago

The transports here I don't think implement concurrency safety directly.

transport.py do not implement concurrency, because asyncio.protocol which transport use handle it.

Send() is the best example, that do not write directly but asyncio puts the data in a queue and its written when permitted. That I assume is the real reason why send() is sync, because it’s just a little more than add to a queue.

So one reason I'm thinking a queue might be better is it should provide better request ordering guarantees(ie requests are processed in the order the async functions are called by the app).

That’s a good argument queue it is, but only for framers without a tid ! Because for framers with a tid the queuing and ordering is handled in the device.

Yeah, transaction.py being so complex was why I prototype'd this at the client layer initially.

To be fair I consider it relative simple compared to execute/async_execute where the mix sync/async makes every change problematic.

There are some test cases for transaction.py that might help your understanding….if you want to experiment with how it’s called, feel free to add it as new tests, but please in a seperate PR so it can get matched quickly.

Is there an obvious place to add a queue mechanism like this to transaction.py that you see?

I am not at my computer right now, but there is a method called something like get_tid(), that would be a good starting point for the queue.

jameshilliard commented 5 months ago

Because for framers with a tid the queuing and ordering is handled in the device.

Hmm, so I'm assuming these devices have an internal FIFO queue or something? I assume they have a limit on queued requests as well? Even if the device has an internal queue I'd expect it to largely execute commands in the order in which commands are sent so it may be useful to have a FIFO queue in pymodbus as well along with a way of limiting in flight requests as needed on the pymodbus side to say a value low enough that it wouldn't fill up the device side buffer.

janiversen commented 5 months ago

Hmm, so I'm assuming these devices have an internal FIFO queue or something? I assume they have a limit on queued requests as well? Even if the device has an internal queue I'd expect it to largely execute commands in the order in which commands are sent so it may be useful to have a FIFO queue in pymodbus as well along with a way of limiting in flight requests as needed on the pymodbus side to say a value low enough that it wouldn't fill up the device side buffer.

Some devices might have a FIFO queue, but most simply start to execute the commands as they arrive async....Some commands (like e.g. write) take longer, which is why the responses might be mixed up.

Right now we have a list of transactions (futures) and responses are linked to these. No need for a FIFO queue for this !

As I just wrote in another comment, we do not try to solve server problems, we implement the standard. Solving server problems, would lead to a bigger number of deviated requests/responses, as well as a lot of "if" in the internal code.

A limit on the number of transactions open could be a good enhancement as a new parameter to the client and server classes. But when the limit is reached pymodbus should not buffer, but reject the call, so again this would not call for a FIFO buffer.

For the framers not supporting transaction ID, the FIFO buffer is a good idea.

jameshilliard commented 5 months ago

most simply start to execute the commands as they arrive async

I'm not sure what you mean by "arrive async" exactly but it sounds like a FIFO queue to me, messages will arrive to the end device in some order over the network.

Right now we have a list of transactions (futures) and responses are linked to these. No need for a FIFO queue for this !

Well...if it's needed may somewhat depend on the queue size, technically there's also the kernel TCP send buffer which is also a FIFO queue itself effectively.

As I just wrote in another comment, we do not try to solve server problems, we implement the standard. Solving server problems, would lead to a bigger number of deviated requests/responses, as well as a lot of "if" in the internal code.

Well what I'm thinking of is a parameter that controls the amount of in flight requests, for serial it would have to always be 1 but with tid's it could be as large as the device buffer(I think protocol upper limit would be 65535 in flight messages due to the 2 byte tid). Regarding following the standard...from what I can tell there isn't exactly a standard regarding transaction buffer sizes from what I can tell.

But when the limit is reached pymodbus should not buffer, but reject the call, so again this would not call for a FIFO buffer.

But why would it make sense to reject the call as opposed to queuing it? I mean this would be a behavior divergence between transports that would be rather unexpected. This would mean that the application API would have a significant behavior divergence between serial devices and small buffer modbus TCP devices for example which would not be desirable IMO.

I think it probably makes more sense to try and unify queuing across all the transports in some way that can ensure they all have the same application side API semantics.

For the framers not supporting transaction ID, the FIFO buffer is a good idea.

From what I'm reading some hardware may have very small internal FIFO buffers so I think pymodbus probably should support queuing for all transports.

janiversen commented 5 months ago

I'm not sure what you mean by "arrive async" exactly but it sounds like a FIFO queue to me, messages will arrive to the end device in some order over the network.

What I meant is that the requests normally do not queue up in the device, they are executed imidiatly but in seperate tasks, and thus end up sending responses in a possible mixed order.

Well...if it's needed may somewhat depend on the queue size, technically there's also the kernel TCP send buffer which is also a FIFO queue itself effectively.

Yes there are a lot of different queues, but since we register the transactions we do not need an extra queue, it just complicated things.

Well what I'm thinking of is a parameter that controls the amount of in flight requests, for serial it would have to always be 1 but with tid's it could be as large as the device buffer(I think protocol upper limit would be 65535 in flight messages due to the 2 byte tid). Regarding following the standard...from what I can tell there isn't exactly a standard regarding transaction buffer sizes from what I can tell.

If you start reading device manuals, you will not find a value for that parameter...and many devices support multiple connections, so the value does not really say anything.

I would not like to complicate the API with a parameter that is impossible to find and very dynamic.

We should simply send transactions until the server reports busy, that is according to the standard. If the server do not implement the busy response, well then its not conforming to the standard.

But why would it make sense to reject the call as opposed to queuing it?

That is a simple way to secure the app do not start retransmitting the request, something that is natural if the app do not receive a response.

I see the idea of queuing on the non TID framers, that alone will make the API consistent. If you add queuing on top of the transaction scheme we complicate the code a lot for nothing.

From what I'm reading some hardware may have very small internal FIFO buffers so I think pymodbus probably should support queuing for all transports.

No, that is not a pymodbus problem !! if the server have a small FIFO buffer (something I seriously doubt) then it's the app, that must do something...because the developers writing the app know which server they use, we have to be general.

The API should behave as close to the modbus standard as possible, and not invent extra layers, unless we do it for conformity of the API (like a transaction simulation for serial).

jameshilliard commented 5 months ago

What I meant is that the requests normally do not queue up in the device, they are executed imidiatly but in seperate tasks, and thus end up sending responses in a possible mixed order.

So it sounds like they are still generally sent in the order received over the network, just the response order may be different.

Yes there are a lot of different queues, but since we register the transactions we do not need an extra queue, it just complicated things.

That's not entirely clear, I mean if we have to implement a queue anyways for non-tid requests then having those support all requests including requests with tid's then there may not be any meaningful additional complexity depending on how it's implemented. In fact it may reduce complexity overall by providing an abstraction layer which provides a cleaner interface between the high level application exposed API and lower level transport specific parts of pymodbus.

We should simply send transactions until the server reports busy, that is according to the standard. If the server do not implement the busy response, well then its not conforming to the standard.

Based on this I think we would still want a queue so that when the server reports busy we can then start queuing requests inside pymodbus the same as for non-tid requests and then continue processing the queue when the device is no longer busy, this way we get a consistent high level application exposed API behavior.

That is a simple way to secure the app do not start retransmitting the request, something that is natural if the app do not receive a response.

But application code will generally await the function that generates the request, if pymodbus queues the request then the application(unless written poorly) should not re-transmit since the awaited function will not complete until pymodbus actually processes the queued request.

In an application if a request fails then the application will likely retry the request, on the other hand if the request is queued the application will not retry since it is waiting on the request to complete still.

Here's a simplified example of how one might want to use the API from the application:

async def monitor_fan(client):
    while True:
        data = await client.read_holding_registers(200, 1, slave=0x12)
        process_fan_data(data)

async def monitor_engine(client):
    while True:
        data = await client.read_holding_registers(200, 1, slave=0x13)
        process_engine_data(data)

async def monitor_pump(client):
    while True:
        data = await client.read_holding_registers(200, 1, slave=0x14)
        process_pump_data(data)

await asyncio.gather(
    monitor_fan(client),
    monitor_engine(client),
    monitor_pump(client)
)

Regardless of the low level transport if we have a queue then there will at most be 3 in-flight requests at any time from this example.

On the other hand if we don't have a queue and instead start rejecting requests when say a device is busy then things may start to break.

I see the idea of queuing on the non TID framers, that alone will make the API consistent. If you add queuing on top of the transaction scheme we complicate the code a lot for nothing.

Lets say we have the same application that can be configured to use either serial port or a modbus tcp server with TID support. If the TID based modbus device side queue fills up and causes the pymodbus API start rejecting requests then the behavior of the pymodbus API will be inconsistent when one uses a TID modbus device vs say a serial port that will queue requests in pymodbus instead of rejecting which is exactly what we don't want. The high level API behavior should IMO really not change at all between the low level transports.

No, that is not a pymodbus problem !! if the server have a small FIFO buffer (something I seriously doubt) then it's the app, that must do something...because the developers writing the app know which server they use, we have to be general.

This seems odd to me, this seems like an implementation detail that the application should not care about, if we have a queue for requests regardless of the framer/transport then the API should be the same between transports.

IMO an application developer should be able to change modbus servers without having to modify application logic to account for stuff like server differences unless there really is no way for the protocol library to abstract out the low level differences.

The API should behave as close to the modbus standard as possible

It's not clear what you mean by this, a protocol library like pymodbus is designed to provide a high level abstraction for the modbus standard.

I would argue that in this particular case it should also try and expose a consistent high level API to applications by exposing an API that follows common asyncio API semantics.

not invent extra layers, unless we do it for conformity of the API (like a transaction simulation for serial).

Having a queue layer for all requests I think would help with the conformity of the API overall while still providing a modbus standard compliant implementation at the low level.

janiversen commented 5 months ago

So it sounds like they are still generally sent in the order received over the network, just the response order may be different.

Yes, it is very important, for security, that pymodbus do NOT change the sending sequence. Think of e.g. a read and a write, if we break sequence the read will return the result of the write.

That's not entirely clear, I mean if we have to implement a queue anyways for non-tid requests then having those support all requests including requests with tid's then there may not be any meaningful additional complexity depending on how it's implemented. In fact it may reduce complexity overall by providing an abstraction layer which provides a cleaner interface between the high level application exposed API and lower level transport specific parts of pymodbus.

You have a valid argument here and I am entitled to see how the PR will look finally, before making big decisions.

Based on this I think we would still want a queue so that when the server reports busy we can then start queuing requests inside pymodbus the same as for non-tid requests and then continue processing the queue when the device is no longer busy, this way we get a consistent high level application exposed API behavior.

That would mean the transaction layer would need to decode the responses, which are later decoded one level higher...that sounds a bit inefficient, but I am as I said earlier ready not to judge before I see an implementation.

But application code will generally await the function that generates the request, if pymodbus queues the request then the application(unless written poorly) should not re-transmit since the awaited function will not complete until pymodbus actually processes the queued request.

Poorly written app code, tends to end up to be issues in the pymodbus repo, so may favorite is not to open pathways for more issues :-)

Regardless of the low level transport if we have a queue then there will at most be 3 in-flight requests at any time from this example.

Your example is convincing, however an implementation still needs to check for "rogue" application code, like the test, where we use async.gather to send request. The original app code (which I have from an example) was even worse, it did not bother test the return codes.

Lets say we have the same application that can be configured to use either serial port or a modbus tcp server with TID support. If the TID based modbus device side queue fills up and causes the pymodbus API start rejecting requests then the behavior of the pymodbus API will be inconsistent when one uses a TID modbus device vs say a serial port that will queue requests in pymodbus instead of rejecting which is exactly what we don't want. The high level API behavior should IMO really not change at all between the low level transports.

It will always be a bit exposed to the low level transpor, because in serial the response sequence is guaranteed with tid it is not guaranteed.

IMO an application developer should be able to change modbus servers without having to modify application logic to account for stuff like server differences unless there really is no way for the protocol library to abstract out the low level differences.

With that argument, pymodbus should try solve the solve problem of every bad implemented server, that is not going to fly !!!

There will always be servers where the application needs to do special handling. An example is Huawei's solar inverters, a really good stable implementation, but write operations are only permitted if the app started the connection by making a login using user defined function codes. Pymodbus could add that logic, but that is very unlikely we will do that.

The API should behave as close to the modbus standard as possible

It's not clear what you mean by this, a protocol library like pymodbus is designed to provide a high level abstraction for the modbus standard.

That really depends on your viewpoint....pymodbus is designed to make modbus requests/response available to python applications !! A real high level interface would get rid of all the registers and return int/float etc.

I think the best way forward is to look at an actual implementation and discuss the weak spots, if you agree.

jameshilliard commented 5 months ago

Yes, it is very important, for security, that pymodbus do NOT change the sending sequence. Think of e.g. a read and a write, if we break sequence the read will return the result of the write.

Yeah, so a queue even for devices that support TID's I think makes sense since that will ensure the sending sequence is ordered properly.

That would mean the transaction layer would need to decode the responses, which are later decoded one level higher...that sounds a bit inefficient, but I am as I said earlier ready not to judge before I see an implementation.

Where exactly are busy responses decoded/handled currently? Are the busy responses associated with the TID of the request being rejected by the server?

Your example is convincing, however an implementation still needs to check for "rogue" application code, like the test, where we use async.gather to send request. The original app code (which I have from an example) was even worse, it did not bother test the return codes.

I'm not sure what you mean by "rogue" application code here, I don't see anything that looks wrong with that asyncio.gather call.

It will always be a bit exposed to the low level transpor, because in serial the response sequence is guaranteed with tid it is not guaranteed.

So it may make sense to have an optional argument that restricts the amount of in-flight requests to say one in-flight request even if the server supports more than that due to having tid support. This may be handy for applications that want stronger guarantees that they will get the same behavior across transports.

With that argument, pymodbus should try solve the solve problem of every bad implemented server, that is not going to fly !!!

I'm mostly just arguing that for supported servers/transports pymodbus should generally do its best to provide a high level API that is consistent and which doesn't cause unnecessary API behavior divergences.

There will always be servers where the application needs to do special handling. An example is Huawei's solar inverters, a really good stable implementation, but write operations are only permitted if the app started the connection by making a login using user defined function codes. Pymodbus could add that logic, but that is very unlikely we will do that.

Yeah, this example sounds like something that would make sense to implement in the application logic. Like the application could catch a write failure exception and trigger the login when that happens.

That really depends on your viewpoint....pymodbus is designed to make modbus requests/response available to python applications !! A real high level interface would get rid of all the registers and return int/float etc.

By high level interface I'm mostly just referring to the client interface which is arguably high level abstraction compared to say a low level interface where you just read/write raw bytes to a socket.

janiversen commented 5 months ago

Where exactly are busy responses decoded/handled currently? Are the busy responses associated with the TID of the request being rejected by the server?

In the corresponding response class on top of transaction.py, and no of course the TID is no longer present since it is not part of the modbus response, just the header.

I'm not sure what you mean by "rogue" application code here, I don't see anything that looks wrong with that asyncio.gather call.

Of course that code is fine, because I added a local function, the original called pymodbus directly in the gather...leading to X requests being sent and the gather waiting on the futures.

So it may make sense to have an optional argument that restricts the amount of in-flight requests to say one in-flight request even if the server supports more than that due to having tid support. This may be handy for applications that want stronger guarantees that they will get the same behavior across transports.

I do not think such a parameter, if being general when creating the client, makes sense. It could make sense on specific calls, but that will explode some code. Please keep a first implementation suggestion simple, once we agree on that then we can talk about bells and whistles.

I'm mostly just arguing that for supported servers/transports pymodbus should generally do its best to provide a high level API that is consistent and which doesn't cause unnecessary API behavior divergences.

A noble goal, but in practice it looks different....in many cases pymodbus is not able to (like the user function codes) or will/should not do it (like the 32bit register) because it is non-standard.

At the moment we actually do have a consistent API, because we do not support concurrent calls...I know now you will say but we do not prevent it either...that is very true, just like we do not prevent users from mixing sync and async calls.

jameshilliard commented 5 months ago

In the corresponding response class on top of transaction.py, and no of course the TID is no longer present since it is not part of the modbus response, just the header.

Is the busy response handling in the ModbusTransactionManager class?

the original called pymodbus directly in the gather...leading to X requests being sent and the gather waiting on the futures.

I'm not really sure why this would be something that would be expected to cause issues.

janiversen commented 5 months ago

Is the busy response handling in the ModbusTransactionManager class?

No, please read what I wrote earlier.

I'm not really sure why this would be something that would be expected to cause issues.

ok, but it does.

jameshilliard commented 5 months ago

No, please read what I wrote earlier.

You wrote "In the corresponding response class on top of transaction.py" I see ModbusTransactionManager is what is at the top of transaction.py, not sure where I should be looking...

janiversen commented 5 months ago

No, please read what I wrote earlier.

You wrote "In the corresponding response class on top of transaction.py" I see ModbusTransactionManager is what is at the top of transaction.py, not sure where I should be looking...

Architectural on top of transaction.py ! Not at the top of transaction.py. The request/response classes sit on top of the transaction level. I thought you had looked at mixin and execute then you should have seen it.

jameshilliard commented 5 months ago

An example is Huawei's solar inverters, a really good stable implementation, but write operations are only permitted if the app started the connection by making a login using user defined function codes. Pymodbus could add that logic, but that is very unlikely we will do that.

If write operations time out until a login is made one could potentially use timeout handler like in #2020 to trigger the login on timeout.

janiversen commented 5 months ago

If write operations time out until a login is made one could potentially use timeout handler like in #2020 to trigger the login on timeout.

No because the solar inverter responds so there are never timeouts (unless network problems).

jameshilliard commented 5 months ago

unless network problems

Ok, so it would be useful for handling those sort of issues then.

jameshilliard commented 4 months ago

closing in favor of #2030