subsys:modbus: Atomic read / write for multiple registers access FCs: 0x03, 0x04, 0x010

Introduction

Reading and writing multiple IRs/HRs is currently done by iterating over the register to be read/written. The respective user callback only pass the address and the value resp. value pointer and is called multiple times. In order to be able to access e.g. a date / time, a 32 or 64 bit values or even struct like data atomically, iteratively accessing multiple 16 bit values is complicated/elaborated.

Problem description

Accessing multiple 16 bit registers to read 32, 64 bit or even struct-, record-like data in an atomic manner requires unnecessary puzzling together of 16 bit fragments due the iterative multiple calls of the user/application callbacks.

First call: prepare buffer (malloc or static?) and read full data from firmware or write first16 bits from bus to buffer
Second to n-th call: consecutively pass 16 bit fragments from buffer to bus or fill 16 bit from bus to the buffer
Last call: write buffer to firmware
Each address would have to have an expected number of callback calls to know which is the last
What to do in the callback if called to few?

If the the callback would be called once passing the address, a value pointer and a number of regs to be read or written, the callback can check for the number of expected regs, read fetch the data from the firmware (read) or write the data to the firmware (write) and return.

Proposed change

1. Solution

Replace the existing

    .holding_reg_rd = holding_reg_read,
    .holding_reg_wr = holding_reg_write,
    .input_reg_rd = input_reg_read,

in struct modbus_user_callbacks with an new signature

static int holding_reg_rd(uint16_t address, uint16_t *value, uint16_t num_regs);
static int holding_reg_wr(uint16_t address, uint16_t *value, uint16_t num_regs);
static int input_reg_rd(uint16_t address, uint16_t *value, uint16_t num_regs);

Concern: This change will create an backward incompatible interface

2. Solution

Add new callbacks

    .holding_regs_rd = holding_regs_read,
    .holding_regs_wr = holding_regs_write,
    .input_regs_rd = input_regs_read,

in struct modbus_user_callbacks with an new signature

static int holding_regs_rd(uint16_t address, uint16_t *value, uint16_t num_regs);
static int holding_regs_wr(uint16_t address, uint16_t *value, uint16_t num_regs);
static int input_regs_rd(uint16_t address, uint16_t *value, uint16_t num_regs);

Use KConfig to enable either or, the existing or the new interface or
leave both in struct modbus_user_callbacks and check for NULL pointers in the code.

Concerns: 1. One can have only either or 2. More flexible, but more code

3. Solution

Move

send_reply = mbs_try_user_fc(ctx, fc);

before the switch case checking the standard function codes. This allows to 'overload' the standard implementation by a user/custom implementation.

mbs_try_user_fc(ctx, fc) then needs to return MODBUS_EXC_ILLEGAL_FC if the FC was not implemented as user/custom solution. Only then the switch with the standard implementation will be executed. In case the FC was found by mbs_try_user_fc(ctx, fc) (FC was implemented by the user), it returns MODBUS_EXC_NONE plus a MODBUS_SEND_REPLY_FLAG At least the return value of mbs_try_user_fc(ctx, fc) will become rather an int then a bool.

Concern: Multiple IR/HR read / write with single callback call will always have to be impl. as user/custom solution and is not provided by Zephyr

4. Solution

solution could be combined with 1. or 2.

Concern: Mitigates concern of 3. solution but also opening up for a 3. implementation besides 1. and 2.

Preferred solution

Our preferred solution is 2.1 (add new interface switchable with KConfig) or 3. (Overwrite standard impl. by user). If we impl. 2.1 we will very like no longer impl. 3. too, since we do not need both.

Detailed RFC

Pls see chapter 'Proposed change'

Proposed change (Detailed)

Pls see chapter 'Proposed change'

Dependencies

solution will create interface incompatibility.

Concerns and Unresolved Questions

Pls see concerns in proposed solutions

Alternatives

Pls see solution proposals.

In order to be able to access e.g. a date / time, a 32 or 64 bit values or even struct like data atomically, iteratively accessing multiple 16 bit values is complicated/elaborated.

Well, you have to deal with a "partial" or "unaligned" access anyway, so I really do not see much benefit in any of the proposed changes.

In order to be able to access e.g. a date / time, a 32 or 64 bit values or even struct like data atomically, iteratively accessing multiple 16 bit values is complicated/elaborated.

Well, you have to deal with a "partial" or "unaligned" access anyway, so I really do not see much benefit in any of the proposed changes.

How do you mean it - "either partial or unaligned" or "partial and unaligned".

I feel the benefits are quite obvious:

1 callback call instead of N
no counting/managing of calls inside the callback (first, in between, last)
no intermediate storing of the data block (what ever it may be 32, 64 bit values or strings or a struct) in side the callback over several calls (= a static or malloc-ed/free-ed buffer in each callback or a common for the IR / HR etc, vs quickly on the stack)
just the starting reg number, a pointer to the buffer, the number of regs expected to be read/written allowing to use memcpy or similar and doing the right alignment in the callback (which needs to be done anyway)
less elaborated callbacks -> smaller code / footprint (which to some is essential)
with the 3. proposal (overload) give the user the freedom to implement its own solution for every FC, if the Zephyr implementation does not suit the requirements. A standardized mechanism, where the project does not have to care about the various different implementations for the same FC.

I have dealt with with both approaches using a commercial Modbus stack (from a well known debugger vendor) which also basically allowed the proposed 3. solution (doing a direct custom/user solution for every FC). From experience I can say, it is by far easier than fiddling with multiple callback calls.

I feel at least the freedom to do a custom implementation for all FCs should be given (3. solution proposal). Further more it would be very beneficial that each unused FC default implementation and the user FC mechanism each can be disabled (KConfig) just returning an error msg to the client. This would save code size on the server side.

How do you mean it - "either partial or unaligned" or "partial and unaligned".

Logical OR.

I feel the benefits are quite obvious:
* 1 callback call instead of N

I do not see the cost of calling a callback to be that high, especially if it is optimized away.

* no counting/managing of calls inside the callback (first, in between, last)

No idea what you mean.

* no intermediate storing of the data block (what ever it may be 32, 64 bit values or strings or a struct) in side the callback **over several calls** (= a static or malloc-ed/free-ed buffer in each callback or a common for the IR / HR etc, vs quickly on the stack)

It sounds like you should rethink your approach to validating and overwriting the data. Perhaps it would help if the Modbus server implementation had lock/unlock callbacks that are called at the beginning/end of the command processing?

* just the starting reg number, a pointer to the buffer, the number of regs expected to be read/written allowing to use memcpy or similar and doing the right alignment in the callback (which needs to be done anyway)

* less elaborated callbacks -> smaller code / footprint (which to some is essential)

Well, it is intentional that the callback arguments are kept to a minimum to minimize possible application errors, so the user would not even get the idea of doing memcpy with buffer of Modbus subsystem. This will not change.

* with the 3. proposal (overload) give the user the freedom to implement its own solution for every FC, if the Zephyr implementation does not suit the requirements. A standardized mechanism, where the project does not have to care about the various different implementations for the same FC.

No, mandatory FCs should comply with the spec and should not be overloadable.

I have dealt with with both approaches using a commercial Modbus stack (from a well known debugger vendor) which also basically allowed the proposed 3. solution (doing a direct custom/user solution for every FC). From experience I can say, it is by far easier than fiddling with multiple callback calls.

That is okay, but it does not convince me.

I feel at least the freedom to do a custom implementation for all FCs should be given (3. solution proposal). Further more it would be very beneficial that each unused FC default implementation and the user FC mechanism each can be disabled (KConfig) just returning an error msg to the client. This would save code size on the server side.

No, what you think you can save on the code is not worth the additional Kconfig options to maintain.

How do you mean it - "either partial or unaligned" or "partial and unaligned".

Logical OR.
I feel the benefits are quite obvious:
* 1 callback call instead of N
I do not see the cost of calling a callback to be that high, especially if it is optimized away.

It is not about the exec time of calling a callback and that it has to be called several time. It is about slicing what ever data block I want to transmit (e.g. a time/date format which does not fit in 16 bits) into 16 bit chunks and iterate over those N times instead of just getting the pointer to the buffer an copy the time/date format aligned as required.
* no counting/managing of calls inside the callback (first, in between, last)
No idea what you mean.

Example: again writing time / date to the device:

Existing iterative approach (within the callback):
- first call: triggered by the register number, prepare a buffer of at least the size of time/date (either malloc or a static buffer and reset an index to the beginning of the buffer, write the first 16 bit chunk into, increment the index
- in between: for each call to the callback check that address within range, write the next 16 bit chunk and increment index, check for buffer overflow, if so return error code
- last call: check that time/date was puzzled together completely, if, so: set the new time/date of your device in atomic manner how ever that is done (internal API) in the FW application else: wait for the last call to arrive, possibly forever or until the triggering register number is passed again to start over
Block approach (within the callback):
- check register number and number of regs (size)
- set time / date of your device by reading from rx buffer pointer directly or if really required copy payload from rx buffer to stack based function local variable and use that to set the time / date in the device (internal API).

Is simply does not make sense, that the stack slices the data block into 16 bit chunk just for the callback to puzzle it together again.

* no intermediate storing of the data block (what ever it may be 32, 64 bit values or strings or a struct) in side the callback **over several calls** (= a static or malloc-ed/free-ed buffer in each callback or a common for the IR / HR etc, vs quickly on the stack)
It sounds like you should rethink your approach to validating and overwriting the data. Perhaps it would help if the Modbus server implementation had lock/unlock callbacks that are called at the beginning/end of the command processing?

I do not think it is a com stacks task to lock or unlock something in the FW. And it will not help if the last call to the callback has to call an FW internal API setting the date but the com stack keeps that locked until returning from the last call to the callback. Introducing lock and unlock just because the callback is call iteratively - makes it even more elaborated.

* just the starting reg number, a pointer to the buffer, the number of regs expected to be read/written allowing to use memcpy or similar and doing the right alignment in the callback (which needs to be done anyway)

* less elaborated callbacks -> smaller code / footprint (which to some is essential)
Well, it is intentional that the callback arguments are kept to a minimum to minimize possible application errors, so the user would not even get the idea of doing memcpy with buffer of Modbus subsystem. This will not change.

I cannot see why the user should not be allowed to copy the payload. Just because of bad callback implementations would copy too much or overwrite something in the tx buffer? Then one has to pass everything by value and never by ref / pointer.

* with the 3. proposal (overload) give the user the freedom to implement its own solution for every FC, if the Zephyr implementation does not suit the requirements. A standardized mechanism, where the project does not have to care about the various different implementations for the same FC.

No, mandatory FCs should comply with the spec and should not be overloadable.

Without have read the spec in any detail, I doubt that the spec specifies the impl. to loop over the number of registers. This is impl. detail. A user specific implementation is not necessarily in-compliant with the spec. And if in doubt or satified with it, just don't do a user specific impl. of the standard FC and get the Zephyr impl. Nothing the project needs to care about / maintain but also should not block, IMO.

I have dealt with with both approaches using a commercial Modbus stack (from a well known debugger vendor) which also basically allowed the proposed 3. solution (doing a direct custom/user solution for every FC). From experience I can say, it is by far easier than fiddling with multiple callback calls.

That is okay, but it does not convince me.

I feel at least the freedom to do a custom implementation for all FCs should be given (3. solution proposal). Further more it would be very beneficial that each unused FC default implementation and the user FC mechanism each can be disabled (KConfig) just returning an error msg to the client. This would save code size on the server side.

No, what you think you can save on the code is not worth the additional Kconfig options to maintain.

If you only have 64kB Flash, 100 Byte here and there can make the difference.

I am facing this issue right now.

IMO FC16 is the only critical issue right now, block writes should be able to be handled in application code as blocks. Multi-reg read requests are easier to handle.

With the existing implementation, receiving any FC16 means that I need some intermediate data storage between the modbus and the final location, and to have to use indeterminate timeouts/assumptions in order to trigger write done.

I support approach 2 which would be a simple change in implementation

    if(ctx->mbs_user_cb->holding_regs_wr)
    {
        ctx->mbs_user_cb->holding_regs_wr(reg_addr, prx_data, reg_qty);
    }
    else {
    for (uint16_t reg_cntr = 0; reg_cntr < reg_qty; reg_cntr++) {
        uint16_t addr = reg_addr + reg_cntr;

        ... existing loop approach
    }
   }

I am facing this issue right now.

IMO FC16 is the only critical issue right now, block writes should be able to be handled in application code as blocks. Multi-reg read requests are easier to handle.

With the existing implementation, receiving any FC16 means that I need some intermediate data storage between the modbus and the final location, and to have to use indeterminate timeouts/assumptions in order to trigger write done.

I support approach 2 which would be a simple change in implementation
    if(ctx->mbs_user_cb->holding_regs_wr)
    {
        ctx->mbs_user_cb->holding_regs_wr(reg_addr, prx_data, reg_qty);
    }
    else {
  for (uint16_t reg_cntr = 0; reg_cntr < reg_qty; reg_cntr++) {
      uint16_t addr = reg_addr + reg_cntr;

      ... existing loop approach
  }
   }

Why should reading be simpler? Also with reading one needs to fetch the data block at once for consistency of the data in the block, store it somewhere in an intermediate storage and then iterate over it as 16bit registers to be transferred and BE endianessed to the tx buffer.

There even is a, IMO, non-standard float implementation starting at register number 5000 and above, if enabled. That implementation changes endianess over all 4 float bytes to BE. The 16 bit register impl. will change the float byte wise in the two words. On an little endian machine you get "float little-endian byte swap according to

https://www.modbustools.com/poll_display_formats.html

In the Modbus spec. nothing else but bit and word (16 bit) data types are defined.

I still think it needs all three and the word byte swap as option for the implementation (presumably the appl.) that fills the tx buffer to have the freedom to decide which endianess shall be used on the bus for the payload (LE, BE, LE-byte swap, BE-byte swap) of multi word regs read and write are required.

@jfischer-no :

Any good reason why the standard FCs do not use the same registration mechanism as the user fcs and provide the FC implementations in a collection of FCxx files? That way a stack user can register the FCs she/he wants to support, choose from existing impl. e.g. as the ones that are already there in the switch/case or any new custom FC impl. handling multiple read / write without iteration. No need for elaborated #if / #else / #endif in the code, just in the cmake files. What is finally registered becomes user responsibility.
Any good reason the custom/user FCs do not get the ctx passed instead of tx buffer struct, rx buffer struct and the interface id as single params (where the id can be used to retrieve the ctx again)? I feel it would make sense to pass the ctx. Make things simpler and and more uniform.
Some other optimizations I would suggest
- mbs_exception_rsp() is currently called some 43 time in the modbus_server.c, that could be centralized by returning the EXC code + a SEND_REPLY bit from all those functions calling mbs_exception_rsp() instead of a bool "send_reply".
- modbus_raw.c impl. nearly the same function; different function signature, not updating the exc counter. Updating exc counter not allowed in raw? Otherwise move to core and use in server and raw?

Why should reading be simpler? Also with reading one needs to fetch the data block at once for consistency of the data in the block, store it somewhere in an intermediate storage and then iterate over it as 16bit registers to be transferred and BE endianessed to the tx buffer.

I meant more that at least reading is more doable with the existing implementation. Its easier to async read a slice of data then async write a slice, but agree that block writes for both should be supported

There even is a, IMO, non-standard float implementation starting at register number 5000 and above, if enabled. That implementation changes endianess over all 4 float bytes to BE. The 16 bit register impl. will change the float byte wise in the two words. On an little endian machine you get "float little-endian byte swap according to

https://www.modbustools.com/poll_display_formats.html

In the Modbus spec. nothing else but bit and word (16 bit) data types are defined.

I still think it needs all three and the word byte swap as option for the implementation (presumably the appl.) that fills the tx buffer to have the freedom to decide which endianess shall be used on the bus for the payload (LE, BE, LE-byte swap, BE-byte swap) of multi word regs read and write are required.

Agree 100%. I'm in favour of the additional user callbacks, as opposed to FC overloads. With standard FC, I just want to deal with the incoming/outgoing data, not worry about the response values etc.

zephyrproject-rtos / zephyr