[Modbus-binding] type conversion based on byte and word order

sebastiankb commented 1 year ago

Since the PR #161 is looking for a generic solution to express the endianness via content type, and it seems that currently there is no practical solution in sight, I would like to put the focus here back on Modbus itself.

In my view, the most practical way would be to introduce modbus-specific term(s) that refer to a proper type conversion from or to a byte stream. That mean, we need information about the data type (e.g., signed or unsigned), the byte and word order. Here are some proposals (please note the term names and values are just proposals):

Proposal I (One term fits all)

After studying some Modbus libraries from different programming languages it seems not quite unusual to combine all related type conversion information into one term or function (such as, eg, libmodbus or buffer from nodejs).

That means for a TD we can introduce a term modbus:type that takes values such as uint32BB, uint32LB, uint32BL, uint32LL, floatBB, floatLB, floatBL, etc.

The first 'B' or 'L' character in the type value stands for bytes in big endian (=B) or little endian (=L) order. The second 'B' or 'L' character in the type value stands for word in big endian (=B) or little endian (=L) order.

This results into a huge list of combinations and would cause a big switch statement in the programming language.

Proposal II (Decoupling type and order)

The modbus:type only carries the data type with the sign (uint8, int8, uint16,...) and we have another term modbus:endian that provides the byte and word ordering by B, L, BB, BL, etc. (please note some types like uint8 will have no word ordering or it does not make sense).

The switch statements will be less, however, will cause some nested switch statements. This proposal allows working with default assumptions such as B or BB when modbus:endian is not present.

Proposal III (fine-grained)

Besides modbus:type there are terms for modbus:byteOrder and modbus:wordOrder. Both will take B and L for big and little endianness.

There will be a need of more nested switch statements. But we can also work with default assumptions when modbus:byteOrder and/or modbus:wordOrder are not present.

What do you think of the proposals? Are there any more?

a-hennig commented 1 year ago

I am in favour a variant of #1. (to avoid confusion make the second letter a "S" for swap). Reason:

although it make the list longer for {uint,int,float}{32}{B,L} + {uint,int,float}{64}{B,L,BS,LS} = 15
in real modbus devices I see a lot of "almost common practice" types, like dates in 2,3 or 4-registers or BCD encoded integers. With modbus:type, we can add them into the same list as we see them in the wild.

relu91 commented 1 year ago

The problem that I'm seeing in introducing the new modbus term is how it plays along with the others. For example, in my understanding combining modbus:type with content types that are not application/octect-stream do not make sense. Therefore, we impose a new validation constraint on consumers. Moreover, we have to keep in mind that proposal 1 we are also describing the datatype (int, unint, float, etc.), this impose that modbus:type: "float" should not be used with type: "integer" (unless you want to support downcasting).

As always, I think we can have mid-term solutions, but the shortcomings above should be addressed in the specification. Plus I would also add a note saying that the term is experimental and we are looking for alternatives.

sebastiankb commented 1 year ago

For example, in my understanding combining modbus:type with content types that are not application/octect-stream do not make sense.

In my view, the byte and word order metadata only makes sense when octet-stream is used as content type. Maybe we can define this content type as default assumption for modbus binding?

Moreover, we have to keep in mind that proposal 1 we are also describing the datatype (int, unint, float, etc.), this impose that modbus:type: "float" should not be used with type: "integer" (unless you want to support downcasting).

Good point! The modbus type should not be in conflict with the interaction type. We have a similar situation with readOnly, writeOnly, and observable that may not be aligned with forms op values. In my view, this can be handled by a simple consensus check.

sebastiankb commented 1 year ago

I just found this nice overview here which we may simply follow as value options for modbus:type:

egekorkan commented 1 year ago

Here is what this looks like in the Siemens sayWoT implementation:

modbus:type (string): integer/int, uint, boolean, number/float, string
modbus:wordOrder (string): HighWordFirst, LowWordFirst
modbus:endian (string): "bigendian", "littleendian"

sebastiankb commented 1 year ago

Here is a regular expression of the table above:

^((u?int(8|16|32|64)|float|double)(be|le|sw|sb)|string(le)?)$
Where is
u=unsigned (absence means signed)
be=Big Endian
le=Little Endian
sw=Big Endian Word Swap
sb=Big Endian Byte Swap

egekorkan commented 1 year ago

Call of 22.11:

We need to decide whether to choose 3-word combination or a long list like in the comment from @sebastiankb
@relu91: no preference but going from 3 words to a specific string direction is simpler since it is string concatenation.
@danielpeintner: with the 3-word approach, we need to specify which combinations are not allowed.
@egekorkan : 3 words approach is more human-readable.
@ashimura : We should look at how other WoT implementations handle Modbus data as well. maybe there are other variables for some use case scenarios.

egekorkan commented 1 year ago

Known mechanisms (to be extended):

current node-wot: contentType driven approach
modbus go library: 3-word approach
io.broker: 1 word approach
rmodbus: No approach (I only see address and count but I also see parse_u16)
tokio-modbus: Only BigEndian seems to be used. See https://github.com/slowtec/tokio-modbus/blob/e6e9802f3ef5bbec2d83656be4e55be2235ee8b1/src/codec/tcp.rs#L6.
pyModbus: Endiannes with a utility function. See https://github.com/sourceperl/pyModbusTCP/blob/38ec9e6e2282a51799bdc4e08644fdfb1abc55be/pyModbusTCP/utils.py#L146 . Value length via usage of words like "long_long"

lu-zero commented 1 year ago

Other two libraries

egekorkan commented 1 year ago

@lu-zero I have gone into the source code of the two libraries above to see how they do it. It seems that none of those nor a Python Modbus library do the word swapping. I mostly see endiannes and that is done with a specific function argument. This means that it is the user of the library who needs to handle the byte manipulation. After this small research, my opinion is to go with the 3-word approach for the following reasons:

Most libraries seem to use different words for each parameter/setting.
If a library does not have one of the options, e.g. word swapping, the end programmer of the WoT Consumer application will need to handle the value. Thus, a human-readable approach is more favorable.
If a library does not have one of the options, e.g. word swapping, the Modbus driver can simply ignore a keyword. In the one-word approach, it would be more annoying to remove what is not usable.
Similar to @relu91 opinion from the meeting, I think it is more complicated to split a string to find the parameters needed for the modbus library used by a driver rather than assigning each of the 3 keywords to a parameter.
Adding on top of @ashimura opinion, having 3 separate words is more flexible for the future. In case there is a 4th parameter used in some obscure devices, we can simply add that. If we go with 1-word approach, we would have to change all the types to allocate another parameter in them. E.g. we find that some devices require an empty byte at the end with a parameter referred to as "emptyByte" or "eb". Going with a 1-word approach means adding to int16le another version like int16leeb. This might be a breaking change for some implementations as they may not understand that. If there was a specific word for emptyByte, they would ignore it anyway and let the WoT application developer handle it until the library is updated.
We can split the defaults better.
- If the endianness term is not present, it is big-endian.
- If word swapping term is not present, there is no word swapping
- (possibly messy?) type can be inferred from the data schema level
We can split common parameters. In TD 2.0, we want to have common terms in one container, like base URI, and common content type. Endiannes and word swapping can go there whereas the type can go in each form. If there are devices that use a different endianness for just one register, they can override it in a specific form.

As @danielpeintner said, we need to specify which combinations are not allowed. Thinking again, I am not sure why a double type with a word swap is not mentioned above nor why a longer string has no word swap. In another direction, we can just allow all combinations since maybe that combination is really what they have in their device. We also do not prohibit the usage of the maximum keyword together with type:object at the data schema level.

Reasons to go with 1 word:

Less term per form (in case of no default)
As in the first comment, lower-level libraries seem to use such functions

lu-zero commented 1 year ago

I like the 3 word approach and probably I'll open an issue on rmodbus about it later to see if they can support it in their API.

a-hennig commented 12 months ago

update: while I still think that the application/logical datatype needs to be a list (int16,flout32, date3, string32, etc), I would separate this list/type from the encoding/swapping. Also, because on every property, the logical type MUST be noted to make sense, the encoding/swapping tends to be same for an entire device/thing, and would hopefully some time be defined once on Thing level. (I know several protocol stack implementations, where you even have to choose this once per driver).

ie. I'm fine with 3-word (or 1+1)

egekorkan commented 12 months ago

So a list of keywords as an initial proposal:

modv:endian (or endianness) : bigendian,littleendian (or capitalization) (_ seems what is used in C)
modv:swap: word, byte, wordbyte (latter is where we have both swaps)
modv:type: int16, uint16, int32, uint32, int64, uint64 double, float, string, bitmap. (other option is to have integer, unsigned-integer or similar)
Regarding types, some manufacturers define their own (see this) but we should do this at this stage. My idea would be that once we solve the data schema mapping issue, we can address this specifically later on.
Edit: We do not need wordbyte nor do we need byte since those are covered by endian

sebastiankb commented 12 months ago

Another alternative to the orders can be binary flags:

modv:wordOrderChange: true/false (true="most significant word" is sent first; false="least significant word" is sent first)
modv:bitOrderChange: true/false (true="most significant bit" is sent first; false="least significant bit" is sent first)

egekorkan commented 12 months ago

I quite like the proposal from @sebastiankb since in the meantime I have learned that endianness and byte swapping are the same concept. Since different communities can use one term or the other, it is better to use a more neutral/verbose word that is more explicit. I would maybe just say modv:byteOrderChange since we are ordering the bytes (thus the bits too) in the first place.

I also had a talk with @relu91 and he mentioned mixed or middle endian concept. It seems that it is the same concept as word swap but for modbus users, word swap is a more used concept.

Some resources:

Wikipedia: Byte-swapping consists of rearranging bytes to change endianness.
A library with mid endian: https://apollo3zehn.github.io/FluentModbus/#input-and-holding-registers . Here it is visible that it is mid and little endian so it is indeed a word swap on top of little endian. In other words, mid endian alone may not exist. Here is another example: https://discuss.python.org/t/add-middle-endian-format-to-struct-module/21978

So in the end, I would separate the two concepts (as already proposed) and we do not need a value for middle endian.

relu91 commented 12 months ago

That's definitely good findings! plus one for @sebastiankb 's proposal.

sebastiankb commented 12 months ago

@egekorkan

I would maybe just say modv:byteOrderChange since we are ordering the bytes (thus the bits too) in the first place.

Would be also ok. Btw: This was also suggested in Proposal III above.

egekorkan commented 12 months ago

Call of 29.11:

We decided:

mostSignificantWord: boolean with default true
mostSignificantByte: boolean with default true

Additionally:

We need to make sure to put text that explains the meaning of word/byte swapping. Also, examples like in comment above should be added.
We should note that word swapping is not in the modbus spec but happens in off-the-shelf devices. So here is how you describe that in a TD.
We should explain the relationship between data length and words.
For TD 2.0, we may move such stuff to contentType since it is not part of the Modbus specification. We will collect more input/use cases on this.

w3c / wot-binding-templates

[Modbus-binding] type conversion based on byte and word order #293