grid-x / modbus

BSD 3-Clause "New" or "Revised" License
78 stars 26 forks source link

RTU over TCP error: response data size '5' does not match count '4' #52

Closed branimirborisov closed 4 months ago

branimirborisov commented 2 years ago

I'm trying to read a RTU device routed over TCP using both the library (via InfluxDB Telegraf) and the modbus-cli tool. In both cases trying to read the device results in the following error:

modbus: response data size '5' does not match count '4'

Cli command is:

./modbus-cli.amd64 -address tcp://{HOST}:{PORT} -register 40807 -quantity 2 -slaveID 1

When switching to RTUoverTCP I get the following timeout error:

read tcp 172.24.0.3:52840->{HOST}:{PORT}: i/o timeout

Device is SIEMENS SICAM P with Modbus RTU translated over TCP.

Note: When using another tool, for example modpoll register value is parsed correctly, e.g.:

modpoll -p ${MODBUS_PORT} -a 1 -f -t 4:float -r 40807 ${MODBUS_HOST} 

results in:

[40807]: 213209248.000000
...

Is there some additional parameter I'm missing that needs to be specified to parse correctly the value or is it some issue with the library implementation and the device compatibility?

Thanks!

branimirborisov commented 2 years ago

Hi,

Which query actually do you refer to? What do you mean by using a different port?

Thanks!

На нд, 22.05.2022 г. в 11:57 ч. andig @.***> написа:

You RTU query uses a different port. Why?

— Reply to this email directly, view it on GitHub https://github.com/grid-x/modbus/issues/52#issuecomment-1133850011, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUSLFM5KVPVSQXX6ASJJOTVLHZF5ANCNFSM5WS7C5QA . You are receiving this because you authored the thread.Message ID: @.***>

srebhan commented 2 years ago

The origin of the error is here: https://github.com/grid-x/modbus/blob/c4a3d042e99b1c444701d185c14d78afe1fad8b5/client.go#L125

It seems like the device is sending more data then expected. You might want to soften the constrains in L124 to

    if count <= length {

and do

    if count <= length {
        err = fmt.Errorf("modbus: response data size '%v' does not match count '%v'", length, count)
        return
    }
    results = response.Data[1:count+1]
andig commented 2 years ago

That sounds like bad advise to me- you'd never know which parts of the response are the correct ones.

From the description it is not entirely clear if the device needs be addressed using Modbus RTU or TCP. Let's assume RTU. The first error is clear then- addressed as TCP the device might respond with anything, having misunderstood the request. As it doesn't match the expected TCP structure that will error.

It's unclear though, why the second request returns i/o timeout which is before Modbus (any kind of it) is even spoken.

branimirborisov commented 2 years ago

@andig The device should be address via TCP. Its actually RTU device using a TCP converter. So at the end we should address it as TCP device.

Also as I have described when using another tool - modpoll and addressing the device as TCP I'm able to read the data. So I would also suggest to look into the TCP implementation and see if we can also cover this case.

What if we follow the suggested approach from @srebhan and skip the check or if its actually needed for some cases to introduce a option/flag that will allow us to toggle it on/off? This way it can be enabled by default, but I would be able to turn it off and use it with the particular device?

andig commented 2 years ago

Just to be sure because this can be tricky. Are you really sure that you have an RTU to TCP protocol converter? Not just an RS485 to Ethernet adapter (which would expose RTU over TCP)?

It still doesn‘t figure that in the RTU case you‘ve tried the port couldn‘t be connected to either.

In any case, if application wants x bytes, y bytes response should be an error (opened https://github.com/grid-x/modbus/pull/53 based on such an experience).

frzifus commented 2 years ago

Regarding the precise response length, I agree with @andig. But before we start tweaking the software, lets collect some more details.

@branimirborisov could you compile and test a modbus-cli based on the latest master commit (https://github.com/grid-x/modbus/commit/0daecbb3900f79b94cee3af02c1cd2f0ab86f008)? Just to make sure that this problem still exists.

if its still an issue, could you run the modbus-cli command again and append -log-frame to collect more details about the transmitted data?

Usage of ./modbus-cli:
  -log-frame
        prints received and send modbus frame to stdout

Next we should try the same with modpoll to compare sent and received data. Since I have never used modpoll, I do not know how to record them. But maybe -v is worth a try.

usage : mbpoll [ options ] device|host [ writevalues... ] [ options ]

ModBus Master Simulator. It allows to read and write in ModBus slave registers
                         connected by serial (RTU only) or TCP.
  -v            Verbose mode.  Causes mbpoll to print debugging messages about
                its progress.  This is helpful in debugging connection...

Update: If modpoll -v does not provide any details, we could record[0] them via tcpdump.


0: https://www.plcnext-community.net/makersblog/network-record-with-tcpdump/

branimirborisov commented 2 years ago

@frzifus i'm trying to build the modbus-cli form the latest version in master branch, but I'm getting following error:

cmd/modbus-cli/main.go:285:11: cannot use "github.com/grid-x/serial".RS485Config literal (type "github.com/grid-x/serial".RS485Config) as type "github.com/grid-x/modbus/vendor/github.com/grid-x/serial".RS485Config in assignment

Maybe I'm doing something wrong? Can you provide some instructions on how to build the cli from master? I'm on Ubuntu 18.04 with go 1.10.4?

andig commented 2 years ago

Your go is ancient. Upgrade to a version from this century ;)

branimirborisov commented 2 years ago

@andig @frzifus Running the modbus-cli from master with the -log-frame option results in:

./main -address tcp://{HOST}:{PORT} -register 40807 -quantity 2 -slaveID 1 -log-frame
modbus: send 00 01 00 00 00 06 01 03 9f 67 00 02
modbus: recv 00 01 00 00 00 08 01 03 04 4d 6b ca 16 4b
modbus: response data size '5' does not match count '4'

Any suggestions? Thank you!

andig commented 2 years ago

So what's the mbpoll output in comparison?

branimirborisov commented 2 years ago

So what's the mbpoll output in comparison?

-- Polling slave... (Ctrl-C to stop)
[40807]: 247596832.000000
-- Polling slave... (Ctrl-C to stop)
[40807]: 247597152.000000
-- Polling slave... (Ctrl-C to stop)
[40807]: 247597152.000000
-- Polling slave... (Ctrl-C to stop)
[40807]: 247597152.000000
-- Polling slave... (Ctrl-C to stop)
[40807]: 247597472.000000
-- Polling slave... (Ctrl-C to stop)
srebhan commented 2 years ago

Which corresponds to the first four bytes in the response as IEEE float.

branimirborisov commented 2 years ago

@andig @frzifus Any feedback regarding the issue, do you plan on adopting some changes in the library at this point or not? I need some info to decide weather we proceed with the Telegraf plugin on our side or try to find another solution to read the device?

frzifus commented 2 years ago

Would be nice to get a more detailed view of your mbpoll request first. have you tried to capture the TCP packets which include modbus send and recv payload?

branimirborisov commented 2 years ago

@frzifus Unfortunately I'm not able to find a way to capture the packets with the tool. Currently we are testing another python based library to read the data. I'll will try to post the results here once we have verified its working.

frzifus commented 2 years ago

Could you drop your capture file here?

branimirborisov commented 2 years ago

@frzifus This is a capture from modpoll with tcpdump:

17:32:13.121100 IP 127.0.0.1.53212 > {MODBUS_HOST}.3351: Flags [P.], seq 320494278:320494290, ack 184513, win 63456, length 12
    0x0000:  4500 0034 d24e 4000 4006 7bc3 c0a8 6408  E..4.N@.@.{...d.
    0x0010:  596a 6e97 cfdc 0d17 131a 5ac6 0002 d0c1  Yjn.......Z.....
    0x0020:  5018 f7e0 0eeb 0000 0039 0000 0006 0103  P........9......
    0x0030:  9f66 0002                                .f..
17:32:13.157926 IP {MODBUS_HOST}.3351 > 127.0.0.1.53212: Flags [P.], seq 184513:184527, ack 320494290, win 2236, length 14
    0x0000:  4500 0036 1f75 0000 f806 b69a 596a 6e97  E..6.u......Yjn.
    0x0010:  c0a8 6408 0d17 cfdc 0002 d0c1 131a 5ad2  ..d...........Z.
    0x0020:  5018 08bc ff35 0000 0039 0000 0008 0103  P....5...9......
    0x0030:  044d 8986 105f                           .M..._
17:32:13.157966 IP 127.0.0.1.53212 > {MODBUS_HOST}.3351: Flags [.], ack 184527, win 63442, length 0
    0x0000:  4500 0028 d24f 4000 4006 7bce c0a8 6408  E..(.O@.@.{...d.
    0x0010:  596a 6e97 cfdc 0d17 131a 5ad2 0002 d0cf  Yjn.......Z.....
    0x0020:  5010 f7d2 af9d 0000                      P.......
17:32:14.121299 IP 127.0.0.1.53212 > {MODBUS_HOST}.3351: Flags [P.], seq 320494290:320494302, ack 184527, win 63442, length 12
    0x0000:  4500 0034 d250 4000 4006 7bc1 c0a8 6408  E..4.P@.@.{...d.
    0x0010:  596a 6e97 cfdc 0d17 131a 5ad2 0002 d0cf  Yjn.......Z.....
    0x0020:  5018 f7d2 0ede 0000 003a 0000 0006 0103  P........:......
    0x0030:  9f66 0002                                .f..
17:32:14.161862 IP {MODBUS_HOST}.3351 > 127.0.0.1.53212: Flags [P.], seq 184527:184541, ack 320494302, win 2224, length 14
    0x0000:  4500 0036 1f76 0000 f806 b699 596a 6e97  E..6.v......Yjn.
    0x0010:  c0a8 6408 0d17 cfdc 0002 d0cf 131a 5ade  ..d...........Z.
    0x0020:  5018 08b0 fc66 0000 003a 0000 0008 0103  P....f...:......
    0x0030:  044d 8986 131f                           .M....
17:32:14.161902 IP 127.0.0.1.53212 > {MODBUS_HOST}.3351: Flags [.], ack 184541, win 63428, length 0
    0x0000:  4500 0028 d251 4000 4006 7bcc c0a8 6408  E..(.Q@.@.{...d.
    0x0010:  596a 6e97 cfdc 0d17 131a 5ade 0002 d0dd  Yjn.......Z.....
frzifus commented 2 years ago

perfect, I think that's the response => ...0008 0103 044d 8986 105f... I'll try to have a more detailed look by the end of the week.

frzifus commented 2 years ago

@branimirborisov ~looking at the raw bytes. Both requests and responses look fine. The only thing that is noticeable, you requested a different register. maybe its caused by the modbus-cli wrapper?~

Maybe the other device is wrong? We expect 4 bytes in both responses. But we received in both cases 5. See response -> unknown.

Any idea @Carelo @andig ?


Raw

Response:

our-lib => 03 04 4d 6b ca 16 4b
modpoll => 03 04 4d 89 86 10 5f

Request

FunctionCode

got 0x03, thats fine details.

The Data Address of the first register requested.

This seems to be different (our-lib) 9f67 (modpoll) 9f66.

The total number of registers requested.

In our case both is 2. => 0002

Response

The number of data bytes to follow (2 registers x 2 bytes each = 4 bytes)

In our-lib and modpoll we receive 0x04. Thats correct.

payload

Seems to be fine too. In both cases, its the same length as expected. our-lib => 4d6b ca16 modpoll => 4d89 8610

Unknown

our-lib => 4b modpoll => 5f

srebhan commented 2 years ago

@frzifus right, the last "unknown" byte is the key. While modpoll seems to just ignore this byte, this-lib errors-out on the extra byte.

branimirborisov commented 2 years ago

Thanks for looking into the details @frzifus . But if the response is OK why would the library throw an exception?

frzifus commented 2 years ago

But if the response is OK why would the library throw an exception?

No, the response is invalid. The given byte length doesnt match.

branimirborisov commented 2 years ago

How about providing an option (not by default, but by additional configuration for example) to ignore it instead of throwing the exception just to cover the case?

andig commented 2 years ago

One could return the data plus error. Making the lib accept invalid stuff without error for sake of a broken device doesn‘t sound desirable.

branimirborisov commented 2 years ago

@andig @frzifus The device is behind an RTU to TCP converter - the problem might be due to the converter not to the device itself. If I would like to read/parse the first 8 bytes, no matter of the response size , which might be a side effect of the converter, why wouldn't this be allowed if at the end the data is valid?

Finally its your call how to tackle this issue, just let me know, so we can focus our efforts elsewhere.

Thank you both for the support so far!

srebhan commented 2 years ago

@andig your proposal would work from Telegraf side if we can identify that too many bytes were received (e.g. by having an error class to check against).

guelfey commented 2 years ago

Thanks for investigating @frzifus @branimirborisov. So what I can tell from the trace, actually the "outer" frame length seems to be correct:

0039 // transaction id
0000 // "protocol identifier" (constant 0)
0008 // remaining bytes in the frame
0103 044d 8986 105f // indeed 8 bytes remaining

But the inner length which is sent again as part of the function response is too short:

01 // unit ID
03 // function code
04 // remaining bytes in response
4d 8986 105f // 5 bytes!

So I'd argue that if the outer length matches and there's no underflow, we should just discard the rest and only return the 4 bytes. After all, the response header tells us that only 4 bytes are relevant. Maybe the peer has some implementation reason why it does this (e.g. all buffers must be even-sized, or it needs to reuse fixed size buffers. I'll try to find out what the spec actually says about this.

frzifus commented 2 years ago

Just to mentioned it, @Carelo was part of the late night research too. :dancers: :dancers: :dancers:

srebhan commented 1 year ago

Is there any progress on this issue?

andig commented 1 year ago

So I'd argue that if the outer length matches and there's no underflow, we should just discard the rest and only return the 4 bytes. After all, the response header tells us that only 4 bytes are relevant. Maybe the peer has some implementation reason why it does this (e.g. all buffers must be even-sized, or it needs to reuse fixed size buffers. I'll try to find out what the spec actually says about this.

Imho the response is just invalid and the error expected. Fine as-is.

srebhan commented 1 year ago

So the user is left with a inaccessible device? I thought the conclusion by @guelfey

So I'd argue that if the outer length matches and there's no underflow, we should just discard the rest and only return the 4 bytes. After all, the response header tells us that only 4 bytes are relevant. Maybe the peer has some implementation reason why it does this (e.g. all buffers must be even-sized, or it needs to reuse fixed size buffers. I'll try to find out what the spec actually says about this.

suggests that the response can be used nonetheless!?!

andig commented 1 year ago

Which 4 of the 5 bytes would you even consider „the response“?

andig commented 1 year ago

If you decide for the first 4 which is probably a little less arbitrary a choice, https://github.com/grid-x/modbus/issues/52#issuecomment-1154392587 still sounds like a good idea to me.

srebhan commented 1 year ago

IMO you should use the first 4 bytes as this is the most natural interpretation. The other option would be to let the user decide, e.g. you add an option varadic to specify what to do in the discussed case!? This would be the best option from Telegraf side as we can just pass through this option to the config....

andig commented 1 year ago

We have the same problem in https://github.com/evcc-io/evcc/discussions/6051#discussioncomment-4907747. In our case, the server returns less data than expected. This leads to client using the result as part of slice access which panics. Instead of checking this at every return site it would be nice to check as part of the reading operation.

Current code does already check if response in itself is consistent, e.g. for holding:

count := int(response.Data[0])
length := len(response.Data) - 1
if count != length {
    err = fmt.Errorf("modbus: response data size '%v' does not match count '%v'", length, count)
    return
}

In addition, it should check if response matches request:

if 2*int(quantity) != length {
    err = fmt.Errorf("modbus: response data size '%v' does not match request '%v'", length, 2*quantity)
}

It should still return the result:

results = response.Data[1:]
return

Improvement would be to define a custom error type so it can be checked with errors.Is().

srebhan commented 1 year ago

That would work for me.

srebhan commented 7 months ago

Upstream PR: grid-x/modbus#81