Closed l-castelli closed 8 years ago
Can you elaborate a bit? Possibly provide some example traffic which shows this issue?
test.pcapng is the byte by byte version, tesok.pcapng is the working version.
Thanks.
I'll see if I can take a look at this soon-ish.
I've just looked into this, and there is something strange going on with the byte-for-byte capture (the one not working correctly). This is what I see when I concatenate the data bytes manually (first 16 bytes):
0000001F 1AA53E95 38013F61 57FEBD1D ..
If this was regular traffic for Simple Message, the 0x1F
would be the pkt_len
field, with following 3 int32
s the header fields. The strange thing is, there is no (official) message with a total length of 0x1F
== 31 bytes. In addition, in this capture, 0x1AA53E95
(447037077) would need to be the msg_type
, but 447037077 is not a registered type.
Could it be that 0x1F
is really ASCII 31 which is A
, which would be 0x0A
? Because then the first message would be a JOINT_POSITION
, which would make sense. There also appear to be no pkt_len
bytes.
The issue with the dissector not working when messages are so fragmented that each byte is in a separate frame remains valid though. So if you could provide me with a capture that is essentially the second one, but then with each message transmitted byte-for-byte, I could verify the fix I have implemented.
Most messages of the capture should indeed be JOINT_POSITION simple messages. However, I'm pretty sure the msg_type part of the header was written like this: 0x0A ,so i have no idea where the 0x1F is coming from. ps: The ASCII character for the binary code 0x1F is not 'A'. (0x41 = 'A') I can make another capture for you tommorow. I'm not 100% sure, but i thought testok.pcapng was essentially the first one (test.pcapng).
Best Regards
You're right that 0x1F
is not A
.
Looking at it again, this could be a desync issue: the capture starts in the middle of a message. A bit later in the capture, we see the following:
38000000 0A000000 01000000
This does correspond to a JOINT_POSITION
header (length: 56, type: 10, comm: topic).
The bytes after that also make sense (0
for seq
, then 10 little endian floats):
00000000 1F1AA53E 9538013F 6157FEBD
1DE7A4BF 360A72BF 05C0A0BF 00000000
00000000 00000000 00000000
I'm not sure this is something I can fix: Simple Message does not have sync bytes at the moment (ros-industrial/industrial_core#133), making it virtually impossible to detect start-of-msg to resync to a byte stream. Message start (or pkt_len
) is assumed to coincide with the start of a TCP frame (or the data part of it at least). If that is not true, then the dissector cannot recover, and will interpret whatever the first four bytes are as the pkt_len
field (which is essentially what was happening in my earlier comment.
I can make another capture for you tommorow. I'm not 100% sure, but i thought testok.pcapng was essentially the first one (test.pcapng).
That is probably true, however due to the desync it's hard to tell.
I'm inclined to close this as a cant-fix
: the only thing I can think of is to scan for known msg sizes (ie: values for pkt_len
that correspond to sizes of msgs), but I don't like it very much.
I have a fix here that basically collects 4 bytes before trying to read the pkt_len
field, which should make the dissector able to dissect streams even when a message is written out byte-for-byte. It only works if the capture starts the start of a message though (or: if the first bytes on any configured simple message port are actually part of the pkt_len
field).
If you can provide me with a capture that has that kind of traffic (ie: byte-for-byte ánd includes start of simple msg stream) that would be very much appreciated.
I'll provide you a capture asap. (probably monday morning)
Oh wait, in this capture, i forgot to write the joint_position messages byte by byte. I'll upload another capture asap.
Thanks for the captures. As I noted in the commit message, 38e9ec9 only partially fixes the issue you reported, as it will still not be able to recover from a desync (ie: capture starts in the middle of a message). I'll create an issue for that, but I'm unsure about whether that is actually something that can be fixed without changes to the protocol itself (ros-industrial/industrial_core#133).
I've opened #22 to track the remaining desync issue in the dissector.
Thanks for providing me with the captures and for reporting this :+1:.
issue is solved when we're reading/writing whole simple message at once.