dissector doesn't work properly when reading/writing data byte by byte

l-castelli commented 8 years ago

issue is solved when we're reading/writing whole simple message at once.

gavanderhoorn commented 8 years ago

Can you elaborate a bit? Possibly provide some example traffic which shows this issue?

l-castelli commented 8 years ago

testSMdissector.tar.gz

l-castelli commented 8 years ago

test.pcapng is the byte by byte version, tesok.pcapng is the working version.

gavanderhoorn commented 8 years ago

Thanks.

I'll see if I can take a look at this soon-ish.

gavanderhoorn commented 8 years ago

I've just looked into this, and there is something strange going on with the byte-for-byte capture (the one not working correctly). This is what I see when I concatenate the data bytes manually (first 16 bytes):

0000001F 1AA53E95 38013F61 57FEBD1D ..

If this was regular traffic for Simple Message, the 0x1F would be the pkt_len field, with following 3 int32s the header fields. The strange thing is, there is no (official) message with a total length of 0x1F == 31 bytes. In addition, in this capture, 0x1AA53E95 (447037077) would need to be the msg_type, but 447037077 is not a registered type.

Could it be that 0x1F is really ASCII 31 which is A, which would be 0x0A? Because then the first message would be a JOINT_POSITION, which would make sense. There also appear to be no pkt_len bytes.

gavanderhoorn commented 8 years ago

The issue with the dissector not working when messages are so fragmented that each byte is in a separate frame remains valid though. So if you could provide me with a capture that is essentially the second one, but then with each message transmitted byte-for-byte, I could verify the fix I have implemented.

l-castelli commented 8 years ago

Most messages of the capture should indeed be JOINT_POSITION simple messages. However, I'm pretty sure the msg_type part of the header was written like this: 0x0A ,so i have no idea where the 0x1F is coming from. ps: The ASCII character for the binary code 0x1F is not 'A'. (0x41 = 'A') I can make another capture for you tommorow. I'm not 100% sure, but i thought testok.pcapng was essentially the first one (test.pcapng).

Best Regards

gavanderhoorn commented 8 years ago

You're right that 0x1F is not A.

Looking at it again, this could be a desync issue: the capture starts in the middle of a message. A bit later in the capture, we see the following:

38000000 0A000000 01000000

This does correspond to a JOINT_POSITION header (length: 56, type: 10, comm: topic).

The bytes after that also make sense (0 for seq, then 10 little endian floats):

00000000 1F1AA53E 9538013F 6157FEBD
1DE7A4BF 360A72BF 05C0A0BF 00000000
00000000 00000000 00000000

I'm not sure this is something I can fix: Simple Message does not have sync bytes at the moment (ros-industrial/industrial_core#133), making it virtually impossible to detect start-of-msg to resync to a byte stream. Message start (or pkt_len) is assumed to coincide with the start of a TCP frame (or the data part of it at least). If that is not true, then the dissector cannot recover, and will interpret whatever the first four bytes are as the pkt_len field (which is essentially what was happening in my earlier comment.

gavanderhoorn commented 8 years ago

I can make another capture for you tommorow. I'm not 100% sure, but i thought testok.pcapng was essentially the first one (test.pcapng).

That is probably true, however due to the desync it's hard to tell.

gavanderhoorn commented 8 years ago

I'm inclined to close this as a cant-fix: the only thing I can think of is to scan for known msg sizes (ie: values for pkt_len that correspond to sizes of msgs), but I don't like it very much.

I have a fix here that basically collects 4 bytes before trying to read the pkt_len field, which should make the dissector able to dissect streams even when a message is written out byte-for-byte. It only works if the capture starts the start of a message though (or: if the first bytes on any configured simple message port are actually part of the pkt_len field).

If you can provide me with a capture that has that kind of traffic (ie: byte-for-byte ánd includes start of simple msg stream) that would be very much appreciated.

l-castelli commented 8 years ago

I'll provide you a capture asap. (probably monday morning)

l-castelli commented 8 years ago

capture18-04-2016.pcapng.tar.gz

l-castelli commented 8 years ago

Oh wait, in this capture, i forgot to write the joint_position messages byte by byte. I'll upload another capture asap.

l-castelli commented 8 years ago

capturev2.pcapng.tar.gz

gavanderhoorn commented 8 years ago

Thanks for the captures. As I noted in the commit message, 38e9ec9 only partially fixes the issue you reported, as it will still not be able to recover from a desync (ie: capture starts in the middle of a message). I'll create an issue for that, but I'm unsure about whether that is actually something that can be fixed without changes to the protocol itself (ros-industrial/industrial_core#133).

gavanderhoorn commented 8 years ago

I've opened #22 to track the remaining desync issue in the dissector.

Thanks for providing me with the captures and for reporting this :+1:.

ros-industrial / packet-simplemessage

dissector doesn't work properly when reading/writing data byte by byte #21