Protobuf dll errors when parsing large packets

phorn-NI commented 2 years ago

I have found an issue when trying to parse large data packets of UDP data through the protobuf dll.

Issue Description

I have used the customers proto file to generate the type cluster contained in the generated mdc lvlib - using the grpc-labview scripting tools.

For some initial tests, I built a PoC which generates randomised data, then serialises and deserialises the data to check it is valid (I shared this vi in a previous bug #119 ).

For this test I have copied the data from the Buffer indicator on the front panel of the previously shared vi (screenshot above). That data has been pasted onto the block diagram of the "bug demo.vi" (attached in "bug report.zip" under the bug report folder) as the simulated data (screenshot below)

I have pcap files generated through the NI XNET driver (with an NI-8623 module) with protobuf data from a customers ADAS ECU. I can view the data in Wireshark correctly (information on how to configure Wireshark to do this is below). For this test I have copied the data field of one of the protobuf frames, and pasted it as my non-simulated data in "bug demo.vi" (screenshot below)

The original pcapng file is also included in the same folder as "bug demo.vi"

When I run the vi and select Sim, I get no errors and the Unpacked data is valid.

When I turn Sim off, I get an error from the dll and the Unpacked data is just default values.

Configure Wireshark for Protobuf

In proto.zip there is a set of proto files from the customers ECU. Below is a description I wrote for the customer to show how to configure Wireshark to show the protobuf frames. This will be the easiest way to see the data that should be shown in the vi.

Go to Edit >> Preferences, and in the left panel there is a drop down list for Protocols. Find Protobuf in there.

Click on Edit Protobuf Search Paths, and add the directory on your PC as a source directory. For reference, I took the proto files used in my PoC work, and used them, with one of the additional Google proto files downloaded from Github. I’ve attached these as a zip to avoid confusion.

Then click on Edit Protobuf UDP message types, and add port 8204 with the message type MDC_Payload. Exit the dialogs.

You should then be able to load up a pcapng file from XNET. In the display filters box at the top type protobuf – this will limit the displayed results to only packets that have protobuf data.

For each packet, you should now be able to drill down into the Protocol Buffer section of the middle pane, and view your signal data.

shivaprasad-basavaraj commented 2 years ago

@phorn-NI I debugged this issue and found that it is an invalid entries in the protobuf data that is tripping up grpc-labview. Specifically, the data after the tail as shown below.

tail is the last entry in MDC_payload and I am not sure why we are getting these additional data from the ADAS ECU after all the data for the MDC_Payload itself. These data values are not recognized as part of the serialized data for MDC_Payload and hence we wouldn't know how to process them.

We could probably ignore these bits of data and not cause the errors you are seeing. I will take a look at how situations like these are handled by protobuf in other languages and see what we could do in grpc-labview.

P.S. We need to ignore the first four bytes the UDP data since they are not part of the serialized data for MDC_Payload.

phorn-NI commented 2 years ago

Thanks @shivaprasad-basavaraj , I will ask the customer about the extra data at the end.

I remember now that the 4 bytes at the start is a header field. Currently it is mostly blank, but the very first byte (02 in the packet shown below) is the payload type. 02 means it is MDC_Payload so we can deserialise it correctly - there are 4 other potential types which will have a different serialisation structure (but for our purposes we can just ignore the frame if the first byte is not 02)

shivaprasad-basavaraj commented 2 years ago

@phorn-NI I tried writing these bytes to a file and then using generated code for the proto file in C# to try to deserialize it as an MDC_Payload object. I ran into an exception with this data in C# as well. The problem is that gRPC cannot make any sense of these bits after the tail entry. We cannot even treat them as unknown fields since they don't prescribe to the protobuf format and we cannot get information out such as data type.

I think for this set of data the correct behavior for grpc-labview would be to error out since gRPC cannot deserialize this data. We could probably give out a better error description though letting the user know that the data is corrupt.

phorn-NI commented 2 years ago

@shivaprasad-basavaraj thanks

The rest of the data should parse correctly though shouldn't it (like it does in Wireshark)? Could the function still pass out the data but raise an error/warning saying there was additional field that don't fit the type?

shivaprasad-basavaraj commented 2 years ago

@phorn-NI I am guessing WireShark is just doing a best-effort guess for what those data fields should be based on the proto file provided and just ignores what it doesn't understand.

My gut feel is grpc-labview should more closely resemble the behavior of other programming languages which support gRPC which seems to be to error out if the required data doesn't prescribe to the protobuf wire format at all (not just it cannot find the protobuf index and such but it cannot make any sense of the data at all).

@ccifra What do you think the behavior of grpc-labview should be when we run into situations like these?

phorn-NI commented 2 years ago

thanks @shivaprasad-basavaraj that makes sense.

I've explained to the customer that we can continue to use tshark to parse the data but it is slow, and there is a risk of incorrectly parsed data (don't know how large the risk is), or they need to remove these bytes from the packet data from the ECU

It is also possible (and I've asked for confirmation) that I just have an out of date version of the proto file.

phorn-NI commented 2 years ago

@shivaprasad-basavaraj @ccifra I checked with the customer and my proto file was slightly incorrect - the second line was commented out on mine, it shouldn't be.

nanopb.proto is another file they have which imports the google descriptor proto file, and also has lots of optional parameters. I remember from a previous conversation with Chris that optional parameters are not supported - is that still true?

Anyway, I re-enabled the import statement and viewed the data in Wireshark, the same bytes are still labelled as unknown.

The customer isn't willing to accept a risk that the parsing is incorrect (this is the main stream of output data they need to use to validate their tests - if it is incorrect the test is junk) - which means using tshark or our API with those values in the UDP data are not valid options. They have an internal application which displays the data, so it must be somehow handling the data - I've asked them to explain how that application handles it.

phorn-NI commented 2 years ago

@shivaprasad-basavaraj I know what is going on now.

The UDP packet is used to serialise lots of data including mdc_payload. So the invalid packets (and everything after that - such as the additional faceboundingbox field) is actually part of something else and can be ignored.

I manually changed the data being written to the parsing functions in my test, but removing the first 4 bytes and everything after the start index of the invalid packets.

I worked out from the wireshark data that this was index 450 for this packet. From my previous serialisation/deserialisation experiements, the packet length was more like 360-370 bytes. So I have 2 questions.

Could we modify the function so that it parses the data and then has a separate output with the rest of the packet?
Alternatively, is there a way we can pre-analyse the packet to see how long we expect the mdc_payload data to be, so we can then trim it down before passing to the main parsing function?

shivaprasad-basavaraj commented 2 years ago

@phorn-NI I think it would be difficult to handle this in grpc-labview in a generic way. If we start seeing bits that we don't recognize as part of the protobuf data then we cannot be sure that we have parsed any of the data accurately. The data may genuinely be corrupted and not data that is safe for us to ignore and separate out as in this case.

I think we should handle this in G code for this specific scenario. I think ideally the packet from the ECU should include information such as the number of bytes for each data that they are serializing. Then we could use this information to split up the bytes in G before it is passed on to the grpc-labview layer to unpack.

Also, it is dangerous to assume that bytes for MDC_Payload would end at byte 450 as in the example above. Protocol buffers use variable length encoding and different data would mean different number of bytes used to encode this data.

shivaprasad-basavaraj commented 1 year ago

The issue was that in the attached example the test data was being generated by a special implementation of protobuf called nanopb. nanopb has a special mode for encoding data which signifies end of data by adding an entry with tag id of 0. This is not standard protobuf behavior and we expect it to work only when both ends of the data communication are built on top of nanopb unlike grpc-labview.

ni / grpc-labview

Protobuf dll errors when parsing large packets #151