tweaselORG / TrackHAR

Library for detecting tracking data transmissions from traffic in HAR format.
Creative Commons Zero v1.0 Universal
5 stars 0 forks source link

Protobuf string decoding in adapter `google/googledatatransport-firelog-batchlog-protobuf` #71

Open baltpeter opened 1 month ago

baltpeter commented 1 month ago

In google/googledatatransport-firelog-batchlog-protobuf (which I'm writing as part of #52), I'm seeing some odd values:

image

That is not us misinterpreting a field or otherwise specifying a wrong dataPath, it's a problem of our Protobuf decoder (which we took from Cyberchef).

Here's an example of a request where that happens: https://data.tweasel.org/data/requests/informed-consent,101672

If we load that in Cyberchef, we get the same problem:

image

The fields I marked with an arrow should be iPhone9,3 and en-GB, respectively but they are instead interpreted as buffers.

As far as I understand it, there is no definitive way to tell from the Protobuf wireformat (without a schema) whether a value is a string or buffer, so libraries apply heuristics, which can of course fail.

https://protogen.marcgravell.com/decode handles this case correctly/offers both interpretations:

image

I also quickly tried rawprotoparse (which is used by HTTPToolkit), but even with rawprotoparse(buffer, { stringMode: 'string' }), that still interprets both fields as buffers:

image