tweaselORG / TrackHAR

Library for detecting tracking data transmissions from traffic in HAR format.
Creative Commons Zero v1.0 Universal
5 stars 0 forks source link

Implement indicator matching as a fallback #6

Closed baltpeter closed 1 year ago

baltpeter commented 1 year ago

Currently, TrackHAR only implements the adapter-based approach for detecting tracking data. This necessarily means that a significant portion of requests will be unprocessed (as we can't write an adapter for every possible endpoint, especially developer-/app-specific ones).

To alleviate that somewhat, we should (optionally) fall back to indicator matching on a user-provided list of honey data for requests that are not matched by an adapter.

Relevant: https://github.com/baltpeter/base64-search

baltpeter commented 1 year ago

I think I'll add an additional indicators parameter to the options argument in process().

The user can set that to an object that maps data types to honey data like this:

{
    localIp: ['10.0.0.2', 'fd31:4159::a2a1'],
    idfa: '6a1c1487-a0af-4223-b142-a0f4621d0311'
}

The result will then have an additional indicators "adapter" with the results from the indicator matching.

baltpeter commented 1 year ago

Actually, better yet: Let's have a pseudo-adapter per matching type (e.g. indicators-plain, indicators-base64, indicator-url-encoded).

baltpeter commented 1 year ago

Actually, better yet: Let's have a pseudo-adapter per matching type (e.g. indicators-plain, indicators-base64, indicator-url-encoded).

Actually^2: We can't really do that since we said that a request can only be matched by one adapter… But I guess we can put that information into the reasoning.

baltpeter commented 1 year ago

I really enjoy using GPT4 (through Bing Chat creative mode) for writing documentation. Thanks to the large context limit, you can just paste in huge code and/or README snippets and have it work with that. Usually, the results do still require some editing, but for me at least, it makes the process a whole lot less painful. And impressively often, it even mentions things that I would have forgotten about.

Transcripts: docstrings, README