Open baltpeter opened 6 months ago
@zner0L What do you think?
I just thought of creating a kind of Endpoint
object which would allow to decompose the endpoint URL similarly to the URL
JS object. For regexes, we could combine a host regex with a path and protocol component (like this: https://stackoverflow.com/questions/9213237/combining-regular-expressions-in-javascript).
In e9e52478e3572445357488ff81964b2f56a52c1f, I implemented a filter that only includes requests to servers that the user's device also provably (through Tracker Control/the App Privacy Report) contacted.
I am currently doing that by checking the request's hostname from the HAR against the hostnames in the TC/APR export.
Instead of the HAR hostname, I think we should be checking against all endpoint URLs that the corresponding adapter accepts.
Imagine a tracking endpoint
https://api\d.tracker.tld/ingest
. If during our analysis, we happened to find requests tohttps://api2.tracker.tld/ingest
but the user's device happened to usehttps://api5.tracker.tld/ingest
instead, we would currently exclude those requests.However, implementing it this way is surprisingly hard. We only get a hostname from the TC/APR export. Meanwhile, our adapters' endpoint URLs can be strings or regexes of full URLs.
How would we check whether
android2-ads.adcolony.com
matches/^https:\/\/(android|ios)?ads\d-?\d\.adcolony\.com\/configure$/
? Maybe I'm missing something, but I really can't see an automated way that isn't hacky and error-prone.I feel like the only (proper) way to implement this change would be to also manually add a
hosts
array to each adapter in TrackHAR.