privacy-tech-lab / privacy-pioneer

Privacy browser extension for analyzing web traffic of visited websites
https://www.privacytechlab.org/
Other
22 stars 1 forks source link

Update Web Request filter to include Beacon API #582

Closed dadak-dom closed 1 week ago

dadak-dom commented 2 weeks ago

While testing the crawler, I stumbled upon an issue where certain requests would be missed by Privacy Pioneer. More specifically, if I visited a site once, I might generate, say, 9 monetization entries. If I restarted the VM, cleared the cache, and visited the site again, I might generate 6 entries. Interestingly, it seemed as though there was a clear pattern of requests being missed, i.e. the same requests would be left out consistently. Upon checking the HTTP archives from the site load, it was clear that Privacy Pioneer was not finding requests that it should have found. I finally diagnosed the issue to be related to the HTTP request listener, since it seemed like these requests weren't even being provided for analysis within the extension. After many hours of analyzing internet traffic, I stumbled upon a subtle technicality that was overlooked in the previous paper.

As an example, here's an analytics request sent when visiting isotoner.com : image

As we can see in the "Initiator" column, this request was activated via the Fetch API. Now let's visit the same site again:

image Same exact request URL and request, except this time it was initiated with the Beacon API. Due to the way that the request filter was set up, the second request example would go undetected by Privacy Pioneer. As @SebastianZimmeck pointed out, this would also indicate that the version of the extension used for the previous paper would result in filtering out too many categories of requests.

SebastianZimmeck commented 2 weeks ago

Corresponding PR and also discussed in the broader context of testing the crawler (particularly, in this comment).