While testing the crawler, I stumbled upon an issue where certain requests would be missed by Privacy Pioneer. More specifically, if I visited a site once, I might generate, say, 9 monetization entries. If I restarted the VM, cleared the cache, and visited the site again, I might generate 6 entries. Interestingly, it seemed as though there was a clear pattern of requests being missed, i.e. the same requests would be left out consistently. Upon checking the HTTP archives from the site load, it was clear that Privacy Pioneer was not finding requests that it should have found. I finally diagnosed the issue to be related to the HTTP request listener, since it seemed like these requests weren't even being provided for analysis within the extension. After many hours of analyzing internet traffic, I stumbled upon a subtle technicality that was overlooked in the previous paper.
As an example, here's an analytics request sent when visiting isotoner.com :
As we can see in the "Initiator" column, this request was activated via the Fetch API. Now let's visit the same site again:
Same exact request URL and request, except this time it was initiated with the Beacon API. Due to the way that the request filter was set up, the second request example would go undetected by Privacy Pioneer. As @SebastianZimmeck pointed out, this would also indicate that the version of the extension used for the previous paper would result in filtering out too many categories of requests.
While testing the crawler, I stumbled upon an issue where certain requests would be missed by Privacy Pioneer. More specifically, if I visited a site once, I might generate, say, 9 monetization entries. If I restarted the VM, cleared the cache, and visited the site again, I might generate 6 entries. Interestingly, it seemed as though there was a clear pattern of requests being missed, i.e. the same requests would be left out consistently. Upon checking the HTTP archives from the site load, it was clear that Privacy Pioneer was not finding requests that it should have found. I finally diagnosed the issue to be related to the HTTP request listener, since it seemed like these requests weren't even being provided for analysis within the extension. After many hours of analyzing internet traffic, I stumbled upon a subtle technicality that was overlooked in the previous paper.
As an example, here's an analytics request sent when visiting isotoner.com :![image](https://github.com/privacy-tech-lab/privacy-pioneer/assets/144378508/5797635b-4a88-40d2-9ef9-a1c640d9951c)
As we can see in the "Initiator" column, this request was activated via the Fetch API. Now let's visit the same site again: