privacy-tech-lab / gpc-web-crawler

Web crawler for detecting websites' compliance with GPC privacy preference signals at scale
https://privacytechlab.org/
MIT License
4 stars 2 forks source link

Issue 59-3 #74

Closed katehausladen closed 10 months ago

katehausladen commented 10 months ago

1) The extension now logs web requests using the urlClassification object. Requests in the following categories are kept: fingerprinting, tracking_ad, tracking_social, any_basic_tracking, any_social_tracking, as they correspond to the Social, Advertising, and Fingerprinting Gen categories.

2) The extension no longer looks for DNSLs.

katehausladen commented 10 months ago

@Jocelyn0830, if you could just test the branch to verify that the addition of urlClassification and deletion of dns_link works, I'll fix the merge conflicts. The conflicts are due to this branch being opened so long ago (i.e. before all the GPP stuff was added).

Just let me know when you've tested it. The command to create the db for this branch is CREATE TABLE entries (id INTEGER PRIMARY KEY AUTO_INCREMENT, site_id INTEGER, domain varchar(255), sent_gpc BOOLEAN, uspapi_before_gpc varchar(255), uspapi_after_gpc varchar(255), usp_cookies_before_gpc varchar(255), usp_cookies_after_gpc varchar(255), OptanonConsent_before_gpc varchar(800), OptanonConsent_after_gpc varchar(800), urlClassification varchar(5000));

Jocelyn0830 commented 10 months ago

@katehausladen just tested and it worked fine on my end :)