privacy-tech-lab / gpc-web-crawler

Web crawler for detecting websites' compliance with GPC privacy preference signals at scale
https://privacytechlab.org/
MIT License
3 stars 1 forks source link

Using the [April Crawl Data](https://docs.google.com/spreadsheets/d/1xDz4RS5tlWBmAS33xVEOk2rqtc21lSkdInRv9J02ZFs/edit#gid=484688826), I tested the crawl for sites that output GPP strings (as tested in April) to check the gpp-version. Out of the 20 sites that I picked from the data, it seems that all of them used v1.1 and that data is reflected in the gpp-version column accurately. I also tested on sites that do not output GPP strings before and after the gpc signal is sent and as expected the column would reflect a 'null' value for gpc_version, since their gpp_before_gpc and gpp_after_gpc would also output a 'null' value. #113

Closed Sokvy77 closed 1 month ago

Sokvy77 commented 1 month ago
          Using the [April Crawl Data](https://docs.google.com/spreadsheets/d/1xDz4RS5tlWBmAS33xVEOk2rqtc21lSkdInRv9J02ZFs/edit#gid=484688826), I tested the crawl for sites that output GPP strings (as tested in April) to check the gpp-version. Out of the 20 sites that I picked from the data, it seems that all of them used v1.1 and that data is reflected in the gpp-version column accurately. I also tested on sites that do not output GPP strings before and after the gpc signal is sent and as expected the column would reflect a 'null' value for gpc_version, since their gpp_before_gpc and gpp_after_gpc would also output a 'null' value. 

In my testing and debugging of 20 sites, I have yet to encounter a site (that was crawled and identified to have a GPP string in April Crawl) that uses the v1.0. I'm not sure if this indicates and confirms that most sites have switched to the v1.1.

While I'm thinking of continuing my manual testing of other sites from the site list that had gpp strings in April Crawl to make sure of this switch, I wonder if there is a way for me to get a hold of sites that are still using v1.0 right now and test those sites out, instead of going through our site list.

Originally posted by @franciscawijaya in https://github.com/privacy-tech-lab/gpc-web-crawler/issues/110#issuecomment-2146547934