privacy-tech-lab / gpc-optmeowt

Privacy browser extension for opting out from web tracking via GPC
https://www.privacytechlab.org
MIT License
149 stars 14 forks source link

Implement simple batch analysis feature #220

Closed SebastianZimmeck closed 2 years ago

SebastianZimmeck commented 3 years ago

Today we discussed additional things that we can do, especially, a manual analysis of Do Not Sell metrics provided by larger websites. Another point I can think of is to implement a batch analysis feature. At the moment, we are building our extension such that when a user is in analysis mode, they navigate to a new website, the analysis is provide, they navigate to another website, the analysis is provided, and so on. We could implement some kind of batch functionality that allows users to give as input a set of domains and the output is a CSV with compliance analysis results (or analysis results are written to the analysis page and can be imported from there; maybe that is easier to reuse our current functionality). That way we could scale our analysis. I imagine something at the order of 1K to 100K sites. This would be useful for both (1) an additional research aspect and (2) additional artifact.

For (1) we would get a survey of how many sites implement Do Not Sell links and US Privacy Strings (and how many do not). We would also know how many sites are compliant. We would probably only still be able to contact a smaller fraction of sites only because we are not automatically crawling for email address to contact the sites (and that seems also tricky).

I can see two ways of going about this:

1. Built the functionality into OptMeowt itself

This could be built directly into OptMeowt. Essentially, read in a domain list from an external file, use JS APIs for opening and closing tabs and, as always, record the results.

2. Use an external script to drive a Selenium instance with OptMeowt installed (in our script and dat repo)

The crawling functionality could also be external to OptMeowt. I played around with this option using a basic setup of Selenium for Firefox and installing Firefox extensions in Selenium. There are also some setup steps, such as changing the path variable. Essentially, OptMeowt would stay as is and we provide an external script and setup instructions to do a crawl.

So, should we do that? If so, how?

SebastianZimmeck commented 3 years ago

We decided to give it a go with a simple in-extension batch analysis mode. We try that out first, and see how it goes. @kalicki1 you are probably in the best position to take the lead on this one; possibly with some help by @OliverWang13.

SebastianZimmeck commented 3 years ago

This could be a good Pro feature if we decide to go a startup route with an open core business model, for example. We should have clarity on this before implementation. This could be done via a code checked at the backend or a if batch analysis is not part of the extension codebase, do not open source that code.

SebastianZimmeck commented 3 years ago

As @kalicki1 is currently exploring how pages can be reloaded as part of the analysis mode, that reload functionality will also inform our further dealings here.

SebastianZimmeck commented 2 years ago

Maybe, we pick this up later.