Closed SebastianZimmeck closed 2 years ago
We decided to give it a go with a simple in-extension batch analysis mode. We try that out first, and see how it goes. @kalicki1 you are probably in the best position to take the lead on this one; possibly with some help by @OliverWang13.
This could be a good Pro feature if we decide to go a startup route with an open core business model, for example. We should have clarity on this before implementation. This could be done via a code checked at the backend or a if batch analysis is not part of the extension codebase, do not open source that code.
As @kalicki1 is currently exploring how pages can be reloaded as part of the analysis mode, that reload functionality will also inform our further dealings here.
Maybe, we pick this up later.
Today we discussed additional things that we can do, especially, a manual analysis of Do Not Sell metrics provided by larger websites. Another point I can think of is to implement a batch analysis feature. At the moment, we are building our extension such that when a user is in analysis mode, they navigate to a new website, the analysis is provide, they navigate to another website, the analysis is provided, and so on. We could implement some kind of batch functionality that allows users to give as input a set of domains and the output is a CSV with compliance analysis results (or analysis results are written to the analysis page and can be imported from there; maybe that is easier to reuse our current functionality). That way we could scale our analysis. I imagine something at the order of 1K to 100K sites. This would be useful for both (1) an additional research aspect and (2) additional artifact.
For (1) we would get a survey of how many sites implement Do Not Sell links and US Privacy Strings (and how many do not). We would also know how many sites are compliant. We would probably only still be able to contact a smaller fraction of sites only because we are not automatically crawling for email address to contact the sites (and that seems also tricky).
I can see two ways of going about this:
1. Built the functionality into OptMeowt itself
This could be built directly into OptMeowt. Essentially, read in a domain list from an external file, use JS APIs for opening and closing tabs and, as always, record the results.
2. Use an external script to drive a Selenium instance with OptMeowt installed (in our script and dat repo)
The crawling functionality could also be external to OptMeowt. I played around with this option using a basic setup of Selenium for Firefox and installing Firefox extensions in Selenium. There are also some setup steps, such as changing the path variable. Essentially, OptMeowt would stay as is and we provide an external script and setup instructions to do a crawl.
So, should we do that? If so, how?