Closed SebastianZimmeck closed 2 years ago
Below is a spotcheck of 20 US sites on the builtwith TCF list (https://pro.builtwith.com/report/list/e813bdf2-dcc3-466d-88ec-25b20b3c2450) to find out how many have the IAB US Privacy String/USPAPI. Here are the sites:
Out of 20 sites tested, 15 sites have the IAB US Privacy String/USPAPI.
OK, thanks, @Jocelyn0830. That looks promising!
I looked more into this. @Jocelyn0830, can you do the following?
I may also contact the builtwith.com people and ask if they would be willing to give us the whole list or a bigger part for research purposes. Otherwise, their basic plan is $300, and we do not have a budget for that.
I just realized it is even possible to zoom in on California. So, let's transcribe all the different California sites first, see, how many we get.
Also, the very high traffic volume ones are important to get because they are most likely subject to the California Consumer Privacy Act. So, please include those as well, @Jocelyn0830.
I transcribed all the California sites and US sites with very high traffic volume into the Google sheet. In total, I got 224 sites with 65 California sites. 189 sites have very high traffic volume.
Below is a spotcheck of 25 sites in the Google sheet to find out how many have the IAB US Privacy String/USPAPI:
Out of 25 sites tested, 19 sites have the IAB US Privacy String/USPAPI.
Nice!
The Google sheet is updated with 730 unique sites transcribed from the TCF US list.
Excellent work, @Jocelyn0830!
This issue is superseded by #16.
Once we have finalized the current implementation tasks, it would be a nice paper contribution to actually crawl some sites. I am imagining running OptMeowt in analysis mode on 1,000 sites or so. "Small big data" as a proof of concept of our data analysis. We know the performance of the analysis mode. So, this would not require any ground truth analysis. Just running OptMeowt in analysis mode and recording the results.
Maybe, we could use the IAB members directory to find sites with Privacy String/USPAPI implementation. Not sure if the Tranco list is any help.
I do not think it is necessary to build any batch analysis functionality into OptMeowt as that would add another layer of complexity. However two other options are (1) manually visiting different sites or (2) implementing some external driver. On the latter, here is what I wrote in a previous issue that we had not pursued at the time.
Not sure how to exactly tackle it. Let's discuss next meeting ...