privacy-tech-lab / gpc-web-crawler

Web crawler for detecting websites' compliance with GPC privacy preference signals at scale
https://privacytechlab.org/
MIT License
3 stars 1 forks source link

New OneTrust cookies? #94

Closed katehausladen closed 3 months ago

katehausladen commented 4 months ago

I was reading about the OptanonConsent cookie here and noticed that they listed 2 other cookies that have to do with the CCPA and GPP: Screenshot 2024-02-21 at 3 39 40 PM

I saw the OneTrustWPCCPAGoogleOptOut cookie "in the wild" a few times when doing the validation/test set back in November and December, but to me, it's not completely clear what it's doing (I'm not sure what "cookie category associated with IAB CCPA" is). In particular, it looked like this cookie was on all PMC sites listed here.

I don't remember seeing the OTGPPConsent cookie in November/December. However, now it looks like the PMC sites, runnersworld.com, and a few news sites (huffpost, cbs, usnews, and probably more that I haven't checked) all have it. Here's what it looks like:

Screenshot 2024-02-21 at 3 48 16 PM

Should we add the OTGPPConsent cookie (and/or the OneTrustWPCCPAGoogleOptOut one) to the cookies we look for with the crawler? Manually looking at a few sites, it's only on a few sites that have the OptanonConsent cookie. They are first party cookies, so they're not going away. I'm not sure if this cookie can be added without implementing GPP via the API, as currently all the sites that I have seen that have this cookie also have GPP implemented via the API.

SebastianZimmeck commented 4 months ago

Great find, @katehausladen!

My basic understanding (which could be wrong since I had not heard of these two cookies before) is that OneTrust is aiming to achieve opt out consistency between their opt out system and the IAB's system.

For example,

OTGPPConsent

This cookie is dropped when the GPP (Global Privacy Platform) feature is configured for a template within its geolocation rule group.

seems to say that when a user has opted out and the GPP String is set accordingly, then that opt out is also reflected in the OTGPPConsent cookie for that user.

If you have some examples of that cookie value and the GPP String that the IAB set, it may be possible to find whether that is actually how that works.

By the way, when they talk about a cookie being "dropped" they mean a cookie being "stored" on the user's computer.

SebastianZimmeck commented 4 months ago

I think it would be a nice add-on if it is relatively easy to do.

katehausladen commented 4 months ago

We're going to add OTGPPConsent and OneTrustWPCCPAGoogleOptOut cookie collection

SebastianZimmeck commented 4 months ago

@franciscawijaya will implement this with the help of @katehausladen.

katehausladen commented 4 months ago

@franciscawijaya you'll have to add the new cookie names to regex.js. Then, you'll have to add new columns for the cookies in the rest-api (namely, in the app.post in index.js starting line 59). Analysis data for each domain is stored in analysis_userend[domain]. Add the cookie data to this object in logData under the if (command === "COOKIES") {... section. Just follow how the other cookie data is stored.

katehausladen commented 3 months ago

I merged the pull request and updated the readme + wiki accordingly. Here's the newest architecture diagram powerpoint.

web-crawler-architecture.pptx

SebastianZimmeck commented 3 months ago

Excellent! Thank you, @katehausladen!