Closed SebastianZimmeck closed 4 years ago
I think we should to this actually. Per today's final CCPA Regs, it is clear that technological solutions for opting out from the sale of personal information will be developed. Members of the IAB will probably implement the IAB approach. So, if our extension could hand this as well, that would be great.
Probably, the fist order of business should be to find some example sites that are participating in the IAB approach. The Privacy String is technology-agnostic. So, how are people implementing this in practice? Ideally, it would be header-based solution as well.
This also relates to the enhancement of the IAB CCPA framework.
I thought about this some more. We should actually implement the Do Not Sell opt out functionality per the IAB CCPA framework. This will be the first real functioning opt out for all sites that make use of IAB CCPA framework. Here is the plan:
The IAB Tech Lab U.S. Privacy String consists of four characters and is very similar to what we are actually doing. Here is an example:
Example 2 meets the following conditions:
Version 1 of the US Privacy string is being used. (1)
The digital property has NOT provided explicit user notice. (N)
The user has made a choice to opt out of sale. (Y)
The digital property is not operating under the Limited Service Provider Agreement. (N)
1NYN
How the four character string is transmitted to the server is principally technology-agnostic. However, the IAB recommends storage in a first party cookie:
The recommendation is to store the String in a first-party cookie named "usprivacy" where the API library can read it and write to it. In case storing on a 1st party cookie is not possible or practical (such as on mobile native or if cookies are disabled), a different storage method should be adopted.
What I am hoping is that our extension can identify the usprivacy
cookie and rewrite its value, most importantly, with a Y
as the third of the four-character string. Here is the EditThisCookie browser extension that manages to get access to the cookies of the currently visited website. The user can then edit its values. (As an aside, let's not copy any code directly from the extension. It comes under the GNU GPL license and would "infect" our code. However, we can use the same APIs to identify the cookie and rewrite its value. Just not directly copy the EditThisCookie code.)
Here is a screenshot of using the EditThisCookie extension on https://psychologist.onl/
It would be great if we could automatically rewrite cookie values like this to, say, 1NYN
.
Now, this is probably a somewhat tricky task, especially, synching this with the whitelist (let's not worry about that for the time being). We may want to split this into multiple issues. For the time being, I am leaving this together, though.
I started working on implementing the IAB CCPA guidelines for opting-out in our extension and managed to get a rudimentary cookie-modifier functioning. I haven't uploaded the code yet just because it is heavily based on how AccuWeather has implemented the framework (they have an 'opt-out' link in the site footer that links to a us_privacy
cookie for site visitors to use), in order to rule out as many variables in the implementation process as possible. The rough JS is pasted below. I originally had this function run every time after our original extension modified a given site's request headers, so this function definitely runs too many times for its intended purpose in my current implementation.
chrome.cookies.get({
"name": 'us_privacy', // Make this not case-sensitive
"url": 'https://www.accuweather.com/'
},
function (cookie) {
if (cookie !== null) {
let new_cookie = cookie
new_cookie.value = '1YYN'
new_cookie.url = 'https://www.accuweather.com/'
new_cookie.domain = null;
if (new_cookie.hostOnly !== null) {
delete new_cookie.hostOnly
}
if (new_cookie.session !== null) {
delete new_cookie.session
}
chrome.cookies.set(new_cookie, function (details) {
console.log("Found and updated cookie value.")
})
} else {
console.log("COOKIE NULL")
}
})
storeId
parameter, but in the future this will be important to record and maintain because I believe it is more important once you have multiple Chrome windows open. url
parameter set, so they always need to be set before adding new cookies via chrome.cookies.set()
.hostOnly
and session
parameters that cannot be present in a cookie that you would want to set, so I have them removed in the code above.url
parameter to the current site url, while preserving the domain
parameter the retrieved cookie had, I would find a second cookie created with the domain set to .www.accuweather.com
instead of replacing the original one with a domain of www.accuweather.com
after the function ran. I found this StackExchange post that referenced this point and explained how subdomains are handled in cookies. So the leading dot in chrome don't reflect whether or not a leading dot was used from the server, but whether or not that cookie had a "Domain=something" in its definition from the server. (And if it had, the cookie will also be sent to sub-domains).
Basically, it says a domain
value of null
would set the cookie's hostOnly
value to true
by the browser. According to the Chrome API documentation, this means that hostOnly
is
True if the cookie is a host-only cookie (i.e. a request's host must exactly match the domain of the cookie).
To me it looks like this is an arbitrary choice on AccuWeather's part, though there could be some other reasons they set the domain this way that we don't know about. When I implement this I can just make sure to make a note of how a given site handles its cookie domain somewhere so that we only ever have one copy of the us_privacy
cookie. However, this could get confusing if we are the ones setting a cookie and not reading one already added to the site's storage, so I think we will have to add a function that checks for multiple signals and handles them somehow.
In general, the notes above are for future reference so my thought process on how to implement the IAB proposal is documented somewhere.
@SebastianZimmeck, I will start to generalize this implementation and break it up into more fleshed out and manageable chunks over the next few days. Let me know what you think about what I have found.
heavily based on how AccuWeather has implemented the framework
If it is not straight copied, it is OK in terms for copyright. You can certainly use the same APIs and the code can look similar.
I originally had this function run every time after our original extension modified a given site's request headers, so this function definitely runs too many times for its intended purpose in my current implementation.
If it is not a drag on performance, that would be OK. Maybe, it is possible to identify request with a cookie. The function would only need to be run for that.
The points you describe with the hostOnly
and other settings seem tricky. Perhaps, we can discuss this more tomorrow.
I will start to generalize this implementation and break it up into more fleshed out and manageable chunks over the next few days. Let me know what you think about what I have found.
Sounds good. Generally, this looks promising to me.
heavily based on how AccuWeather has implemented the framework
Maybe I misspoke when I said this. What I meant is that since AccuWeather does set and use the us_privacy
first-party cookie, I tested out my code on the site to see if I could set up some sort of "communication," even if they didn't respond. In other words, could I see that they created a cookie on my browser and then could I modify that specific cookie.
The rest does look promising, hopefully this is the right way forward!
A few more sites that can be used for testing as they are using the IAB CCPA Compliance Framework.
Here is a site that lists various other domains having the usprivacy cookie. It does not seem to be always correct, though.
The recent commit I pushed has functionality for sending a chosen cookie to every single site visited by a user. So far, there is no implemented functionality to check if a cookie is already on the site, and if there was one, for it to be parsed and responded to accordingly. I will continue to work on this alongside my suggestion below as well as issue #42.
@SebastianZimmeck, after reading your updates to issue #42, I was thinking that we could use the idea of storing known ad networks' cookie profiles to a JSON and extend it slightly to allow us to store known variations of the us_privacy
signal, such as us_privacy
, usprivacy
, etc. This way, when we check if a site follows the IAB protocol, we can respond to the variation of the cookie the site has already implemented. I am mentioning this because the sites you listed above seem to be setting their own usprivacy
cookie, while I have been developing according to Accuweather's preference for us_privacy
.
we could use the idea of storing known ad networks' cookie profiles to a JSON and extend it slightly to allow us to store known variations of the us_privacy signal, such as us_privacy, usprivacy, etc.
Good idea. Per the IAB specification, it should be us_privacy
(bottom of the page). However, it certainly may be the case that some implementers use a slightly different format. Some possibilities are usprivacy
, us-privacy
, and us_privacy
.
At the moment, I am thinking that it may be the best if you are creating two different JSON specs; one with the variations of the us_privacy
string and one for the concrete ad network server URLs to visit (per issue #42). These are slightly different things (though, both could use the JSON spec idea, indeed).
In the newest code update, I added:
us_privacy
name variations. I began to work on adding JSON functionality as discussed above, though decided to put it off until I work on issue #42 just because that can almost be its own issue. Local variables serve their purpose well for now, and some partially implemented code pertaining the JSON issue exists and is commented out at the bottom of the JS file. We can come back to this issue when issue #42 is implemented. us_privacy
, the default signal to 1NYN
, and to not set a specific domain. .
is in front of the domain). I added a check for this in the new update the resolves this issue for now. What we need to work on next is handling the case where multiple cookies exist in the browser for some reason. Our extension avoids creating new cookies on AccuWeather, however this is not true for system.jobboard.io. When there is no cookie set and the page is loaded, the extension seems to get the opportunity to set its own before the site does, and the site apparently does not recognize the one we set by then. This can be seen here:
In this case, two different IAB cookies are set, one by us and one by system.jobboard.io. Since our cookie update process runs multiple times per site load as of right now (it runs every time a header is modified), we can resolve this with a check for multiple cookies and then delete the one containing our default settings in a subsequent script call. I feel that this kind of a check for multiple cookies can get quite complex however. Maybe we can find a way to avoid this in the first place altogether.
Despite this needed fix, it seems like the core functionality is in place! I will continue to test this while I work on other parts of the extension, since it seems there will be a few minor bugs that will need to get worked out.
In this case, two different IAB cookies are set, one by us and one by system.jobboard.io.
So, our cookie is set on the domain, jobboard.io
, and the site sets the cookie on the subdomain, sytem.jobboard.io
? Are the two different domains the problem? In other words, would there be only one cookie written if we would also write the cookie to the subdomain?
Since our cookie update process runs multiple times per site load as of right now (it runs every time a header is modified), we can resolve this with a check for multiple cookies and then delete the one containing our default settings in a subsequent script call. I feel that this kind of a check for multiple cookies can get quite complex however. Maybe we can find a way to avoid this in the first place altogether.
As I see it at the moment, I do not think that it is a big problem that there are cookies in the domain and subdomain(s). Especially, if the site relies on setting and reading the cookie from multiple (sub)domains (not sure, is that the case?), it may be even necessary to have multiple cookies. If we can figure out exactly where the cookies are set and read, we can delete it. If we are not quite sure, I think it is OK to have multiple cookies. What would be important, though, is to have consistent values for these cookies.
So, our cookie is set on the domain, jobboard.io, and the site sets the cookie on the subdomain, sytem.jobboard.io?
No, our cookie is the one set to the subdomain system.jobboard.io
. When a new cookie needs to be made from scratch, our extension abstains from setting a specific domain. Chrome fills out this information itself based on the current URL, which gives us system.jobboard.io
, in this case a specific subdomain. However, it looks like the site sets its own IAB cookie to the domain .jobboard.io
and not the subdomain Chrome assigned our cookie.
Fundamentally, it looks like this is the problem yes. You need to keep the name and the domain the same to overwrite a cookie. This is how our extension overwrites the cookie if jobboard.io
places its own cookie first, something we do not have an issue with at the moment. The extension recognizes the cookie is there and then saves the site-assigned domain from that cookie at this point in the code.
It is important to note that in this case, the site-assigned IAB cookie's name is different than the one we assigned. This fact alone necessitates some kind of check for multiple cookies aside from the cookie domain issue.
Especially, if the site relies on setting and reading the cookie from multiple (sub)domains (not sure, is that the case?)
This doesn't seem to be the case. Multiple cookies are set because whoever runs the given website doesn't check to see if another variation of the cookie exists (ours if we set our cookie first), albeit with slightly different parameters than the ones they chose to give it. Since the site won't handle it, our extension needs to be vigilant in such cases and make sure to use the same settings of the site-assigned cookie. This is at least my thinking at the moment.
it may be even necessary to have multiple cookies
My only concern with this is that I believe the IAB protocol mentions only one cookie should be used by a site and a user to mutually exchange the opt-out information. This leads me to think if a user doesn't modify a site-provided IAB cookie, most site owners will not check for other variations of the cookie in the same way we do. Though this could be open to interpretation, I think I would prefer to keep only one IAB cookie per site for this reason.
Here are some thoughts I have on a few rough ideas we could implement.
Making a function to 'guess' what domain to use
If it comes down to it, we could also create some sort of function to 'guess' the best domain to use when setting a us_privacy
cookie if ones does not exist. We could collect information on all the other cookies set by a site, average the number of times a particular domain shows up, and select the most often occurring one to set as our given us_privacy
domain. This could possibly increase the chances a site will recognize our cookie, though we have now way of knowing for sure.
Creating cookies for each version of the us_privacy
signal
Though I would really prefer to not do this, I think we do have the option to set a cookie for each variation of the IAB signal that exist (us_privacy
, us-privacy
, usprivacy
). This way, we have three identical copies of the signal, each with a different name, in case a given site only checks for one. Until the IAB spec is clarified or many sites clearly adopt one or another, we will not know which default name to use.
Recent commit regarding multiple IAB cookies
The recent commit here removes the default cookie placed by the extension if, when the extension is called again, it recognizes multiple IAB cookies on a given URL. It does not guarantee that there will not be multiple cookies at all, but does solve the specific issue with loading our own cookie before a site gets to load their own as discussed above with jobboard.io
. Since this solution only deletes one cookie, if there are three or more IAB cookies for some reason, the current URL will still have more than one cookie after this patch runs.
The big picture is that, if no cookie exists, we will place one. In the case of jobboard.io
, they always end up placing their own cookie immediately after we place ours, though with different enough settings that it doesn't override the one we placed. Hence we end up with multiple cookies. This patch doesn't prevent this from occurring in the first place, but rather resolves it in a subsequent pass of the extension.
As discussed, @kalicki1 will continue with his testing (and possibly open new issues and close this one as the concrete work becomes more clear). In principle, there are two approaches:
Since I want to move development along in other areas of the extension and not spend too much time focused only on this issue, I will open a pull request to bring the changes made so far on this IAB CCPA implementation into the master branch. I will do the same with issue #42 to test the cookie-based code side by side and find ways to simplify the code base if possible.
If major issues surface or revisions need to take place regarding this IAB spec implementation, I will open new, focused issues that address them specifically. Seeing as the major goal of this issue is now complete, we can close this issue as well. We can continue to use this issue as a reference to problems we resolved in the past if new issues in the IAB framework come up.
Depending a bit on how things are developing on the policy end (i.e., whether we find some support for our signal, ideally in terms of standardization), we should consider also implementing the IAB CCPA Compliance Framework (should it turn out that there is not a whole lot of support for our signal). Their US Privacy String follows a similar idea as our signal, is technology-agnostic, and can be sent via a browser extension. It would be binding for companies participating in the IAB.
Here are some further references: