privacycg / nav-tracking-mitigations

Navigation-based Tracking Mitigations
https://privacycg.github.io/nav-tracking-mitigations/
31 stars 14 forks source link

Communicating people’s choices to all parties #13

Open jwrosewell opened 2 years ago

jwrosewell commented 2 years ago

We discussed providing people choice over the sharing of state information during the Privacy CG meeting of the 23rd/24th September 2021. I considered the following key contributions as particularly relevant.

If I’ve not captured the points above correctly please comment further.

Whilst the SWAN.community prototype and proposal text does not yet explain how these issues are intended to be addressed, they were considered in the design, so that the SWAN.community method could evolve into a modern state sharing mechanism based on common contracts rather than solely registerable domain names. In this issue I’m sharing how that group envisage this evolution so that we can debate it at a future Privacy CG meeting.

SWAN Concepts

There are two relevant components to be familiar with.

  1. Model Terms – all parties that use SWAN data are bound by the model terms. They are conceptually identical to the EC Standard Contractual Clauses which many of your businesses use. They are very similar to the take it or leave it open-source licences used on this very platform. If you don’t like the MPL 2 and that’s the licence that the source code is licensed under then you don’t use the source code. We use these standard contracts every day without realising it. Model terms are explained here.

  2. Open Web Identifiers (OWID) – all personal data – or any data - is wrapped in an “envelope” that provides additional information concern the entity that collected of processed the data, when they did it, and a method of obtaining the legal basis under which the data collection or processing occurred.

    1. Registerable domain name. Via well-known end points operated by the domain the legal entity (aka “data controller”) and contract used to collect or process data. Typically, this will be an organisation providing a user interface to collect the data (for example; collecting preferences, a random identifier, or optional email address), or processing data (for example; hashing two values to form a pseudo anonymous identifier).

    2. The date and time to the minute that the OWID was created.

    3. Data payload that the OWID contains, encoded as a byte array.

    4. The Elliptic Curve (EC) signature applied to the OWID by the data controller. The data that is signed consists of the other three fields, domain, date and time and payload. The signature is used to confirm the OWID was generated by the entity that claims to have generated it at the registerable domain name.

Domain names in the OWID scheme provide the legal entity that generated the data (“Creator”), and the standard contractual clauses that were used to collect or process the data. It is the domain that is embedded in the OWID. Every Creator MUST provide a well-known end point that provides the contract – typically the common elements of a privacy policy and data sharing agreement - under which the data was collected or processed, their legal name, and a public key that enables the signature in the OWID to be verified using EC cryptographic signatures.

OWIDs and Model Terms are separate concepts and can stand on their own. In the SWAN proposal they are combined and can be seen visually by inspecting an advertising request involving many parties.

The following is a snippet from the SWAN.community demo showing the parties to a fictional OpenRTB transaction where data is exchanged outside of the web browser, all parties must have visibility of each other, and the user must be able to inspect the supply chain in the web browser should they wish.

image

OWIDs are explained more fully here. There are also three concrete implementations of OWIDs for Go, .NET and JavaScript.

Supporting people’s choices

OWID’s provide the following benefits.

  1. Data can be linked to a common privacy policy or any other common legal contract. i.e. pub-a.com, pub-b.com and pub-c.com all reference the same contract under which the data was collected.
  2. The common contract can be inspected by any party who receives the OWID. In the case of SWAN, the Model Terms are the common contract.
  3. The Creator provides cryptographic proof they collected or processed the data and under which contract they used to do so.

If we consider these features in relation to state data being shared between registerable domains within a user agent, whether via URL, cookies, shared storage, or any other mechanism, we can achieve the following.

  1. User agent can determine from the domain the contracts under which it collects and processes data. The following is an example response from the creator domain current-bun.uk.
{
  "domain": "current-bun.uk",
  "name": "Current Bun",
  "publicKeySPKI": "-----BEGIN PUBLIC KEY-----\nMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE8DB9A0fY+/bRbdXBRz+AtDLS4Sf2\nIu1k0g3WbmloqfsLgi9R/oAbPdzUKgj16I9BAP0TvV3GkV+y2lgUXWaTDQ==\n-----END PUBLIC KEY-----\n",
  "contractURL": "https://github.com/SWAN-community/swan/blob/main/model-terms.md"
}
  1. Data being transmitted by the user agent can also determine the contract to govern subsequent processing.
  2. The user agent can provide a list of contracts for data in a single operation or over a prolonged period. Transparency improves for people.

Therefore, we can support the following flow.

  1. User agent receives a state sharing request between organization X and organization Y.
  2. User agent checks to determine if the state data is wrapped in an OWID.
  3. If it is an OWID then the user agent unpacks the OWID and discovers the contract that applies to the data.
  4. The user agent verifies that the OWID was generated by the claimed Creator by checking the signature.
  5. If the OWID is valid, the user agent checks that sending party X and receiving party Y are also bound by the same contract.
  6. If parties X, Y and the OWID are not bound by the same contract, the user agent warns the user. If they are bound by the same contract then the state sharing is allowed.

In Boolean logic the equations are.

Good = Contract X == Contract Y == Contract OWID Bad = Contract X != Contract Y || Contract X != Contract OWID || Contract Y != Contract OWID

  1. The user agent could prompt for contract acceptance at a frequency of its choosing based on past knowledge of the contract. For example, a message like, “Contract [X] is being used to share data. Would you like to review this contract before continuing?”. A follow up question after review could be whether the user accepts the contract and wishes the data sharing to take place. The results of this double check can then be used to enable people to revoke their prior consent, exercise their right to be forgotten, and prevent future data sharing under that contract via the user agent.

All state operations will repeat this basic process.

The process can be applied to all state sharing operations in a web browser and is backwards compatible. The following would be needed.

  1. A method of communicating to the user agent that the data being shared is an OWID. This could be done via a naming convention or some other indicator. Perhaps adding the suffix OWID to the key (for example “pref_owid”), or an additional field in the data schema (for example “owid_cookie”).
  2. If the user agent does not implement this proposal, then the current state sharing configuration of the user agent will apply as any additional fields or suffixes will be meaningless to that user agent.
  3. If the user agent does implement the proposal, then the above process can apply.

This would enable browser vendor A to adopt this state sharing permission enhancement, but vendor B to ignore it. People will then decide which model they prefer and modify their browser choices accordingly as they have always done.

The benefits of the proposal include:

  1. No central authority defines “good” or “bad”, “sanctioned” or “unsanctioned”. Many of our debates get stuck on this problem. People make that choice using a familiar mechanism to them. This was a key point from Martin. The acceptance of T&Cs is part of installing an app, signing up to a service, accepting an OS upgrade, or using a search engine. The legitimacy of this model is not in doubt.
  2. All parties are in possession of people’s preferences and can respect them. This was a key point Jeffery made during the meeting.
  3. The user agent can keep a record of the state sharing that is happening and provide it to the user for inspection. The mechanism for state sharing is not relevant, so such inspection does not require the user to understand the difference between cookies, local storage, or anything else. They just see a list of standard contracts that they have accepted in one place.
  4. The user agent can take an unobtrusive audit role that they are free to implement as they wish. For example; a contract that has been verified by a Data Protection Authority could have an A grade rating and the user informed via an icon in the address bar. For example; “all data used on this web site followed DPA X’s standard contract”.
  5. This audit trial provides people the indisputable evidence they need should they feel a bad actor has harmed them.
  6. People will be informed what they should do if they believe a breach has occurred. The user agent could facilitate a complaint to the relevant Data Protection Authority.
  7. The Authorized Agent (AA) model in CCPA could be used to enable the AA to interface with user agent and handle settings for them. For example; should an identifier need to be removed then the AA could instruct the user agent to prevent it being shared in future. This would be in addition to any other AA options.
  8. Most importantly it allows participants on the web to share data without the only option being to ask people multiple times for the same permission, or having beneficial features of the web removed for some participants but not others.
jyasskin commented 2 years ago

I think there are both policy and technical questions here, and the policy questions are probably out of scope for this Work Item — I don't think we should get side-tracked by discussing how a browser decides whether to trust that its user consented to transferring a particular OWID. That could plausibly go in another Work Item if the CG is interested, and we might want to wait on that Work Item showing results before spending a lot of time on the technical questions here. It does make sense to spend a little time to see if this works at all.

One technical question is how a trusted OWID should interact with the mitigations for unwanted navigational tracking. Looking at the documented mitigations, they fall in 2 buckets: clearing storage after a "bad" navigation is detected, and removing "bad" query parameters.

Is the proposal here that if a URL has a trusted OWID embedded in it then a navigation to it is no longer "bad"? How should that embedding work?

Or is the proposal that if there were explicit state-transfer mechanisms outside of the URL, then people would stop embedding extra IDs into URLs, and so we could be more aggressive at blocking things that look like IDs in URLs?

jwrosewell commented 2 years ago

@jyasskin Where are the policies under which this Work Item is working defined?

Thanks for the questions.

Is the proposal here that if a URL has a trusted OWID embedded in it then a navigation to it is no longer "bad"?

Yes with emphasis on the use of the words "trusted" and "bad". If the OWID contract referenced were "trusted" by the user then the user agent must not consider it "bad".

Also there is assumption in the question that is incorrect. The URL would not be "bad" without the OWID embedded in it. It would be in an "unknown" state of "goodness" or "badness".

How should that embedding work?

Via well-known prefixes to indicate the presence of an OWID which browsers could optionally inspect. The string "owid" could be used as a key prefix to indicate an OWID value. e.g. owid-user-id could be a user identifier passed in an OWID wrapper.

Or is the proposal that if there were explicit state-transfer mechanisms outside of the URL, then people would stop embedding extra IDs into URLs, and so we could be more aggressive at blocking things that look like IDs in URLs?

The proposal would apply to any data transfer within the web browser irrespective of the transfer mechanism. However let's start with the URLs and see how it stands up to analysis.

It is not within the scope of the proposal to require anyone to stop do anything. I'm very passionate about choice and classic liberal values like that.

Logically if there are more efficient or less intrusive mechanisms available it seems reasonable people would use them, particularly if they were implemented consistently and were simple to use.

jwrosewell commented 2 years ago

Thank you @jyasskin for adding this to the agenda yesterday and contributing to the debate.

There were a number of misunderstandings that I attempted to address but due to time, or long held misconceptions, may not have communicated clearly enough. Here is a clarification.

As I understand the charter for this group, we are interested in privacy holistically across the web, not just as it relates to one particular use case or economic model. This issue/proposal takes components of the work developed to address problems with RTB advertising in relation to privacy (see ICO report from June 2019), and as such privacy beyond the perimeter of the web browser, and applies them to the specific concern of sharing of information via URLs accessed via a web browser.

This proposal considers both contract law and technical solutions. I do not believe that any enduring solution can be created for privacy that does not consider both. The debate identified this as a key area of difference between me and @johnwilander. This proposal does not require @johnwilander to change his position in relation to solutions exclusive utilising technical solutions to be viable. If Apple do not trust people to consent to contracts other than Apple’s to improve their access to the web then that is their choice and we must respect that.

This proposal will work with any contract. However, in practice I suspect a narrow set of Standard Contractual Clauses (SCC) will be used. Many of your businesses incorporate a SCC today for international data transfer of personal data from the EEA.

In practice people will be presented with a limited number of SCC contracts specifically for the purpose of sharing personal data. These SCCs will be embedded in hundreds of thousands of bespoke contracts that cover aspects other than the sharing of personal data. We therefore have a many to one relationship between website’s privacy policies and a SCC contract.

By applying the SCC contract to this proposal, we address many of the practical issues raised. Specifically, to @AramZS's question there could be hundreds of thousands of websites utilising a single SCC contract. Once the user has signaled their preference for that SCC contract then they do not need to be asked again for the hundreds of thousands of websites they might then go on to visit. To express this in cartoon form…

image

I believe that this is a similar concept to the Global Privacy Control (GPC) that many are already familiar with. The improvement over GPC is that there can be many SCC contracts for different purposes and thus people and web participants have greater choice. With SWAN people can be assured that all recipients of their personal data for advertising use cases must be bound by the same contract. They can be reassured that they have access to law enforcement should a breach occur.

I’m concerned that @ekr sought to dismiss the role of contracts in facilitating people’s safe access to the web. All people consent to a contract when installing a web browser. That contract gives the web browser the legitimacy to perform its function as the user’s agent. To argue that people cannot understand these contracts and thus consent to them undermines the legal basis for the provision of all digital services. Perhaps this is not what @ekr meant? If @ekr meant to state that such contracts are not confusing when presented by a web browser at the point of setup, but are confusing when presented by a website then I’d be interested to understand why he believes this so that I can consider remedies.

Group participants should consider the merits associated with gaining more information to inform decisions of “good” or “bad”, or “sanctioned” and “unsanctioned”, utilising an approach that places people in control and respects their choices. If not via the solution in this proposal, then perhaps others. Should the chairs wish to schedule a session to learn more about SCCs and their role in privacy I’d be happy to support that.