Site Groups / First Party Sets v2

pbannist commented 3 years ago

Site Groups

This document proposes a new web platform mechanism to declare a collection of related domains as being in a Site Group. This is an evolution of the First-Party Sets proposal to accommodate for several changes:

Removal of much language around “First-Party” as it has many historical connotations/denotations that may be less relevant or confusing in the future.
Renaming of the standard to “Site Groups” as not only does this remove the “First-Party” confusion, but is a more straightforward name that may even be usable in communication to users.
Proposes modifications to existing standard browser UA policies to remove “single organization control” as a requirement of Site Groups due to:

Lack of public information that documents corporate/organizational ownership, and any clear way of defining a policy that can be fairly applied
Inability of browsers to police organizational ownership
Bias of this requirement towards large companies over small
Organizational ownership not being discernible to the user, nor offering the user any comfort that their data would be used in a specific way
- Specifically empowers the “owner” site with the incremental cross-site functionality, and disempowers “secondary” sites from having cross-site capabilities. This should allow for all required functionality across sites while minimally increasing the availability of user data beyond the origin.
- Adds a requirement of a shared privacy policy, and a human-readable site group name across all domains in a Site Group.

Most of the language in this proposal is directly taken from the First-Party Sets proposal, and a significant amount of privacy-specific language was removed from this as no changes are proposed from the First-Party Sets proposal into this one. That is, I tried to remove any parts of the FPS proposal that were not modified in any way and/or were not relevant to expressing the changes in this version. If Site Groups is deemed to be a useful extension of FPS then those elements can be reintegrated back in.

Thanks to the editors/writers of the First-Party Sets proposal: Kaustubha Govind, Google David Benjamin, Google

Introduction

Browsers have proposed a variety of tracking policies and privacy models which scope access to user identity to some notion of “first-party”. From the user’s perspective, first-party has typically meant a singular domain, but this limits how sites can provide services to the user. Site Groups aims to increase the ability of sites to provide valuable services to their users by widening the privacy boundary to include affiliated sites, while minimally impacting the user’s privacy. In redefining this scope, we must balance two goals: the scope should be small enough to meet the user's privacy expectations, yet large enough to provide the user's desired functionality on the site they are interacting with.

One natural scope is the domain name in the top-level origin. However, the website the user is interacting with may be deployed across multiple domain names. For example, https://google.com, https://google.co.uk, and https://youtube.com are owned by the same entity, as are https://apple.com and https://icloud.com, or https://amazon.com and https://amazon.de. We may wish to allow user identity to span related origins, where consistent with privacy requirements. For example, Firefox ships an entity list that defines lists of domains belonging to the same organization. This explainer discusses a mechanism to allow organizations to each declare their own list of domains, which is then accepted by a browser if the set conforms to its policy.

Goals

Allow related domain names to declare themselves as within a Site Group.
Define a framework for browser policy on which declared names will be treated as the same site in privacy mechanisms.
Minimally increase the availability of user data to reduce potential privacy issues

Non-goals

Third-party sign-in between unrelated sites.
Information exchange between unrelated sites for ad targeting or conversion measurement.
Other use cases which involve unrelated sites.

Declaring a Site Group

A site group is identified by one owner registered domain and a list of secondary registered domains. (See alternative designs for a discussion of origins vs registered domains.)

An origin is in the site group if:

Its scheme is https; and
Its registered domain is either the owner or is one of the secondary domains.

The browser will consider domains to be members of a set if the domains opt in and the set meets UA policy, to incorporate both user and site needs. Domains opt in by hosting a JSON manifest at https:///.well-known/site-group. The secondary domains point to the owning domain while the owning domain lists the members of the set, a version number to trigger updates, and a set of signed assertions to inform UA policy (details below).

Suppose a.example, b.example, and c.example wish to form a first-party set, owned by a.example. The sites would then serve the following resources: https://a.example/.well-known/site-group { "owner": "a.example", "version": 1, "privacy-policy": "a.example/privacy-policy.html", "sg-name": "Human readable name of this site group", "members": ["b.example", "c.example"], "assertions": { "chrome-sg-v1" : "", "firefox-sg-v1" : "", "safari-sg-v1": "" } }

https://b.example/.well-known/site-group { "owner": "a.example" }

https://c.example/.well-known/site-group { "owner": "a.example" }

The browser then imposes additional constraints on the owner's manifest: Entries in members that are not registrable domains are ignored. Only entries in members that meet UA policy will be accepted. The others will be ignored. If the owner is not covered by UA policy, the entire set is rejected.

Owner Privileges

The owner domain of a given site group has special privileges within that site group. It can read and write "first party" data stores (first party cookies, LocalStorage, Storage Access API, etc.) within the browser when the browser origin is set to a domain within its site group. More plainly:

The owner domain has access to read/write all data from across the site group
Each secondary domain only has access to read/write data from its own domain (same as currently implemented)

This would generally require any sites in a site group including resources (iframes, script, etc.) from the owner domain in order to create applications using the site group, but this seems like a minimal issue. This also allows for cases where sites in a site group can avoid calling the owner domain in order to lower any privacy/security risks for that request.

Discovering Site Groups

By default, every registrable domain is implicitly owned by itself. The browser discovers site groups as it makes network requests and stores the site group owner for each domain. On a top-level navigation, websites may send a Sec-Site-Group response header to inform the browser of its site group owner. For example https://b.example/some/page may send the following header: Sec-Site-Group: owner="a.example", minVersion=1

If this header does not match the browser's current information for b.example (either the owner does not match, or its saved first-party set manifest is too old), the browser pauses navigation to fetch the two manifest resources. Here, it would fetch https://a.example/.well-known/site-group and https://b.example/.well-known/site-group. These requests must be uncredentialed and with suitably partitioned network caches to not leak cross-site information. In particular, the fetch must not share caches with browsing activity under a.example. See also discussion on cross-site tracking vectors.

If the manifests show the domain is in the set, the browser records a.example as the owner of b.example (but not c.example) in its site-group storage. It evicts all domains currently recorded as owned by a.example that no longer match the new manifest. Then it clears all state for domains whose owners changed, including reloading all active documents. This should behave like Clear-Site-Data: *. This is needed to unlink any site identities that should no longer be linked. Note this also means that execution contexts (documents, workers, etc.) are scoped to a particular site group throughout their lifetime. If the group owner changes, existing ones are destroyed.

The browser then retries the request (state has since been cleared) and completes navigation. As retrying POSTs is undesirable, we should ignore the Sec-Site-Group header directives on POST navigations. Sites that require a site group to be picked up on POST navigations should perform a redirect (as is already common), and have the Sec-Site-Group directive apply on the redirect. Subresource requests and subframe navigations are simpler as they cannot introduce a new first-party/site group context. If the request matches the origin URL's owner's manifest but is not currently recorded as being in that site group, the browser validates membership as above before making the request. Any Sec-Site-Group headers are ignored and, in particular, the browser should never read or write state for a site-group other than the current one. This simpler process also avoids questions of retrying requests. The minVersion parameter in the header ensures that the browser's view of the owner's manifest is up-to-date enough for this logic.

Design details

UA Policy

Defining acceptable sets

We should have some notion of what sets are acceptable or unacceptable. For instance, a set containing the entire web seems clearly unacceptable. Conversely, a set containing https://acme-corp-landing-page.example and https://acme-corp-online-store.example seems reasonable. There is a wide spectrum between these two scenarios. We should define where to draw the line.

Browsers implementing Site Groups will specify UA policy for which domains may be in the same set. While not required, it is desirable to have some consistency across UA policies. For a set of guiding principles in defining UA policy, we can look to how the various browser proposals describe first parties (emphasis added):

A Potential Privacy Model for the Web (Chromium Privacy Sandbox): "The notion of "First Party" may expand beyond eTLD+1, e.g. as proposed in First Party Sets. It is reasonable for the browser to relax its identity-sharing controls within that expanded notion, provided that the resulting identity scope is not too large and can be understood by the user."
Edge Tracking Protection Preview: "Not all organizations do business on the internet using just one domain name. In order to help keep sites working smoothly, we group domains owned and operated by the same organization together."
Mozilla Anti-Tracking Policy: "A first party is a resource or a set of resources on the web operated by the same organization, which is both easily discoverable by the user and with which the user intends to interact."
WebKit Tracking Prevention Policy: "A first party is a website that a user is intentionally and knowingly visiting, as displayed by the URL field of the browser, and the set of resources on the web operated by the same organization." and, under "Unintended Impact", "Single sign-on to multiple websites controlled by the same organization."

UA policies are at the discretion of each browser, but since this proposal does require UA policies to be in alignment, making any required adjustments to those policies is important. Specifically, the requirement of ownership by a single organization has a variety of issues. These are laid out in the following issues:

https://github.com/privacycg/first-party-sets/issues/18 https://github.com/privacycg/first-party-sets/issues/17 https://github.com/privacycg/first-party-sets/issues/14

While it is obviously up to each browser vendor to decide on its own UA policy, removing this requirement does not seem to have a negative impact on overall privacy considerations. Two ways to mitigate for the removal of this requirement are:

Shared and declared privacy policy across all domains
Removal of cross-site privileges from all domains except the owner domain

Additionally, the robust, enforceable requirements of the First-Party Sets proposal remain:

Signed assertions by a trusted verification entity
Sites being able to join only a single site group

Given the UA policy, policy decisions must be delivered to the user’s browser. This can use either static lists or signed assertions. Note that site group membership requires being listed in the manifest in addition to meeting UA policy. This allows sites to quickly remove domains from their site group set.

Shared Privacy Policy

More so than organizational ownership, a shared privacy policy across all domains in a site group can give a user comfort that their data is being used in a consistent and understandable way within the site group. Additionally, by including a shared privacy policy within the json declaration, would allow any browser UI elements to include direct links to the policy for users to inspect.

Parties that were interested in verifying that a site group was "well-behaved" could easily validate that all member sites did indeed adhere to the shared privacy policy, and report violations to browsers directly or to any entities managing signed assertions for site groups (see below).

A given domain within a group could implement a stricter privacy policy than the site group-shared policy, but could not relax any of the policies from a sharing/transfer/usage perspective. This could also be validated by interested parties and external assertion-management entities.

This idea could be extended to give users direct links to "forget me" from the site group or similar functionality within the browser.

Static lists

The browser vendor could maintain a list of domains which meet its UA policy, and ship it in the browser. This is analogous to the list of domains owned by the same entity used by Edge and Firefox to control cross-site tracking mitigations.

A browser using such a list would then intersect first-party set manifests with the list. It would ignore the assertions field in the manifest. Note fetching the manifest is still necessary to ensure the site opts into being a set. This avoids problems if, say, a domain was transferred to another entity and the static list is out of date.

Static lists are easy to reason about and easy for others to inspect. At the same time, they can develop deployment and scalability issues. Changes to the list must be pushed to each user's browser via some update mechanism. This complicates sites' ability to deploy new related domains, particularly in markets where network connectivity limits update frequency. They also scale poorly if the list gets too large.

Signed assertions

Alternatively, the browser vendor, or some entities it designates, can sign assertions for domains which meet UA policy, using some private key. A signed assertion has the same meaning as membership in a static list: these domains meet the signer’s policy. The browser would trust the signers’ public key and, as above, only accept domains covered by suitable assertions. Assertions are delivered in the assertions field, which contains a dictionary mapping from signer name to signed assertion. Browsers ignore unused assertions. This format allows sites to serve assertions from multiple signers, so they can handle policy variations more smoothly. In particular, we expect policies to evolve over time, so browser vendors may wish to run their own signers. Note these assertions solve a different problem from the Web PKI and are delivered differently. However, many of the lessons are analogous.

As with a static list, signers maintain a full list of currently checked domains. They should publish this list at a well-known location, such as https://sg-signer.example/site-groups.json. Although browsers will not consume the list directly, this allows others to audit the list. The signer may wish to incorporate a Certificate-Transparency-like mechanism for stronger guarantees. The signer then regularly produces fresh signed assertions for the current list state. For extensibility, the exact format and contents of this assertion are signer-specific (browsers completely ignore unknown signers, so there is no need for a common format). However, there should be a recommended format to avoid common mistakes. Each signed assertion must contain:

The domains that have been checked against the signer’s policy
An expiration time for the signature
A signature over the above, made by the signer’s private key

Assertion lifetimes should be kept short, say two weeks. This reduces the lifetime of any mistakes. The browser vendor may also maintain a blocklist of revoked assertions to react more quickly, but the reduced lifetime reduces the size of such a list. To avoid operational challenges for sites, the signer makes the latest assertions available at a well-known location, such as https://sg-signer.example/assertions/. We will provide automated tooling to refresh the manifest from these assertions, and sites with more specialized needs can build their own. To support such automation, the URL patterns must be standard across signers.

Note any duplicate domains in the assertions and members attribute should compress well with gzip.

UI Treatment

In order to provide transparency to users regarding the Site Group that a web page’s top-level domain belongs to, browsers may choose to present UI with information about the Site Group owner and the members list. One potential location in Chrome is the Origin/Page Info Bubble - this provides requisite information to discerning users, while avoiding the use of valuable screen real-estate or presenting confusing permission prompts. However, browsers are free to choose different presentation based on their UI patterns, or adjust as informed by user research.

Browser UI elements can also expose the shared privacy policy for the site group, as well as a human readable name of the site group that would desirably match any cross-site branding. This would hopefully give users even more context about the site group, how their information is used, and why the site is part of a given site group.

Note that Site Groups also gives browsers the opportunity to group per-site controls (such as those at chrome://settings/content/all) by the site group boundary instead of eTLD+1, which is not always the correct site boundary.

johnwilander commented 3 years ago

Hi Paul!

The owner domain has access to read/write all data from across the site group

My understanding of this is that it would violate the same-origin policy. Is that accurate? Back when we (Apple WebKit) originally proposed this kind of feature we called it "Affiliated Domains" but later we unfortunately referred to it as "same-origin policy v2" to entice people (link to the email). That got people in W3C WebAppSec worried that we we're proposing a relaxation of the same-origin policy which we were not. Maybe things have changed since, but I'd be surprised if there was any appetite for such cross-site access to website data.

But it may be that you're saying the owner domain would have access to its website data unconditionally when embedded on a site from the site group. That would not violate the same-origin policy.

pbannist commented 3 years ago

Hi John,

Thanks for the feedback! I think I am not articulating myself correctly, so let me try again :) In the existing First Party Sets proposal, a primary application case is described as:

That is, if a.example and b.example are in the same first-party set, the same-origin policy should still prevent https://a.example from accessing https://b.example's IndexedDB databases. However, it may be reasonable to allow a https://b.example iframe within https://a.example to access the https://b.example databases.

From my understanding, that would not violate the same-origin policy. I'm proposing strengthening that, such that:

If the owner of a given "site group" was a.example, with members b.example and c.example
An a.example iframe embedded into b.example or c.example could read its IndexedDB database
But a b.example or c.example iframe embedded into another domain in the site group could not read their IndexedDB data.

This is certainly where my knowledge of privacy/security concerns hits a limit. I think this is a more private/safer solution, since only one domain is given this special privilege. However, maybe there are reasons why this is a bad idea, and can totally remove this from my draft proposal if it's unworkable.

So yes - what you wrote at the bottom is what I mean, just trying to rewrite it so it's more clear.

johnwilander commented 3 years ago

Agreed that would not violate the same-origin policy, only relax third-party storage restrictions. In fact, that makes this particular part of your proposal very close to what we’ve expressed as our main interest in FPS, namely the ability to relax some restrictions for a dedicated single sign-on domain within a set. That relaxation could take many forms, such as wording in permission prompts or long term persistence of a granted permission instead of it being ephemeral or very limited in time.

jackfrankland commented 3 years ago

I'm not sure you can remove the concept of organizational ownership here, as a key aspect of a privacy policy is surely the entity that controls the users' data (the company, business, organization, or whatever you call it).

Is the aim of this proposal to make things more strict and to say, as well as being owned by the same entity, the group of secondary domains must also follow the same conditions of the privacy policy?

jwrosewell commented 3 years ago

A number of people are interested in discussing matters related to first and third party at TPAC. If the current accepted W3C definitions do not reflect people's higher fidelity requirements in practice then it may be better to address those definitions and the implications prior to working on technical changes.

pbannist commented 3 years ago

@jackfrankland thanks for your feedback! I don't think organizational ownership actually matters, and just acts as a construct to bias the proposal towards larger organizations. Two main reasons:

As I wrote about in https://github.com/privacycg/first-party-sets/issues/14 and https://github.com/privacycg/first-party-sets/issues/18, an "organization" is a very nebulous thing that has many different definitions in many places. Geico and Dairy Queen are part of the same organization, but clearly there is no data privacy relationship between the two. So making a shared privacy policy a feature of the proposal seems to help here, but the organizational ownership part doesn't seem additive to me at all.
On the reverse side, my company (CafeMedia) acts as an agent for a large number of smaller publishers. We manage their GDPR/CCPA consent and data sharing policies, we manage and enforce contracts with downstream companies that access user data, and we mandate shared privacy policies across sites. So in this case, we can enforce across a large number of domains (that we don't actually own), how all of the data is used and what the user can expect.

One of the reasons I think the "owner domain" within the site group should be the only domain with privileges across the site group, is because it addresses those two points in a way that (in my opinion) makes it easier to drop the organizational requirement. It enforces a singular entity (via one domain) as the "owner" of the group, which is the only entity that gets the special "privileges" of site groups. This can be the same organization as the member domains, or it can be a different organization, but it ensures that only one organization gets the special privileges. Then the shared privacy policy enforces that any domains in the group are acting with the same policies and practices.

jackfrankland commented 3 years ago

@pbannist Thanks a lot for the reply.

I completely agree that the concept of any organizational ownership of a company that operates a site is not additive. Perhaps it's the ambiguity of the term, and it's likely that I have misunderstood how it's been used so far.

I believe what is relevant is the entity that acts as the data controller for the domain a user visits. For dairyqueen.com this seems to be American Dairy Queen Corporation, and for Geico it seems to be Geico Corporation, as listed in their separate privacy policies.

I'm not sure if it's exactly what you're suggesting, but I wanted to raise the point that I don't believe two domains can share the same privacy policy if the data controller for each is different. They may share the same behaviour in regards to how they treat the user's data, but the contract is not the same. Therefore, if two domains are part of the same set/group, they must also be owned/controlled by the same entity.

gffletch commented 3 years ago

From an Identity perspective, what is the expectation for an identity provider that operates as both the IDP for first party relying parties and 3rd party relying parties? The IDP can only be part of a single "group" but still operate as a federated identity provider for other sites but not as a member of other group?

pbannist commented 3 years ago

@gffletch That's a good question and maybe better asked on the main FPS proposal, as your issue would be the same there, and that has more eyes watching it. I do not know the answer.

TanviHacks commented 2 years ago

Hi @pbannist! We see that this proposal hasn't been touched in a while - is there anything more to discuss here or should we close this out?

privacycg / proposals