Proposed Work Item: First-Party Sets

krgovind commented 4 years ago

First-Party Sets is a web platform mechanism that allows a set of registrable domains (or origins) to be defined as "first-party" to each other. Our primary motivation for this proposal is to define a privacy boundary that allows browsers to eliminate cross-site tracking that currently relies on mechanisms such as third-party cookies and fingerprinting. Tracking policies and privacy models from various browser vendors - Chromium, Edge, Mozilla, WebKit - scope access to user identity to some notion of first-party , which we refer to as a privacy boundary.

Although the top-level document’s registrable domain can act as a natural privacy boundary; it is clear that multi-domain sites are a reality, which compels us to define a better alternative. For example, Firefox ships an entity list to group together domains belonging to the same organization.

Organizations generally prefer maintaining distinct domain names to manage branding, or to allow for future business sales/acquisitions. Additionally, choosing the registrable domain as the privacy boundary may compel organizations to move all their web properties to a single parent domain. The parent domain that a property is hosted on may change with business ownership, and train users to make security decisions based on the subdomain component of URLs. This could make them more susceptible to phishing attacks.

First-Party Sets allows site operators to assert a list of domains as being associated with the same entity. This then allows us to define a top-level document’s First-Party Set as the privacy boundary. Browsers may choose to not impose cross-domain communication restrictions across members of a given First-Party Set (such as is done in practice with disconnect.me’s extension, Firefox ETP’s use of the entity list, and Edge Tracking Protection’s similar exception for same-party domains). However, it is important to apply a set of countervailing pressures:

Preventing abuse by unrelated websites forming a First-Party Set - This is achieved by requiring every organization to submit their list for acceptance based on conformance with a published UA policy.
Making site associations visible to the user - This is achieved by making First-Party Sets discoverable via various browser UI surfaces.
Discourage formation of arbitrarily large sets by imposing storage and entropy limits - Browser storage limits and entropy limits such as the proposed Privacy Budget that are currently applied per-domain are applied per First-Party Set

First-Party Sets has recently been the subject of discussion on various forums; including at PrivacyCG F2F, and WebAdvBG.

We have been working to incubate First-Party Sets in WICG, and it was recently transferred there: https://github.com/WICG/first-party-sets

We'd like to propose that the Privacy CG discuss it and see if the group would like to take it on as a Work Item.

othermaciej commented 4 years ago

Apple supports adopting this proposal as a Privacy CG Work Item. We have proposed similar mechanisms in the past and continue to be interested in this area.

In honesty, we would probably not implement the spec as-is because it leaves too many of the hard problems with such a mechanism unsolved or up to each individual browser, but we believe they are eminently solvable, and Privacy CG would be a great place to work through them.

melanierichards commented 4 years ago

Echoing Microsoft Edge sentiment from the WICG Discourse thread: we believe that First-Party Sets could be useful in helping unblock valid intra-organizational use cases while maintaining the right privacy promises. We’re supportive of exploring this idea further. Agreed that as a community we’ll need to continue workshopping mitigations against abuse while striking the right balance between organizational cohesion vs. sets that can be reasoned about by most users. We’re hopeful that we can collectively come up with solutions to these considerations, and are interested in continued discussion on First-Party Sets.

Privacy CG would be a great home for this.

pbannist commented 4 years ago

Echoing what I wrote on the Discourse thread, I think this proposal is better discussed in WICG. Privacy is a major consideration here, but it is not the overriding or exclusive consideration. The Privacy Group would seem to relegate all other considerations to second-class, which is not appropriate for a standard that has so many implications that go beyond privacy.

johnwilander commented 4 years ago

Echoing what I wrote on the Discourse thread, I think this proposal is better discussed in WICG. Privacy is a major consideration here, but it is not the overriding or exclusive consideration. The Privacy Group would seem to relegate all other considerations to second-class, which is not appropriate for a standard that has so many implications that go beyond privacy.

First Party Sets aim to relax privacy (and potentially security) protections on the web. Such protections are an overriding concern but not an exclusive concern. If we don't figure out how to uphold existing protections, browser vendors who prioritize user privacy are unlikely to implement First Party Sets and the end result would be a bifurcated web in terms of how domain names are handled. That's why I think First Party Sets should be discussed in the Privacy CG. This is a place where we have a reasonable chance of figuring out a version of this proposal that's acceptable by most browser vendors.

pbannist commented 4 years ago

If adopted by browsers other than Chrome (like Safari/Webkit) then, yes FPS does have the side effect (not aim) of reducing privacy, and perhaps security, protections. However, within Chrome, it is part of a set of proposals that aim to increase privacy and security, while limiting economic damage to publishers.

It is possible that a more desirable outcome for "the web" is: 1) More privacy, on the whole 2) Less economic damage to publishers 3) A bifurcated web around standards (which already exists in many cases) 4) Less power concentrated among a small number of multi-national conglomerates

It seems less likely that an honest conversation across all stakeholders can be had if privacy is the overriding concern.

johnwilander commented 4 years ago

If adopted by browsers other than Chrome (like Safari/Webkit) then, yes FPS does have the side effect (not aim) of reducing privacy, and perhaps security, protections. However, within Chrome, it is part of a set of proposals that aim to increase privacy and security, while limiting economic damage to publishers.

It is possible that a more desirable outcome for "the web" is:

More privacy, on the whole

Less economic damage to publishers

A bifurcated web around standards (which already exists in many cases)

Less power concentrated among a small number of multi-national conglomerates

It seems less likely that an honest conversation across all stakeholders can be had if privacy is the overriding concern.

I don't understand what "within Chrome" means. Do you mean this is a one-browser feature? If the aim is not to get browser interoperability, I don't see why it should be discussed anywhere within W3C. This is a place were we work together to enhance and develop a web platform that works regardless of which (modern) browser is being used. Given that the goal is interoperability, I think the Privacy CG is the right place to work on First Party Sets.

I'll let Google and @krgovind speak to whether they share your views since they are the ones proposing First Party Sets.

pbannist commented 4 years ago

I mean that if Webkit/Safari and other browsers could consider other perspectives (around economic benefits, increased competition, support for diverse voices, etc.) beyond privacy, perhaps an interoperable standard could be created that does decrease privacy in return for other end-user benefits. Or, the choice could be made that an interoperable standard is not possible.

However, without an honest conversation around all considerations, it seems that the only possible outcomes of an interoperable FPS standard are: 1) increased privacy on the whole: increased privacy in Chrome, slightly decreased privacy in other browsers 2) one additional user benefit: transparent cross-domain functionality within an "organization" - I am oversimplifying the benefit, to be fair 3) significant degradation of other considerations: reduced competition/innovation, reduced economic outcomes, reduced diversity of voices on the web

I'm also very interested in the Chrome team's point of view.

erik-anderson commented 4 years ago

The Privacy CG has very explicit goals around multi-implementer support and evaluating web compatibility impacts, so a characterization that it doesn't take a holistic view is, in my opinion, unfair. Current Work Items, including Storage Access API and Private Click Measurement, are designed to provide capabilities to help address some of the concerns outlined. Privacy considerations will be an important part of the conversation no matter where this is incubated.

The Privacy CG has a more regular cadence for discussion than WICG (which is designed to be lightweight), including twice-a-month teleconferences, breakout sessions, and face-to-faces. It's likely to get more focused time and attention from a diverse set of interests, including both the ads industry and browser developers. As a result, I believe it's likely to move forward more quickly in the Privacy CG.

bslassey commented 4 years ago

First Party Sets aim to relax privacy (and potentially security) protections on the web.

I'll disagree with this framing. First Party Sets are aiming to establish a well defined notion of first parties that can safely maintain existing capabilities granted to third parties in order to enable browsers to put greater restrictions on true third parties. It's also important to note that both Firefox and Edge have seen the need to use entities.json for a similar purpose. So this is hopefully standardizing that existing behavior. If we end up doing an origin-based FPS, this could allow a tightening of same-site security boundaries. I think this opportunity is really interesting, if only for the potential to deprecate or at least reduce the dependency on the PSL.

johnwilander commented 4 years ago

First Party Sets aim to relax privacy (and potentially security) protections on the web.

I'll disagree with this framing. First Party Sets are aiming to establish a well defined notion of first parties that can safely maintain existing capabilities granted to third parties in order to enable browsers to put greater restrictions on true third parties. It's also important to note that both Firefox and Edge have seen the need to use entities.json for a similar purpose. So this is hopefully standardizing that existing behavior. If we end up doing an origin-based FPS, this could allow a tightening of same-site security boundaries. I think this opportunity is really interesting, if only for the potential to deprecate or at least reduce the dependency on the PSL.

Sorry, I should have been more precise. Today, a third-party means differing registrable domain from the top frame. With FPS, the intention is to, for at least some engine decisions, treat some such differing registrable domains as first party. That to me is a relaxation.

But all of this should be discussed in issues, not the proposal. 🙂 I‘ve been wanting to solve this for years, as shown by my two pitches of the idea to WebAppSec in 2017, and I really hope we can get to a definition that holds over time as new business decisions are made based on the existence of FPS and that meets user expectations. I even have some ideas for how to resolve some things. I’ll share once we have a repo.

annevk commented 4 years ago

existing capabilities

I think allowing for this at all was a design mistake.
Standards have always allowed for different policies to prevent tracking, e.g., https://html.spec.whatwg.org/#user-tracking. What Safari has done and others are doing as well is making that the default.

It's also important to note that both Firefox and Edge have seen the need to use entities.json for a similar purpose. So this is hopefully standardizing that existing behavior.

How would that work without it being centrally managed?

krgovind commented 4 years ago

It's also important to note that both Firefox and Edge have seen the need to use entities.json for a similar purpose. So this is hopefully standardizing that existing behavior.

How would that work without it being centrally managed?

@annevk I think what you're advocating for is a centralized/unified UA policy as defined in the current proposal, in order to enable standardization? Please feel free to open an issue on the repo with that suggestion. :)

jackfrankland commented 4 years ago

I'd like to make a quick argument for the proposal https://github.com/privacycg/proposals/issues/11, if I may :)

Instead of defining a relationship between domains, I believe a better solution is to define the relationship between a domain and the business that owns it. A business may own multiple domains, and therefore relationships between domains can be inferred, potentially serving the same goals as first party sets. In just this regard I believe it has the following advantages:

There is no need for a possibly arbitrary owner domain.
It's not necessary to make requests to two domains to confirm the relationship.
I believe there is value in knowing the business/entity that has access to the user's data for a domain, and that this relationship is a more easily defined thing that user agents can freely use to determine differing behaviour. This is in contrast to something that arguably has less meaning/value by itself when it depends on dynamic UA policy for its definition. In order for first party sets to be most successful, it may require consistency of behaviour between user agents, which could be difficult.

bslassey commented 4 years ago

Sorry, I should have been more precise. Today, a third-party means differing registrable domain from the top frame. With FPS, the intention is to, for at least some engine decisions, treat some such differing registrable domains as first party. That to me is a relaxation.

I just want to be really clear about this point, while FPS establishes a set of domains that are owned/controlled/run by the same party, it is not suggesting to treat them as first party to each other such that they would be equivalent to subdomains of the same registrable domain. Perhaps this was a mistake in naming (perhaps "Entity Sets" would be better to put it in the context of entities.json and happy to revisit that choice).

existing capabilities

I think allowing for this at all was a design mistake.

And that is why we are all looking to reduce the capabilities of third parties, which this helps to enable. Or are you suggesting that those capabilities are too powerful to allow for a set of domains that are owned/controlled/run by the same party?

Standards have always allowed for different policies to prevent tracking, e.g., https://html.spec.whatwg.org/#user-tracking. What Safari has done and others are doing as well is making that the default.

It's also important to note that both Firefox and Edge have seen the need to use entities.json for a similar purpose. So this is hopefully standardizing that existing behavior.

How would that work without it being centrally managed?

As @krgovind pointed out, central management is certainly a possibility, but defining what that means is important. I am very much of the opinion that the current central management isn't working very well for a number of reasons (no clear policy, sets that are clearly wrong, lack of awareness or opt-in from affected entities, etc.).

hober commented 4 years ago

@krgovind, would you like to talk about this during this week's telcon? If so, please add the 'agenda+' label to this issue. Thanks!

johnwilander commented 4 years ago

Sorry, I should have been more precise. Today, a third-party means differing registrable domain from the top frame. With FPS, the intention is to, for at least some engine decisions, treat some such differing registrable domains as first party. That to me is a relaxation.

I just want to be really clear about this point, while FPS establishes a set of domains that are owned/controlled/run by the same party, it is not suggesting to treat them as first party to each other such that they would be equivalent to subdomains of the same registrable domain. Perhaps this was a mistake in naming (perhaps "Entity Sets" would be better to put it in the context of entities.json and happy to revisit that choice).

This is very similar to the discussion back in spring of 2017 when I called this proposal Same-Origin Policy v2. People thought I proposed relaxing parts of the existing same-origin policy, similar to what you describe above with subdomains. That was never the case and it is not the case here where I say relaxation.

There are many more "engine decisions" made on first versus third party than same-origin policy ones. I went through some of them back in 2017 and would like to explore them anew as part of this work item. Some examples:

Partitioning. You could envision a joint FPS partition or even no partitioning within a FPS.
Cookie blocking.
CORS preflights. You could envision a cross-origin resource not having to do preflights if it's loaded by a website within its FPS. That mode could be opt-in. (You could argue that this is actually a case of relaxing the existing same-origin policy. 🙂)
Storage Access API decisions on prompting or wording in the prompt.

englehardt commented 4 years ago

It would be helpful to understand precisely the problems we’d like to solve with First Party Sets, and why those problems can’t be solved through other web platform features or proposals (e.g., the Storage Access API).

The definition of “first party” should be clear and understandable to users, web developers, and publishers. The simplest, most natural approach is to enforce a strict one-to-one mapping between first party and registrable domain (i.e., eTLD+1) or a narrower selector (e.g., origin). Using information from the top-level URL is the ideal way to indicate first party because this is already familiar to most users, it is based on a unique identifier for the website owner, it is consistent across web browsers, it is visible in the address bar, and is even visible in a URL to a page that has not yet been visited.

Unfortunately, a definition of first party based on top-level URL isn’t compatible with all sites on the web today. Some cross-site applications expect unrestricted access to third-party cookies. For this reason, Mozilla has deployed Disconnect’s entity list. This is a web compatibility intervention that we hope to deprecate as fewer browsers support third party cookies and fewer sites rely on them.

Standardizing such an intervention through First Party Sets solidifies new means of cross-site communication that are unintuitive, and that reduce the accountability a site has to a user. This is opposite of the direction we'd like to move the web.

Shared membership in a First Party Set is not easily discoverable. Why should a user expect that a visit to siteA-flowers.example would automatically be correlated to their siteB-roses.example account? We should not have to rely on their shared ownership being implicit knowledge. We don’t see an additional “UI treatment” that will fix the unwanted surprise.

Requiring the user agent to enforce a policy puts too much onus on the user agent in constructing a policy and rules for determining which First Party Sets are permitted. Inconsistent application of those rules, especially between different browsers, creates considerable uncertainty for sites. This creates compatibility problems for all browsers that are most felt by smaller actors, and may force browsers to adopt the most permissive of the policies (as pointed out by Maciej). This might be alleviated by agreeing to a common set of rules, but we don’t expect to reach agreement on those rules, leaving uncertainty where there is no agreement.

These issues seem fundamental to the design of the proposal, and hence Mozilla is not supportive of First Party Sets.

englehardt commented 4 years ago

First Party Sets aim to relax privacy (and potentially security) protections on the web.

[snip] It's also important to note that both Firefox and Edge have seen the need to use entities.json for a similar purpose. So this is hopefully standardizing that existing behavior.

To respond to this specifically: we found entities.json to be necessary for web compatibility, but that shouldn't be used as justification for standardizing such functionality. The need arises from shipping protections in the face of applications that rely on the legacy functionality of permissive third-party cookie policies at a time when blocking some (or all) third-party cookies was not a shared goal among browser vendors. It's something we can do without requiring websites to change, but first party sets requires change.

brodrigu commented 4 years ago

This creates compatibility problems for all browsers that are most felt by smaller actors

While use-cases of larger actors are clearer and these actors have the resources to be more vocal and represented, we should be cautious about prioritizing the larger actors use-cases above those of smaller actors, particularly if we aim to promote a dynamic and open web.

jdwieland8282 commented 4 years ago

I just want to be really clear about this point, while FPS establishes a set of domains that are owned/controlled/run by the same party, it is not suggesting to treat them as first party to each other such that they would be equivalent to subdomains of the same registrable domain. Perhaps this was a mistake in naming (perhaps "Entity Sets" would be better to put it in the context of entities.json and happy to revisit that choice).

I think the definition of a FPS should expand to include domains acting in a cooperative fashion, otherwise FPS heavily favors big companies, Google.com & youtube.com for example.

othermaciej commented 4 years ago

I agree with Mozilla's concerns about this proposal. However, I think it's at least possible, if uncertain, that the user-understandability, bad-faith, and interop problems can be solved, and I think it's worth a try.

jwrosewell commented 4 years ago

A method of addressing the competing concerns the proposal highlights is needed. Two options available are:

First-Party Sets were reviewed by W3C Technical Architecture Group (TAG) during May. TAG have a set of adopted Ethical Privacy Principles which would have been used to assess this, and any other proposal. Is it possible to ask TAG reviewers for their assessment regarding the competing concerns raised here?
The problems described here are not new. The issues surfaced in these comments were documented in 2002 by MIT in their "Tussle in Cyberspace" document. Page 3 is particularly interesting. The “Tussle” should be settled before specific proposals dependent on the outcome of the “Tussle” are contemplated. W3C values among other documents provide some guidance.

Overall the proposal is based on a number of assumptions which do not sit comfortably with both TAG and W3C positions.

People are incapable of trusting a domain owner AND their supply chains. In no other industry is this the case.
People should not have the ability to make such trust choices.
The W3C should create standards to resolve matters related to commercial practice.

My broader comments to TAG review can be found here.

kdeqc commented 4 years ago

I support this proposal, but I think it might be helpful if we looked at it from how we would ideally classify domains. Then we could decide if a single mechanism could be used (eventually, since it would be fairly difficult to solve everything at once). With that in mind, here's how I would ideally classify domains. I would be interested in how others would classify them.

First-party domain: as defined today, the domain in the address bar
Associated domains: other domains owned by the same organization, where consumers should understand the relationship between them because they share branding
Cooperative/2nd-party domains: I think requiring the domains to be owned by the same entity does generally benefit larger companies. So a mechanism where smaller companies/publishers could join together into a cooperative could be beneficial too
Pure SaaS domains: this is for the sites which use third-party vendors for services like web analytics, shopping carts, file hosting, help desks, learning modules, chat services, etc. This category would be for vendors that either don't collect data, or only collect data on behalf of the first-party domain and never for their own use. The idea would be that the first-party domain would disclose who they use as vendors, so they could be seen as a trusted third-party.
Other third-party domains: either they don't have a direct relationship with the first-party domain, or they collect user data for their own use.

Thanks!

jwrosewell commented 4 years ago

the first-party domain would disclose who they use as vendors, so they could be seen as a trusted third-party

This seems like a simple to explain and implement option which addresses many of the "Tussles".

A domain owner would publish the other domains they trust. This "bundle" of domains would all be treated as one by the browser.

A privacy "check-list" icon could be used within the browser UI to enable people to quickly see who they're trusting if they trust the primary domain owner.

krgovind commented 4 years ago

Apologies for the delay in responding to all the great comments and feedback here. I'll try and address as many as possible.

@pbannist @brodrigu @jdwieland8282 @kdeqc - You all mentioned a desire to allow co-operating domains to form a First-Party Sets (FPS). This explicitly conflicts with our goal for FPS to serve as a site's privacy boundary (and if we move to an origin-based FPS, perhaps it could even serve as a security boundary). Although not within current scope, we can also envision future extensions to FPS, such as credentials sharing between native apps and websites as described by Mike West in his original FPS proposal. As such, FPS could have security implications, and is ideally reserved for true first-parties. Having said that, I would encourage you to engage with other proposals such as FLoC and TURTLEDOVE, which aim to preserve a vibrant and competitive open web.

@jackfrankland I left a comment on the issue that should explain my preference for using a .well-known location over DNS TXT records.

There is no need for a possibly arbitrary owner domain.

Requiring a manifest hosted at a .well-known necessitates this.

It's not necessary to make requests to two domains to confirm the relationship.

I suppose we could make this possible with the FPS proposal as well by requiring that the signed manifest be hosted at .well-known locations of owner and member domains. However, having a single source of truth might make deployment easier and prevent ownership being out-of-sync for brief periods while the manifest is deployed over multiple domains? Or perhaps, we simply have each member host a signed assertion that simply specifies the owner domain, but having a central unified manifest might be more scalable and easier to verify?

I believe there is value in knowing the business/entity that has access to the user's data for a domain, and that this relationship is a more easily defined thing that user agents can freely use to determine differing behaviour. This is in contrast to something that arguably has less meaning/value by itself when it depends on dynamic UA policy for its definition. In order for first party sets to be most successful, it may require consistency of behavior between user agents, which could be difficult.

Your proposal mentions an authority that signs domain-to-entity relationship - so there is clearly some policy at play, although you are proposing a unified policy - which I think comprises of two requirements: (a) common ownership, and (b) common privacy policy - across all user agents. Others have also expressed concerns about inconsistencies between user agent policy, and that is something we are willing to work to bring consistency to.

@englehardt You make a good point about the registrable domain in the URL being the accepted/familiar means of identifying a site's identity. However:

Users' susceptibility to phishing attacks suggests that this assumption (that security decisions on the web are made based on the registrable domain component of the URL) doesn't actually hold true for all users today. Regardless, we can perhaps overlook that, and consider the next point:
Businesses being acquired/sold is a reality. Just as an example, Flickr has changed ownership a few times in the past decade. So we should now expect users to think it's completely ordinary that flickr.com should redirect to either flickr.yahoo.com or flickr.verizon.com or flickr.smugmug.com; and train them to make security decisions based on the subdomain, which we know to be an anti-pattern.

@kdeqc Thank you for that neat classification of domains!

First-party domain: as defined today, the domain in the address bar

This is unaffected.

Associated domains: other domains owned by the same organization, where consumers should understand the relationship between them because they share branding

This is indeed the target use-case for FPS.

Cooperative/2nd-party domains: I think requiring the domains to be owned by the same entity does generally benefit larger companies. So a mechanism where smaller companies/publishers could join together into a cooperative could be beneficial too

See my response at the top.

Pure SaaS domains: this is for the sites which use third-party vendors for services like web analytics, shopping carts, file hosting, help desks, learning modules, chat services, etc. This category would be for vendors that either don't collect data, or only collect data on behalf of the first-party domain and never for their own use. The idea would be that the first-party domain would disclose who they use as vendors, so they could be seen as a trusted third-party.

Since these entities "only collect data on behalf of the first-party domain and never for their own use", I believe they should be satisfactorily served by partitioned storage. This is consistent with Chrome's Privacy Model, and I see a relevant discussion in issue #18.

Other third-party domains: either they don't have a direct relationship with the first-party domain, or they collect user data for their own use.

I think these should be N/A as far as web privacy work goes?

annevk commented 4 years ago

Flickr has changed ownership a few times in the past decade. So we should now expect users to think it's completely ordinary that flickr.com should redirect to either flickr.yahoo.com or flickr.verizon.com or flickr.smugmug.com; and train them to make security decisions based on the subdomain, which we know to be an anti-pattern.

If they indeed did that it seems like it would make it much more transparent to users who they have a relationship with. How is that not a win?

jwrosewell commented 4 years ago

@krgovind How do the goals of FPS change in light of the CMAs recommendations concerning User IDs and choice?

It has been noted that standards bodies dominated by large organisations disadvantage smaller participants who lack the numbers of people to pursue a myriad of overlapping proposals. If they do not follow all proposals they may be disadvantaged. As such it would seem sensible to settle this "tussle" between browser vendor engineering objectives and regulation before performing any further work in the interests of the many smaller participants of the W3C.

brodrigu commented 4 years ago

@krgovind thank you for your response. The goal of this proposal appears only to support large multi-property entities sharing data and resources in a way that does not allow for smaller entities to pool together for the same advantages. Additionally, it does not appear to provide any privacy guarantee; on the contrary, is seems to provide a vector for sharing data without user consent across potentially non-obvious sibling properties. It doesn't seem like the privacy cg is a good place to implement. However, discussing it here (which I suppose we are doing) is likely a valuable exercise to tease out what the true implications to privacy are in this proposal.

pbannist commented 4 years ago

@pbannist @brodrigu @jdwieland8282 @kdeqc - You all mentioned a desire to allow co-operating domains to form a First-Party Sets (FPS). This explicitly conflicts with our goal for FPS to serve as a site's privacy boundary (and if we move to an origin-based FPS, perhaps it could even serve as a security boundary). Although not within current scope, we can also envision future extensions to FPS, such as credentials sharing between native apps and websites as described by Mike West in his original FPS proposal. As such, FPS could have security implications, and is ideally reserved for true first-parties. Having said that, I would encourage you to engage with other proposals such as FLoC and TURTLEDOVE, which aim to preserve a vibrant and competitive open web.

Thanks for addressing that point, @krgovind. I am actually arguing the two sides of the same coin, so you've addressed one point (that co-operating domains can't form FPS), but the second point remains open.

An "organization" is opaque and arbitrary from an end user's perspective. They have no way of understanding that Geico and Dairy Queen are owned by the same company, and even if they can be given information to make that linkage, from their perspective they are still no different than example.com and unrelatedsite.com. The fact that an organization owns the two domains doesn't make that organization more or less likely to respect a user's privacy with respect to cross-domain capabilities.

The way I see it, there are two tiers of understanding that exist between a user and the websites they visit in the future.

1) The first-party domain - we all agree that users understand that things they share/do on this domain will be accessible within this domain. 2) Other domains - We all agree that users, currently, have no way of understanding that other domains may have access to information they share/do with the first-party domain - which is why all browsers, soon, will block this sharing.

FPS aims to create a middle tier in that framework for users to understand that the first-party domain is connected to another group of domains that have certain privileges regarding information they can share. UX signals in the browser could give users information that would help them make that connection. But this information would be true for organizationally-owned domains as well as co-operating domains. The fact that an organization owns those domains does not give the user any useful information about whether they can trust that organization to use their data responsibly and respect their privacy, and therefore is not a useful signal to an end-user.

It could be argued that 'organization' is not a perfect principle to manage FPS through, but it is the best available. I would argue that this is only true from the perspective of large organizations - like those that own/fund all of the major browsers. This gives large organizations capabilities that are restricted from small organizations. It could stifle innovation while, as stated earlier in this thread, relaxing privacy. If privacy truly is a fundamental human right, then it shouldn't be able to be infringed upon by companies of any size.

Additionally, all of this puts the browsers in the position of vetting "what is a valid organization?" across every country and organization type. How is partial ownership handled? How are private companies' ownership structures vetted? What % ownership is the minimum required for an FPS to be valid? What about government ownership of companies? Can China claim a first-party set across every domain owned by all Chinese state-owned companies? And in the case of the four major browsers, not only is a browser making that decision, but a browser owned or funded by the largest media, software, and hardware organizations in the world.

If FPS can't support co-operating domains, then it should not be able to support domains owned by the same company. The UX signal is the only signal that that is useful to a user, and that is independent of the presence of a shared organization.

jdwieland8282 commented 4 years ago

I would encourage you to engage with other proposals such as FLoC and TURTLEDOVE, which aim to preserve a vibrant and competitive open web.

@krgovind thank you for addressing our support for expanding FPS. wrt FLoC, our concern is that the only entity with the ability to create FLoCs or cohorts is the browser, we feel that is anti-competitive. What if the FLoCs generated don't preform any better than contextual? What if the FLoC's are unpredictable. It will be very challenging to "preserve a vibrant and competitive open web" if we are made to design bidding strategies against a FLoC created by what is essentially a black box to us.

Our desire for expanding FPS tracks directly to our desire to have another trusted entity that can create FLoCs or cohorts. That trusted entity will need cross domain identity signals to build viable cohorts, similar to what the browser will use, except different in one important way. Whereas the browser could have access to all browsing habits, we are only asking for browsing habits within the FPS.

michaelkleber commented 4 years ago

Hello @jdwieland8282,

I do believe that some of the ideas on how to build TURTLEDOVE-style interest groups should support your desire here: a bunch of sites that band together and jointly create ad targeting audiences based on activity on any of those sites.

For prior discussion of more powerful ways to build audiences, check out TURTLEDOVE issue #26, Criteo's SPARROW version, and Facebook's approach. But there's room for a lot of flexibility here.

It sounds like you also want to limit these audiences so that they can only be targeted while someone is visiting that same collection of sites? That hasn't come up before, but it would be an easy feature to add.

Anyway, if your goal is building cohorts to target ads at, please work with us in making the TURTLEDOVE/SPARROW idea space support your needs.

jdwieland8282 commented 4 years ago

Hi @michaelkleber,

I do believe that some of the ideas on how to build TURTLEDOVE-style interest groups should support your desire here: a bunch of sites that band together and jointly create ad targeting audiences based on activity on any of those sites.

Not entirely, TD interest groups do an ok job at retargeting, but there is no mechanism for finding the "next 1000" customers interested in my product or service. Modeling, the idea that given a seed, one can predict what other users will be interested is essential to Ad Tech and is more or less what (based on my understanding) FLoC does. Criteo's Sparrow version is promising, but FB's proposal won't work for publishers with limited 1st party data, FB is unique in that they have many many users who generate lots of 1st party data which can be used for a seed and modeled audiences.

It sounds like you also want to limit these audiences so that they can only be targeted while someone is visiting that same collection of sites? That hasn't come up before, but it would be an easy feature to add.

This is not a conclusion I would draw based on my previous comments. I think we can set it aside for now. The core point I'm making is that we need cohort creation to be possible by more than just the browsers, and the only way for small publishers to generate enough data/signal for this cohort creation is for them to be able to share data horizontally among themselves (not necessarily w/ advertisers). ex. a FPS

Thanks for your comments, I plan to attend the Sparrow Tech workshop next week.

michaelkleber commented 4 years ago

We definitely do want interest groups to support the "next 1000 customers" use case. The SPARROW Lookalike Targeting section is explicitly about this, and I'm happy to work on how something like the FB proposal can be made available to someone who is a third party on many consenting sites, rather than one large first party.

But we (Chrome) are not interested in an approach that involves joining up individual users' browsing histories across many different sites. Our focus is on ways to build audiences that don't require giving out browsing history. First Party Sets is the wrong tool for this problem.

johnwilander commented 4 years ago

Given the comments on separate proposals above, I think it would be useful to have a separate discussion on them to see if there is any multi vendor interest. Chrome folks, do you intend to set something like that up for e.g. Turtledove?

michaelkleber commented 4 years ago

Yes! TURTLEDOVE & SPARROW have just moved into WICG (discourse thread), very much because we want to have multi-vendor conversations about it.

jackfrankland commented 3 years ago

Your proposal mentions an authority that signs domain-to-entity relationship - so there is clearly some policy at play, although you are proposing a unified policy - which I think comprises of two requirements: (a) common ownership, and (b) common privacy policy - across all user agents. Others have also expressed concerns about inconsistencies between user agent policy, and that is something we are willing to work to bring consistency to.

@krgovind thanks a lot for the reply. You're right, an authority would have to follow a policy in order to sign off on the information given.

In my proposal, this would mean verifying that the correct business is being registered for the domain. The correct business should be the one that is named on the published privacy policy on the site. A published privacy policy is already commonplace, and is required by law in certain jurisdictions (e.g. https://gdpr.eu/privacy-notice/). The proposal's main aim is to have some of this information readable by the user agent programmatically, in an effort to reduce the over-prevalence of consent overlays, and to foster better transparency / control of the user's data.

My proposal does not go as far as defining UA behaviour/policy, how it should treat two domains owned by a common business, or requirements for their privacy policies to be the same. In that respect, the goals for these two proposals are quite different. However, my argument is that the publication of the domain-to-business relationship suits the goals for this proposal nicely, and may be more useful than the publication of a domains-to-domains relationship - especially if the policy for this proposal ends up being that the domains must have matching business ownership according to their privacy policies.

hober commented 3 years ago

We have consensus among the @privacycg/chairs and @krgovind (as required by our charter) to adopt this as a Work Item, with @krgovind and @davidben as Editors. I'll work with the @WICG/chairs to transfer the repository over soon.

krgovind commented 3 years ago

If they indeed did that it seems like it would make it much more transparent to users who they have a relationship with. How is that not a win?

@annevk I tried to explain this, but perhaps didn't do a good job. :) Essentially, forcing all sites to move to subdomains of their parent/owner domains would have, in my example, manifested as flickr.com moving to flickr.yahoo.com, then flickr.verizon.com and subsequently to flickr.smugmug.com. This would train users to stop paying attention to the registrable domain, and focus only on the subdomain. Thus, it would make them susceptible to entering their credentials on mybank.evil.com, because mybank is in the subdomain, and lead the user to think "perhaps mybank was recently acquired by evil.com"?

I will also mention a couple of other use-cases that we learned about:

Sites that serve user-uploaded content may want to serve such untrusted content on a separate domain. For example, googleusercontent.com exists for this reason. Similarly, we recently ran into the example of codepen.io and cdpn.io, where cdpn.io appears to depend on 3p cookies when embedded on codepen.io.
We've also heard from others in PrivacyCG that organizations prefer to maintain top-level domain names as an indication of branding/identity. This is motivated by business reasons.

@pbannist - The example that you mentioned, Geico and Dairy Queen, would actually not be a valid set given our current thinking around the FPS policy. Berkshire Hathaway is a holding company, with Geico and DQ being subsidiaries. Regardless, you do bring up challenges around defining the policy in a way that stays true to first principles, but I'm confident that we can work together towards that goal.

Regarding the question of whether ownership/organization is the right principle to design FPS around, there is user research around users' expectations/comfort with being tracked within a first-party. For example, see this paper. Of particular interest are Section 4.2.3, and "Trust" under Section 4.3.2

The UX signal is the only signal that that is useful to a user, and that is independent of the presence of a shared organization.

I agree that it's important to surface FPS affiliation information to users, and we are proposing that it be surfaced in the browser UX. Are you suggesting that this is not sufficient?

@jackfrankland

The proposal's main aim is to have some of this information readable by the user agent programmatically, in an effort to reduce the over-prevalence of consent overlays, and to foster better transparency / control of the user's data.

Got it. Would this be similar to the P3P project? If so, it may be instructive to study the criticisms, and address how we can overcome those issues with your proposal.

However, my argument is that the publication of the domain-to-business relationship suits the goals for this proposal nicely, and may be more useful than the publication of a domains-to-domains relationship - especially if the policy for this proposal ends up being that the domains must have matching business ownership according to their privacy policies.

Our specification of FPS as a domain-to-domains relationship is mostly an artifact of needing to find a domain to host the central/unified manifest file on. :) As I mentioned in my previous response, having a single source of truth makes verification and deployment easier. Do you envision a way that we can maintain a central manifest file using a domain-to-business relationship?

annevk commented 3 years ago

@krgovind that very much depends on the browser UI, no? If sites all moved in that direction, browsers could respond by highlighting the registrable domain even more prominently (or only showing that).

krgovind commented 3 years ago

@krgovind that very much depends on the browser UI, no? If sites all moved in that direction, browsers could respond by highlighting the registrable domain even more prominently (or only showing that).

@annevk : I'm not seeing how browsers highlighting the registrable domain would help this situation, because in the Flickr case, the URL bar would have changed from yahoo.com, to verizon.com, to smugmug.com; with the content page itself being the only indication that it is Flickr. Would users notice if it went to somethingelse.com with the phishing page showing a content page identical to Flickr's?

brodrigu commented 3 years ago

Essentially, forcing all sites to move to subdomains of their parent/owner domains would have, in my example, manifested as flickr.com moving to flickr.yahoo.com, then flickr.verizon.com and subsequently to flickr.smugmug.com. This would train users to stop paying attention to the registrable domain, and focus only on the subdomain. Thus, it would make them susceptible to entering their credentials on mybank.evil.com, because mybank is in the subdomain, and lead the user to think "perhaps mybank was recently acquired by evil.com"?

@krgovind The deprecation of 3rd party cookies would be the forcing function which would push sites to consolidate domains to retain some functional benefits they see in a 3rd party cookie world and set up the situation you are solving for above. This drive to consolidate to as few eTDL+1s for functional benefit would not be limited to only 1st parties that are owned by the same organization. You could imagine sites forming a co-op or joining together in a publisher network where you might see two not-co-owned sites nyherald.co-op.com and laregister.co-op.com sharing a registrable domain. As business needs and incentives change, nyherald.co-op.com might move to nyherald.pub-network.com or nyherald.amp.com causing the same user-apathy towards the registrable domain.

Is this something you have considered?

annevk commented 3 years ago

@krgovind I think that's a good illustration as to why they might not want to do that (those domain transitions would also not be cheap I suspect).

krgovind commented 3 years ago

@brodrigu - I think you are arguing for a solution for publisher consortiums/networks. As discussed earlier on this thread, we think that those should be better served by other APIs such as TURTLEDOVE, and are ideally not compelled to join under a single domain. Note that moving registrable domains like you described also has the cost of losing access to your previous state/cookies, so that would need to be weighed against other incentives.

@annevk It sounds like you're taking the position that if a multi-domain site wanted to share data across its domains, the only way it should be allowed to do that is by taking the significant step of consolidating on a single domain? Would that recommendation stand for ccTLD domain variants, as well as for content separated for security reasons (e.g. googleusercontent.com)?

brodrigu commented 3 years ago

@krgovind It's important to note that the publisher consortium use case is more incentivized than co-owned domains to migrate to a shared eTLD+1 and that if the problem first party sets is trying to avoid is user domain apathy, FPS will likely not be successful if the use case isn't addressed.

we think that those should be better served by other APIs such as TURTLEDOVE, and are ideally not compelled to join under a single domain

certainly there are tradeoffs, but the upside for sharing an eTLD+1 amongst a trusted consortium is higher than currently available alternatives.

First Party Sets is a great proposal, but the rigidity of co-ownership as a requirement for set membership hinders its potential to meet a developing security concern.

Update: moved to issue: https://github.com/privacycg/first-party-sets/issues/17

annevk commented 3 years ago

@krgovind on a set of domains that have a common registrable domain as defined by the URL Standard, yes. It's hard enough to get users to grasp that, conveying through UI that two unrelated domains are in a set would go far beyond that and frankly does not really seem feasible.

johnwilander commented 3 years ago

Since FPS is now a work item, can we continue the conversation in separate issues? Maybe the editors can find the cycles to migrate the subdomains vs registrable domain set discussion into an issue. 🙏🏼

krgovind commented 3 years ago

Since FPS is now a work item, can we continue the conversation in separate issues? Maybe the editors can find the cycles to migrate the subdomains vs registrable domain set discussion into an issue. 🙏🏼

Thanks for the advice, John. I've created privacycg/first-party-sets/issues/19 to capture this discussion.

hober commented 3 years ago

Closing, as this is now a Work Item.

privacycg / proposals

Proposed Work Item: First-Party Sets #17