w3ctag / design-reviews

W3C specs and API reviews
Creative Commons Zero v1.0 Universal
330 stars 55 forks source link

Related Website Sets (formerly First-Party Sets) #342

Closed mikewest closed 7 months ago

mikewest commented 5 years ago

Guten TAG!

I'm requesting a TAG review of:

Further details (optional):

You should also know that y'all are marvelous.

We'd prefer the TAG provide feedback as:

dbaron commented 5 years ago

A few notes from reading through the explainer (which I haven't fully digested yet):

mikewest commented 5 years ago

Thanks for the feedback, @dbaron! It might also be useful to get feedback from @englehardt and @ehsan, with whom I briefly discussed this proposal. I'd love to figure out if we can make this more robust together. :)

the rules that the browser not cache the set if there's any mismatch seems a little problematic if a site wants to evolve the set over time (e.g., add a host to it

First, as is probably clear, the rules are somewhat up in the air. This first pass seems like a reasonable balance between deployment difficulty and stability, but we might well want to revisit some of the restrictions. The incremental verification suggestion you mention is one route that occurred to me, but I'm sure there are others. For example, the X-Bikeshed-This-Origin-Asserts-A-First-Party-Set could have a version number rather than a boolean, bypassing the cache expiration.

Second, I don't know how much we need this sort of thing to be trivial to change. The history of Mozilla's disconnect-entitylist.json shows that entities do indeed shift over time, but each individual entity seems relatively stable.

Why is a host that is a registerable domain disallowed?

That's a typo. :) Should have read "is itself a public suffix". Fixing it in https://github.com/mikewest/first-party-sets/commit/34adb31b50d0ae8f9bfa77863b5e4a1792c09ce5.

This does appear to pose some risk to users ("How will malicious actors abuse this mechanism?"), but it's not clear to me what the user benefit is over what happens today. It seems like the explainer should be clearer about that.

I'm hopeful that we can create non-proprietary and publicly auditable alternatives to the lists Apple, Google, and Mozilla are independently maintaining for various features. In the best case, something like this feature would give us the ability to offer entity-related features like credential sharing, and reduce the risk of rolling out tighter controls on cross-entity sharing.

The abuse point of hopping between sets seems to be mentioned only in passing, but it seems pretty concerning.

I think the current design fairly substantially mitigates this risk by making deployment bidirectional and atomic (e.g. the pain point you noted at the top). It seems to me that we can mitigate it more by locking sites into a given set for some period of time if we decide that the inherent difficult is either unacceptable or not enough.

Thanks again!

dbaron commented 5 years ago

I'm hopeful that we can create non-proprietary and publicly auditable alternatives to the lists Apple, Google, and Mozilla are independently maintaining for various features. In the best case, something like this feature would give us the ability to offer entity-related features like credential sharing, and reduce the risk of rolling out tighter controls on cross-entity sharing.

I think it would be useful to say something like that in the explainer.

mikewest commented 5 years ago

@plinss: Skimming the minutes, I don't think y'all got to this in the 05.03 meeting, and it looks like it fell off the radar for the 12.03 meeting. Perhaps there's an upcoming slot it could fit into?

@dbaron: Yes. I need to restructure the explainer a bit to improve the explanation of the problem I'm aiming to solve, as it grew out of a different document with a distinct purpose. I'll certainly take some time to do that (though I don't think it'll create substantive changes, and hopefully won't block y'all taking a closer look).

Thank you both!

lknik commented 5 years ago

Hi Mike!

Hope you missed me. Lovely explainer. May I ask about a few bits below.

Still, it seems likely that folks will want to stretch the bounds of what first-party sets enables over time

Can you please elaborate why it's likely, and which folks specifically do you mean here? Not asking for all their names and addresses, of course.

Tying those two domains together in the same first-party set could increase the risk of credential leakage, if browsers aren't careful about how they expose the credential sharing behavior discussed above

Any other risks that you can imagine (apart from the stuff listed later in the explainer)? Aside from the Ordinary User not knowing about the existence first/third party stuff, would it make sense to require browser UI changes to indicate that some site is linked with another?

It would be fatal to the design if https://subdomain1.advertiser.example/ could live in one first-party set while https://subdomain2.advertiser.example/ could live in another

That looks unfortunate indeed. Good the explainer is listing plenty of concerns.

Given this reality, we need to add a registrable domain constraint to the design above such that each registrable domain may live in one and only one first-party set.

Would there be a way to deregister from the set, and e.g. change sets in quick time intervals, or something like that? I'm simply wondering if site1 can easily change its membership (rather than: being member of two separate sets on the same time, which is already marked as concern). Apart from the natural expiration of 7 days you speak of, unless it could be the same.

We can mitigate this risk to some extent by limiting the maximum number of registrable domains that can live together in a first-party set, rejecting sets that exceed this number

How would the risk after such mitigation compare with today's risk of making the same? Would you imagine it conceivable that advertisers will start serving their stuff from XXXYYYZZZ.ccTLD, and smartly game the number-limited system? (but: "Forget the entity" looks good).

As the declaration is public by nature, the style of abuse noted here will be trivially obvious to observers, which creates exciting opportunities for out-of-band intervention

Sounds like an opportunity for a new batch of research papers? I'm sure many will be happy ;-)

torgo commented 5 years ago

@mikewest we discussed at today's call having a focused discussion on this one at our next f2f - week of 20th of May. Is that going to be too late to be useful for you? If not, would you like to dial in for that (we will be in ~UTC).

mikewest commented 5 years ago

Thanks, @torgo!

we discussed at today's call having a focused discussion on this one at our next f2f - week of 20th of May. Is that going to be too late to be useful for you?

Sure! Chrome will likely have begun implementation by then, but y'all's feedback would be quite welcome as we work through the initial stages.

If not, would you like to dial in for that (we will be in ~UTC).

I can probably make time to chat with y'all; that week looks pretty open. Let me know when you're closer to scheduling something?

mikewest commented 5 years ago

Thanks, @lknik! Sorry I missed your feedback when you first provided it.

Still, it seems likely that folks will want to stretch the bounds of what first-party sets enables over time

Can you please elaborate why it's likely, and which folks specifically do you mean here? Not asking for all their names and addresses, of course.

The example I linked in the document (https://lists.w3.org/Archives/Public/public-webappsec/2017Mar/0034.html) came to mind as an existence proof of folks with interesting ideas about loosening the same-origin policy based on affiliation.

Any other risks that you can imagine (apart from the stuff listed later in the explainer)? Aside from the Ordinary User not knowing about the existence first/third party stuff, would it make sense to require browser UI changes to indicate that some site is linked with another?

The document lists the risks I've thought about. If I come up with more, I'll add them. :)

I don't personally think there's any value in exposing the relationship between A and B to users directly via browser UI, but I'm not at all a UI guy. I'd expect folks like @estark37 to have strong, well-informed opinions on these topics, and I'd defer to them completely. That said, however Chrome comes down on that question, I don't think it makes sense to specify UI in this kind of document.

Would there be a way to deregister from the set, and e.g. change sets in quick time intervals, or something like that? I'm simply wondering if site1 can easily change its membership (rather than: being member of two separate sets on the same time, which is already marked as concern). Apart from the natural expiration of 7 days you speak of, unless it could be the same.

I don't think it would be helpful to create a way to deregister oneself in an accelerated fashion without also taking some catastrophic action against the data that's been built up given the existing first-party relationships. I could imagine the Clear-Site-Data: * mechanism being draconian enough to enable this, for example.

How would the risk after such mitigation compare with today's risk of making the same? Would you imagine it conceivable that advertisers will start serving their stuff from XXXYYYZZZ.ccTLD, and smartly game the number-limited system? (but: "Forget the entity" looks good).

How would this "game the system"? The risk mitigated by limiting the size of a set is the incentive that would otherwise exist to create a single global set of all an advertisers' otherwise unrelated publishers (e.g. doubleclick.net + cnn.com + sz.de + vox.net + ∞). Allowing an advertiser (or anyone else) to bind their matching ccTLDs together seems different in kind from that scenario.

Sounds like an opportunity for a new batch of research papers? I'm sure many will be happy ;-)

I agree! Mechanisms that encourage transparency are good.

mikewest commented 5 years ago

Regarding use cases, I'd like to draw your attention to https://mikewest.github.io/cookie-samesite-firstparty/draft-west-cookie-samesite-firstparty.html (http://tools.ietf.org/html/draft-west-cookie-samesite-firstparty if you prefer "paginated" text), which builds upon the primitive described here in a way that might allow us to avoid some developer pain points while tightening cookie controls over time.

lknik commented 5 years ago

@mikewest Thanks for the answer (We're discussing at f2f Reykjavik).

dbaron commented 5 years ago

Given that the repo is now at https://github.com/krgovind/first-party-sets, feels like ccing @krgovind might be useful.

torgo commented 5 years ago

@mikewest just picking this up again, I think we are stalled and this topic has gone into our "abyss." Can you let us know the status and (most usefully) if there are specific questions where the TAG might weigh in. Does it make sense to discuss this at TPAC?

annevk commented 5 years ago

One worry I have after hearing folks talk about this at TPAC is that this becomes as attractive as the PSL and will be used for all the wrong things. In particular there were a number of suggestions this would allow us to ease certain origin restrictions. I know that's not the goal, but once there's architecture in place it'd be annoying to have to have that discussion again and again and again.

(No great ideas other than adding a bright red warning section early on in the document.)

mikewest commented 5 years ago

I would like for it to replace the PSL (through a hand-wavey mechnism in which we fix https://github.com/sleevi/psl-problems/ by murdering cookies, locking everything to origins, and relaxing the PSL-related bits via FPS rather than PSL).

mikewest commented 5 years ago

(Some TPAC discussion in https://github.com/w3c/webappsec/blob/master/meetings/2019/2019-09-TPAC-minutes.md#origins-and-sites-and-entities).

lknik commented 4 years ago

@mikewest so you'll enable FPS at the same time when phasing out cookies and PSL? Otherwise, what would be the estimated duration of all these mechanisms working at the same time, and potentially a risk that we'll end up with FPS, PPS and cookies forever?

hober commented 4 years ago

Hi,

@dbaron, @plinss, and I took another look at this at our Cupertino F2F.

I seem to recall @johnwilander chose not to pursue Affiliated Domains, his earlier, First Party Set-like proposal, because after working on it for a while he concluded that it was a bad idea for the web. I'll ask him to distill those thoughts into digestible feedback that is relevant to FPS.

johnwilander commented 4 years ago

Hi,

@dbaron, @plinss, and I took another look at this at our Cupertino F2F.

I seem to recall @johnwilander chose not to pursue Affiliated Domains, his earlier, First Party Set-like proposal, because after working on it for a while he concluded that it was a bad idea for the web. I'll ask him to distill those thoughts into digestible feedback that is relevant to FPS.

I already have: https://github.com/krgovind/first-party-sets/issues/6

Still waiting for a response to my latest questions and concerns.

krgovind commented 4 years ago

I already have: krgovind/first-party-sets#6

Still waiting for a response to my latest questions and concerns.

Sorry for the delay in responding to your comments on the repo, @johnwilander! November was busy with conferences and a vacation. I'll respond within the next few days.

hober commented 4 years ago

Hi,

@plinss and I took a look at this today in our Wellington F2F. The explainer appears to identify only one use case:

[W]eb platform features can use first-party sets to determine whether embedded content may or may not access its own state[…] It may be reasonable to allow a https://b.example iframe within https://a.example to access the https://b.example databases.

Isn't this solved by the Storage Access API? If it is, we can solve this use case with that API while avoiding the attractive nuisance concern with FPS that @annevk raised. Shouldn't we do that instead?

davidben commented 4 years ago

Thanks for the comments, everyone!

@annevk We certainly need to be careful when introducing a non-origin boundary, but, FPS, eTLD+1, or something else, I think it’s sadly necessary.

The same-origin policy roughly ensures two origin will not interact unless they want to. https://a.example's data is safe from https://evil.example. but, if https://a.example and https://b.example want to share information, there are many opt-in cross-origin channels.

Those channels allow cross-site tracking. Mitigating this means limiting communications between two "sites" (however we define them) even when they want to communicate. This stricter isolation needs a coarser boundary. Consider subdomains like https://accounts.google.com and https://calendar.google.com. Separate origins isolate bugs, but the origins still interact, just as browsers use multiple processes but have IPC. If tracking mitigations worked on the origin boundary, those pages would need to be https://google.com/accounts and https://google.com/calendar to still work. User activity is as linked as before, and we've lost privilege separation.

Thus we need some larger boundary: a collection of origins treated as one “site" for anti-tracking purposes, whether eTLD+1 or first-party sets. Like you say, this comes with needing clear guidance on which to use, probably based on the above distinction.

@hober and @plinss We probably need to do a better job describing the use cases. Let me try clarifying things here and we'll see about updating the explainer. I don't think they'd all be covered by the Storage Access API.

One could imagine recasting the browser's knowledge of related domains into, say, a Storage Access API prompt suppression, but we don't think that would meet the compatibility or privacy needs here. Additionally, if this relatedness is to involve any site opt-in (in addition to the UA policy, of course), we need a standard way to manage that opt-in.

First, the Storage Access API assumes a particular flow (gesture into a 3p iframe), which makes sense for truly 3p scenarios. Multi-domain sites may need to interact more tightly. For instance, seamless single-sign-on across multiple first-party domains kicks in once the page loads.

Second, the platform should be aware of these sets. A pair of sites that always get storage access (prompt suppression or users just granting access on name recognition) are really one site w.r.t. linkability. Privacy-related platform limits must then cover the entire set, or each domain in the set will inflate the limit. Examples where this may make sense: Privacy Budget, Trust Tokens limits, Conversion Measurement limits, or limits on 3p isLoggedIn queries to avoid fingerprinting.

We're also envisioning this feeding into other tracking mitigations that wouldn't fit the Storage Access API. Navigations within a first-party set could be exempt from potential mitigations for navigational tracking (link decoration, referrers, POSTs, etc.), which would reduce unnecessary compatibility impact.

This browser awareness can also translate into UI: clearing site data could offer to clear state across the entire set if the user wants to reset first-party linkability, or the browser could display the owning origin to the user somewhere.

Finally, some of these uses (limits, UI integration) are not just exception grants, so sites won’t want to be associated with unrelated domains. Even if the UA's list approves, it may be incorrect or out-of-date. We can fix this by requiring site opt-in in addition to UA policy, but that needs a standard mechanism. First-party sets provides that mechanism, as well as a story for handling changes (key state on owner domain and clear when it changes).

annevk commented 4 years ago

@davidben well, we already have registrable domains. And at least with those it's somewhat clear to the user they all belong to the same entity.

davidben commented 4 years ago

We do, though we really ought to fix all those to be scheme + registrable domain. That one has the same issues to resolve around when to use it over origins.

There, the problem we're addressing is that the web grew up without these restrictions and sites are often spread across multiple domains. Two browsers have already found they need this: Firefox uses a hardcoded list of related domains to extend first-party-ness. Edge's blog mentions doing something similar. Moreover, these entity lists are paired with a blocklist anti-tracking strategy, rather than platform-wide changes. That means they only need to cover the subset of multi-domain sites also on the blocklist. The true set is likely much larger. (Anecdotally, I've seen bank sites bounce between domains like bank.example and bank2.example, likely because each component is hosted by a different provider.)

You're right that all this is ultimately should tie back to who the user thinks they're interacting with. First-party sets, unlike the lists above or some kind of Storage Access API policy tweak, tries to identify each set with an owning origin, so there's room to explore surfacing that information. But folks like @estark37 have done far more research into this sort of thing than me, so I'll defer to her expertise there.

annevk commented 4 years ago

There's a big difference between curated first-party sets and self-declared first-party sets though, especially for Firefox's use case.

(As for scheme + registrable domain or opaque origin aka site, we're getting there standards-wise. https://github.com/whatwg/html/pull/5354 might be of interest.)

davidben commented 4 years ago

FPS isn't self-declared either. It's an intersection of the self-declared set with UA policy, i.e. curation. (Have you seen the revised explainer? We've recently reworked it to make that combination a bit clearer and give the browser better hooks for this.)

(As for scheme + registrable domain or opaque origin aka site, we're getting there standards-wise. whatwg/html#5354 might be of interest.)

(Yup. I've been pushing on getting corresponding changes elsewhere in the stack like fixing SameSite and finally fixing the scope.)

torgo commented 4 years ago

There have been some issues raised on this on this thread and elsewhere. There are good ideas in this proposal. As the TAG we don't feel we have something to add to the debate at this point. If there is a new major development, we would be happy to review at that time. We find it slightly concerning that this review has been on our plate for over a year and this proposal still isn't in a community group or other standards body.

jwrosewell commented 4 years ago

The discussion in the W3C Improving Web Advertising Business Group this week, specifically in relation to First Party sets, once again raises the issue of governance of the various proposals that have been put forth. Like Turtledove/Sparrow, the proposals around first party sets imply (in fact, they require) a governance structure. Specifically, the group discussed that in some cases independent domains should be allowed to federate browser data, while in other cases this would not be allowed. This means a decisioning structure needs to be put in place to provide basic rules for what federation(s) would be allowed, and to potentially adjudicate requests and violations.

This same requirement is central to the debate over Turtledove and Sparrow, where the main discussion is around what entities have access to end user content consumption data and are responsible for creating the cohorts and populating the reporting structures.

In both cases, it seems implied that the only “governance” is the browsers themselves, and that this governance will be opaque (not necessarily published, without clearly visible procedures).

This proposal needs an explicit understanding of what governance structures are being proposed. There needs to be success criteria for the application of these policies. These criteria should benefit all stakeholders including browser vendors who would avoid any appearance of collusion that could otherwise be viewed as stifling competition. The W3C Improving Web Advertising Business Group have developed a draft of such succes criteria.

Recognizing there are important questions to address in finalizing these success criteria to evaluate first-party sets and other similar proposals aimed at improving web advertised. A non-exhaustive list below highlights some of these issues that deserve greater attention:

Also posted on explainer

Also posted on discourse.

pbannist commented 4 years ago

Connected to governance, there are also issues of bias that need to be looked into around this proposal, as documented by my issue on the explainer and follow-up on discourse. These biases could be addressed by being more permissive and finding another solution to governance of FPS, or could be more restrictive (proposed by another commenter on discourse) and really have very limited usage so users are not concerned.

Per @jwrosewell comment above, it's important to note that privacy, while important, is not the singular factor that should be used for making all decisions. The draft success criteria layout the other stakeholders and scenarios that need to be taken into account around decisions that affect the viability of a thriving and diverse open web.

hadleybeeman commented 3 years ago

This issue came up in the context of our review of the SameParty cookie attribute proposal. (Discussion from our TAG breakout session.)

We are finding that this proposal for first-party sets prompts more discussion in the context of cookies than it did on its own.

So we are reopening this issue to continue that discussion.

torgo commented 3 years ago

Hi! One concern I have is the potential for first party sets to expand the definition of what consists of a “first party” and a “third party” while at the same time web users are becoming ever more aware of their privacy and web browsers are responding by adding privacy features (which in some cases depend on that definition).

In the PrivacyCG call last week (https://github.com/privacycg/meetings/blob/main/2021/telcons/03-11-minutes.md), Kaustubha stated that one mitigation against misuse of FPS would be to “require all the domains in the set are owned by the same organization.” I'd like to drill down on that. First of all, who is requiring that? Would it be up to the browser maker to do so? In which case, does this mean there would be specific allow-lists of first party sets (the “UA policy”)? It's asserted that FPS is better than browsers that ships with “an entity list that defines lists of domains belonging to the same organization” because it allows these organisations to declare their own list of domains. However, isn't a UA policy just another list of allowable domains? Secondly, what counts as a an "organization" in this instance? Amazon.co.uk and Amazon.com, for example, are two distinct organisations in two different privacy-regulatory regions. So in that sense treating them both in the same first party may be counter to relevant data protection laws?

dmarti commented 3 years ago

In many cases, two domains may be owned by the same corporate entity, but branded in a sufficiently different way that the web user is not aware that they are part of the same "set." Some high-profile examples are

Common domain ownership as a standard is likely to produce surprising results in the handling of individuals' sensitive data. (The same user might shop on one LVMH domain for gifts for their spouse, and from another domain for gifts for a co-worker.) Existing browser entity sets are inconsistent in their treatment of commonly owned domains, and there is no recognized standard for when the user-visible terms and UX are adequate for considering domains as part of the same set.

It would be more appropriate to look at common privacy policy and user-visible site design and branding to determine if domains could be treated as part of a set by the browser: Some possible criteria: https://github.com/privacycg/first-party-sets/issues/14#issuecomment-797191058

krgovind commented 3 years ago

Hi! One concern I have is the potential for first party sets to expand the definition of what consists of a “first party” and a “third party” while at the same time web users are becoming ever more aware of their privacy and web browsers are responding by adding privacy features (which in some cases depend on that definition).

@torgo When you say "expand the definition", I think you are referring to the fact that today "first-party" is essentially defined as "same-domain". Unfortunately, domain names are an artifact of the DNS, the primary purpose of which is to map human-readable names to IP address. The premise of First-Party Sets (FPS) is that in today's highly composable web - (a) sites are deployed over multiple domain names (some times for reasons such as security, or localization), and (b) domain names typical serve as brand indicators for the web - therefore, treating "same-domain" as "first-party" is too limiting and antiquated. We are indeed seeking to define a privacy boundary for the web that is more realistic than the domain name; but we want to make sure that we are drawing the boundary correctly, and surfacing the information to users appropriately via UA Policy and UI affordances.

First of all, who is requiring that? Would it be up to the browser maker to do so?

Yes, browser makers should require that any first-party sets accepted by the browser have been previously approved per the "UA policy". Ideally, this verification process is conducted by an independent entity.The WebPKI / TLS certificate issuance serve as precedence here.

In which case, does this mean there would be specific allow-lists of first party sets (the “UA policy”)? It's asserted that FPS is better than browsers that ships with “an entity list that defines lists of domains belonging to the same organization” because it allows these organisations to declare their own list of domains. However, isn't a UA policy just another list of allowable domains?

Yes, the UA policy essentially will result in an allowlist of FPS assertions. The reasons I think this is better than the entities lists that some browsers currently ship:

Secondly, what counts as a an "organization" in this instance? Amazon.co.uk and Amazon.com, for example, are two distinct organisations in two different privacy-regulatory regions. So in that sense treating them both in the same first party may be counter to relevant data protection laws?

Sorry, I'm not sure if there are two questions here. IIUC, the question is primarily about FPS' application to privacy regulations. Note that being part of the same First-Party Set does not preclude organizations from conforming to privacy regulations. FPS only defines "first-party" from the browser's perspective; but organizations still have to do their due diligence and decide whether the domains really should be part of the same FPS, and conform to regulations. (Just as they have to do today on browsers where third-party cookies are available and cross-domain sharing is possible).

torgo commented 3 years ago

Thanks for the reply, @krgovind. We again discussed in today's TAG call and one thing that came up is the transparency of registration of these sets when it comes to these allow-lists... how would an org register these? Is there a scope for a standarised approach to registration / vetting / approval process? Secondly, we discussed permissions promopts - would allowing camera acess for one site in a first party set (like instagram) then allow it for other sites in the set (such as whatsapp)?

krgovind commented 3 years ago

how would an org register these? Is there a scope for a standarised approach to registration / vetting / approval process?

The explainer currently only speaks to the technical aspect of the proposal; but yes, we are absolutely interested in working with the ecosystem on a standardized approach to the policy enforcement.

FYI: Chrome is currently running an Origin Trial and has a temporary informal process described here (note: the experiment does not have any privacy implications at this point, because SameParty cookies don't bypass the "Block third-party cookies" user control at this point).

Secondly, we discussed permissions promopts - would allowing camera acess for one site in a first party set (like instagram) then allow it for other sites in the set (such as whatsapp)?

No, we are not proposing to change the scope for permissions. The current scope for FPS is only to be treated as a privacy boundary where browsers impose cross-site tracking limitations (such as third-party cookie blocking).

chrisn commented 3 years ago

@torgo said: how would an org register these? Is there a scope for a standarised approach to registration / vetting / approval process?

@krgovind said: The explainer currently only speaks to the technical aspect of the proposal; but yes, we are absolutely interested in working with the ecosystem on a standardized approach to the policy enforcement.

A common approach and governance model for FPS registration is something that we also would want to see, with clear definition of membership rules.

One concern I have is that FPS seems to allow flexibility for browsers to implement their own UA policy. From the explainer:

The browser will consider domains to be members of a set if the domains opt in and the set meets UA policy

Browsers implementing First-Party Sets will specify UA policy for which domains may be in the same set. While not required, it is desirable to have some consistency across UA policies.

Inconsistency is arguably already present on the web today, given each browser’s third-party cookie blocking policy, but the lack of a requirement for consistency across browsers leads to uncertainty around whether a site's declared membership set would be honoured by all browsers. If not, it will be hard to build site functionality that reliably depends on FPS.

krgovind commented 3 years ago

@chrisn : At the time that we wrote the explainer, we didn't want to presume that browsers would be willing to converge on UA policy. We can adjust the language in the explainer as we start to take meaningful steps towards a more standard process.

johnwilander commented 3 years ago

John from Apple WebKit here. I don't think there's any consensus on how FPS should be used by browsers. We have not expressed any interest in relaxing our default cookie blocking based on FPS, regardless of SameParty attributes. I think it's important to decouple FPS as a piece of knowledge browsers can base decisions and policies on and what those decisions and policies are.

krgovind commented 3 years ago

John from Apple WebKit here. I don't think there's any consensus on how FPS should be used by browsers. We have not expressed any interest in relaxing our default cookie blocking based on FPS, regardless of SameParty attributes. I think it's important to decouple FPS as a piece of knowledge browsers can base decisions and policies on and what those decisions and policies are.

@johnwilander Thanks for chiming in! I think some aspects of FPS, especially questions around standard UA policy, recommendations around usage by developers for platform predictability, etc. may be well-served if we can lay out specifics/principles on how FPS may be used in WebKit. Would you be willing to share your thinking on what kinds of decisions/policies you would base on FPS?

wseltzer commented 3 years ago

Linking discussion from the TAG minutes of 29 March

torgo commented 3 years ago

Hi @krgovind - We have just finalized a feedback doc on the proposal which can serve as the basis for our special session tomorrow. Thanks for bearing with us.

erik-anderson commented 3 years ago

I want to explicitly comment that the Microsoft Edge team is generally supportive of this proposal. I'm concerned with some of the framing and conclusions of the feedback doc that was shared. At the same time, some of the concerns in the document are shared by us and highlight that there's more work to be done.

My highest-level concern is that the feedback doc talks about the fundamental importance of origins but either doesn't consider or dimisses the likely alternate outcome that will happen without such a proposal: the consolidation of sites onto shared origins, which will carry both security and privacy implications.

The security implications of entities merging sites currently on different origins to be hosted on a shared one is reduced isolation between the sites, with a broader set of developers deploying code on the shared origin. What may have been a security issue on one site might now impact all of them. This could lead to a dramatic increase in attackable surface area. As the news has shown time and time again, security issues often have privacy impacts.

From a pure privacy angle, when running code in iframes from other entities (e.g. an analytics service), in the "put all of my sites on one origin" approach, the iframe would not have partitioned storage across those sites. With the current First-Party Sets proposal, those unrelated origins being iframed would still have partitioned storage (albeit with the ability for the hosting page to choose to provide a shared identifier) which is still a stronger default protection than today's 3p cookies.

The framing around the "redefines 3rd-party cookies" section potentially hints that 3p cookies are substantially blocked today; I don't know if that was the intent. Safari has blocked 3p cookies for some time, but also offers the Storage Access API to allow access that's shared across all sites. Firefox and Edge have a list-based approach to identify trackers and block 3p cookies for those alone, while also providing the Storage Access API to address instances where user-pereceivable brokenness is present; Firefox is also working on fuller "State Partitioning" but currently has compat affordances for SSO. Chrome doesn't currently block 3p cookies by defualt except in Incognito mode which brings compat issues. This proposal introduces a middle point where 3p cookies can be broadly blocked by defaut while allowing, in a much more limited way, some current scenarios to function as they do today. The proposal also doesn't preclude a browser from continuing to offer controls for users that want a more aggressive blocking configuration, even for First-Party Sets-related cookies.

The question of if this proposal puts users first also reaches a different conclusion that I do (I think it does prioritize users first). While this isn't the forum for feedback on the Storage Access API, it's important to consider the current solution space to evaluate the likely net impact on users. The Storage Access API has challenges in terms of how to adequately inform the user of both the reason for a prompt and the impact of approving it and also will potentially lead to prompt fatigue. If we hope to bring storage partitioning to the broader web, having more scoped primitives like First-Party Sets increases the odds that more browser vendors will be willing to ship with 3p cookies partitioned by default since the general user experience would meet their goals for user acceptance.

An area of the feedback that I do agree with is the concern around governance. This could generate significant interop challenges if entities had to choose to register with different browser vendors separately; many might choose to only focus on the largest market share browsers. At the same time, we already have multiple browsers shipping list-based approaches because it's been the only viable path thus far for an acceptable user experience. The TAG feedback argues that we shouldn't attempt to uplift anything that looks like a hand-approved list into a standard, which is a great goal but which current implementation choices shows is hard to avoid. My understanding is that the approval process for creating a set was added to the proposal in part because of concerns from other implementers that, even if the list size is kept small and sites are allowed to join one-and-only-one set that it would be insufficient due to abuses where sites owned by different entities collude to join the same set.

A non-comprehensive list of areas I'd like to explore to mitigate the potential impact (which are not mutually exclusive) of the governance concern: make the max size of lists small enough to not need any approval (may not be practical due to the past concern about a lack of objective, user-intuitive criteria for when sites can join the same set); an independent entity to approve and/or revoke the ability to use a set, using a common set of criteria that multiple implementers agree to (a bit like CAs and web PKI, which carries its own set of challenges, though perhaps smaller in scope here); or "GREASE"ing of when First-Party Sets are used (e.g. disabling them some small percentage of the time and/or revoking the right to use them at all if the site doesn't function without them) to help sites prove/validate that they will function adequately for browsers and/or users who configure their browsers to limit or disallow the use of First-Party Sets.

Thanks for the healthy discussion!

torgo commented 3 years ago

Thanks @erik-anderson for this really constructive comment! Regarding governance, one of the topics we discussed in a special TAG session on Monday (raw minutes here) is what the governance is for. I think we still need to drill down on this. If the governance is to make sure that FPS members are part of the same organization then what is the definition of organization and how does that fit together with legal and regulatory? For example, we discussed how under some definitions Facebook and WhatsApp might be the same organization - and just yesterday there was some timely press coverage demonstrating how that assumption breaks down when you consider regulatory and legal requirements. So I think the proposal needs to be very clear about the requirements when it comes to governance - what is governance of first party sets trying to achieve? I would like to hear more about the the existing allow lists that have been discussed - e.g. Disconnect, Firefox, Safari. How big are they? How are they managed?

pes10k commented 3 years ago

Thanks for the comments and discussion in the thread! I just wanted to share my point of view (not necessarily Brave in general at this point) on first-party sets, and why we don't think they'd be a good addition to the web platform.

  1. Having related groups move their properties and applications to a single origin / eTLD+1 is a feature, not a bug. The origin is one of the few security and privacy boundaries we hope (even if imperfectly) users understand, and it''s emphasized in many browsers' UIs accordingly. If nothing else, the platform emphasizes that different origins deserve different levels of trust and caution. First-party sets, and expecting users to understand that different origins now have different amounts of "differentness" will make even these kinds of determinations even more difficult, and likely impossible to all but expert users.

  2. If the concern is that clustering multiple "properties" onto a single origin will cause security problems (a problem I totally understand and buy and appreciate), lets try to fix that problem (possibly by allowing a single site declare sub-site/origin security / isolation boundaries), instead of trading privacy for security. For example, I know https://w3c.github.io/webappsec-suborigins/ didn't get broad support last time it was floated, but that seems like a place to build from.

  3. A concern I have about first-party sets that I haven't seen discussed much yet (though apologies if i've missed it in minutes) is how, in a first-party-sets-world, to information users about the privacy implications of first-party sets before they visit the site. This seems extremely important (despite the prevalence of dark patterns on the Web, of folks changing URLs out from under users, etc). For example, right now in at least Safari, Brave and Firefox (and more generally browsers that block or partition 3p cookies / storage), I can have an Instagram account, and a Facebook account, and I can log into each as 1p, and know that there is no "web platform supported" (i.e. handwaving away things like fingerprinting, and other known-but-not-solved leaks on the web) way that Facebook and Instagram should be able to link those accounts. That may no longer be the case in a FPS world (if I understand most of the intended use cases for FPS so far).

TL;DR; for point 3, i'd be very interested to know more about how FPS-supporters imagine informing users of FPS, and their implications for tracking before the user commits to visiting a site.

Thanks again all for the discussion!

krgovind commented 3 years ago

Appreciate the discussion and feedback, all!


Responding to @erik-anderson

My understanding is that the approval process for creating a set was added to the proposal in part because of concerns from other implementers that, even if the list size is kept small and sites are allowed to join one-and-only-one set that it would be insufficient due to abuses where sites owned by different entities collude to join the same set.

This is accurate. We updated the proposal to require an approval process in response to feedback from Safari and Firefox engineers. The original proposal allowed sites to assert domain relationships with some technically enforced limits and abuse prevention; accompanied by a blocklist when abuse is detected.

A non-comprehensive list of areas I'd like to explore to mitigate the potential impact (which are not mutually exclusive) of the governance concern: make the max size of lists small enough to not need any approval (may not be practical due to the past concern about a lack of objective, user-intuitive criteria for when sites can join the same set); an independent entity to approve and/or revoke the ability to use a set, using a common set of criteria that multiple implementers agree to (a bit like CAs and web PKI, which carries its own set of challenges, though perhaps smaller in scope here); or "GREASE"ing of when First-Party Sets are used (e.g. disabling them some small percentage of the time and/or revoking the right to use them at all if the site doesn't function without them) to help sites prove/validate that they will function adequately for browsers and/or users who configure their browsers to limit or disallow the use of First-Party Sets.

These are some great specific ideas! Since relying on technical mechanisms alone was previously recommended against, our current preference is to have an independent entity approve/revoke sets. GREASE'ing is an interesting idea, although it sounds like we need to come up with (a) strong alternative solutions to help site authors support the clients with the feature disabled; and (b) build robust detection mechanisms to aid revocation.

The other mechanism I had in mind, also inspired from the Web PKI, is transparency logs for any creation/updation/dissolution of sets for increased accountability and auditability.


Responding to @torgo

Thanks to the TAG for meeting with us! We appreciated the opportunity to provide additional context on the problem space, current handling of tracking protection in other major browsers, clarify the confusion around why our proposal will not interfere with SOP and other security mechanisms, and address other points in your feedback document. We will also address these in the explainer, and look forward with reviewing the edits with you.

If the governance is to make sure that FPS members are part of the same organization then what is the definition of organization and how does that fit together with legal and regulatory? For example, we discussed how under some definitions Facebook and WhatsApp might be the same organization - and just yesterday there was some timely press coverage demonstrating how that assumption breaks down when you consider regulatory and legal requirements. So I think the proposal needs to be very clear about the requirements when it comes to governance - what is governance of first party sets trying to achieve?

I am not a policy/legal expert; but I don't think FPS policy verification precludes site authors from conforming to regulatory and legal requirements. Since the assertion needs to be submitted by site authors, they will still have the agency to not form a FPS at all, or form multiple distinct and disjoint sets.

The primary goal of the FPS policy is to prevent abuse that may be possible by formation of sets with unrelated domains. We chose to use "same organization" because that appears to be the common language in tracking prevention policies published by multiple major browsers (see this section for excerpts). The DNT specification, which was developed within the W3C uses the language "share the same data controller as the referring site". If there is more precise or appropriate language to capture the essence of these existing policies, I'd be grateful for any advice.

I would like to hear more about the the existing allow lists that have been discussed - e.g. Disconnect, Firefox, Safari. How big are they? How are they managed?

My understanding is that in their default tracking protection modes, both Firefox and Edge (not Safari) use the Disconnect-dot-me trackers blocklist to selectively block third-party cookies on domains classified as trackers, and then apply an exception to those tracker domains when they appear as subresources on sites owned by the same organization as the tracker domain. The list of these collections of commonly owned domains is called the entities list, which is maintained on Github. I was unable to locate a documented policy for how domains get accepted to the entities list; but it appears to be done on a pretty ad-hoc basis when a compatibility bug is discovered by a browser engineer. I do not believe site authors are involved (unless by happenstance); which I think has the unfortunate consequence of mistakes such as com.com being listed as a CBS property, and yahoo.co.jp being listed as a VerizonMedia property.

First-Party Sets proposes to maintain a single list of related domain sets (similar to the entities list) in place of Disconnect's two lists (currently one blocklist, and one allowlist). Since this proposal would require site authors to submit sets of their domains, and have a published policy in concert with other enforcement mechanisms; we hope that it will be a much more rigorous approach to the issue of supporting multi-domain sites (which are exceedingly common on the modern internet) while also bringing meaningful privacy improvements to the web.

krgovind commented 3 years ago

Thanks for sharing your feedback, @pes10k!

First-party sets, and expecting users to understand that different origins now have different amounts of "differentness" will make even these kinds of determinations even more difficult, and likely impossible to all but expert users.

Our intention is not to introduce multiple new levels of "different-ness"; but we hope that the users only have to understand two types of boundaries - a security boundary (origin), and a privacy boundary (First-Party Sets, where a singleton FPS is equivalent to registrable domain).

I don't think it is correct to say that the origin serves as a privacy boundary on the web today, because AFAIK even where third-party cookies are currently blocked, third-party is equivalent to cross-domain, not cross-origin.

Regarding user understanding, I would venture to say that FPS offers the opportunity to highlight the notion of this privacy boundary in the browser UI, in addition to continuing to highlight the origin/domain as the security boundary.

I wonder if forcing multiple distinct sites onto a common registrable domain could also potentially have the opposite effect of what we want, by training users to pay more heed to the subdomain (because it directly indicates the brand/app name) than the domain, and unwittingly teach them to make security decisions solely based on subdomains.

  1. If the concern is that clustering multiple "properties" onto a single origin will cause security problems (a problem I totally understand and buy and appreciate), lets try to fix that problem (possibly by allowing a single site declare sub-site/origin security / isolation boundaries), instead of trading privacy for security. For example, I know https://w3c.github.io/webappsec-suborigins/ didn't get broad support last time it was floated, but that seems like a place to build from.

Note that even if we do manage to corral together every platform feature that treats the site/domain as a security boundary, build new features, and resolve the issues over multiple years:

For example, right now in at least Safari, Brave and Firefox (and more generally browsers that block or partition 3p cookies / storage), I can have an Instagram account, and a Facebook account, and I can log into each as 1p, and know that there is no "web platform supported" (i.e. handwaving away things like fingerprinting, and other known-but-not-solved leaks on the web) way that Facebook and Instagram should be able to link those accounts.

This is certainly not true for Firefox's Default ETP mode, because the Disconnect entities list groups facebook.com and instagram.com as one entity (reference).

TL;DR; for point 3, i'd be very interested to know more about how FPS-supporters imagine informing users of FPS, and their implications for tracking before the user commits to visiting a site.

It seems like this problem would exist even in a world without FPS, no? Since your recommendation was that Facebook should have to redirect instagram.com to instagram.facebook.com; it seems like browsers would then have to inform users of the redirect before committing that navigation?

pes10k commented 3 years ago

I don't think it is correct to say that the origin serves as a privacy boundary on the web today, because AFAIK even where third-party cookies are currently blocked, third-party is equivalent to cross-domain, not cross-origin.

and

This is certainly not true for Firefox's Default ETP mode, because the Disconnect entities list groups facebook.com and instagram.com as one entity (reference).

Good point and thank you for the correction. I only mean that the "site" is usually the privacy boundary.

The Moz folks should correct me if I'm wrong, but current Firefox partitions all cookies by default, not just sites labeled by disconnect (with some exceptions for compat / existing SSO flows). But, in general, the site is a privacy boundary in at least Firefox, Safari and Brave (though with differences between them on how to handle cases that expect privacy harming, cross site data flows currently).

But, I don't think we should take those exceptions as signifying that cross eTLD+1 flows are some necessary property of the web. We should take such exceptions (ether centrally curated a la disconnect, or user curated a la Storage Access API) as what they are, an artifact of the Web having serious privacy problems today, and issues we should be sure to solve going forward.

We still need to develop an interim compat mechanism like the Disconnect list or First-Party Sets.

This is interesting! Is the intent for FPS to be a temporary, for-a-year-or-two, bridge to something else? If so, that is very interesting and encouraging, but its surprising given the FPS conversations I've listened in on (which is << all of them, so grain of salt). But, many of the proponents of FPS in the calls I've participated in seem like "we want cross site information flows forever and always" and not "just for a year until X ships". Could you clarify here?

Or, in other words, if FPS is meant to be an interim solution, an interim between the the status quo and what?

It seems like this problem would exist even in a world without FPS, no? Since your recommendation was that Facebook should have to redirect instagram.com to instagram.facebook.com; it seems like browsers would then have to inform users of the redirect before committing that navigation?

Apologies but I'd don't quite follow the concern above. My point is that either 1) there should be a firm privacy boundary between instagram.com and facebook.com which the browser should enforce, or 2) Facebook should make it clear to users that there is no such boundary with www.facebook.com and instagram.facebook.com.

But what would be really really bad if it appeared to users like there was a privacy boundary between IG and FB, but there really wasn't one (or a significantly weakened one) bc of FPS.

But there wouldn't be any need for notification in the example you gave. Users would still know that things they did on instagram.com were different and isolated from facebook.com and instagram.facebook.com, regardless of any bouncing (unless you're describing pushing identifiers across storage areas, a la bounce tracking, in which case, thats another problem to be solved and is being tackled in PrivacyCG).

englehardt commented 3 years ago

This is certainly not true for Firefox's Default ETP mode, because the Disconnect entities list groups facebook.com and instagram.com as one entity (reference).

Good point and thank you for the correction. I only mean that the "site" is usually the privacy boundary.

The Moz folks should correct me if I'm wrong, but current Firefox partitions all cookies by default, not just sites labeled by disconnect (with some exceptions for compat / existing SSO flows). But, in general, the site is a privacy boundary in at least Firefox, Safari and Brave (though with differences between them on how to handle cases that expect privacy harming, cross site data flows currently).

That's right, noting that our partitioning feature is not yet on by default. Let me add a bit more context to what we've shared before when this was brought up in a past First Party Sets discussion:

We started using Disconnect's entity list as part of our original "Tracking Protection" feature, which is a content blocking feature. When blocking content, it's absolutely necessary to have an entity list for web compatibility reasons. For example, Facebook's fbcdn.net domain is on Disconnect's blocklist and would thus be blocked when visiting facebook.com. This resource blocking would make that page entirely unusable. When we developed ETP (i.e., cookie blocking based on the same blocklists) we continued to have it respect the entity list, in part for consistency across our blocklist-based features and in part out of an abundance of caution to avoid web compat issues. I'm sure there are some instances where the entity list prevents cookie blocking breakage, but I suspect that many of the rules on that list aren't "required" for a cookie blocking feature.

As Pete said, we believe site is the right privacy boundary for passive cookie access. Our state partitioning feature applies to all third parties and does not use an entity list. We have no plans to add one. It does automatically relax partitioning under a variety of circumstances documented here for webcompat reasons. But we're working to figure out how we can narrow these over time with the goal of removing them entirely at some point.

krgovind commented 3 years ago

Our state partitioning feature applies to all third parties and does not use an entity list. We have no plans to add one. It does automatically relax partitioning under a variety of circumstances documented here for webcompat reasons. But we're working to figure out how we can narrow these over time with the goal of removing them entirely at some point.

@englehardt - Is my understanding correct that this mechanism is only in place for opt-in ETP Strict mode? Are you on track to make it the default?

Are you investing in APIs other than Storage Access API to help remove the current heuristics?

englehardt commented 3 years ago

Our state partitioning feature applies to all third parties and does not use an entity list. We have no plans to add one. It does automatically relax partitioning under a variety of circumstances documented here for webcompat reasons. But we're working to figure out how we can narrow these over time with the goal of removing them entirely at some point.

@englehardt - Is my understanding correct that this mechanism is only in place for opt-in ETP Strict mode? Are you on track to make it the default?

We intend to continue to ship it to a broader set of Firefox users with the goal of enabling it by default. I don't have a timeline to share, but you can follow along with our next step (private browsing windows) here: https://bugzilla.mozilla.org/show_bug.cgi?id=1698810.

EDIT: One additional note for clarity. By default, Firefox already partitions many APIs by site. It's the storage APIs that have webcompat implications that aren't yet partitioned by default.

Are you investing in APIs other than Storage Access API to help remove the current heuristics?

Nothing else to share at the moment.

krgovind commented 3 years ago

The Moz folks should correct me if I'm wrong, but current Firefox partitions all cookies by default, not just sites labeled by disconnect (with some exceptions for compat / existing SSO flows).

Per this announcement the "partitioning by default" mode is only in the opt-in Strict mode, with additional heuristics in place. As @englehardt mentions above, there is additional work to be done to completely remove reliance on lists, heuristics, and consent prompts that are hard to understand.

Or, in other words, if FPS is meant to be an interim solution, an interim between the the status quo and what?

Sorry, perhaps an Oxford comma or two would have helped with my statement. 😅

I meant to refer to the Disconnect-me lists as an interim solution that Firefox/Edge are using. I did not intend to refer to FPS as an interim solution. In my personal opinion, we haven't yet seen viable solutions to remove this mechanism in the future, so it may be premature to say that we can. (I would be thrilled to be proven wrong in the future)

However, if other browsers would prefer to engage with FPS as a medium-term solution, I think that would still be a vast improvement over reliance on lists and heuristics. Blocklists are never exhaustive, they fail open, and are prone to errors such as those evidenced in the Disconnect entities list. Heuristics cause platform predictability issues.

Apologies but I'd don't quite follow the concern above. My point is that either 1) there should be a firm privacy boundary between instagram.com and facebook.com which the browser should enforce, or 2) Facebook should make it clear to users that there is no such boundary with www.facebook.com and instagram.facebook.com.

The point I was anchoring on in your statement was about prompting the user before committing the navigation. If a user navigated to instagram.com, they may not anticipate that it will redirect to instagram.facebook.com. So it seems like if the site author configured such a redirect, it would automatically allow joining of identity across facebook.com and instagram.facebook.com without (or before) the user noticing. This would suggest that the browser should prompt the user before that redirect happens, in order to confirm that it meets the user expectations.

But there wouldn't be any need for notification in the example you gave. Users would still know that things they did on instagram.com were different and isolated from facebook.com and instagram.facebook.com

This goes back to your previous assertion that the domain name in the URL is the only way to communicate the privacy boundary to the user. However, FPS also offer the opportunity to communicate that boundary as a collection of domains. I think this question is orthogonal to the point you were making about prompting the user before committing a navigation.