michaelkleber / privacy-model

A Potential Privacy Model for the Web: Sharding Web Identity
Other
189 stars 31 forks source link

Request for definition: first party/third party #8

Open ehsan opened 5 years ago

ehsan commented 5 years ago

The text uses these terms loosely and it is unclear what is meant by them. It refers to the Mozilla and WebKit anti-tracking policies which for example use slightly different definitions for "first party" so it is actually confusing which definition this text is using here.

michaelkleber commented 5 years ago

I was hoping this was addressed by the bullet

The notion of "First Party" may expand beyond eTLD+1, e.g. as proposed in First Party Sets.

Was the Mozilla policy's bit about "set of resources on the web operated by the same organization" contemplating the same sort of forward-looking possibility? Oh, but WebKit's also says "the set of resources on the web operated by the same organization" — what is the slight difference?

ehsan commented 5 years ago

I was hoping this was addressed by the bullet

The notion of "First Party" may expand beyond eTLD+1, e.g. as proposed in First Party Sets.

The First Party Sets mention should probably be removed until that proposal is something that has cross-vendor agreement on, but that wasn't really what I was referring to...

Was the Mozilla policy's bit about "set of resources on the web operated by the same organization" contemplating the same sort of forward-looking possibility? Oh, but WebKit's also says "the set of resources on the web operated by the same organization" — what is the slight difference?

Mozilla's Anti-Tracking Policy borrows its definition of "first party" from the DNT spec:

A first party is a resource or a set of resources on the web operated by the same organization, which is both easily discoverable by the user and with which the user intends to interact. An intention to interact is characterized by a deliberate action, such as clicking a link, submitting a form, or reloading a page. Merely hovering over, muting, pausing, or closing a given piece of content does not constitute an intention to interact. Interactions with other parties are considered third-party, even if the user is transiently informed in context (for example, in the form of a redirect).

This definition is carefully worded such that under it when the user is on blog.example and uses a login widget from social.example which is presented to the user in a clear fashion to log in, social.example is considered a first-party. However if on the same website the user clicks on a Play button to play a video from a video.example embed, video.example would be a third-party. The definition also takes care of interactions such as redirections which are very commonly involved in tracking as well as collusion between tracking networks when they exchange data on the client side.

The WebKit Tracking Prevention Policy defines first-party as follows:

A first party is a website that a user is intentionally and knowingly visiting, as displayed by the URL field of the browser, and the set of resources on the web operated by the same organization. In practice, we consider resources to belong to the same party if they are part of the same registrable domain: a public suffix plus one additional label.

The big difference between their definition and ours is the lack of the usage of user interaction in changing how first-parties are recognized and limiting its definition to what is visible in the address bar.

It was unclear to me what kind of first-party definition you were aiming for here.

michaelkleber commented 5 years ago

Thank you very much for pointing this out! Apologies that I wasn't aware of this very important subtlety.

Do you feel that enforcing that distinction is technically feasible? The embedded login widget "presented to the user in a clear fashion" seems like an excellent use case to support, but it seems like a whole new browser-mediated UI paradigm might be needed to be sure that it's clear to users what is going on.

ehsan commented 5 years ago

Do you feel that enforcing that distinction is technically feasible?

Firefox is shipping code based on it, so yes.

If you're asking the question of "can the browser infer e.g. whether something is a login widget and is displayed clearly etc." then the answer is clearly no but I don't think that's what you're asking? That being said ML these days makes most of my statements of this class wrong! :)

it seems like a whole new browser-mediated UI paradigm might be needed to be sure that it's clear to users what is going on.

Why do you think this implies browser UI? Firefox's ETP makes login widgets work silently for the most part. The user can dig into the outcome of the automatic decisions the browser makes after the fact but we suspect the vast majority of users aren't interested in doing that...

jyasskin commented 5 years ago

It seems like we're going to need a Talmud to explain the detailed reasoning behind word choices here. 🙂 (That's not a bad thing, just a non-obvious requirement.)

On the detailed definitions, the DNT definition defines "first party" "With respect to a given user action", which seems useful. Mozilla and WebKit try to define first party without any scope, leading to, I think, some unwanted results when the user has multiple tabs open. So, let's include a scope in whatever definition gets added here. I think "for a given user action" and "for a given top-level browsing context" are among the scopes that would would work for me.

@michaelkleber I think we should try to start with definitions and goals that make sense intuitively to users who aren't browser engineers. If that results in something that restricts things like login flows too much because, for example, the browser couldn't tell that the login provider was actually first-party, then we can look at new APIs to explain things to the browser.

englehardt commented 5 years ago

Mozilla and WebKit try to define first party without any scope, leading to, I think, some unwanted results when the user has multiple tabs open.

@jyasskin Would you mind to share an example of such an unwanted result? I'd like to better understand your concern.

jyasskin commented 5 years ago

@englehardt Keeping in mind that this is a hostile reading and not what I think any of the authors intend, Mozilla says,

A first party is a resource or a set of resources on the web operated by the same organization, which is both easily discoverable by the user and with which the user intends to interact. An intention to interact is characterized by a deliberate action, such as clicking a link, submitting a form, or reloading a page. Merely hovering over, muting, pausing, or closing a given piece of content does not constitute an intention to interact. Interactions with other parties are considered third-party, even if the user is transiently informed in context (for example, in the form of a redirect).

All of the web resources created by Google, Facebook, and the New York Times are first parties when this definition is applied to me as a user. That is, I can easily discover all three organizations, and I intend to interact with those organizations (in their own tabs) using the above definition of "intention to interact". Given that they're all first parties, the text doesn't wind up saying that "Firefox will block or remove access to stateful identifiers" when Google or Facebook are tracking a user within a NYTimes page.

I don't think anyone's actually confused about what Mozilla meant to write, but we should try to be more precise as we try to get wider consensus around a single policy.

ehsan commented 5 years ago

All of the web resources created by Google, Facebook, and the New York Times are first parties when this definition is applied to me as a user. That is, I can easily discover all three organizations, and I intend to interact with those organizations (in their own tabs) using the above definition of "intention to interact". Given that they're all first parties, the text doesn't wind up saying that "Firefox will block or remove access to stateful identifiers" when Google or Facebook are tracking a user within a NYTimes page.

I believe this is a flawed reading of the text of the policy, with all due respects. :-)

Our definition of tracking is: "Tracking is the collection of data regarding a particular user's activity across multiple websites or applications (i.e., first parties) that aren’t owned by the data collector, and the retention, use, or sharing of data derived from that activity with parties other than the first party on which it was collected.". So when we're for example speaking of "Tracking We Will Block", this is what is being discussed. It should hopefully be obvious how this would result in e.g. Firefox blocking tracking attempts when e.g. Google is tracking a user within a NYTimes page.

ehsan commented 5 years ago

It seems like we're going to need a Talmud to explain the detailed reasoning behind word choices here. slightly_smiling_face (That's not a bad thing, just a non-obvious requirement.)

To be quite frank, this proposal is making very bold suggestions around a privacy model for the web without really describing the goals that it is trying to address very well beyond "let publishers support themselves with effective advertising".

I'd like to engage more on the technical aspects, but it is very difficult right now to do without being able to read the minds of the authors on many of the questions I've brought up in the issues filed so far. I actually view them as a blocker towards further engagement in the technical details, because I think that we probably can't agree on the technical solutions as long as we can't agree which problem it is that we are solving or what are our basic definitions and assumptions.

I find it an undue burden towards discussion to expect the readers of a text like this to do all of the work necessary to read the tea leaves and go from the proposed solutions to the initial assumptions, definitions and goals.

On the detailed definitions, the DNT definition defines "first party" "With respect to a given user action", which seems useful. Mozilla and WebKit try to define first party without any scope, leading to, I think, some unwanted results when the user has multiple tabs open. So, let's include a scope in whatever definition gets added here. I think "for a given user action" and "for a given top-level browsing context" are among the scopes that would would work for me.

While this is true, I think your observation is probably testament to the fact that our Policy isn't written in HTML spec language, but rather it is meant to be a document that is understandable by people who aren't familiar with the details of the Web Platform technology. (Perhaps there is a wording change we could make to further clarify this.)

@michaelkleber I think we should try to start with definitions and goals that make sense intuitively to users who aren't browser engineers. If that results in something that restricts things like login flows too much because, for example, the browser couldn't tell that the login provider was actually first-party, then we can look at new APIs to explain things to the browser.

FWIW I wasn't necessarily suggesting that you should make this proposal more palatable towards non-browser engineers (though that is definitely your choice) but rather this issue was an honest attempt at discovering what kind of first party definition you had in mind. So for example simply defining a set of basic terms like this one in the beginning would fully address my question. :-)

Thanks!