privacycg / storage-access

The Storage Access API
https://privacycg.github.io/storage-access/
209 stars 27 forks source link

Per-Frame or Per-Page Storage Access #3

Closed johnwilander closed 4 years ago

johnwilander commented 4 years ago

Mozilla's documentation on differences between Safari's and Firefox's implementation covers the difference explained below.

Safari: Storage access is granted only to the iframe that requested it, not to other iframes on the webpage and not to other subresources such as scripts and images.

Firefox: Storage access is granted to all matching subresources on the webpage such as iframes, scripts loads, and image loads.

We (WebKit) have received a few bugs where developers are asking for full storage access under the current webpage. Even so, our original intent for the API is to grant access to the specific embedded piece of content that needs it. Such a limited scope also guarantees that no other context such as other iframes or subresource loads suddenly have a change in their cookie access.

Mozillans, have you seen any issues arise from your full page scope? Did you have specific reasoning behind it?

annevk commented 4 years ago

If A embeds documents B1 and B2, granting storage access to B1 only is not meaningful as B2 has synchronous script access to B1.

If A embeds document B1 and subresource B2 I suppose you could only send cookies for B1 (and its same-origin subresources presumably?), but I don't see there's a meaningful difference with also sending cookies to B2 (and it seems more useful for the developer).

cc @ehsan @bakulf

jackfrankland commented 4 years ago

In most real-world scenarios I can think of, where there is an embedded third-party iframe on a site, the third-party already has first-party script access. This is due to the integration usually being via a JS snippet, which injects the iframe dynamically onto the page.

In these cases, once the user has granted storage access within the iframe, it would be trivial to store a new identifier for the user in first-party storage (via postmessage), that can then be used for tracking purposes (including the identifier as a query parameter on subresource requests).

Carrying on from this, and perhaps relating more to the issue https://github.com/privacycg/storage-access/issues/2, this identifier will remain in first party storage, allowing the user to be tracked across pages, or at least for pages where the JS snippet is being loaded.

Even if the the third party didn't have first-party script access, it could use its own storage (local storage for example), to track the user across pages that have an embedded iframe.

My overall point is that depending on the mechanism of the third party integration, it may be futile to try and prevent further tracking once the user has granted storage access. Perhaps it would be better to make it clear to the user that this is what they are accepting when granting access.

hober commented 4 years ago

it may be futile to try and prevent further tracking once the user has granted storage access. Perhaps it would be better to make it clear to the user that this is what they are accepting when granting access.

It's definitely important that UAs inform the user of the risks associated with granting storage access. Consider for example the screenshot of Safari's prompt here:

https://webkit.org/blog/8311/intelligent-tracking-prevention-2-0/

othermaciej commented 4 years ago

I believe it is possible to restrict further tracking once the user has granted storage access. If the non-granted form of the storage is blocked, or ephemeral, then it can't be used to persistently store an identifier.

Tracking via first-party storage is definitely a risk, and is already in widespread use. I believe there are strategies to deal with it. For example see the limitations introduced for "link decoration" tracking mentioned here:

https://webkit.org/blog/8828/intelligent-tracking-prevention-2-2/ https://webkit.org/blog/9521/intelligent-tracking-prevention-2-3/

(At some point we might want to put first-party storage limits into standards as well.)

jackfrankland commented 4 years ago

In relation to this issue specifically then, I do not see the benefit of limiting storage access to the single embedded iframe, considering:

As for other iframes or subresource loads suddenly having a change in their storage access, this is something that developers already have to deal with in Safari; cookie access is granted after the user visits the domain in a first-party context (and a cookie is set). I think it would be good to consider adding an observer to make it easier for developers to know when there has been a change in storage access, perhaps.

johnwilander commented 4 years ago

I will address the per-frame vs per-page scope at the CG conference call that’s coming up.

What it comes down is that the Storage Access API is not the “Legacy Third-Party Cookie API” or the “Quirks Mode API,” i.e. the API is not intended to get things back to the old world. The API has a purpose in the new world where embedded web content needs to ask for permission to use its first-party storage for the purposes of authentication or service fulfillment.

johnwilander commented 4 years ago

(I’m not saying anyone claims it should be a quirks mode API. I’m just saying we should design something for a modern, useful purpose, not to fix legacy things that assumed full cookie access always. This API should serve the web platform going forward, for many years to come.)

Brandr0id commented 4 years ago

This API should serve the web platform going forward, for many years to come

Agreed, we should ensure this meets the needs going forward and serves the platform in the best possible way.

After looking through options to scope the grants per-frame or per-page Edge has been leaning towards the per-page access model.

Many of the points have already been discussed but looking at the amount of meaningful protection afforded to users if storage access has already been granted for the same third party as well as the user and web developers experiences with the API led to this.

If a second frame, from an already allowed third party origin, is blocked it doesn't add meaningful protection to the user. However it is likely to cause the content to act in non-intuitive or user-unfriendly manner. Take the social media authentication example. If there are multiple embedded social.example frames on content-aggregator.example a user can interact with one to "sign in" by unblocking storage access. If the other frames remain signed out or they are unable to interact with the content accessing storage it may seem like the platform is malfunctioning. Similarly a web developer may not intuitively expect that if they have requested access for their site, social.example, on content-aggregator.example that despite using requestStorageAccess() they actually don't have access because it happened in a parent or sibling frame.

ehsan commented 4 years ago

I can comment on why we chose the per-page model. We considered going with the per-frame model that WebKit used since we were the second implementation at first, but the cross-frame script access work-around that @annevk mentioned was what caused us to set up the scope of storage grants to all of the frames from that origin on the page.

I think our experience has shown that developers who are trying to write tracking code would try to circumvent this restriction and they won't rely on a user prompt for their circumvention; since other possible circumvention mechanisms exist. The risk to worry about here is over-prompting[*] for developers who are trying to make their applications work for their users, facing a prompt which has a very low click-through rate. Putting restrictions such as the per-frame model in place may force those developers to have to write code which looks in the parent for another iframe of their own origin which may have already obtained storage access, or even worse write code that behaves like anti-tracking circumvention code and get trapped in other anti-tracking mechanisms that browsers ship.

Of course with the above in mind, I think this may be something that we could reconsider if we were in another position in the future (e.g. such as other circumvention techniques being less in use in practice).

[*] I should also mention that at Mozilla is doubtful on the usefulness of permissions to obtain informed consent from users. This is a very broad topic, which this paper explores somewhat. Our current implementation of the storage access API uses prompting as a fallback, and we're not yet sure if users can understand all of the information presented in this prompt, respond free from coercion (e.g. from the site), and be held accountable for the consequences of their action (e.g. if they get tracked when they wanted to get a video to play), to borrow some concepts from that paper. So we've been averse towards too much prompting.

johnwilander commented 4 years ago

This API should serve the web platform going forward, for many years to come

Agreed, we should ensure this meets the needs going forward and serves the platform in the best possible way.

After looking through options to scope the grants per-frame or per-page Edge has been leaning towards the per-page access model.

Many of the points have already been discussed but looking at the amount of meaningful protection afforded to users if storage access has already been granted for the same third party as well as the user and web developers experiences with the API led to this.

We do not argue that the per-frame model provides extra protection over per-page. Sorry if that has been the impression. We just think it's a more purposeful, well-scoped API that makes sense for the user who tapped/clicked the content and the developer who needs cookie access to deliver some embedded experience.

If a second frame, from an already allowed third party origin, is blocked it doesn't add meaningful protection to the user. However it is likely to cause the content to act in non-intuitive or user-unfriendly manner. Take the social media authentication example. If there are multiple embedded social.example frames on content-aggregator.example a user can interact with one to "sign in" by unblocking storage access. If the other frames remain signed out or they are unable to interact with the content accessing storage it may seem like the platform is malfunctioning.

That is not the case in our implementation. Once the user has opted in through the permission prompt, subsequent calls to document.requestStorageAccess() for the same pair of first and third party will not prompt the user and instead just instantly open up cookie access.

Similarly a web developer may not intuitively expect that if they have requested access for their site, social.example, on content-aggregator.example that despite using requestStorageAccess() they actually don't have access because it happened in a parent or sibling frame.

I'm not sure that I understand this part. All of that code is from social.example. They are in full control of it. Without messaging across frames, their different pieces of embedded content cannot even know that one of their iframes has called the Storage Access API.

johnwilander commented 4 years ago

I can comment on why we chose the per-page model. We considered going with the per-frame model that WebKit used since we were the second implementation at first, but the cross-frame script access work-around that @annevk mentioned was what caused us to set up the scope of storage grants to all of the frames from that origin on the page.

I think our experience has shown that developers who are trying to write tracking code would try to circumvent this restriction and they won't rely on a user prompt for their circumvention; since other possible circumvention mechanisms exist. The risk to worry about here is over-prompting[*] for developers who are trying to make their applications work for their users, facing a prompt which has a very low click-through rate. Putting restrictions such as the per-frame model in place may force those developers to have to write code which looks in the parent for another iframe of their own origin which may have already obtained storage access, or even worse write code that behaves like anti-tracking circumvention code and get trapped in other anti-tracking mechanisms that browsers ship.

As mentioned above, WebKit's implementation does not re-prompt. Once the user has opted in, that preference is saved and subsequent calls to document.requestStorageAccess() for the same pair of first and third party provides instant cookie access without a prompt.

Does that change your perspective on this?

Of course with the above in mind, I think this may be something that we could reconsider if we were in another position in the future (e.g. such as other circumvention techniques being less in use in practice).

[*] I should also mention that at Mozilla is doubtful on the usefulness of permissions to obtain informed consent from users. This is a very broad topic, which this paper explores somewhat. Our current implementation of the storage access API uses prompting as a fallback, and we're not yet sure if users can understand all of the information presented in this prompt, respond free from coercion (e.g. from the site), and be held accountable for the consequences of their action (e.g. if they get tracked when they wanted to get a video to play), to borrow some concepts from that paper. So we've been averse towards too much prompting.

johnwilander commented 4 years ago

To give some more context to "the Storage Access API is not the Legacy Third-Party Cookie Mode API" statement above, I'd like to mention that WebKit has a real legacy mode which we call a "temporary compatibility fix for popups." That compatibility fix does provide per-page cookie access to the third-party and it is the exact kind of thing that needs it – a temporary fix for legacy federated login behavior.

johnwilander commented 4 years ago

Based on discussions on the CG's last phone call, I added:

… in https://github.com/privacycg/storage-access/commit/32e1d3f46c8780f51966ed5880a981eab88d3136.

annevk commented 4 years ago

Without messaging across frames

I wonder if there is some misunderstanding here. If A embeds B1 and B2 then B1 can access B2 by doing self.parent[1].localStorage["hello"] (i.e., synchronously, they're all part of the same agent/event loop). It's really weird if that returns something but self.localStorage["hello"] throws or some such.

hober commented 4 years ago

The spec will define this to be per-page when #27 lands (see the storage access map). At least, it attempts to. Please let me know if I got it right.