w3ctag / design-reviews

W3C specs and API reviews
Creative Commons Zero v1.0 Universal
330 stars 55 forks source link

Shared Storage API #747

Closed pythagoraskitty closed 11 months ago

pythagoraskitty commented 2 years ago

Braw mornin' TAG!

I'm requesting a TAG review of Shared Storage.

In order to prevent cross-site user tracking, browsers are partitioning all forms of storage (cookies, localStorage, caches, etc) by top-frame site. But, there are many legitimate use cases currently relying on unpartitioned storage that will not be fully met without the help of new web APIs. We’ve seen a number of APIs proposed to fill in these gaps (e.g., Attribution Reporting API, Private Click Measurement, Storage Access, Trust Tokens, FLEDGE, Topics) and some remain (including cross-origin A/B experiments and user measurement). We propose a general-purpose, low-level API that can serve a number of these use cases.

The idea is to provide a storage API (named Shared Storage) that is intended to be unpartitioned. Origins can write to it from their own contexts on any page. To prevent cross-site tracking of users, data in Shared Storage may only be read in a restricted environment that has carefully constructed output gates. Over time, we hope to design and add additional gates.

Further details:

We'd prefer the TAG provide feedback as (please delete all but the desired option):

☂️ open a single issue in our GitHub repo for the entire review

Security/Privacy Questionnaire

This section contains answers to the W3C TAG Security and Privacy Questionnaire. This can also be found here

  1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?

    Shared Storage is a JavaScript storage mechanism (much like localStorage, indexedDB, CacheStorage, etc.). Like those other storage mechanisms it’s partitioned by origin. But unlike the other storage mechanisms, which may eventually be partitioned by top-frame site as well, (depending on the browser), Shared Storage is designed to not need top-frame partitioning. The intention is to provide for a large number of cross-site use cases such as cross-site fraud and abuse detection, a/b testing (including lift measurement), reach measurement, and frequency capping of ads.

  2. Do features in your specification expose the minimum amount of information necessary to enable their intended uses?

    Yes. While cross-partition writing to Shared Storage is unbounded, reading is severely rate limited. Reading can only occur in isolated JavaScript worklet environments, called Shared Storage Worklets, and data can only leave these worklets via “output gates”. The two output gates in development are selectURL and Private Aggregation.

    The selectURL gate allows for the Shared Storage Worklet to select between one of n URLs supplied by the embedder using information available in Shared Storage. The URL can therefore represent up to log2(n) bits of cross-site information from Shared Storage. The returned url is opaque to the caller, and can only be read within a Fenced Frame, which is isolated from the embedding page. On Fenced Frame user activation, the Fenced Frame can navigate to a destination page, which ultimately could leak the log2(n) bits with the embedder’s URL to the destination page.

    The Private Aggregation API will allow for data in Shared Storage to be aggregated into histograms. The resulting histograms are noised and are differentially private.

    Each output gate has a “budget”. The Private Aggregation’s budget is defined by its differential privacy parameters (e.g., the epsilon value). However, this budget must be reset periodically else Shared Storage becomes useless over time. The selectURL operation has a cap (budget) on the number of leaked bits per period. The bits are removed from the budget once the user has clicked on the Fenced Frame. Like with Private Aggregation, the budget is periodically reset.

  3. How do the features in your specification deal with personal information, personally-identifiable information (PII), or information derived from them?

    Shared Storage does not place any limits on the types of information origins can write to shared storage.

  4. How do the features in your specification deal with sensitive information?

    Same answer as # 3.

  5. Do the features in your specification introduce a new state for an origin that persists across browsing sessions?

    Yes, similar to other storage mechanisms, with tracking mitigations provided in answer to # 2.

  6. Do the features in your specification expose information about the underlying platform to origins?

    No.

  7. Does this specification allow an origin to send data to the underlying platform?

    Shared Storage’s selectURL operation produces a urn:uuid that only a Fenced Frame can interpret.

  8. Do features in this specification allow an origin access to sensors on a user’s device

    No.

  9. What data do the features in this specification expose to an origin? Please also document what data is identical to data exposed by other features, in the same or different contexts.

    Limited access to the origin’s cross-site data as described in answer # 2.

  10. Do features in this specification enable new script execution/loading mechanisms?

    Yes. The Shared Storage worklets are loaded and executed in separate JavaScript contexts without access to any web page or network.

  11. Do features in this specification allow an origin to access other devices?

    No.

  12. Do features in this specification allow an origin some measure of control over a user agent’s native UI?

    No.

  13. What temporary identifiers do the features in this specification create or expose to the web?

    None.

  14. How does this specification distinguish between behavior in first-party and third-party contexts?

    Shared Storage is intentionally provided to both first and third-parties on a page.

  15. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode?

    The behavior is like other JavaScript storage mechanisms in incognito mode. That is, the storage is kept in a separate partition from normal browsing mode, and is cleared when the incognito session ends.

  16. Does this specification have both "Security Considerations" and "Privacy Considerations" sections?

    Yes.

  17. Do features in your specification enable origins to downgrade default security protections?

    No.

  18. What should this questionnaire have asked?

    N/A.

torgo commented 2 years ago

Hi - just quickly is this happening / going to happen in the Privacy CG? Thanks.

hadleybeeman commented 1 year ago

Discussed briefly in TAG breakout A today. This fits in with more general feedback that we are working on.

jkarlin commented 1 year ago

Sorry, didn't see the earlier question. The CG is WICG. I haven't heard support from privacy cg or patcg.

nschonni commented 1 year ago

https://github.com/mozilla/standards-positions/issues/646

rhiaro commented 1 year ago

Hi there, sorry for the delay in getting back to you on this.

We are concerned about exposing such a complex low level API to authors (see Design Principles: high vs low level API tradeoffs). This proposal is about much more powerful functionality than only sharing access to storage. In particular, with regards to accommodating use cases that are no longer met once third-party cookies are deprecated, we strongly encourage addressing these in a focused, case-by-case way, rather than in the general sense. For example, you mention single sign-on as a use case, but we understand that FedCM is being worked on specifically to address this case.

Could you provide us with a summary of how Shared Storage fits in with the other Privacy Sandbox proposals (such as Fenced Frames, Topics, First Party Sets)? Are there any duplicated functionality / use cases among them? Where are the overlaps?

We appreciate that you see this proposal as providing a privacy improvement compared to the status quo of third-party cookies on the web, however would you be able to give us an analysis of the privacy implications in comparison with the web without third-party cookies as the baseline?

johannhof commented 1 year ago

Hi Amy, just wanted to quickly ask for clarification (without speaking for the proposal authors):

For example, you mention single sign-on as a use case

This doesn't match my understanding of Shared Storage, do you have a link for where that is proposed as a possible use case for it? I might be missing something here.

jkarlin commented 1 year ago

Hi folks, thanks for the questions.

We are concerned about exposing such a complex low level API to authors (see Design Principles: high vs low level API tradeoffs). This proposal is about much more powerful functionality than only sharing access to storage. In particular, with regards to accommodating use cases that are no longer met once third-party cookies are deprecated, we strongly encourage addressing these in a focused, case-by-case way, rather than in the general sense. For example, you mention single sign-on as a use case, but we understand that FedCM is being worked on specifically to address this case.

The design principle example shows a high level vs low level tradeoff where the low level reveals substantially more data than the high level. In the case of Shared Storage, the idea is to create output gates that are only capable of revealing small amounts of data in the long run. For measurement use cases, the differential privacy parameters of the private aggregation API control the rate. For dynamic document selection (selectURL), there is an entropy rate limit. These rates are defined by the user agent. I do agree that a low-level API would need to support a higher rate than an individual purpose-built API as it supports more use cases. But it’s not clear to me that it would need to provide a higher rate than all of the purpose-built APIs combined.

Note that even for the purpose built APIs, what gets sent is left up to the sender. The privacy comes from the mechanism (e.g., differential privacy or entropy rates). So fundamentally, as long as the privacy mechanisms are in place, I think we should explore more expressive APIs that allow for technical innovation in parallel to the purpose-built ones. This can help to cover the use cases that we’ve missed before third-party cookie deprecation, and also serves as a proving ground for new purpose-built APIs in the future. The Storage Access API exists as a similar catch-all.

Could you provide us with a summary of how Shared Storage fits in with the other Privacy Sandbox proposals (such as Fenced Frames, Topics, First Party Sets)? Are there any duplicated functionality / use cases among them? Where are the overlaps?

Shared Storage has three components. The first is the unpartitioned storage API (write data from anywhere, read it only in an isolated worklet). The other two components are the private ways in which data can leave the worklet. Private Aggregation API is a measurement API, allowing for measurement of things like ad reach, demographics, or cross-site debug reporting. This complements the other APIs (such as Attribution Reporting, FLEDGE, etc.) to cover measurement use cases that they don't explicitly support yet. SelectURL allows for choosing between documents to display in a fenced frame based on shared storage data. This could be useful for choosing between contextual ads that FLEDGE wasn’t involved with choosing, for running cross-site A/B experiments, for payment and login providers to show different buttons based on the user’s logged-in status, etc. It leverages fenced frames to limit the choice of selected document from being revealed to the embedding page.

We appreciate that you see this proposal as providing a privacy improvement compared to the status quo of third-party cookies on the web, however would you be able to give us an analysis of the privacy implications in comparison with the web without third-party cookies as the baseline?

Sure. Private aggregation can reveal cross-site data at a rate defined by its differential privacy parameters at a per-origin scope. SelectURL can reveal up to X bits of cross-site entropy (also origin scoped). These limits are reset at a defined rate. All of the parameters are set by the user agent. This is similar to the limits for other privacy-preserving APIs such as Attribution Reporting, FLEDGE, Topics, PCM, and IPA from Mozilla & Meta.

I think it’s important to point out that in this baseline you’re proposing, we’re seeing that other methods are emerging to personalize ads for users, including fingerprinting and increased usage of PII gates. This is why our work is so important. We’re developing privacy-safe ways to enable personalized advertising that don’t rely on user PII or fingerprinting.

jkarlin commented 1 year ago

Wanted to add that the specification is also available. If you'd like me to open up a review for it in a new thread let me know.

atanassov commented 1 year ago

Hi @jkarlin, thank you for the answers above. The following is hard to follow as we don't seem to have all the context you do. Can you try and answer the original question about how this proposal fits into Privacy Sandbox more directly?

Shared Storage has three components. The first is the unpartitioned storage API (write data from anywhere, read it only in an isolated worklet). The other two components are the private ways in which data can leave the worklet. Private Aggregation API is a measurement API, allowing for measurement of things like ad reach, demographics, or cross-site debug reporting.

One of the main sections that's missing for me is about user needs. Can you elaborate and ideally add these to the explainer? See https://tag.w3.org/explainers/. The explainer talks about "including cross-origin A/B experiments and user measurement" – can you elaborate these in the form of user needs definition - from the user's perspective, how this benefits the end user.

The output gaiting of the API seems to be entirely based on budgeting. Can you confirm if this is the case or are we missing some other controls?

jkarlin commented 1 year ago

Hi @jkarlin, thank you for the answers above. The following is hard to follow as we don't seem to have all the context you do. Can you try and answer the original question about how this proposal fits into Privacy Sandbox more directly?

Shared Storage has three components. The first is the unpartitioned storage API (write data from anywhere, read it only in an isolated worklet). The other two components are the private ways in which data can leave the worklet. Private Aggregation API is a measurement API, allowing for measurement of things like ad reach, demographics, or cross-site debug reporting.

I guess I don't quite understand what you're asking for here. It fits into the privacy sandbox in the sense that it enables sites to perform operations using cross-site data such that the cross-site data is leaked in a rate-controlled way (either differential privacy or entropy limits). The API is general purpose, as there are use cases we will have missed (or new use cases to discover) that aren't covered by the purpose-built APIs. If you're looking for specific use cases then I would say that I'm aware of experiments being designed to perform reach measurement (understanding how many users have seen your ad) and incrementality studies (understanding what the ROI on your advertising is) with the Private Aggregation API.

One of the main sections that's missing for me is about user needs. Can you elaborate and ideally add these to the explainer? See https://tag.w3.org/explainers/. The explainer talks about "including cross-origin A/B experiments and user measurement" – can you elaborate these in the form of user needs definition - from the user's perspective, how this benefits the end user.

From the developer's perspective, shared storage and its output gates allow for better advertising performance measurement, spam and fraud defense, and content selection. The end-user's benefit to having a thriving digital advertising ecosystem is that the sites that they visit can fund themselves without having to resort to tracking individual user movements across the web. This enables the sites that the user enjoys visiting to thrive and provide more content while respecting the user's privacy.

The output gaiting of the API seems to be entirely based on budgeting. Can you confirm if this is the case or are we missing some other controls?

Yes. Like all of the proposed APIs in this space, shared storage gates each have budgets. The budgets are effectively rate limits, as it's necessary for the budgets to reset over time otherwise they have a limited time to be useful.

hober commented 1 year ago

see also https://github.com/WebKit/standards-positions/issues/10

torgo commented 1 year ago

From our TAG F2F today:

Having reviewed the Mozilla and Webkit position discussions, the TAG shares the privacy concerns Mozilla raised regarding this. We'd like to see these use cases worked on in PATCG, with broader participation from other implementors.

We are concerned about the privacy implications of any storage intended to be available across sites or origins without the user's explicit permission, and see that this could lead to capabilities used to create a drop-in replacement for third-party cookies as they work now. This goes against the Ethical Web Principle The web must enhance individuals' control and power. The TAG is explicitly trying to encourage development of new web technologies to replace 3rd party cookies that do not replicate the privacy pitfalls of 3rd party cookies. See our draft finding Improving the web without third-party cookies.

We are concerned that the user needs given aren't technical needs. For example, a comparison table between the way these use cases are currently serviced and the way they are envisioned to be serviced with this new technology in place, and what the user benefit would be, would be more like what we're looking for. We recognise the use cases (cross-origin A/B experiments, user measurement, etc — which are site owner or developer needs) can provide value, but are not convinced that the value is worth the compromise to users' privacy.

We'd be grateful if you would please clarify the user needs as outlined above.

One last more general question we'd like to get a clear answer on is on a scale of 1 to 100, what pieces of the proposals in privacy sandbox will need to be in place to have a clear deprecation plan for third-party cookies, and how much does Shared Storage get us there? With so many related proposals coming in, we are concerned that the collective amount of entropy might result in a supercookie that maintains the status quo. We would likely be able to be able to provide more constructive (and likely pragmatic) feedback with some level of clarity on roughly how close we will be getting to deprecation (of third-party cookies) with the current set of proposals.

jkarlin commented 1 year ago

Having reviewed the Mozilla and Webkit position discussions, the TAG shares the privacy concerns Mozilla raised regarding this. We'd like to see these use cases worked on in PATCG, with broader participation from other implementors.

We’re happy to discuss the measurement and targeting needs addressed by Shared Storage, Topics, FLEDGE, etc. within the PATCG. However the priority of the group has been to focus on a single use case at a time with the current focus being conversion measurement. When the PATCG broadens its focus to other use cases, we will happily engage and work together on solutions with multi-vendor support

We are concerned about the privacy implications of any storage intended to be available across sites or origins without the user's explicit permission, and see that this could lead to capabilities used to create a drop-in replacement for third-party cookies as they work now.

We certainly do not feel that differentially private reporting and limited (~3 bits on ad click) leakage are equivalent in capability to third party cookies. The budgets (e.g., epsilon and delta, or entropy, per unit time) matter.

This goes against the Ethical Web Principle The web must enhance individuals' control and power. The TAG is explicitly trying to encourage development of new web technologies to replace 3rd party cookies that do not replicate the privacy pitfalls of 3rd party cookies. See our draft finding Improving the web without third-party cookies.

We are quite aligned with the principles you point to.

I will reiterate that any proposal in this space involves reducing the near infinite bit budget of today’s third-party cookies via rate limiting (e.g., with differential privacy or entropy rate limits). The user’s control and power is their ability to disable the API.

We are concerned that the user needs given aren't technical needs. For example, a comparison table between the way these use cases are currently serviced and the way they are envisioned to be serviced with this new technology in place, and what the user benefit would be, would be more like what we're looking for. We recognise the use cases (cross-origin A/B experiments, user measurement, etc — which are site owner or developer needs) can provide value, but are not convinced that the value is worth the compromise to users' privacy. We'd be grateful if you would please clarify the user needs as outlined above.

The user need we are trying to support is the continued existence of a large amount of content freely available on the web. The collection of use cases that are proximate goals of Privacy Sandbox are, to the best of our understanding, the essential tools that enable the flow of money from advertisers to most of the web sites in the world.

We agree that it is not immediately obvious that a use case like "cross-origin A/B experiments" is a necessary contributor to that goal. But the Privacy Sandbox effort was rooted in years of discussions in the Improving Web Advertising Business Group to learn what capabilities were widely agreed to be required for the ads ecosystem to retain its ability to move money from advertisers to publishers.

One last more general question we'd like to get a clear answer on is on a scale of 1 to 100, what pieces of the proposals in privacy sandbox will need to be in place to have a clear deprecation plan for third-party cookies, and how much does Shared Storage get us there? With so many related proposals coming in, we are concerned that the collective amount of entropy might result in a supercookie that maintains the status quo. We would likely be able to provide more constructive (and likely pragmatic) feedback with some level of clarity on roughly how close we will be getting to deprecation (of third-party cookies) with the current set of proposals.

Please see https://privacysandbox.com/timeline for Chrome's intended timeline for third-party cookie removal. This targets removal starting in Q3 of 2024, and is based on the same collection of proposals that we've been incubating for years.

This doesn't answer your "on a scale of 1 to 100" question, though, and it's hard to know how to address that. The Blink process pushes us to provide as much transparency as possible about future plans, and we think we have done so throughout, even in the face of considerable uncertainty. And if something arises that has implications for this timeline, we will update privacysandbox.com and our other communication channels as promptly as we can.

torgo commented 1 year ago

Hi Josh - just checking back on this. Has there been any movement on multi-stakeholder (specifically multi-browser) support for this API? Thanks, Dan

torgo commented 1 year ago

It was brought to my attention today that my comment above could be clearer. To clarify: we pay attention to multi-stakeholder support as a key indicator of consensus. As stated in the Ethical Web Principles, the Web is multi-browser, multi-OS, multi-device. Multiple implementations are also part of the W3C process and WHATWG process. We are also considering the complexity we're adding to the web platform – and disparity of implementations of web feature adds to that complexity. This is why we ask for "Key pieces of existing multi-stakeholder review" in our review request template. Ideally we would like to see expressed support from at least 2 browser engines. It would also be interesting to know if there has been support from other Chromium-based browsers and/or from web developers.

jkarlin commented 1 year ago

I'm not aware of interest from other browsers for Shared Storage. There is certainly web developer interest (and ongoing testing) for advertising, a/b testing, and anti-fraud use cases.

shivanigithub commented 1 year ago

FYI, Chrome plans to start gating shared storage API invocation behind the enrollment and attestation mechanism. (enrollment explainer, spec PR)

rhiaro commented 11 months ago

Hi folks - We're planning to close this as the concerns raised by us and in the Mozilla standards position have not been addressed. We're also noting that there is lack of multi-stakeholder feedback/support. We are sympathetic to the need to replace some of the functionality lost with the removal of third-party cookies; in this case we think it'd be better to address the use cases individually, with designed-for-purpose approaches, rather than adding a single underlying mechanism which has the potential to re-introduce the risks that the removal of third-party cookies were intended to mitigate in the first place.

We're a little concerned by the addition of the attestation mechanism mentioned. Registering in order to use an API is not something we see having a place in the web platform.

Please let us know if there are any significant changes in the design going forward that would mitigate the issues raised.

jkarlin commented 5 months ago

Please let us know if there are any significant changes in the design going forward that would mitigate the issues raised.

Are you also interested in hearing about and providing feedback on other changes to the design going forward, given the unsatisfied status?

e.g., we're making it possible to directly create cross-site worklets so that third-party script doesn't have to create an iframe for its origin in order to create its worklet.

torgo commented 1 month ago

Hi @jkarlin sorry we didn't get back to you on this question. If there are significant changes to Shared Storage that you think we should review, then by all means let us know and we can re-review. Depending on the scope of the update it might be appropriate to open a new review?

jkarlin commented 1 month ago

Great! FYI https://github.com/WICG/shared-storage/pull/161 is a recent spec change we've made to make it possible to load cross-site script in shared storage worklets, and to align the behavior of addModule and createWorklet() to both use the calling context's origin by default.