privacycg / proposals

New proposals in the Privacy Community Group
https://privacycg.github.io
122 stars 5 forks source link

Speculative Request Control #19

Open eligrey opened 3 years ago

eligrey commented 3 years ago

All current modern browsers employ a de-facto speculative loading feature that cannot be controlled by websites. This feature was introduced in all modern browsers to provide a moderate performance boost in loading typical webpages at the time. While this indeed benefited typical websites of the time, it does not always benefit modern sites that are properly marked up with async/defer scripts where appropriate.

Websites should be able to opt-out of eager speculative requests and be able to accept responsibility for their own site performance.

Giving site owners control over eager speculative requests improves the security implications of generating dynamic <meta> CSPs at runtime based on private locally-stored tracking consent data. Currently, client-side-generated <meta> CSPs are effectively unenforced until DOMContentLoaded due to eager speculative requests. With eager speculative requests disabled, these CSPs can be effectively applied and enforced immediately.

What is an eager speculative request?

An eager speculative request is a speculative request that is sent out before preceding synchronous scripts finish executing.

Motivating Use Cases

The motivating use case for this feature is to increase the ease at which sites could adopt a CSP based on locally-stored consent provided by a third party JS library. In this use case, we can assume that the library vendor and site owner have taken the time to explicitly preload resources asynchronously where appropriate, as they must knowingly disable eager speculative requests.

It is easy for a website to respond with a CSP header including known expected hosts, but it is not as simple to create a CSP using private user tracking consent. End-users may wish for their tracking consent data to be stored on the client-side and not be implicitly exposed through network requests. It is possible to create a client-side JavaScript library (e.g. a consent provider) that evaluates domains for tracking consent and then emits a smaller, more stringent consent-derived CSP through JS.

Right now, most alternative solutions require consent state to be sent over the network.

More in my explainer draft.

annevk commented 3 years ago

Related: https://github.com/whatwg/html/issues/5624. cc @zcorpan @hsivonen

hsivonen commented 3 years ago

It seems to me that this hinges a lot on whether one considers CSP via runtime meta as a misfeature or not. To me, it seems questionable not to provide CSP via HTTP headers.

async/defer etc. are rather beside the point. Markup that's visible to the parser can be deal with on a per-resource basis without having to turn the whole preloader off.

It's possible that the exist super-competent Web devs who can do better than the browser. However, it's all too likely that turning the preloader off becomes a cargo cult applied by less competent developers who end up sabotaging the overall perf.

zcorpan commented 3 years ago

First, from what I can tell, meta CSP is applied and enforced immediately already. It's specified in HTML like so:

When a meta element is inserted into the document, if its http-equiv attribute is present and represents one of the above states, then the user agent must run the algorithm appropriate for that state, as described in the following list: ... Content security policy state (http-equiv="content-security-policy") This pragma enforces a Content Security Policy on a Document. [CSP]

  1. If the meta element is not a child of a head element, return.

  2. If the meta element has no content attribute, or if that attribute's value is the empty string, then return.

  3. Let policy be the result of executing Content Security Policy's parse a serialized Content Security Policy algorithm on the meta element's content attribute's value, with a source of "meta", and a disposition of "enforce".

  4. Remove all occurrences of the report-uri, frame-ancestors, and sandbox directives from policy.

  5. Enforce the policy policy.

https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-security-policy

This demo shows that it is enforced immediately in webkit, chromium, gecko: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/8456

Second, I don't see how this relates to speculative parsing. In https://github.com/whatwg/html/issues/5624#issuecomment-689547534 I noted that chromium stops the speculative parser (or at least doesn't speculatively fetch anything) if the speculative parser finds a CSP meta, while webkit and gecko ignore it in the speculative parser. That is, webkit and gecko can speculatively fetch resources after a CSP meta without enforcing the CSP policy to those fetches. I hope those fetches are invalidated when the real parser finds the CSP meta and later the fetching element, but I haven't tested that. Still, this issue talks about script-generated meta, which is not something the speculative parser could find.

Also, I was under the impression that speculative parsing only starts if there's a non-defer non-async non-module script src element. Is this not the case?

hsivonen commented 3 years ago

Also, I was under the impression that speculative parsing only starts if there's a non-defer non-async non-module script src element. Is this not the case?

That applies for document.write-inserted content in Gecko. However, for content arriving from the network, the prefetches start before the corresponding DOM insertions even when the parsing is not speculative. The time difference between those fetches starting and the corresponding DOM insertions happening is just very short.

hsivonen commented 3 years ago

Notably, the DOM insertions look at the wall clock to decide to let the event loop to spin before a script is seen. That in particular would allow non-speculative prefetches to end up a longer time earlier from the corresponding DOM insertions.

zcorpan commented 3 years ago

@hsivonen what you describe is then not relevant for speculative parsing, but for speculative fetches that happen from the normal HTML parser, between the time the tree builder processes a token and the time the parser decides to insert the element to the DOM. Yes?

So, it's possible for a script to insert a CSP meta in between?

hsivonen commented 3 years ago

@hsivonen what you describe is then not relevant for speculative parsing, but for speculative fetches that happen from the normal HTML parser, between the time the tree builder processes a token and the time the parser decides to insert the element to the DOM. Yes?

Yes.

So, it's possible for a script to insert a CSP meta in between?

Yes, e.g. from setTimeout if the DOM insertion batch is so long relative to CPU speed that the event loop gets to spin.

In general, trying to use script to undo fetches that, absent the script, would be caused by what's in the HTML source from the network results in a bad time, and I think we should not change the Web Platform to facilitate such attempts.

zcorpan commented 3 years ago

@hsivonen thanks. So a control that disables speculative parsing, or avoiding script src, is not enough to avoid all speculative fetches, since normal parsing also does speculative fetches.

eligrey commented 3 years ago

I'm renaming this issue to "Speculative Request Control" to better fit with the use-case goals. What I want in my use case is a way to disable speculative fetches alone. Speculative parsing without making requests is safe and doesn't need a control in my opinion.

eligrey commented 3 years ago

I've tweaked the proposed API naming to reflect my focus on eager speculative requests:

API

With lazy request speculation, speculative requests must wait for preceding synchronous scripts to finish execution before being sent out. Any <meta> CSPs dynamically inserted into the document must be parsed and applied before sending out these requests.

Request-Speculation HTTP header

Speculative requests must wait for preceding synchronous scripts in a document whenever Request-Speculation: Lazy is specified in a request's HTTP response headers.

request-speculation attribute on document element

If there is a root document element with a request-speculation attribute and the attribute has a value that case-insensitively equals lazy, then speculative requests must wait for preceding synchronous scripts.

Read-only Document.prototype.requestSpeculation getter

document.requestSpeculation reflects the document's current request speculation setting as either eager or lazy. This is a read-only getter.

Example usage

<html request-speculation="lazy">
  <head>
    <script src="/consent-provider-utils.js"></script>
    <script>
    // Create meta CSP
    const meta = document.createElement('meta');
    meta.httpEquiv = 'Content-Security-Policy';

    // Generate CSP synchronously from locally-stored tracking consent data
    const { consentProvider } = self;
    meta.content = consentProvider
      ? consentProvider.generateCSPFromConsent(localStorage.trackingConsent)
      : 'default-src […];'; // default CSP

    // Enforce CSP on document
    document.head.appendChild(meta).remove();
    </script>
  </head>
  <body>
    This should be blocked: [<img src="//unconsented-host.example"/>]
  </body>
</html>
othermaciej commented 3 years ago

I don't think this feature is a good idea. Enabling websites to disable specific performance optimizations is likely to do more harm than good on net. even if there may be rare cases where there's a good reason to do so. I endorse @hsivonen 's explanation of the dynamic in https://github.com/privacycg/proposals/issues/19#issuecomment-690143825

At the extreme, if a site dynamically inserts <meta>, it can also dynamically insert all of its content after the <meta> is added. This may be bad for perf, but disabling speculative loading is also likely to be bad for perf.

Also note that there could be forms of speculative loading besides preload scanning, such as predictive speculative loading (or predictive speculative revalidation), that could occur before a DOM call, an attribute on the document element, or even an HTTP response header would be seen by the browser. So the proposed feature cannot meet its promise of totally preventing speculative loading without potentially preventing some speculative loading for pages that don't even use the feature.

eligrey commented 3 years ago

I've reduced the scope of this proposal down to the control of 'eager speculative requests', which I am defining as speculative requests that are sent out before preceding synchronous scripts finish executing.

This feature can no longer be used to completely disable speculative requests, and describes the minimum features needed to empower the dynamic <meta> CSP use-case without causing as much potential harm to performance.

This update is now reflected in https://github.com/eligrey/speculative-request-control and in this issue

hsivonen commented 3 years ago

The doc now says:

The motivating use case for this feature is to increase the ease at which sites could adopt a CSP based on locally-stored consent provided by a third party JS library. In this use case, we can assume that the library vendor and site owner have taken the time explicitly preload resources asynchronously where appropriate, as they must knowingly disable eager speculative requests.

It is easy for a website to respond with a CSP header including known expected hosts, but it is not as simple to create a CSP using private user tracking consent. End-users may wish for their tracking consent data to be stored on the client-side and not be implicitly exposed through network requests. It is possible to create a client-side JavaScript library (e.g. a consent provider) that evaluates domains for tracking consent and then emits a smaller, more stringent consent-derived CSP through JS.

Considering how pervasive third-party consent scripts are, I think it's completely unrealistic to assume that every site that includes them has "taken the time explicitly preload resources asynchronously where appropriate". Also, considering how pervasive such scripts are, this feature could have a negative performance impact on significantly many sites.

othermaciej commented 3 years ago

I still think the somewhat speculative motivating use case is not sufficient reason to introduce a major performance footgun. Does anyone actually do the thing described in the OP, or is CSP-as-consent-enforcer a hypothetical?

CSP is not really meant to be a privacy mechanism, it is designed for security. CSP only has the ability to block loads entirely, it cannot separately control whether loads receive cookies, whether frames can store state, etc. Thus, using it for privacy purposes would result in blocking "dual use" content that both shows visible content to the user (e.g. ads, hosted videos, social gadgets) and attempts to perform tracking. This is likely unacceptable for many sites. Further, consent regimes such as GDPR and CCPA often place restrictions not only on collection of new information by third parties, but also on first party information gathering, and on subsequent use of information even after consent is withdrawn (e.g. CCPA's "Do Not Sell My Personal Information"). It's not possible to achieve that if consent is kept purely client-side and enforced only through selective third-party load blocking

Overall, the proposed use case is a neat idea, but I don't think it is workable, or reasons cited above.

eligrey commented 3 years ago

In response to @othermaciej's https://github.com/privacycg/proposals/issues/19#issuecomment-691856809

Does anyone actually do the thing described in the OP, or is CSP-as-consent-enforcer a hypothetical?

I work on a proprietary multi-layered adaptive consent management platform, where dynamic CSPs are a completely-optional defense-in-depth security mechanism and not the core security boundary (which is the DOM itself). I aim to make it easier for site owners to comply with global privacy laws. In the event that our consent manager snippet fails to load, we can also suggest an onerror handler that instantiates a <meta> CSP based on consent data, but it will suffer from the same limitations as any dynamically-generated <meta> CSP.

CSP is not really meant to be a privacy mechanism, it is designed for security. […] Thus, using it for privacy purposes would result in blocking “dual use” content that both shows visible content to the user (e.g. ads, hosted videos, social gadgets) and attempts to perform tracking

Correct, CSPs do not serve as an ideal universal solution to the problem that is privacy protection/compliance. I would imagine that any third party JS library using this feature wouldn’t want to block consentual shared state tracking through an imprecise and heavy-handed CSP. This feature is just a privacy & security nice-to-have which can increase the security posture of sites using 3rd party consent management tools without any backend changes.

Further, consent regimes such as GDPR and CCPA often place restrictions not only on collection of new information by third parties, but also on first party information gathering, and on subsequent use of information even after consent is withdrawn (e.g. CCPA’s “Do Not Sell My Personal Information”). It’s not possible to achieve that if consent is kept purely client-side and enforced only through selective third-party load blocking

While this is true for most existing companies that spread your data everywhere without much concern, it is not necessarily true for sites that are either designed from the ground-up to preserve user privacy or have only ever shipped with ‘actually working’ (note: rare at this point) privacy protection JS enabled. In the case of user-identifying information residing on your servers or your partners' servers, then you would also need to accomplish explicit consent mediation between all relevant parties, as you already have to do today. This feature does not attempt to solve that issue.

eligrey commented 3 years ago

In response to @hsivonen's https://github.com/privacycg/proposals/issues/19#issuecomment-691844774

Considering how pervasive third-party consent scripts are, I think it's completely unrealistic to assume that every site that includes them has "taken the time explicitly preload resources asynchronously where appropriate". Also, considering how pervasive such scripts are, this feature could have a negative performance impact on significantly many sites.

Consent management providers will want to reduce the performance impact by pushing such a config out of the defaults. I expect that most consent management providers would only suggest this content modification to a site owner if the following criteria are met:

This feature serves as a low-effort way for site owners to defer <meta> CSP responsibility to client-side logic provided by a consent management platform, and site owners may be okay with a temporary perf decrease to help with privacy compliance needs. Later on, the site owner can refactor to no longer need this feature. In an ideal SPA following every best-practice, opting out of eager speculative requests should have no effect on performance.

jackfrankland commented 3 years ago

Deferring CSP responsibility to third party JS, computed dynamically, does not provide the site with the security that CSP is intended to give in my opinion.

hsivonen commented 3 years ago

site owners may be okay with a temporary perf decrease to help with privacy compliance needs

This assumes two things, neither of which is necessarily true:

  1. That the decrease is "temporary".
  2. That being OK with a perf decrease is up to site owners as opposed to users and browsers vendors.

Considering that users want performance, in terms of adoption, it's bad to suggest a feature that makes the first-mover browser appear slower. Also, it's suggestive of a bad placement of mechanism that a mechanism that is supposed to make the browser load fewer things could make the browser slower. (Generally, when the browser itself decides not to load stuff, things get faster.)

eligrey commented 3 years ago

I was referring to a temporary performance decrease in the sense that this decrease can eventually be mitigated or even fixed by sites being refactored to adopt best practices regarding async loading.

Poor performance is already penalized in search engine rankings. Key performance metrics like LCP, FID, and CLS are all negatively impacted by disabling this feature on sites that aren't properly optimized for async loading. An educated site owner with a properly-optimized site should be allowed to turn off eager speculative requests if they do not want the feature enabled.

hsivonen commented 3 years ago

I was referring to a temporary performance decrease in the sense that this decrease can eventually be mitigated or even fixed by sites being refactored to adopt best practices regarding async loading.

I understood that but I don't believe it would consistently remain as a temporary state.

Also, if the speculative fetches are turned off, it's really hard or impossible to recover the performance by other means. The closest way possible would be resources as rel=preload right after the script responsible for the client-side CSP, but even that would make the fetches start later than in the present case. Of course, if the site has the capability of listing its resources that way from the server, it probably wouldn't need the client-side CSP in the first place.

eligrey commented 3 years ago

The closest way possible would be resources as rel=preload right after the script responsible for the client-side CSP, but even that would make the fetches start later than in the present case

I'll be measuring the performance impact of this change on a site with preloads in https://github.com/eligrey/speculative-request-control/issues/1. I will probably have some results to share later next week.

yoavweiss commented 3 years ago

I wanted to chime in and say that I agree with @hsivonen and @othermaciej's comments here:

Overall, I think it might be better to further outline the use-case, and describe what kind of in-HTML tracking you're trying to defend against.

Maybe the solution here could be Service Worker based. Maybe you want the page's default request mode to be anonymous. Maybe there's some other solution to the tracking problem you're trying to tackle. Outlining the use case would enable us to try and see what that might be.

TanviHacks commented 2 years ago

Hi @eligrey! We see that this proposal hasn't been touched in a while - is there anything more to discuss here or should we close this out?

eligrey commented 10 months ago

@TanviHacks I discovered that there is a pre-existing speculative request control, so now I recommend this to customers that want our firewall library to regulate speculative requests.

With the following HTML snippet, we can use our library to regulate requests that would have otherwise been speculatively initialized.

<head>
<script></script>
<meta http-equiv="Content-Security-Policy"/>
</head>

The control I'm suggesting in this proposal could now be considered just a formality to standardize and give a developer-friendly interface for this mechanism.

eligrey commented 9 months ago

I'm considering rebooting this standard under https://github.com/w3c/webappsec as it's more related to security than privacy. I'll post an update here if I end up doing that.

zcorpan commented 9 months ago

@eligrey I would assume the CSP behavior in Chromium is to be considered a bug, which may be fixed in the future.

eligrey commented 9 months ago

I recall reading on Twitter that this behavior is intentional. I can look up the relevant source code later.