whatwg / fetch

Fetch Standard
https://fetch.spec.whatwg.org/
Other
2.11k stars 331 forks source link

Cross-Origin Read Blocking (CORB) #681

Closed anforowicz closed 2 years ago

anforowicz commented 6 years ago

Historically, browsers had rather lax Content-Type checking. We’ve been able to introduce stricter checks in some cases (e.g. blocking mislabeled scripts and stylesheets in presence of the nosniff header [1]) and unfortunately failed in some other cases (e.g. Firefox’s attempt to block mislabeled images in presence of the nosniff header [2, 3]).

Given Spectre, lax handling of mislabeled cross-origin responses carries new, significant security risks. We've developed a proposal, which we're calling Cross-Origin Read Blocking (CORB), which increases the strictness of cross-origin fetching semantics while trying to still stay web-compatible. CORB reduces the risk of leaking sensitive data by keeping it further from cross-origin web pages. In most browsers, it keeps such data out of untrusted script execution contexts. In browsers with Site Isolation, it can keep such data out of untrusted renderer processes entirely, helping even against speculative side channel attacks.

We're looking to collaborate with everyone on an interoperable set of changes to the web platform, so that blocking of cross-origin responses can be done consistently across all the browsers. Please take a look at the proposal and its compatibility impact in the CORB explainer and provide feedback in this thread on the algorithm itself, as well as on the next steps for trying to encode CORB into the relevant specs for web standards.

We believe that CORB has a reasonably low risk of breaking existing websites (see the “CORB and web compatibility” section in the explainer). We’ve spent a considerable amount of time trying to tweak CORB to minimize compatibility risk (e.g. introducing confirmation sniffing and skipping sniffing for HTML comments since JS can have them too) and are continuing to consider additional tweaks to minimize the risk further (e.g. we are trying to gather data that might inform how to handle text/plain and range requests). The remaining risk is mostly for nosniff responses labeled with a wrong MIME type - as pointed out above, stricter handling of such responses has always been desirable, but the Spectre threat makes this more urgent.

[1] https://fetch.spec.whatwg.org/#should-response-to-request-be-blocked-due-to-nosniff? [2] https://github.com/whatwg/fetch/issues/395 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1302539

annevk commented 6 years ago

cc @cdumez @youennf @travisleithead @evilpie @ckerschb @whatwg/security (please tell me or whatwg/meta if you want to be added to this team; it's basically for notification purposes of issues that need security input)

wanderview commented 6 years ago

When service workers cache actual cross-origin responses (e.g. in ‘no-cors’ request mode), the responses are ‘opaque’ and therefore CORB can block such responses without changing the service worker's behavior (‘opaque’ responses have a non-accessible body even without CORB).

Opaque responses have a hidden body, yes, but the Cache entry still contains the filtered body. It must still be there for the service worker to use the Cache to service no-cors requests like , etc.

It seems CORB would require that Cache.match() not send the opaque body data to the renderer process until after the CORB checking is performed. The CORB checking would depend on what the service worker does with the opaque response. (If I understand correctly the CORB check requires knowing the destination of the response within the browser.)

FWIW, this does part does seem doable to me from an implementation perspective. Gecko's cache waits to open the body file descriptor until body consumption begins. So as long as we can perform the CORB check in the renderer process as part of the respondWith() call, then it seems possible to achieve this.

annevk commented 6 years ago

Having read through https://chromium.googlesource.com/chromium/src/+/master/services/network/cross_origin_read_blocking_explainer.md in more detail I wonder why it doesn't call out fetch() a bit more prominently. fetch() supports "no-cors" and therefore can get at cross-origin resources without CORS. Those resources are opaque to most callers, but it seems that special care needs to be taken to not leak them to the wrong process until explicitly called for, including when they get persisted to disk...

csreis commented 6 years ago

Maybe @anforowicz or Nick Carter can chime in more, but in terms of persisting no-cors to disk, Chrome's implementation is still able to write the response to disk without giving it to the renderer process that made the request. (I think we're using DetachableResourceHandler for that, FWIW.) That may be worth mentioning in the explainer, since I think it matters for preload and ServiceWorkers as well. Were there other cases you were concerned about?

annevk commented 6 years ago

@csreis storing the response from a fetch() in the Cache API can be done outside service workers too, but yeah, that's roughly what we want to have in the specification around "opaque filtered responses" I think. To make it very clear these need to remain out-of-process for as long as possible.

(I'm not entirely sure where we should put the canonical description of the class of attacks, standards-wise. Either here or in HTML I suppose.)

jakearchibald commented 6 years ago

https://github.com/whatwg/fetch/issues/144#issuecomment-368040980 - see "Attack 4".

It looks like CORB will handle this attack for particular mime types, but I think it still makes sense to apply the extra blocking I proposed, since it'll cover all mime types.

Let me know if that's wrong.

evilpie commented 6 years ago

I am a bit concerned about the "Mislabeled image (nosniff)" case. Do you have any data on how common text/html is for images, with nosniff? At least for JavaScript this number was quite high, and even higher in the HTTP Archive report. This number however doesn't take into account no-sniff. Do you have that data or maybe should we ask the HTTP Archive people again?

anforowicz commented 6 years ago

@evilpie, we have some data in the "Quantifying CORB impact on existing websites" section of the explainer. After excluding responses that had an explicit "Content-Length: 0" response header, we see that 0.115% of all CORB-eligible responses might have been observably blocked due to a nosniff header or range request.

The real question here is: how many of these 0.115% contained images (and were undesirably disrupted by CORB) VS non-images (and were non-decodable with and without CORB). At this point we only have anecdotal data - we were able to only repro one such case in the wild and it turned out to be a tracking pixel that returned a html doc as a response.

jakearchibald commented 6 years ago

I'm currently looking to enable range requests to pass through a service worker safely, and later I'll specify how various web APIs should make range requests and validate responses.

Although CORB is involved in the same area, the goals are different, but we should be aware of overlap 😄.

Here's a summary of the similarities and differences, as I understand them:

CORB's goal is to prevent bringing data into the content process, whereas I'm aiming to prevent exposing data to script. CORB is best-effort, with compatibility in mind, whereas I need to strictly avoid exposing opaque data to script.

CORB will filter opaque partial responses if they match particular content types. This prevents an audio/video element being used to bring data that's potentially sensitive into the content process.

https://github.com/whatwg/fetch/pull/560 prevents Attack 4, where a <script> is given a partial response that may contain private data. CORB will make this a lot harder for particular content types, but https://github.com/whatwg/fetch/pull/560 prevents this particular attack for all content types.

CORB recommends against multipart range requests. Currently range requests aren't specced from that API's point of view, but I'm trying to define it. I don't plan to use multiple ranges in a single response, and once specced, browsers shouldn't make kinds of range requests that aren't explicitly allowed.

I intend to make media elements reject responses that would result in a mix of opaque and visible data being treated as the same media resource. This prevents Attack 1.

I intend to make media elements reject responses that would result in opaque data from multiple URLs being treated as the same media resource. This prevents Attack 2.

I intend to make range supporting APIs fail if the partial response starts at an offset other than the requested range. This prevents Attack 3.

In intend to make downloads fail/restart if content identifying headers change between requests. Such as total length in Content-Range, Content-Type, ETag, Last-Modified.

jakearchibald commented 6 years ago

Why does CORB blocking filter the response? Wouldn't it be more robust to replace the response with a generic empty response?

Although they're less sensitive, CORS safelisted headers and status codes also leak data.

jakearchibald commented 6 years ago

I don't think the cache API is part of this. If responses are filtered/blocked as part of fetch, then only correctly filtered/blocked responses will go into the cache.

wanderview commented 6 years ago

I don't think the cache API is part of this. If responses are filtered/blocked as part of fetch, then only correctly filtered/blocked responses will go into the cache.

Doesn't this depend on how cache.add() is implemented internally?

Edit: Oh, you mean at the spec level. Nevermind.

csreis commented 6 years ago

Regarding range requests:

Thanks for covering the overlap here, @jakearchibald! I agree that CORB overlaps with the attack 4 defense, but only for certain content types, so your original plans still seem relevant.

CORB's goal is to prevent bringing data into the content process, whereas I'm aiming to prevent exposing data to script. CORB is best-effort, with compatibility in mind, whereas I need to strictly avoid exposing opaque data to script.

Correct.

CORB will filter opaque partial responses if they match particular content types. This prevents an audio/video element being used to bring data that's potentially sensitive into the content process.

Correct.

560 prevents Attack 4, where a Githubissues.
  • Githubissues is a development platform for aggregating issues.