CORS, CORP, TAO & "public static resource" metadata

noamr commented 2 years ago

Preliminary reading: Notes on the threat model of cross-origin isolation, https://github.com/w3c/resource-timing/issues/220, https://github.com/w3c/resource-timing/issues/240

Public static resources are resources that can be fetched by any user, and don't contain any user-specific data or metadata. Images served by CDNs are a good example of such resources.

Today, to enable access to these resources by an embedder, the provider needs to:

Enable CORS for full reads, requiring the embedders to also use CORS.
and/or enable CORP for embedding it in pages that use SharedArrayBuffer et al
In addition, Enable TAO for reading timing

This raises several issues:

The issue with CORS, is that it requires sprinkling crossorigin attributes all across the document, and apart from this ergonomic issue it is currently not supported for CSS images.
CORP was originally meant as a feature for protecting against Spectre, and not as a carte blanche for exposing metadata about a resource. One of the reasonings for CORP in the first place was that the attack surface was limited (a Spectre attack is not trivial). Also, it's unclear where "embedability" and legibility stops and data access starts. CORP has amorphic boundaries when taken outside the Spectre-protection area.
TAO also has somewhat amorphic boundaries. e.g. when we add a new attribute to resource timing, how do we know that resources that opted in to TAO before are OK with exposing also this new attribute?

This is a complex situation, difficult to understand, and creates conflicts and doubts every time we propose a new feature that requires some cross-origin opt-in. For public static resources, this complexity seems unwarranted. As the web platform evolves, this might create a situation of "HTTP header per feature", where public static resources require a patchwork of HTTP headers to opt-out of privacy protection for resources that are not private in the first place.

I see two approaches to this problem (though there are probably more).

Allow public static resources to declare themselves as such Basically extend CORP/ACAO or introduce a similar header to say: "I'm public. Always treat me as if I'm CORS." This is very ergonomic, and I believe would make a lot of sense to web developers. However, it would become somewhat of a back door for replacing CORS, and there's a valid risk of misconfiguration.
Make "anonymous CORS by default" a viable & convenient option If we want more sites to use CORS for everything, we need to make it possible and ergonomic. For example, a meta tag that declares that the default CORS mode for the document is CORS (strawman: <meta name=crossorigin content=anonymous >) and potentially a way to override it in CSS.

Whatever direction we choose with this, I think reaching a consensus on this would be a very valuable outcome.

noamr commented 2 years ago

/cc @annevk @mikewest @yoavweiss @domenic @eeeps Not sure what Artur Janc's Github handle is :)

domenic commented 2 years ago

I wonder about a version of (1) where we give Access-Control-Allow-Origin: * special treatment. Basically, it feels like it shouldn't be necessary to sprinkle crossorigin="" attributes everywhere, for simple cases.

Imagine something like:

If a request is sent with implicit mode "no-cors", but:
- Its method is CORS-safelisted
- It only contains CORS-safelisted request headers
- It contains no credentials
And the response comes back such that
- Its contains Access-Control-Allow-Origin: *
- It contains no credentials
- Maybe also require it to have X-Content-Type-Options: nosniff?
then treat the response as "cors", at least for some purposes:
- Allow gathering timing info/metadata from it, at least
- Maybe just expose it fully? E.g. to service workers

The main delta I can see here is that there's no Origin header on the request. So basically we're saying, "if your server responds with Access-Control-Allow-Origin: * without even consulting the Origin header, then your stuff is now treated as public".

I apologize if this is a silly idea that has been discussed in the past. I suspect it has, perhaps back when CORS was originally being invented. In that case I'd be happy to accept a pointer those discussions and then we can continue discussing something more realistic like a document-wide CORS opt-in.

noamr commented 2 years ago

then treat the response as "cors", at least for some purposes:

Allow gathering timing info/metadata from it, at least

Maybe just expose it fully? E.g. to service workers

My tendency is to support that when something is public it's exposed fully, otherwise we'll keep having to rule whether something is data or metadata, and that blurry distinction can lead to leaks .

yoavweiss commented 2 years ago

I really like @domenic's proposal, and think that would be the best outcome. If there are reasons against it, I'd appreciate pointers as well.

/cc @arturjanc @nicjansma

mikewest commented 2 years ago

We've talked about this kind of thing in the past, and got hung up on complexities around reissuing requests without credentials if responses asserted ACAO: * (see https://github.com/w3ctag/design-reviews/issues/76).

Carving out the subset of requests that don't have any credentials might be a reasonable approach (though it's not enough to ensure that no cookies are sent, given client certs and etc.) I worry a bit that the story for developers would be complicated, and their sites would begin catastrophically failing if they accidentally sent a Set-Cookie header along with a response from an origin that was supposed to be cookieless. That seems like it would be quite difficult to guarantee (especially in development environments) and would lead to really confusing experiences, where the site would fail iff you'd been unlucky enough to visit the one misconfigured endpoint.

We've already started defining credentialess modes (https://developer.chrome.com/blog/coep-credentialless-origin-trial/). Perhaps that would be a reasonable thing to extend beyond its current applicability to CORPless responses? (/cc @ArthurSonzogni )

(Relatedly: I've had a bad idea floating around for a while that's relevant here: it would be ideal if we could somehow assert that a given server serves truly public resources, and won't ever have cookies and asserts CORS + TAO + CORP for all its resources. Origin policy was my initial shot at making such a thing possible, but maybe a smaller version would be to come up with a meaningful subdomain prefix (similar to __Host- cookies) that would apply that understanding to everything it served. Like we could make a priori claims that storage was simply inaccessible from https://this-is-truly-public-i-swear.site.example/, and all the usual response headers would be implied. I think there are other things we could do with meaningful names as well, but this might be an interesting use-case to start with.)

yoavweiss commented 2 years ago

I worry a bit that the story for developers would be complicated, and their sites would begin catastrophically failing if they accidentally sent a Set-Cookie header along with a response from an origin that was supposed to be cookieless.

CORP would still be able to override the "no Set-Cookie" restriction, right? Or are you concerned with sites that don't currently opt-into CORP, but would pass COEP restrictions with these new set of rules (unless a Set-Cookie sneaks in)?

We've already started defining credentialess modes (https://developer.chrome.com/blog/coep-credentialless-origin-trial/).

Credentialless requests should definitely also be included in the set of rules we're discussing, and I can see a version of credentialless mode being the opt in discussed in (2). At the same time, (1) seems better from an adoption perspective, as it requires no extra work from developers.

mikewest commented 2 years ago

"catastrophicly failing" was hyperbolic, I grant you. :) My suggestion was simply that "anonymous CORS by default" would only work in those cases where no cookie had been set, and that setting a cookie would change requests' behavior in a way that I think would be confusing for developers (and difficult to recover from).

yoavweiss commented 2 years ago

I agree it would be confusing, but at the same time that may be offset by the adoption benefits and the fact that by default, no work would be needed on developers' behalf.

/cc @philipwalton for opinions on the above tradeoff

arturjanc commented 2 years ago

@mikewest I'm wondering if you may be trying to solve a more general problem than what is necessary for this particular use case. Specifically, given that -- at least in the context of TAO -- the benefits we're discussing apply only to loading cross-origin resources (most of the TAO-gated values are exposed same-origin by default), and that cross-site requests already don't carry cookies by default (and despite things like CHIPS and FPS it's likely that there will be fewer and fewer credentialed cross-site requests), the footgun potential might be fairly small?

In particular, if ACAO: * only exposed metadata (and not the response body) then it doesn't seem catastrophic to me if suddenly sending a credentialed request would remove this metadata. I'd expect this to be rare and recoverable for the application, assuming timing data is generally used in aggregate and resilient to a few missing data points. Of course if we expand this to expose response bodies then it's more likely that developers will start to rely on this and the breakage potential will increase.

Coming back to @domenic's proposal, I think one specific concern that came up in the past is that we need to consider that a request could be cached in response to a credentialed request and then returned from the cache when an uncredentialed request is made (see the note about Vary: cookie in @jakearchibald's How to win at CORS). This would leak information for resources that set ACAO: *: an attacker could request them in no-cors mode to put the authenticated response in cache and then load it from cache in cors mode and read the metadata. This seems fixable, but it will require some thought.

But, overall, at a high level, COEP: credentialless and anonymous iframes are both largely based on the idea that it's safe-ish to reveal information about responses to uncredentialed requests. It seems fairly reasonable to me to similarly expose timing metadata for uncredentialed requests, especially if we're still gating this behind an explicit opt-in such as ACAO: *.

noamr commented 2 years ago

I like that there's lots of traffic on this conversation but I'd love to hear thoughts from outside Google... @annevk? Someone from Mozilla?

yoavweiss commented 2 years ago

This would leak information for resources that set ACAO: *: an attacker could request them in no-cors mode to put the authenticated response in cache and then load it from cache in cors mode and read the metadata. This seems fixable, but it will require some thought.

Thanks for bringing up this scenario! I vageuly remember @yutakahirano talking about making the cache CORS-mode aware. I think that would fix this particular issue.

smaug---- commented 2 years ago

Possible tweak to domenic's proposal - I think I'd prefer something very explicitly, a long the lines Access-Control: public-resource and not change the behavior of existing stuff. But I don't feel too strongly.

yutakahirano commented 2 years ago

Our implementation has separate buckets for credentialized and non-credentialized requests, does this address your concern? (@ArthurSonzogni made this.)

Regarding the original proposal,

It contains no credentials

This includes both directions (e.g., Cookie and Set-Cookie), right? When redirect happens, shouldn't any redirect hop include credentials to enable this?

pshaughn commented 2 years ago

Access-Control: public-resource

I agree with @smaug----'s idea here. A brand-new header or a brand-new value for an old header would make sure it's an explicit opt-in. A new use for a value on an old header might cause problems for someone who was being sloppy about headers in a way that was harmless before now.

noamr commented 2 years ago

Our implementation has separate buckets for credentialized and non-credentialized requests, does this address your concern? (@ArthurSonzogni made this.)

Regarding the original proposal,

It contains no credentials

This includes both directions (e.g., Cookie and Set-Cookie), right?

Right

When redirect happens, shouldn't any redirect hop include credentials to enable this?

I don't think that's necessary. If "public static" behaves like "ACAO: + TAO:", then redirects should behave the same way they do today wrt to these headers, and if the redirect is also "public static" then it also implies having no credentials. In other words, "public static" means ACAO+TAO+no-credentials for this response, whether it's the final response or a redirect. But maybe I'm skipping some detail.

annevk commented 1 year ago

(Some assorted thoughts and additional context.)

The main reason CORS requires opt-in is credentials (cookies, HTTP authentication, and TLS client certificates). Those are included by default without CORS, and with CORS they are an option the requestor can set. (And yeah, an unfortunate problem here is caches. Some folks would support at least locally segmenting the cache, whereas others do not deem that a great solution due to the existence of intermediary caches.)

And the main reason CORS has different requirements for * (i.e., it only works for non-credentialed requests) is that we were quite worried about cargo culting. At the time many websites had made themselves vulnerable through Adobe Flash's crossdomain.xml. (CORS is more limited in that it applies to a single resource, but it's quite easy to set a header for multiple resources at once.)

I think the requirements around redirects will remain valid. The redirector might not be aware that the eventual resource is "public" and could therefore inadvertently reveal information.

CORS enables read access. CORP enables read access by a Spectre attacker in current generation CPUs. TAO enables timing information.

I've previously suggested TAO could be a subset of CORS, but some people had concerns about that. Given that CORS itself also has more detailed opt-in for various header metadata (-Expose-Headers comes to mind) maybe that is fair and they should not be coupled. It depends a bit on the information involved I suppose.

Having read this thread it's not entirely clear to me what we want here.

Is the ask something CORS-like without requester opt-in?
Is the ask something CORP/TAO-like but broader?

For the first 1 something along the lines of what @smaug---- suggested seems reasonable, provided redirects also opt-in. This has the danger of cargo culting as the server would no longer be required to echo the Origin header value. Making it very easy to deploy (incorrectly). (This is mitigated somewhat by what @arturjanc mentioned above, that cross-site credentials are on the way out. But we're talking cross-origin here and those won't go away anytime soon I think.)

noamr commented 1 year ago

Thanks for the explanation, @annevk, I understand some of the context better.

Is the ask something CORS-like without requester opt-in?

Is the ask something CORP/TAO-like but broader?

I think CORS+TAO without requester opt-in would be one solution. The problem with the opt-in is that it requires a lot of markup, some of which is not currently available (e.g. CSS images). The alternative to providing such header would be to make it possible and easy to use CORS for everything by default including the CSS bits, making no-cors the opt-out.

For the first 1 something along the lines of what @smaug---- suggested seems reasonable, provided redirects also opt-in.

Yes, IMO the entire chain of redirects would have to opt in to being public.

This has the danger of cargo culting as the server would no longer be required to echo the Origin header value. Making it very easy to deploy (incorrectly). (This is mitigated somewhat by what @arturjanc mentioned above, that cross-site credentials are on the way out. But we're talking cross-origin here and those won't go away anytime soon I think.)

Perhaps preventing cargo culting here is a matter of naming? Access-Control: public is pretty hard to misinterpret. Also, if a resource is indeed a public resource, what's the harm in sending those cookies? They would be sent anyway if this was a regular no-cors request. We might even want to disallow certain response headers to help with this (e.g. Set-Cookie, Vary: Cookie) and limit this to actual public and perhaps even immutable resources.

domenic commented 1 year ago

We discussed this at TPAC in the WebAppSec session, including a bit during the break. Here's my attempt to summarize where I think we landed:

There was a strong desire that the solution work (a) without requester opt-in; (b) on more than just cookieless destination origins.

The most promising path forward in that regard is a new response header. This header is a promise that the resource is public and non-customized. It means that even though the request was sent with mode = "no-cors" / credentials mode = "include", the server is OK with exposing its data as if it were mode = "cors" / credentials mode = "omit". I.e., any origin can fetch() the data, use it in a subresource request, etc.

This approach has a footgun, which is that the server might be wrong. If the server applies this header to a response, but it actually does vary the response based on credentials, then the server has created a security hole. CORS attempts to prevent this footgun by requiring reflecting the Origin header into Access-Control-Allow-Credentials, if the credentials mode is "include". We would be effectively removing this protection, with our new header. During the break it seemed like people were maybe comfortable with this.

The effects of this response header would be:

Exposes the response body in full. (As if the response type were "cors".)
Exposes all timing metadata currently exposed by TAO. This would include exposing the metadata of the response (e.g. transfer size) on the main resource of an iframe load.
Allows the resource to pass the CORP check (as if CORP: cross-origin).

This header mainly affects the behavior of "no-cors" subresource fetches, and somewhat affects the behavior of "navigate" iframe fetches in terms of the metadata exposure. It barely effects "cors" requests (e.g. fetch()): for those, it can just serve as another spelling for Access-Control-Allow-Origin: *.

Things which were not discussed:

Does this expose all response headers, as if Access-Control-Expose-Headers: *? (This would affect fetch() as well.)
Does this allow fetch(request, { mode: "no-cors" }) to read back the body, or is it only "no-cors" subresource requests?
Exact spelling of the new header.
Any extra footgun mitigations, such as those @noamr's last message mentions.

jeremyroman commented 1 year ago

Another TBD from my perspective: whether this should also imply Cache-Control: public (which is essentially the inverse of Vary: Authorization) because that's also implied by the claim "this is a truly public resource, the same for everyone", or whether it's an assertion about security properties only.

yoavweiss commented 1 year ago

We continued the discussion over at webappsec and over a breakout session

Summary:

There's significant concerns around the footgun nature of this. It's easy to misconfigure a server and apply a header over a larger range of resources than intended.
Mitigations can include ignoring this header when "mixed signals" are present (e.g. Cache-Control: Private) or only taking it into account when Cache-Control: public is present.
We want the name to convey that this is potentially unsafe and bypasses security mechanisms.
We should restrict this to secure contexts.
In order to avoid accidental broad application, we could require the header relfects the requested path (in a similar way to credentialed CORS having to reflect the origin), to ensure the application is intentional.

annevk commented 1 year ago

I would appreciate someone briefly describing some scenarios this would enable. Because typically the requestor needs to know about CORS upfront (e.g., when wanting to read from <canvas>).

And also what kind of changes to Fetch are envisioned as part of this.

noamr commented 1 year ago

I would appreciate someone briefly describing some scenarios this would enable. Because typically the requestor needs to know about CORS upfront (e.g., when wanting to read from <canvas>).

And also what kind of changes to Fetch are envisioned as part of this.

I will describe that when back from leave, alongside a draft patch to Fetch.

noamr commented 1 year ago

I would appreciate someone briefly describing some scenarios this would enable. Because typically the requestor needs to know about CORS upfront (e.g., when wanting to read from <canvas>).

A good example of a scenario this would enable at first is exposing meta-data that is traditionally CORS-protected without the need for CORS-negotiation - the server can opt out of this protection. Two examples:

decodedBodySize once that becomes CORS-protected rather than TAO-protected (see [PR]()https://github.com/whatwg/fetch/pull/1556).
image-orientation CSS property (@annevk remember the super-long discussions we had about this ~3 years ago)

And also what kind of changes to Fetch are envisioned as part of this.

I wanted to reach a consensus on the overall direction first, but my thoughts are that if the header is valid:

Response tainting will be cors instead of opaque for public resources that were fetched with no-cors (here
TAO check will return success
CORP check will return allowed

noamr commented 1 year ago

Alternate-ish proposal:

To mitigate concerns about people over-allowing using this, another approach came to my mind instead of a new response header - a new request header that we send with no-cors requests. Let's call it nonce-origin. The idea is that nonce-origin would be a random generated string/hash thingy, that the browser would send with no-cors requests. The server would have to respond with Access-Control-Allow-Origin: [the-same-nonce-origin]. No support for '*'.

This means that instead of the server saying "This resource is public", it says "I'm serving this resource to this particular requester, even though I don't know who it is" (which is essentially the same thing).

Perhaps we can still add this to a new response header, as a way to include Timing-Allow-Origin and CORP under the same umbrella.

annevk commented 1 year ago

That sounds tricky/bad for caching. Not the kind of semantics you would want for something truly public.

noamr commented 1 year ago

I thought about this further. I think the main and perhaps only use case for this is images. That's because:

It's easier to apply CORS to fonts/scripts/style resources in the embedding document
CORS doesn't apply to iframes/objects anyway
All new features use CORS by default, no-cors is legacy

So the only place where "CORSing all the things" is not feasible is images - mainly because of CSS ergonomics and the fact that images are both widely used and a very old feature.

So instead of requiring developers to jump through hoops, I suggest we instead limit the scope (at least in the beginning) to images only.

The header could look like:

Access-Control: unsafe-public;image (where image has to match the request's destination, but only supports images at first). This would also mitigate some of the other concerns:

Access-Control-Allow-Headers is irrelevant
We use the destination instead of the path, so applying it widely to the whole site is less of an issue
Fenced frames and everything to do with iframes is irrelevant

We can also still include the following mitigations:

Can't use Vary: Cookie
Can't use Cache-Control: no-store

@arturjanc @camillelamy @yoavweiss @annevk ?

noamr commented 1 year ago

Strawman proposal for how this would work: https://github.com/whatwg/fetch/pull/1617

camillelamy commented 1 year ago

Sorry, I haven't looked at that in a while. Just so that I follow, this is a response header right?

noamr commented 1 year ago

Sorry, I haven't looked at that in a while. Just so that I follow, this is a response header right?

Right.

annevk commented 1 year ago

If we're so close to "just CORS" I'd rather figure out if we can get there somehow. Rather than introducing response-driven CORS just for images. E.g., by forcing CORS fetches to happen for a document or style sheet or some such.

noamr commented 1 year ago

If we're so close to "just CORS" I'd rather figure out if we can get there somehow. Rather than introducing response-driven CORS just for images. E.g., by forcing CORS fetches to happen for a document or style sheet or some such.

There is no viable "just CORS" proposal on the table ATM. If there was I'd be happy to discuss it... But I think a server-side solution that starts with images and perhaps expanded to video etc. is the most realistic.

annevk commented 1 year ago

Well we have "require-corp". It wouldn't be out of the question to add a CORS-equivalent.

noamr commented 1 year ago

Well we have "require-corp". It wouldn't be out of the question to add a CORS-equivalent.

This assumes that all the resources in the page would have to use CORS, and needs special semantics for anonymous/use-credentials.

I think to do this properly we would need something like this (bikeshedding), but I'm not sure if it will get adopted:

<script type="fetchrules">
  [{
    "origins": ["*.somecdn.com"],
    "destinations": ["image", "video"],
    "crossorigin": "anonymous"
  }]
</script>

noamr commented 1 year ago

@annevk how about something equivalent to referrer policies? <meta name="crossorigin" content="anonymous;origin=https:/www.exampe.com">

annevk commented 1 year ago

I'd rather we avoid adding more mutable policies, especially for fetching.

noamr commented 1 year ago

I'd rather we avoid adding more mutable policies, especially for fetching.

Understood. So I don't see how "CORSing all the things" would work. I'm afraid that all encompassing requires-cors would not get adopted, it needs to be more granular than that and not require embedders to add a new header.

IMO it needs to be either a simple thing in the HTML, or something server-side at the embedded resource side (like proposed in https://github.com/whatwg/html/issues/8143#issuecomment-1457636957

noamr commented 1 year ago

After an internal conversation with @arturjanc and @yoavweiss, I would like to push forward a proposal that looks something like https://github.com/whatwg/html/issues/8143#issuecomment-1457636957.

Strawman header: Cross-Origin-Protections: Treat-As-Same-Origin;destination=image or Access-Control: unsafe-public;destination=image

When a resource has this header, and the destination matches the request destination and a list of allowed destinations (currently only including 'image'), Fetch would treat the response as if it was fetched from the same-origin, for the purposes of CORP, CORS & TAO.

annevk commented 1 year ago

Given that we recently introduced https://mimesniff.spec.whatwg.org/#minimize-a-supported-mime-type for Resource Timing to minimize passive information channels I'm even less convinced this is a good idea. At least forcing the requestor to set some kind of policy turns it into an active information channel.

noamr commented 1 year ago

Given that we recently introduced https://mimesniff.spec.whatwg.org/#minimize-a-supported-mime-type for Resource Timing to minimize passive information channels I'm even less convinced this is a good idea. At least forcing the requestor to set some kind of policy turns it into an active information channel.

I appreciate the general concern around 3p origins passing each other information via a passive info channel. This should be a consideration when we introduce new channels, and we should still be careful when, for example, adding new attributes to resource timing. When introducing these channels, vendors can usually further restrict them for same-origin like WebKit does for server timing and encodedBodySize.

In this case though, we're not introducing new channels. If a 3pA to 3pB side channel existed for CORS-same-origin images, it would be exposed today with using <img crossorigin=anonymous>.

Can you elaborate on what kind of passive channel this adds?

annevk commented 1 year ago

When crossorigin is used I would no longer classify it as passive. (Both sides have to do something.) Which is why I argue for something in that general direction.

yoavweiss commented 1 year ago

Given that we recently introduced https://mimesniff.spec.whatwg.org/#minimize-a-supported-mime-type for Resource Timing to minimize passive information channels I'm even less convinced this is a good idea

I don't think we should use the fact we introduced that as an indication that there's consensus on the (undocumented) threat model that this is protecting against. I was supportive of that change to ResourceTiming as it didn't harm the actual use case, but that doesn't mean that I'd be similarly supportive in cases where there is a conflict between that threat model and useful functionality.

annevk commented 1 year ago

I'm not claiming there is consensus though? Let's document it here as it's quite simple (I thought we did this before, but maybe not):

Given a website A, embedding a cross-site image B and a cross-site (to both) script C, B shouldn't be able to broadcast all kinds of information for C to consume without the explicit cooperation from A.

yoavweiss commented 1 year ago

It's probably best to spin off this conversation to a separate issue, but it's not 100% clear to me how a crossorigin attribute on an image can be considered "explicit cooperation from A", when C can very easily inject an image tag with that attribute.

whatwg / html

CORS, CORP, TAO & "public static resource" metadata #8143