whatwg / fetch

Fetch Standard
https://fetch.spec.whatwg.org/
Other
2.11k stars 328 forks source link

Vary HTTP cache on credentials mode #307

Open annevk opened 8 years ago

annevk commented 8 years ago

See https://bugzilla.mozilla.org/show_bug.cgi?id=1274555 by @ziyunfei for details. Best to make this explicit.

annevk commented 8 years ago

@mnot any ideas on how to best do this? Basically say that aside from the request we also use some credentials boolean as a cache key?

mnot commented 8 years ago

Yeah. I'll try to rough something in (kick me if it takes too long).

annevk commented 8 years ago

Note that #340 is also outstanding. I'll leave both for a bit to find out your approach. If you lack the time I guess I'll study the HTTP cache specification in a bit more detail and have another go at the whole thing.

mnot commented 7 years ago

Thinking out loud:

I'd really like to fix this for all caches, not just browser caches -- although it might be that the first step is spec'ing it in Fetch, it'd be great if the approach taken can be backported to 7234.

I don't think using credentials flag or even credentials mode is adequate. For example, with either, if I browse using two different client certs or session cookies, there is going to be intermingling between them in the cache.

Instead, the actual credential has to be part of the cache key.

That means:

The sticky part here is cookies; if we start making them part of the cache key for all responses, cache efficiency is going to bomb, because lots of cookies aren't credentials.

Flagging content with credentials (e.g., with a new Set-Cookie parameter, or using the Key header) probably won't be sufficient to meet the security requirements here.

Doing the opposite -- flagging responses whose cookies aren't credentials -- might work. Cache-Control: public sort of works this way already for the Authentication header -- WDYT?

annevk commented 7 years ago

The main invariant we are trying to protect here is that if you load something with credentials that has Access-Control-Allow-Origin: * due to navigation or some such, that response does not end up reused by a fetch() for the same URL without credentials (because then the CORS check would suddenly pass while it should not).

If the user has multiple identities they are better off using something like https://blog.mozilla.org/tanvi/2016/06/16/contextual-identities-on-the-web/ I think, although offering servers more tools to more cleanly separate identities seems like a useful endeavor as well, but I think that's somewhat separate from this distinction here, which mainly follows from CORS.

mnot commented 7 years ago

preview

mnot commented 7 years ago

OK, but I don't see how this still isn't a concern -- a response that was fetched with credentials can be cached upstream (whether that cache is on the same box, a network proxy, a reverse proxy/CDN or an origin server cache) and reused for a request without credentials, without any knowledge by Fetch.

Developers often assume that the only cache is the browser cache because TLS, but there are still CDN and origin caches, to say nothing of MitM caches.

annevk commented 7 years ago

Those would certainly need to take this into account as well and I don't object to trying to move recommendations up the stack. In general we should offer better advice for servers or better, get server folks involved and let them write it themselves.

mnot commented 7 years ago

This is pretty embedded in the caching logic, I'd rather not reproduce that much of 7234 in Fetch. Will try to do it with a textual note; see my branch.

mnot commented 7 years ago

I've written #496 to use the credentials flag, and the corresponding test to change the credentials mode (but nothing else about the request). From that, it appears that Firefox implements, but Chrome and Safari TP do not.

I tried actually adding a cookie (using document.cookie); Chrome and Safari TP don't emit the cookie on the wire when credentials are supposed to be omitted, but they do reuse the stored response to a request that did include them. Gulp.

annevk commented 7 years ago

Yeah, I think that's a known security bug in Chrome and Safari. 😟

But we should probably be extra careful and file potential duplicates to make sure they know about it.

annevk commented 7 years ago

Thanks for testing that!

sleevi commented 7 years ago

@annevk That's a really strong claim in a public tracker, so I'd appreciate if you believe there is a security bug, that you will file it as such.

I'm concerned, however, with this approach - it appears that only Firefox implements this. Isn't this then something that we should be discussing whether or not it's desirable and/or whether Firefox should change? I'm not sure this argument from spec that Chrome/Safari should implement something that it seems that only Firefox has implemented, and for which Chrome/Safari have not (at least, to my knowledge) given public signals about?

annevk commented 7 years ago

@sleevi I'm pretty certain @jakearchibald already filed it and it was discussed in public previously. (In any event 90 days have long passed.)

annevk commented 7 years ago

Basically without this any site that uses Access-Control-Allow-Origin: * and doesn't use Vary for any credentials is vulnerable (assuming the browser in question supports Vary adequately for such scenarios to begin with). I don't think that's a responsible setup, especially as we require sites to go to great lengths to expose credentialed data through CORS precisely because such data is so sensitive.

sleevi commented 7 years ago

@annevk When you state "responsible setup" are you suggesting on the user agent or the server?

As @jakearchibald and @igrigorik know, there's a non-trivial amount of developer pain caused simply by the inability to reuse the same sockets due to the separation - not ideal, but perhaps a necessary evil relative to the risks to privacy or security. However, I'm having a difficult time understanding and articulating the risks here for this, especially compared to the developer pain it can/will no doubt induce, and that makes it difficult for me to find other folks on the Chrome team to engage on this issue, so I'm trying to make sure I understand.

I'm trying to understand if this is conceptually different in risk scenarios to a site having a lax CSP policy or XSSing itself. The knob to control things - A-C-A-O - is being turned up to 11, and that's on the site operator.

Personally, I'm fairly against the notion that we would need to differentiate on credentials, because I'm trying to see how/why using that logic we wouldn't end up with trying to vary the cache with every client certificate used - that is, if you use CC A, it can't use the cache for resources obtained with CC B. For some usages, that would 'effectively' disable meaningful caching if the client cert rotates on a frequent basis (e.g. 8 hours).

annevk commented 7 years ago

As I said above to @mnot, different client certificates don't matter, the site is protected in that case unless they opted into CORS with credentials, which is highly unlikely. The main problem is returning a cached resource that was the result of a credentialed request to a non-credentialed CORS request.

(There's also advocacy, including my own from before I knew about this issue in Chrome/WebKit and the standard still suggests it, that including ACAO: * is safe as long as it's not a private resource of the intranet kind. And personally I think that should be safe, since it's such a trivial thing to do and so many sites already do it.)

sleevi commented 7 years ago

@annevk I'm still struggling to understand why it's "unsafe", at least with respect to the fact that the cached entry has Access-Control-Allow-Origin: *

For the sake of my own stupidity, could you maybe illustrate a request/response pair (or set of) that you think would be affected, and highlight where you think the site operator doesn't already have a sufficient degree of control?

annevk commented 7 years ago

Would it help if I said that it should be safe to include Access-Control-Allow-Origin: * even if the resource's contents end up being different when passed credentials?

sleevi commented 7 years ago

@annevk Apologies for my own ignorance, but no, because I'm still struggling with working through the 'unsafe' aspect. I'm trying to help make sure this gets prioritized appropriately for Chrome - and that we either express positive or negative signals - but I'm struggling with understanding the 'negative' scenario.

That is, I'm imagining this scenario

http://example.com/ - without the credentials flag set, returns no ACAO header, sets no Vary header, and just says "Anonymous" http://example.com/ - with the credentials flag set, returns Access-Control-Allow-Origin: * header, sets no Vary header, and says "Authenticated"

Now, we have same-origin and cross-origin accesses to these resources, with and without credentials.

What I'm trying to understand is why the bug isn't with the omission of Vary: Cookie, WWW-Authenticate. I'm not sure your phrasing in https://github.com/whatwg/fetch/issues/307#issuecomment-285113449 is what I'd agree with, if only because isn't this what Vary is supposed to be for?

Your statement about TLS certificates leaves me similarly confused, because I would have understood if the argument is that because TLS is a transport-level authentication mechanism, you can't Vary it in the Response. Further, I could understand/appreciate if the argument was that the existence of connection-based authentication headers that abuse(d) HTTP/1.1 (namely, NTLM, Kerberos, and Negotiate) means that some requests would omit the HTTP-Authorization header, and thus fail to be able to be Vary'd on them.

To me, I read that similar to what @mnot suggested originally (if I understood https://github.com/whatwg/fetch/issues/307#issuecomment-282898783 correctly), which is that the existence of intermediate proxies means that this is already a problem - but only for servers that vary the content on the same resource identifier based on what fetch() calls "credentials" but which are actually a complex set of relationships - both part of HTTP and of the underlying transport.

annevk commented 7 years ago

Intermediate proxies can be a problem, yes, but are not in use for most simple sites, which will use increasingly use HTTPS (defeating other kinds of proxies).

I think the bug isn't the omission of Vary because 1) Vary isn't very reliable 2) it's extremely easy to add a header unconditionally and it's been somewhat widely broadcast that it's safe to do so 3) browsers typically offer defense-in-depth even when sites could have protected themselves better (case in point: intranets).

sleevi commented 7 years ago

but are not in use for most simple sites, which will use increasingly use HTTPS (defeating other kinds of proxies).

I'm not sure I agree with this claim, if only because it's made without data. For example, most simple sites shouldn't run into this problem (therefore, should we only focus on complex sites, which may use this?). But also, even many simple sites do increasingly use intermediate proxies, hosted by the CDN performing the TLS termination on their behalf, which then communicates directly with their resources. Think Akamai or Cloudflare - which seem to be pretty well used at this point :)

browsers typically offer defense-in-depth even when sites could have protected themselves better (case in point: intranets)

I'm not sure I agree with this either. Browsers try to ensure the flexibility to afford controls for the site operator to match intent and behaviour, but I don't think they can or do strive to prevent 'misuse' of those controls. I think the scenario you've described, at least as I tried to summarize in https://github.com/whatwg/fetch/issues/307#issuecomment-285116968 , sounds like a misuse of the controls rather than an absence of controls, but I'm trying to make sure I understand the issue fully.

mnot commented 6 years ago

Just a reminder: the HTTP caching document is being revised, so if we can come to a resolution here, it's possible to get some text in there as well (subject to consensus in the HTTP WG etc.).