Closed torgo closed 2 weeks ago
If you could explain the issue beforehand that would help determine whether it's worth discussing.
Explanation
The web architecture is that everything is identified by a URI. To look up the thing, you just need the URI and then "follow your nose" (ie the specs). But now in the fetch() spec nowadays the code has to also be provided with a withCredentials flag. If the flag is not set, credentials are not used and presumably protected resources cannot be accessed. If the flag is set, then the CORS spec requires that access is denied even when the server from which the thing is being fetched gives a wildcard access-control-allow-origin header. The assumption is that the developer who wrote the say XHR request knows which value to set. But in general most code is a series of libraries which have to be transparent to many things, like whether the user will use a password or a cert or cookie, and whether the server will use compression etc etc. The library calling XHR or fetch() can't guess what value to set. The web lookup function just takes the URI. Libraries which call XHR in tern have APIs which just are passed a URI.
The HTML spec has been modified to add extra metadata to indicate a cross-origin fetch, and XHR has xhr.withCredentials. Every other place where URIs are used and stored etc will have to had the withCredentials flag added. Basically the web breaks as a function taking the URI.
The protocol should be redesigned so the flag can be calculated by fetch() with no caller input. For example, if the CORS preview HEAD reveals . being used, then after that user credential could be stripped on the following fetch.
(Alternative designs include inventing two new protocol schemes, httpc: and httpsc: so as to carry the bit ... but are not optimal or serious).
If people feel that CORS control is important then making it work for developers is important, as otherwise the tendency is to leave a proxy open a a script's origin server to relay anything any everything without any caution or CORS, which obviously defeats the whole Same Origin Model.
How is this different from content negotiation, where you need to know what headers to supply to get a particular representation? And failing to supply headers might result in an error?
In any event, "follow your nose" hasn't really come up as a problem with CORS thus far. And redesigning CORS at this point does not seem feasible. We could do something additive, but that doesn't really solve the problem for the existing content, so it's unclear to me what can be done here, realistically.
... as otherwise the tendency is to leave a proxy open a a script's origin server to relay anything any everything without any caution or CORS, which obviously defeats the whole Same Origin Model.
That does not defeat the same-origin policy actually. A proxy server is perfectly acceptable. I recommend reading https://annevankesteren.nl/2015/02/same-origin-policy if you want to know what the same-origin policy protects against. The only problem with a proxy server is that it creates an additional request. That is why we have CORS.
This was discussed a bit at the end of the afternoon at our face-to-face meeting in Melbourne (Wednesday 2016-01-13).
Found an interesting article (see section 5) that has a nice table showing where CORS works and doesn't work. I found this helpful (the article also puts forth a strong opinion on CORS which I don't completely agree with). The article doesn't describe the issue with redirects causing CORS failures, however.
That article equates JSONP (executing third-party code) with CORS (fetching third-party data). I have no words.
It's still not clear to me what the TAG is trying to do here.
I have put this on the agenda for this week's call. I hope we can bottom this out. I will note that @sicking posted a great Q&A on our mailing list in answer to some of the questions raised in this thread and on our mailing list. I'm sympathetic to Brian's comments regarding existing content.
… only adding to the confusion and to the impression that this technology isn't explained very well. @annevk maybe you can Jonas can get together and discuss and come back with a correction?
Note that the problem is worse than the withCredentials flag. There's currently three modes that you can make a cross-origin network request in:
Most HTML features use the last. For example <img>
, <script>
and <iframe>
. Those APIs ask the server to load the data, but promises not to expose (most of) the returned data to the loader.
XHR only uses the first two modes, whereas <img>
and <script>
currently supports all three modes.
<iframe>
actually uses a special mode called "navigate". There are some subtle differences with "no-cors".
Having thought about this some more, so here's some further thoughts.
I agree with the TAG that the way things currently work is that most APIs which is used for loading a resource simply take a URL. This is true in both JS libraries and in the built-in web platform APIs.
However, in practice what happens after that is that these APIs then add a default set of headers, and use a default verb, when requesting the resource. The exact headers and verb varies with the API (for better or worse). So when the request actually hits the wire, it's not just the URL which is sent, but significant amounts of other parameters as well.
Some of these APIs, especially APIs used specifically for loading a URL and returning the raw response unprocessed, additionally accept arguments which allows overriding the default headers and the default verb. So for example XHR and fetch() does this, but <img>
, <script>
and background: url(X)
(in CSS) does not.
(An interesting nit here is that XHR doesn't actually have a default verb. It requires that a verb is always explicitly defined. But I don't think this has been particularly popular with authors, as demonstrated by the number of XHR-wrapping libraries which do use GET as default verb).
This all seems pretty similar to the CORS-mode. I.e. all APIs have a default behavior with regards to when credentials are included and when they are not, and whether CORS is enforced or not. Some of the APIs have parameters which allow overriding the default mode, but some do not.
I agree that it's unfortunate that there are now three types of parameters: verb, headers and CORS-mode. Prior to CORS there were just two.
I also agree that maybe we've used the wrong default CORS-mode in some APIs. So far we've aimed for a combination of backwards compatibility and safety. Which is why <img>
defaults to "no-CORS" and XHR defaults to CORS-without-credentials.
Over the years I have personally gotten more requests for making more APIs support overriding the default verb/headers/CORS-mode. For example requests to allow setting headers on <img>
, or using POST for <iframe>
. The number of these requests that I have personally gotten outnumber the number of requests I've gotten for removing the .withCredentials parameter from XHR.
But this is based on personal experience and might differ from other people's experiences.
In any case, I agree that CORS increases the ways that you can configure a URL load, but it doesn't seem to me that it fundamentally changes how it's done.
There's certainly cost to the increase in configurability. There's always a cost to having choice. But there's benefit too, in the form of increased security.
Maybe we are disagreeing about if the benefit outweighs the cost.
You make the point that it is useful to give more power to a JS app to operate at a lower level, and do explicit HTTP requests with the method and headers all added. Yes, this is useful. It would be useful also to understand and select which certs, passwords, or cookies were used.
But that is not the function which we are talking about when we talk about just taking one URI param. That function is a high-level function to look up a URI, doing the right thing at ll points. It is very common. It only does get, It follows redirects. It provides authentication where it needs to, asking the user without the dev having to be involved. Web app devs rely on the browser doing the right thing. This function no longer exists. It is needed not only be developer of apps, but also of frameworks which lookup URIs on the web as part of their work. The signature of the function is get(uri) not get(uri,credentialFlag). This is currently broken.
If this problem is from an "optimization", well sorry, we'll have to optimize another way.
timbl
I'm not trying to add difficulty to this conversation but I'm genuinely still having trouble understanding...
Specifically, I'm having difficulty discerning what @timbl is looking for or commenting on here as I expressed in https://lists.w3.org/Archives/Public/www-tag/2016Jan/0014.html and after talking with various people I dont think that I am alone in that.
In that post I asked some questions to try to clarify and I didn't really get any reply from Tim so despite having further talks with other TAG members I don't feel like I am closer to understanding.
So maybe it's helpful to more than just me if @timbl can tell me where I go off the rails...
1) With regard to fetch it is intentionally low level aimed at explaining existing stuff upon which we can build new higher level stuff.
2) Many many deployed websites rely on XHR's exposure of withCredentials/CORs. Even if everyone agreed it was fundamentally bad we can't just switch it off -- so in order to explain that, fetch has to expose it too.
3) Many existing high level things already in the system have enough information to make the right decisions with only a url and do ok things. Once explained, I think several people have expressed that we could do more and that is a separate and bigger concern from fetch exposing with credentials.
@timbl - Did you disagree with any of these? If so, at which point?
@timbl said: "It would be useful also to understand and select which certs, passwords, or cookies were used."
Although I feel like I'm far from an expert here, I think this is an interesting comment that's close to the crux of the issue here. (Although I think today we're talking about passwords and cookies, but not certs.)
My understanding is that the flag (and the "credentials mode" of the request) is related to saying whether the user of the fetch API would like the browser to add to the request the passwords and cookies that the browser knows about. Doing this imposes additional security requirements: the site opting in to allowing the response to be read cross-origin needs to explicitly say that this is OK with credentials. This means that it's possible for a request to not "do the right thing" with the credentials flag set, because the site didn't send a CORS response saying that sharing data, based on credentials, across origins was allowed. It also means that it's possible for a request to not "do the right thing" with the credentials flag unset, because the passwords or cookies stored in the browser were needed for it to do the right thing.
But as long as we're looking at an API that's high-level enough that it involves the browser adding on cookies and passwords that it knows about, and given that (I think) the original version of CORS didn't have the semantics that it was OK to share credentials-based responses across origins, I don't see a way around this.
I hope I'm not too far off base here -- please correct me if I'm wrong (on the important stuff, at least).
I think what's closest to what @timbl is looking for is mode "cors" and credentials mode "omit". Never include any kind of client state and just let the server answer or not. The moment you want to include ambient authority it gets more complicated and the function can no longer be seen as just a URL.
On 2016-02 -11, at 19:31, Brian Kardell notifications@github.com wrote:
I'm not trying to add difficulty to this conversation but I'm genuinely still having trouble understanding...
Specifically, I'm having difficulty discerning what @timbl https://github.com/timbl is looking for or commenting on here as I expressed in https://lists.w3.org/Archives/Public/www-tag/2016Jan/0014.html https://lists.w3.org/Archives/Public/www-tag/2016Jan/0014.html and after talking with various people I dont think that I am alone in that.
In that post I asked some questions to try to clarify and I didn't really get any reply from Tim so despite having further talks with other TAG members I don't feel like I am closer to understanding.
So maybe it's helpful to more than just me if @timbl https://github.com/timbl can tell me where I go off the rails...
1) With regard to fetch it is intentionally low level aimed at explaining existing stuff upon which we can build new higher level stuff.
2) Many many deployed web rely on XHR's exposure of withCredentials/CORs. Even if everyone agreed it was fundamentally bad we can't just switch it off -- so in order to explain that, fetch has to expose it too.
3) Many existing high level things already in the system have enough information to make the right decisions with only a url and do ok things.
Many isn’t enough.
Yes, many apps can just leave it on or off, as they only work with public or only work wit authenticate stuff.
I am writing libraries, not just apps.
Libraries use
content = get(uri).
Throughout. At many places, in many ways, at different levels.
Libraries call libraries. Library code has no idea about what application it is working in. Secure, public, distributed, centralized, localhost, file:// ftp:// http:// …. it just has to work transparently.
That is the job of fetch()
To add an extra flag is impossible. It breaks so many things.
If you still don’t understand and accept this, we should probably have a face-face meeting.
You want a “credentials” flag? It has to be in the URI or nowhere. You could add a new part to the URI, as we did with HTTPS.
You can redesign the way it works inside, and adapt the wire protocols it uses, but you can’t change its signature. It is the architecture of the web, and the fact that one could design all sorts of cool apsp with an archietcture based on
content = get(uri, credsFlag)
but that isn’t the web.
It is systems where bookmarks and caches and RDF systems have pairs (uri, credsFlag) where currently they
No. Maybe you and I need to have a face to face conversation about this.
Once explained, I think several people have expressed that we could do more and that is a separate and bigger concern from fetch exposing with credentials.
@timbl https://github.com/timbl - Did you disagree with any of these? If so, at which point?
— Reply to this email directly or view it on GitHub https://github.com/w3ctag/spec-reviews/issues/76#issuecomment-183125622.
@timbl I don't know if 'you and I' refers to TAG or me or both, but I'm certainly willing to participate in that discussion however. You should already have my contact info if me, or if not I believe Amy has it.
You make the point that it is useful to give more power to a JS app to operate at a lower level, and do explicit HTTP requests with the method and headers all added. Yes, this is useful. It would be useful also to understand and select which certs, passwords, or cookies were used.
Agreed. I think it'd be very useful to allow fetch() to specify things like which certs should be used. It might even be useful to allow fetch() to specify which client-side cert should be used, as well as which server-side certs should be trusted.
But that is not the function which we are talking about when we talk about just taking one URI param. That function is a high-level function to look up a URI, doing the right thing at ll points. It is very common. It only does get, It follows redirects. It provides authentication where it needs to, asking the user without the dev having to be involved.
fetch(url)
almost does this. But as has been pointed out, it doesn't provide authentication for cross-site requests.
However fetch(url, { credentials: "include" })
does exactly what you describe above.
So it sounds like you are fine with CORS to have different security modes. And that you are fine with the fetch
and XMLHttpRequest
APIs providing the ability to configure parameters such as HTTP method, headers and CORS security mode.
What you don't like is the default value that the fetch
and XMLHttpRequest
APIs use for the CORS security mode?
Is that correct?
Web app devs rely on the browser doing the right thing. This function no longer exists.
I hope I made it clear above that the functionality you are requesting exists. It is just not the default behavior of the fetch
and XMLHttpRequest
APIs. But it certainly exists.
It is needed not only be developer of apps, but also of frameworks which lookup URIs on the web as part of their work. The signature of the function is get(uri) not get(uri,credentialFlag). This is currently broken.
You don't need to make the signature get(uri, credentialsFlag)
any more than you need to make the API get(uri, headers, method)
. All you need to do is to pass the correct parameters to the fetch
or XMHttpRequest
APIs if you are calling those.
Having read those minutes I'm somewhat reluctant to discuss this further I must say. They are very dismissive of the work I've been doing over the past decade. The suggestions over what can be changed also seem fairly naive and ignorant of, for lack of better words, "browser reality". That is to say, we cannot really change how most of this works.
Hi @annevk – don't read too much into those minutes, please. We all acknowledge the great work you've done on this and there was no call to change anything about how things currently work. In fact, there is a high amount of respect in the TAG for what you're doing and we appreciate that you're doing it. It's unfortunate that this wasn't captured in the minutes. As you know, they're not a verbatim transcription of what was said during the meeting. We plan to continue these discussions and we hope you'll join us for a future session.
Picked up at Boston F2F. @ylafon to put forward a proposal to Anne, Mike et al.
FYI: https://github.com/whatwg/fetch/issues/517 doesn't make much sense to me. It's not even clear what problem you're trying to solve.
We discussed this in London in July; the minutes are currently at https://pad.w3ctag.org/p/2017-07-27-minutes.md
What TimBL drew on the whiteboard in London is:
I will seek @mikewest's feedback.
And I will seek @torgo's explanation of that whiteboard sketch. :)
I don't think the whiteboard was actually very useful, but I figured I'd save it for the record. Hopefully the minutes are more useful than that?
Hi @mikewest it's been too long since we've talked and frankly we miss you. Can we get you on the phone (to discuss this issue) for some time over the next 2 days? We're in your tz.
I'm missed! Hooray!
I'm driving back to Munich from Zurich tomorrow, so my schedule's a little iffy. Thursday is wide open, though. Perhaps we could include @annevk as well, who's also coincidentally in my time zone?
(pasting in my sketch--it was helpful to me anyway...)
@dbaron will write the summary/details from today's meeting, but for my own understanding:
Public "de-auth"
"Public auth"
There's something about how public de-auth can avoid the second round trip, but I didn't quite follow that part :-)
The former can be done by a library (except maybe for the not documented variant that avoids the roundtrip). Is there a library that's so popular it warrants the added complexity to the platform?
The latter is something that was already considered and rejected, because it's too easy of a footgun.
Worth noting the minutes from Nice, 2017-09-28 record the discussion of this mentioned above.
I've attempted to write a summary of our discussion at https://dbaron.github.io/with-credentials/. The summary could probably use some more work, though.
(Since github pages doesn't appear to be picking it up, I guess it's also readable at https://github.com/dbaron/with-credentials/blob/master/index.md .)
WRT the question "why are there sites where ambient authority is required to access the data, but the data are actually public and the site is comfortable with them being accessed in mashups hosted on other domains?" --
This is common practice when the site wants to understand who is using its service, and/or wants to be able to contact them (e.g., when they're creating too much traffic).
Then the next question: are sites that do that OK with someone (Sid, in my example) being able to access the data through somebody else's (Alice's) credentials by getting them (Alice) to visit a site (written by Sid) that steals the data?
Good Q; not sure. I suspect answers would vary.
Discussed on telcon - 29 November 2017. We will come back next call.
@dbaron, in the mashup use-case, it is more letting access to individualised content rather than stealing private data.
@ylafon is the individualized content something that's OK being public, or not? It seems risky to encourage such things, given there are likely cases where the individualized content is privacy-sensitive.
@dbaron well, then people use the "echo back Origin" trick to resolve the "*" issue, they allow sharing, so the possibility is already there.
@dbaron re: https://dbaron.github.io/with-credentials/ See This stack overflow example
This shows that figuring out the kind of error is indeed an issue for devs, also that the solution of doing multiple requests in different modes is presented as a potential solution. But this solution adds latency and is not really satisfying. People tend to use the solution to echo back the origin (your 2/ in use-cases) but IIRC there was an issue with some browser on handling Vary in cache, and basically always echoing back origin is as dangerous as the proposed 'public-deauth'.
The lack of libraries that do retry is a symptom that people are using the more dangerous option of echoing back the origin instead of relying on a library that would do retries for every network error.
Of course opening up the error to include more details on why things failed would allow a library to do the right thing (like in plain http, when credential can be added after a 401 response for a subsequent request), but it would give more information that is needed to a potential malicious script.
IIRC there was an issue with some browser on handling Vary in cache
Browser bugs are not a reason to introduce a new feature (and new bugs).
The lack of libraries that do retry is a symptom that people are using the more dangerous option of echoing back the origin instead of relying on a library that would do retries for every network error.
Given that there's plenty of servers using *
that seems false. (Note also that just echoing the origin is not insecure. It's only insecure if you also set the ACAC header.)
IIRC there was an issue with some browser on handling Vary in cache
Browser bugs are not a reason to introduce a new feature (and new bugs).
Of course, this was just a side-comment on the use of Vary, not related to this issue.
Given that there's plenty of servers using * that seems false. (Note also that just echoing the origin is not insecure. It's only insecure if you also set the ACAC header.)
It is a sign that *
works in the vast majority of the cases, which is a good thing.
It's also a sign that libraries that fallback to trying something else when *
is not used are not in as much demand as claimed.
The issue is that Libraries can't retry for all network errors, as that would be really bad performance-wise (and in general).
As Tim stated, the main issue is that unlike in HTTP, this is not possible to "follow your nose", unlike HTTP for cases where ambient authority is needed (401) or client accept list (406). (Note that there is no "don't use credentials" in HTTP, like if you want cookie dropped). This leads to a situation where you need to know more about the URL than just... the URL, unless every URL uses the footgun option of always echoing the Origin, or use a server that will do proxying and rewrite CORS headers to avoid this issue. (and that is also why getting numbers based on the existence of libraries that could not even fix the issue properly is not a good metric)
The proposal to use another value than *
for ACAO is just one way to minimise the impact on the existing behaviour, but there are other ways to address the issue:
network error
when credentials are sent and ACAO is *
. After all, if *
is used, it means that it should be safe to know about this URL. Returning another error code would allow Libraries to retry if needed. Just that change would be helpful and not have a huge impact on the overall security.*
is sent to do what was proposed in *public-deauth*
, but that can have an impact if some code rely on the error sent back.*public-deauth*
as proposed(Note that there is no "don't use credentials" in HTTP, like if you want cookie dropped)
Isn't that entirely left up to the client implementation?
HTTP also allows for a design where you only return a 200 if a secret is given in a request header or even a GET request body (which you cannot emulate in browsers). All of these would fall flat with "follow your nose".
Discussed at f2f in Boston: https://pad.w3ctag.org/p/09-15-2015-minutes Suggestion we can have a discussion about this with @annevk at TPAC.