Open mnot opened 9 years ago
Some discussion at TPAC. There are many use cases for range requests, initial approach might be to identify bare minimum and only support those. E.g.,
(0) and (1) are a straw-man target for now. Thoughts?
So from those:
Here are some things I'm wondering about:
Is something for the Cache API to decide.
I mean the HTTP cache, not the SW cache. Or do you mean it's effectively an implementation decision?
I mean the HTTP cache, not the SW cache.
Ah I see. Maybe we should talk about that too.
Serve range requests from a complete cache entry
Any hope that this will be implemented, in terms of being able to do it from a SW cache? It certainly would help with a problem I currently have with the Audio element and range headers
I need to do some research to see what browsers do today vs the spec & decide if the service worker spec needs changes, or browsers need to fix bugs.
In the mean time, check out https://samdutton.github.io/samples/service-worker/prefetch-video/ - it's a bit hacky, but it works. It constructs ranged responses on the fly.
Thanks for the link, the workaround worked for my audio problem too.
So, I did the research on range requests. It's a mess. https://docs.google.com/document/d/1SphP-WNxqzZrSv_6ApC9_FpM-m_tLzm57oL3SNGg-40/edit#heading=h.1k8r6xdc6vfo
I think fetch should say something about the minimum browsers are required to support here.
Also, unless I'm missing something, fetch should also create a network error if a 206 is returned when no range was requested.
@jakearchibald if you have time to work on this (or can find someone else) that would be great. I'm kinda swamped.
I'll talk to our media teams and see what their expectations are and get as far as I can.
FWIW, I've written some basic tests to see if the HTTP (!SW) cache that Fetch uses supports partial content (i.e., range requests).
AFAICT, the only browser that does anything is Safari TP, which will store a partial response (i.e., one whose request had a Range
header) and serve that response from cache if a request with the same Range
header is made.
It will not serve a subset of an already-cached response to a request containing Range
; nor will it complete an existing stored partial response by creating its own Range
headers when it goes forward.
Firefox and Chrome don't even handle the simple case (again, according to my test; I talked to @mcmanus about this this AM, and he was a bit surprised to hear that). Edge doesn't support the HTTP cache from Fetch, AFAICT.
I've been thinking about this again, here's a brain dump:
I don't think we do range requests for anything but media elements, though that might have changed recently for image elements now that I think of it?
There's downloads, which I guess are outside of spec land. We want to use them in background fetch.
Downloads are somewhat defined, as they interact with navigation and <a download>
and such. Also, https://github.com/whatwg/html/issues/954.
I pitched the following to our security team:
My assumption is that APIs that range requests are only used by media elements and downloads (which I'll need to verify). This means you can't interpret a portion of a resource as script/css/etc.
Additionally, APIs consuming ranged responses should ensure all parts of a range have the same first entry in the response url list for a given resource.
However, there's still a worry that this new capability carries significant risk, and that we should look for another way forward.
The alternative solution is to find a way to mark a request as "allowed privileged headers", and allow the Range header in that case. Modifying the request in any way would remove the "allowed privileged headers" flag, meaning you couldn't take an internally-created Range request & change the URL & make the request, but you could do fetch(fetchEvent.request) if it had a Range header.
This means new Request(request)
would copy request
s "allowed privileged headers" flag, although modifying request.headers
will unset it. new Request(request, init)
would return a request with "allowed privileged headers" unset.
In addition to this, we should still:
Additionally, APIs consuming ranged responses should ensure all parts of a range have the same first entry in the response url list for a given resource.
Just a note here on
Return a network error if any of the following is true:
- The response has status 206 and request does not have a range header.
We (Facebook) are actually using this currently on XHR as a feature to circumvent CORS preflight requests requirements for some cross origin requests. We add the range as query string parameters and have the cross origin server interpret them equivalent to range headers.
We are currently in transitioning some of these things to fetch and breaking this would be a big problem for us. Adding Range to the header safe-list however, would probably meet our requirements and in fact would likely be the superior solution anyway, since it will allow us to fully remain within HTTP semantics and allow for better caching at various levels.
@DanielBaulig thanks! If we shipped what I proposed it'd have broken XHR too, since XHR uses fetch.
Is the server sending a 206 status in this case? I guess you're using this in a super-safe way that doesn't allow the original content to be interpreted as script?
@jakearchibald I thought it did, but I just double checked and it actually doesn't return a 206, but a 200, so there shouldn't be any acute breakage.
We are only applying this to media content fetched through XHR/fetch and currently always have well aligned requests to the same byte ranges, meaning the byte ranges of two requests will always either be equal or excluding each other, there will never be partial overlaps, so we shouldn't have any cacheability regressions.
That said, the reason I stumbled on this issue thread in the first place is that we are looking into changing this and using less fixed and well aligned byte ranges that could end up (partially) overlapping. Since that would break caching (the query string parameters are included in the browsers cache keys, if they are not identical, there won't be a cache hit), we were looking into implementing caching for these requests in SW by breaking the byte range query parameters out of the cache key for these requests, which then lead me to this thread and also to my reply on this W3C SW thread regarding Cache API supporting Range requests and responses.
Since we are not actually returning 206 though, we should not be seeing any problems from this proposal being implemented. My bad.
Still, I think we should test what browsers actually do when the server returns a 206. If browsers just return it as-is I don't think we should change the behavior into returning a network error. It's not worth the risk.
@DanielBaulig would you mind to show an example of such artificial range parameter? How could I test the behavior you described?
Seconded, it'd be interesting to know which Facebook URLs support this.
All browsers seem to allow a 206 partial response for script elements. As in, it will execute the body of the response, as if it was a 200.
Chrome security were a little worried about this, and were keen on making this an error.
But yeah, we'd need to do a lot of testing.
@sirdarckcat We use bytestart, byteend query string parameters for video playback on Facebook to deal with CORS restrictions. If we added normal Range headers we would have to do preflight requests to resources served from our CDN origin.
I'm going to take another swing at this.
These cases should work:
Since the service worker can rewrite fetches, it opens up the following attacks:
A media element makes two requests:
Request: Resource A. No-cors. Cross-origin. Byte range 0-5000. Response: JS-constructed. Valid media container data, stopping before the bit that defines the duration of the media.
Request: Resource A. No-cors. Cross-origin. Byte range 200-5000.
Response: fetch(event.request)
. Response is opaque.
In this case, resource A isn't a valid media resource, but its 200th byte is now leaked as mediaElement.duration
.
Solution: The media element must not allow a mixture of opaque and non-opaque responses for a given piece of media.
A media element makes two requests:
Request: Resource A. No-cors. Cross-origin. Byte range 0-5000. Response: A fetch to resource B, which responds with opaque valid media container data, stopping before the bit that defines the duration of the media.
Request: Resource A. No-cors. Cross-origin. Byte range 200-5000.
Response: fetch(event.request)
. Response is opaque.
Again, the 200th byte is leaked.
Solution: If the media element receives opaque data, the last URL in each response's URL list must be identical.
Looking at https://github.com/whatwg/fetch/issues/145, it seems Chrome is fine as long as the responses are all the same origin. However, I'm worried about origin A redirecting/rewriting to lots of different places in origin B.
I need to spec where a media element goes for the second part of some media, if the first part results in a redirect. Browsers behave differently here.
A media element makes two requests:
Request: Resource A. No-cors. Cross-origin. Byte range 0-5000. Response: A previously cached response, part of resource A, byte range 8000-8200, which just happens to be opaque valid media container data, stopping before the bit that defines the duration of the media.
Request: Resource A. No-cors. Cross-origin. Byte range 200-5000.
Response: fetch(event.request)
. Response is opaque.
Again, the 200th byte is leaked.
Solution: In the first fetch, the start of the byte range returned does not match the start of the byte range requested. This should be rejected.
We could reject this in the fetch spec, but I don't think we should block it for manual fetch()
calls.
This should be blocked by the media element, but we could have a helper in the fetch spec to make this easier.
A script element makes a request:
Request: Resource A. No-cors. Cross-origin.
Response: An opaque partial response of resource A that just happens to contain gender = 'female'
, which is private user data.
In this case resource A is an html resource like:
<p>Foo</p>
<script>const gender = 'female';</script>
<p>Bar</p>
…and the browser has been previously tricked (perhaps using a media element) into making a request for the range that contains gender = 'female'
.
Given that the script element accepts partial responses, this is a tricky one. It seems that we want to continue to support the case where the server has, unprompted by the range header, returned a partial response. The difference in this case is the server was promted.
Solution: The response needs to know if its associated request had a Range header. Fetch should reject if the original request did not have a range header, but the service worker provides a response that is opaque, partial, and was requested with a range header.
@annevk I'm interested in your thoughts on the solution for attack 4.
(I chatted about this with @annevk on IRC & he's happy with the attack 4 solution)
I'm a little concerned with
If the media element consumes opaque data, the first URL in each response's URL list must be identical.
from Attack 2. In particular if that's what we want to do in the face of redirects. If anything we probably want to compare the last URL. (Safer would be the whole list, but unfortunately no-cors cross-origin to same-origin is already non-opaque.)
My intent is to ensure that all the requests have gone to the same place, as in, the service worker hasn't tried to combine multiple sources.
If the server wants to redirect each request, that seems weird but fine. Although I'll test what browsers actually do in this case.
@annevk ahh yes, I get it now. I've updated the solution to Attack 2 to be last url in the list.
@jakearchibald Would it be possible to copy-and-paste your security analysis in https://github.com/whatwg/fetch/issues/144#issuecomment-368040980 to a more official-sounding URL? It would be nice if it was linked from the Fetch standard to make it discoverable.
It would be nice if it said something about the threat model, but that might be infeasible if it came down to "the threat model of the whole web platform".
@ricea I was going to leave notes in the spec next to the solutions. Does that work?
The solution for Attack 4 will be in the fetch spec, but the others will be in the HTML spec.
Is this what you meant? If not, I'm happy to put the text somewhere else, I'm just not sure where.
@jakearchibald SGTM. My concern is just to make sure it is discoverable and gets the widest possible review.
See new PR: https://github.com/whatwg/html/pull/7655. Needs some more work and lots of WPT but ready for initial feedback.
https://github.com/whatwg/html/pull/7655 was merged but this is still open. Should this be closed?
I think that did solve what browsers do with media, but overall range support is still bad so we should probably keep this open to keep us honest.
Discussed in #97 - breaking out into a separate issue.