whatwg / fetch

Fetch Standard
https://fetch.spec.whatwg.org/
Other
2.11k stars 330 forks source link

Redirected range requests and preflights. #145

Open mikewest opened 9 years ago

mikewest commented 9 years ago

Chrome has some funky behavior around HTMLMediaElement + redirected range requests.

https://codereview.chromium.org/1220963004 denied responses to range requests if their origin is distinct from the origin response for the initial request.

https://codereview.chromium.org/1356353003 relaxes that restriction to accept responses to range requests if they're CORS-same-origin with the origin response from the initial request. It also treats "range" as a simple header for the purposes of preflights if the request is CORS enabled (e.g. <video crossorigin ...>).

It would be nice to spec this out in a sane way. :)

mikewest commented 9 years ago

+@tyoshino

mcmanus commented 9 years ago

cc @mnot - I'm a bit confused on the context - is this saying that 2 different uris with 206 responses should be stitched together just because they both had the same original uri before redirection? (and if they pass cors). That seems odd - they're different resources.

tyoshino commented 9 years ago

This scheme is already in use widely by CDNs. Chrome's HTMLMediaElement is stitching fragments served for different URLs together (see the opening comment of https://code.google.com/p/chromium/issues/detail?id=532569 by strobe). Chrome's resource loader in general doesn't.

Given the situation, it seems we could document requirements for such an approach to make sure it's secure. It doesn't necessarily require all "fetching" on the web platform to do the stitching.

mnot commented 9 years ago

Doing it generically would indeed be very broken.

To do this for a specific application (e.g., HTMLMediaElement), you need a really explicit assertion that not only are the two resources equivalent, but also that the two specific representations are exactly the same -- e.g., ETag sharing. Even then, this is not something happening in HTTP -- it has to be built on top.

See: http://httpwg.github.io/specs/rfc7233.html#combining.byte.ranges http://httpwg.github.io/specs/rfc7234.html#combining.responses

annevk commented 9 years ago

Are we doing this @rocallahan?

rocallahan commented 9 years ago

When our media resource loader takes over an HTTP load, it uses the final post-all-redirects URI as its canonical URI for the resource. All subsequent range requests start with that URI; if further redirects occur, they are honoured. The principal(s) associated with the media data are gathered from all final-URIs. If these are different origins that's generally OK: we'll still play the media, though (since at least one of those origins must not be same-origin with the page) certain APIs will be affected (e.g. after drawing a video frame to a canvas, the canvas will be tainted).

I'm not familiar with the CDN setup described in https://code.google.com/p/chromium/issues/detail?id=532569, but I assume the CDN has a canonical URI which redirects quasi-randomly to one of many mirror URIs, and the mirror URIs never do any more redirects. If so, then by using the final URI from the first load for every subsequent range request we're avoiding any issues.

annevk commented 9 years ago

Okay, so it sounds like the HTML standard would need to do this for media elements. @foolip, have you looked into doing this? It would perhaps also require some overrides then to make sure Fetch does not do anything bad upstream.

foolip commented 9 years ago

I haven't given this any thought in the spec, no. What I do know is that media elements integrate with the network layer in a rather unique way, that seems to be true of all implementations, and certainly was in Presto.

The problem of knowing that the resource is the same when requesting a second range isn't unique to redirects, even when the same server responds you in principle need some sanity checks. I doubt that these are interoperable today, and I doubt even more that doing the strict checks that would actually make sense (ETag) would really be web compatible.

tyoshino commented 9 years ago

Just to make sure, the proposal by @rocallahan is that once the UA receives any body bytes back from the server, it stops following further redirects?

tyoshino commented 9 years ago

Seems the model doesn't work for some CDNs. See this post by strobe@ from YouTube https://code.google.com/p/chromium/issues/detail?id=532569#c33

rocallahan commented 9 years ago

Just to make sure, the proposal by @rocallahan is that once the UA receives any body bytes back from the server, it stops following further redirects?

Sorry, I thought I was pretty clear and I'm not sure how to make it clearer:

All subsequent range requests start with that URI; if further redirects occur, they are honoured.

...

Seems the model doesn't work for some CDNs. See this post by strobe@ from YouTube https://code.google.com/p/chromium/issues/detail?id=532569#c33

That seems to be based on a misunderstanding of what I said.

tyoshino commented 9 years ago

I wanted to make sure I'm understanding what you said in the second paragraph correctly. It was my mistake that I referred to the paragraph by "proposal".

Thanks for replying to the crbug thread.

jakearchibald commented 6 years ago

https://jewel-chair.glitch.me/same-origin.html

Chrome: Observes the redirect. Subsequent requests go to /audio-normal. Firefox: Observes the redirect. Subsequent requests go to /audio-redirect-second-part.

https://jewel-chair.glitch.me/same-origin-immediate-redirect.html

Chrome: Observes the redirect. Subsequent requests go to /audio-normal. Firefox: Observes the redirect. Subsequent requests go to /audio-normal.

I'm looking to spec the correct behaviour here, and I'd like to do the same for other range requests like downloads.

Initially, the Firefox behaviour seems inconsistent. But, if a browser were to request multiple ranges in parallel, Chrome's behaviour could be racey.

I'm not familiar with the CDN pattern @tyoshino mentioned. Are there any further details? Do these CDNs tend to redirect for the initial range, or do they perform multiple redirects for different parts of the media resource?

annevk commented 3 years ago

Range is already allowed to be set by media elements due to https://fetch.spec.whatwg.org/#unsafe-request-flag. Not necessarily great as it allows poking holes in the same-origin policy (see also #568), but that is how it is.

annevk commented 3 years ago

@horo-t @mikewest it seems Chrome has the strictest handling of media element range requests thanks to your efforts:

Given that rather weird behavior it seems we might be able to outlaw redirects for subsequent requests completely. This would also help https://github.com/annevk/orb, though it does not matter much. Is there a reason they are allowed? And if not, are you interested in simplifying that logic?

cc @padenot @anforowicz

mikewest commented 3 years ago

If we can get away with dropping redirects entirely, I'd be happy too. @jakearchibald might have more context on how we landed on the current behavior?

dalecurtis commented 3 years ago

IIRC, the subsequent redirects are sometimes used to reauthenticate the resource. I.e., you watch a video for some time and then walk away for a couple hours, upon clicking play again the provider may need to reauthenticate your session (for content license requirements) which may redirect through some validation before going back to the final redirected URL.