Closed igrigorik closed 8 years ago
(See also #48. If we actually want <img>
to use fetch()
under the hood, we'd need to be able to set context
too.)
@annevk I think there is an important difference between #48 and what we're trying to solve here. Specifically, for #48 we're talking about internal plumbing that allows us to explain <img>
and friends via fetch.. as such, there is some internal API there that allows the context
value to be set.
By contrast, what we're talking about here is a developer-facing API, which can't allow the context
to be overridden, because that would effectively allow anyone to bypass connect-src. Further, for cases like prefetch and prerender, the actual policy is not knowable on the current page.. to know it we have to wait to navigate to the destination and only then can we reason about what policies should be applied against the response.
Well, <img>
can already be explained in terms of Fetch (#concept-fetch, to be precise). Nothing extra is needed for that. #48 is about a developer-facing API. And bypassing connect-src
is okay, as long as you can guarantee the response is bound to the context, which we can (at least theoretically).
(Agreed that for prefetch
and prerender
this might be different and that's why we should probably have both. Though I wish we could get @mikewest to spend some time on these use cases and evaluate them in light of CSP and SW. We still haven't really understood it all deeply enough I think.)
Got a chance to chat with @mikewest... Mike, please correct me if I'm misrepresenting anything.
For preload/prefetch/prerender in particular:
<link>
).With above in mind... Rough sketch of what it could look like with type
:
<link rel="prefetch" href="/next/page.html" type="text/html">
<script>
fetch("/next/page.html", {type: "tex/html"}).then(...)
</script>
In "Fetching":
Accept
, run these substeps:
image
-> image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5
xhtml | html | xml
-> text/html,application/xhtml+xml,application/xml;q=0.9,*
css
-> text/css,*/*;q=0.1
.*
-> */*
Thoughts?
The reasoning does not consider the influence of service workers. E.g. you can prefetch something and end up with a synthetic response in the "prefetch cache". That matters.
Yes.. but I'm not sure what you're trying to point out here?
You first suggest using type as MIME type but then the processing steps use simple strings such as "image", "xhtml", etc.
Poor wording on my part. In my head I was thinking of the matching algorithm as: supplied_mime_type.match(expression)
, where expression is, for example, "image".
It seems simple strings would be more friendly than MIME types still. Making them diverge from context might be okay, though e.g. for "next page" I'm not sure why you'd want "text/html" over "hyperlink". Having the next page change in MIME type over time (e.g. from HTML to an image) should not really impact the desire to prefetch it, I think.
Not sure I follow. If the prefetch target changes from HTML to image, then we definitely want to indicate that via type, such that the UA can set the right request headers.. which are different for HTML and images.
Yes.. but I'm not sure what you're trying to point out here?
Well, is the processing model for that defined?
Is that matching algorithm defined somewhere?
Well, is the processing model for that defined?
Ah, as in, the processing model for matching pre{fetch,load,render} responses with requests? Yes, this is an open issue: http://w3c.github.io/preload/#matching-responses-with-requests. That said, I don't think this is a blocking issue for this discussion.
Is that matching algorithm defined somewhere?
I was thinking "regexp", but I guess that's unnecessary. A simpler version:
image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5
text/html,application/xhtml+xml,application/xml;q=0.9,*
text/css,*/*;q=0.1
*/*
The reason I think it's important is because it has implications for security. You claim there's nothing to worry about but that analysis does not include service workers.
Substring seems like a very strange processing model. We should not treat MIME types that way.
The reason I think it's important is because it has implications for security. You claim there's nothing to worry about but that analysis does not include service workers.
Fair enough. My understanding is that the plan is to handle this case within CSP with a check at the beginning of the fetch, and a check before the response is consumed. That said, I'll defer to @mikewest on this one.
Substring seems like a very strange processing model. We should not treat MIME types that way.
I guess we can rewrite the above in terms of "type" and "subtype". E.g. if the "type token" of the provided mime-type is "image" --> "image/png,image/svg+xml,image/;q=0.8,/*;q=0.5", and so on. Would that be a better approach?
Yeah, that would be more acceptable.
Another run at it:
type
attribute whose value is a valid MIME type, let type-token
and subtype-token
be the parsed values of the valid MIME type, and let the return value be the first matching statement:
type-token
is equal to "image":image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5
type-token
is equal to "application", and subtype-token
is equal to "xhtml+xml" or "xml". Or, if type-token
is equal to "text", and subtype-token
is equal to "html":text/html,application/xhtml+xml,application/xml;q=0.9,*
type-token
is equal to "text", and subtype-token
is equal to "css":text/css,*/*;q=0.1
*/*
Are we defining the Accept
values of the separate contexts? Or am I just misreading this?
We already have.
@annevk does my last attempt look reasonable? Should I open a pull request?
I'm not sure it makes sense to overload the type
attribute for this.
Referer
header for <iframe src='foo.html'>
and <a href='foo.html'>
. type=text/html
in both cases. (IIRC, you can already configure Firefox in a way where that matters.) Overloading type
as the solution for this issue would likely mean that we'd have to be extremely conservative with the Referer
header in pre* requests--i.e. by having pre* requests always send the minimum value for the Referer
. Would people be OK with that?I think the problems with the "destination context" idea was overstated. If you want to differentiate pre* vs. non-pre* fetches then you can do that by adding an internal state flag that fetch()
looks at. Similarly, you could have an internal state that indicates whether a fetch was initiated from fetch()
, or from an <img>
tag, or from a fetch()
in a service worker processing an <img>
fetch. Although that would be more complicated than what you're proposing here, it wouldn't discard useful information.
Also, I don't see how you can resolve this issue without also having at least a general agreement for how CSP controls it. Every new way to initiate a network request should define or reference how CSP controls it. In particular, if CSP would block <iframe src=foo.html type=text/html>
then should CSP also block `<link rel=prefetch href=foo.html>
? The answer isn't obvious when we consider that the prefetch could be for a <a href=foo.html>
that CSP wouldn't block, and when we consider that CSP is used for at least two quite different things (for XSS prevention in normal web pages, and as part of a poor person's confinement mechanism in Chrome's extensions and Firefox's package apps). FWIW, I don't think the "destination context" idea helps with solving any CSP interaction stuff either.
Referer: interesting. What would the UA send in iframe case vs the click? The fact that these are different is, by itself, seems rather odd to me.
Predicting type: you don't have to if you don't care about initialize appropriate request headers and such. E.g. if you want to prefetch an image, then you do want to advertise the fact that you support certain image types, and that Accept header is only added when we know we're requesting an image. As such, this is an opt-in hint to the UA.
Differentiating pre* vs non-pre: true, but as noted, it adds more complexity.
CSP: agree, but this is something that CSP itself needs to address. Today there is no CSP directive that controls existing prefetch/prerender deployments. I do think it's reasonable to add a directive to control these; the directive should be a new one, it can't be tied to anything that already exists. To me this is a non-blocking issue to what we're discussing here.
RE referrer: First, consider the mixed content case: Referrer should NEVER be sent for mixed-content subresources. But, with referrer control, we may opt into sending Referrer for HTTP->HTTPS navigations. More generally, pages should be able to control referrer separately for subresource and navigation loads; see https://briansmith.org/referrer-01 for what I'm shooting for.
RE predicting type: I think we should be able to initialize appropriate request headers without predicting the type. For example, preload for foo.png where foo.png may be WebP or PNG depending on the user-agent. I guess your thinking is that you can say type=image/png and it will work out OK even if it is actually WebP, however that seems quite hacky. The important thing is the intended context, in this case <img src>
, not the type of the response, but <link type>
is about the type of the response.
Re CSP: If people don't define CSP controls when they add new features then that effectively breaks CSP every time such a new feature is added. For example, it is should be possible to use CSP to prevent any accidental/automatic HTTPS->HTTP leakage using default-src 'self' https wss
. This only works if anything that can (automatically) load something over non-secure HTTP is controlled by CSP, and this is an important use case for CSP. Adding features and then figuring out later how CSP affects them just doesn't work for that use case.
In particular, here's a very reasonable use case: Using CSP on my HTTPS page, I want to ensure that I only prefetch/prerender linked HTTPS pages, to minimize the infoleak of the contents of the links on my HTTPS page. I understand that Google SERPs don't seem to want to do this, but I think a lot of HTTPS sites will want to do this.
I understand why it is tempting to overload type
instead of defining a new attribute and new flags. But, it seems like whenever we do such overloading, we end up creating future problems for ourselves. The things I mentioned above where overloading type
for (partially) orthogonal purposes causes problems are just the examples I could think of off the top of my head. That means there are probably more and that there will probably be more in the future. I understand the desire to keep things simple, but it seems like overloading type
is at best just deferring complexity for later.
Anyway, I'm surprised by the "fetch could bypass connect-src" argument. The browser knows (and could keep track of) whether the fetch was triggered by a <link>
or by a JS call to window.fetch()
and/or whether window.fetch()
was called in a service worker or outside of a service worker. In any use of service workers I can think of, I want to have connect-src work differently for fetch()
within my service worker (e.g. connect-src
selfhttps:
) vs. fetch()
outside my service worker (e.g. connect-src
none``). It seems like the fact that we haven't figured out how that would work is starting to cause lots of things to go wrong. Further, I think a lot of sites would like to have a CSP directive to control whether a service worker can ever be registered, because service workers are very special (notably higher risk) as far as security is concerned. I don't mean to be creating stop energy, but it's hard to see how this issue can be properly resolved without having first having at least a general plan for resolving the CSP-related stuff.
RE referrer: First, consider the mixed content case: Referrer should NEVER be sent for mixed-content subresources. But, with referrer control, we may opt into sending Referrer for HTTP->HTTPS navigations. More generally, pages should be able to control referrer separately for subresource and navigation loads; see https://briansmith.org/referrer-01 for what I'm shooting for.
Sure, but I don't think type
prevents anything here. If you have a document-wide policy than pre* fetches would inherit the policy; if you want to control referer on per-request basis then we should expose something similar to CORS setting attribute (e.g. referer=policy).
RE predicting type: I think we should be able to initialize appropriate request headers without predicting the type. For example, preload for foo.png where foo.png may be WebP or PNG depending on the user-agent. I guess your thinking is that you can say type=image/png and it will work out OK even if it is actually WebP, however that seems quite hacky. The important thing is the intended context, in this case
<img src>
, not the type of the response, but is about the type of the response.
Neither solution is perfect.. I could be prefetching an image that I want to navigate to or load within an iframe (e.g. <iframe src=image.jpg>
); the context is not always sufficient to determine the type. I agree that specifying the full mime type is not ideal, but I do believe that it's a better overall fit.
Re CSP: If people don't define CSP controls when they add new features then that effectively breaks CSP every time such a new feature is added.
Sure, I'm 100% with you. My point is simply that prefetch and prerender are already shipped, and they went out without any formal specs.. hence the CSP gaps. The RH spec is, in effect, an attempt to rationalize what was shipped and smooth out the edge cases, which is the reason we're having this discussion. FWIW, we should fork this part of the discussion as a separate thread against CSP spec.
I understand why it is tempting to overload type instead of defining a new attribute and new flags. But, it seems like whenever we do such overloading, we end up creating future problems for ourselves.
I disagree, I don't believe we're overloading the meaning. It's an advisory hint, and we're using it as such to help initialize fetch settings.
Re, connect-src: I think this is orthogonal to this discussion, and I'll defer to @mikewest on details. That said, related issue: https://github.com/whatwg/fetch/issues/77.
I no longer think MIME types are the correct abstraction. They are a) not the primitive implementations use (implementations use contexts) and b) somewhat useless for priority. (And because they are only useful for Accept
I suspect MIME types will not help us much going forward either.)
Being able to influence or even set the Accept
header makes sense, but soon you can do that for every fetch using fetchsettings=""
or whatever we end up calling it.
Setting priority based on a MIME type is just not enough information. Navigating to an image is very different from embedding an image, or from loading an image as an icon. You really need context to determine the priority.
@annevk the ability to set a custom Accept header (amongst others) does not and should not preclude a mechanism that allows the UA to set it automatically based on a shorthand hint. If we were to force developers to manually specify the Accept header on each fetch and resource declaration then that'd be both very painful and verbose. That said, I think we agree on all this already.
As I noted earlier, "context" has its pitfalls too. For example, the target context may be an iframe
but I can load any content-type within an iframe; knowing that context is "navigation" does not tell me anything about content-type -- i.e. context is not sufficient to determine what Accept headers should be set. Also, an optimization proxy can observe what resource types are being fetched and inject those as preload/prefetch hints (with type attribute) without any knowledge of context in which they're used -- this is an important use case for pre{load,fetch}.
Yes, content-type is a weaker prioritization signal. That said, it's still strictly better than today's complete absence of any prioritization information and something the UA (and developers, once we give them such control) can leverage as part of its prioritization algorithm.
FWIW, knowing what type of resource we're about to fetch is an important input both to the UA and developer-specified process algorithms. Source and destination contexts (where they're known and meaningful) should be available too, but context is not enough. As such, I think we still need and should pursue support for "type".
If you want a shorthand for Accept
we should expose the primitive from the specification, not some new syntax. Anyway, I'd like to see your last paragraph quantified somehow, since it seems very important to distinguish some image the user might navigate to next from say an image the page uses.
@annevk the difference between "navigate to next" and "used by current page" is already captured via the source context (i.e. prefetch
vs preload
).
type
is one common interface that can be communicated both by the developer and the server to the UA (see proxy/server initiated fetching use case): the developer can declare what type they're fetching, proxies can observe fetched resources and communicate same information to the UA (e.g. Link: <https://example.com/logo-hires.jpg>; rel=preload; type=image/jpeg
). The UA logic is as outlined in https://github.com/whatwg/fetch/issues/64#issuecomment-115985753.
That's not to say that we might not want to expose additional context to the UA to further improve prioritization logic, but I do think type
is necessary to solve some of the existing prefetch/preload use cases.
To be clear, I wasn't arguing against passing the type in. I can see how that could be useful. My point is that it isn't a substitute for the destination context.
@briansmith yep, agreed. I'm focusing on type
to address the Preload/Resource-Hints use cases.
@annevk any thoughts on https://github.com/whatwg/fetch/issues/64#issuecomment-123132921?
Preload implementation is blocked on this and I'd love to get that moving.
When I talked with @sicking about this he also immediately suggested context as the way to go. Using a MIME type as a way to set a longer Accept
header seems very flawed and not exposing a primitive that UAs currently have or would likely have.
[D]ifference between "navigate to next" and "used by current page" is already captured via the source context (i.e. prefetch vs preload)
Are "fetch" and "load" really the right words in that case? They don't really convey that difference. Either makes sense in the context of the current environment, especially now we have fetch()
.
I think what we want here is something that matches the first bullet in https://github.com/whatwg/fetch/issues/93#issuecomment-125289983 I.e. I think we should fix the current RequestContext enum to make it indicate the first bullet of that post. That should make that enum useful here too as far as I can tell.
@sicking @ehsan thanks for the reference. I agree, that's a direction worth exploring and it does overlap nicely with what we need for Preload + Resource Hints. I'll followup on the other thread.
Closing, new proposal in https://github.com/whatwg/fetch/pull/211.
This is a continuation of https://github.com/whatwg/fetch/issues/43: we've added language to set Accept, Accept-Language, and priority when request context is not fetch. However, this is not sufficient as we need to account for cases where request context is fetch.
To address this, I'm proposing that we add some notion of "destination context", which is an optional attribute that allows the developer to indicate how/by whom the response is intended to be consumed. This value does not affect or override "context" value (which remains as "fetch"), but it can be used in the fetching algorithm to set the appropriate request headers and priority. Optionally, this allows the UA to also enforce how/where the response is consumed.
Why not just override context? Because that would allow fetch() to bypass connect-src, and we also lose the initiator context -- e.g. within ServiceWorker I'd like to distinguish between
<img>
initiated requests and prefetch initiated image requests.. both have same destination context, but may need different processing logic.Request({url: '/whatever.png', destinationContext: "image"})
context
is set to 'fetch': request is subject to connect-src.destination-context
is set to 'image', which is used to initialize Accept, Accept-Language, and request priority.<link rel=prefetch href=/other.png as=font>
context
is set to 'preload'.destination context
is set to 'font' (viaas
, or whatever we call it), which is used to initialize Accept, ...