igrigorik / resource-hints

Moved to...
https://github.com/w3c/resource-hints
32 stars 7 forks source link

Don't think the "type" attribute is quite fit for purpose #6

Closed jakearchibald closed 9 years ago

jakearchibald commented 10 years ago

As far as I can tell, "type" is being used for prioritisation and appropriate request headers (eg, Accept header for images), but elsewhere in the platform the origin of the request is used for this, not the type.

In ServiceWorker, we have the concept of a request context for this, maybe it's more appropriate here too? https://slightlyoff.github.io/ServiceWorker/spec/service_worker/#context-enum

igrigorik commented 10 years ago

By "origin" do you mean type of attribute (like img vs script, etc)? We don't have that context here, and I don't see how we can leverage the request context when all we have is a bunch of hints in the document head?

The problem I'm trying to address is what we ran into with current rel=subresource implementation in Chrome: effectively its preload but without content-type information and because of that all requests are of the same (low) priority and end up hurting some pages (e.g. subresource load is initiated at a lower priority than the actual resource would get if it was discovered by the doc parser).

Practically speaking, all browsers implement content-type prioritization logic already: the parser sets relative priority based on context (img, stylesheet, js, etc), so I'm just trying to leverage what we already have in place.

yoavweiss commented 10 years ago

One concern I have with using type for prioritization is that there may be cases where it's not granular enough. E.g. a recent tweet by @scottjehl, where he asks about "non-critical CSS". So, we may want a way for the author to communicate priorities beyond type. I don't know what the best way for that though.

igrigorik commented 10 years ago

@yoavweiss well, if its not critical, then I'd argue it shouldn't be in the preload list? The point of preload is to help accelerate the initial load when all you have is a few KB of HTML markup (after first RTT) and/or (likely) blocking JS. The non-critical stuff will be discovered and downloaded later.

jakearchibald commented 10 years ago

By "origin" do you mean type of attribute (like img vs script, etc)?

Ah, sorry, origin was the wrong word to use. I mean the thing that initiated the request. We call it context in serviceworker.

We don't have that context here, and I don't see how we can leverage the request context when all we have is a bunch of hints in the document head?

You're using type for that at the moment, and I'm saying that's not the best thing to use.

The problem I'm trying to address is what we ran into with current rel=subresource implementation in Chrome: effectively its preload but without content-type information and because of that all requests are of the same (low) priority and end up hurting some pages (e.g. subresource load is initiated at a lower priority than the actual resource would get if it was discovered by the doc parser).

Practically speaking, all browsers implement content-type prioritization logic already: the parser sets relative priority based on context (img, stylesheet, js, etc), so I'm just trying to leverage what we already have in place.

Yes, the priority is set on context, not content-type. Eg, <script src="whatever"> will have a higher priority than an XHR request to the same url, even though the return will have the same content type. It'd be better to do:

<link rel="preload" context="image" href="//origin/thing.img">
<link rel="preload" context="connect" href="//origin/thing.img" crossorigin>

Not only will this better handle prioritisation, it'll request with the "image" with image/webp in the Accept header, because it knows this is an image context. It'll request the 2nd as it does XHR, using no credentials if the URL is cross-origin.

It's about ensuring that you get a cache hit when the actual request is made.

igrigorik commented 10 years ago

Ah, interesting, I like it! So, to confirm that I understand this correctly:

Correct? Also, as an aside, is there a definition for each of the enum values? What's connect? I'm assuming crossorigin is a whole other discussion...

/cc @willchan relevant to our earlier prioritization discussion... Or, at least, a step in a right direction.

willchan commented 10 years ago

I'm not sure I understand. I know that @igrigorik is writing the type attribute thinking primarily about prioritization. And @jakearchibald seems to emphasize that this is about making sure you get a cache hit (which, as I've said in issue #5, is an implementation issue of which the specification should be agnostic. To be clear, I'm referring to the 'cache' part of it, not the actual request matching / "hit" part of it).

Are we primarily discussing the name of the attribute so we can have consistency? type vs context?

AFAICT, @igrigorik is misunderstanding @jakearchibald's point. Jake is primarily focused on making sure that we emit the correct headers so that the request matches. The context is important for this, since we'll emit different request headers (e.g. Accept) depending on the context. But this context in most cases is just the content type. In XHR, there is no context, and the content type only gets assigned in the very end when you take this opaque blob of data and insert into the DOM with a specific element type.

@igrigorik seems to infer more than what @jakearchibald is saying. That said, providing manual controls over the magic heuristic prioritization value that the UA calculates is indeed something that we've discussed (I with @slightlyoff and separately with @jakearchibald). It's definitely possible for us to let the UA parse the element and heuristically assign a priority value to it, but let that priority value be manually overridden by the developer via an attribute. This formalizes the UA's heuristic prioritization. Developers desperately need something like this, but it limits what the UA might want to do long-term, which is why I've been very hesitant about it. What if we want multiple dimensions of priority? For example, an idle priority. And if the UA's heuristic treats XHR as priority X today, but later on, we want to set it to X-1 or X+1 or whatever, we may be limited in doing so because of all the existing web content.

This is all complicated and therefore I haven't written it up since I'm still working out the details in my head. I'm fairly nervous about committing to any fine grained prioritization scheme since it's an area that people don't fully understand yet, and the implications of which are fairly large.

igrigorik commented 10 years ago

I think the discussion about exposing fine grained prioritization control (beyond content type) should happen at some point, but it does not and should not block Resource Hints.

@willchan to be honest, I'm not sure if I'm assuming more or less of what Jake is saying, but here's what I want to say... :)

Today browsers already implement prioritization logic based on context (image, script, etc), and given that context is also exposed in request object in ServiceWorker, chances are it will be used to impact routing, caching, etc. Putting the two together, it makes sense to ensure that correct context is set on hint-initiated requests when they reach SW, and (it seems to me that) the simplest way to do that is by providing an explicit context attribute as Jake suggested. Of course, we could use "type" instead and then map the type to a context, but that seems like an unnecessary layer of indirection?

willchan commented 10 years ago

Got it. That seems reasonable.

jakearchibald commented 10 years ago

@igrigorik

  • currently context is automatically set by the UA parser when it emits the request
  • you're proposing we introduce a new context attribute on where we can manually set it

Yeah. In ServiceWorker you get .context as part of the fetch event (or request object, we haven't decided yet). The enums are at http://fetch.spec.whatwg.org/#concept-request-context. I'm hoping we'll be able to use these context types to set sensible defaults on Request objects too https://github.com/slightlyoff/ServiceWorker/issues/318#issuecomment-47773500.

I'm assuming crossorigin is a whole other discussion...

Thankfully that attribute already exists on <link> for this purpose http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#the-link-element

@willchan

Jake is primarily focused on making sure that we emit the correct headers so that the request matches. The context is important for this, since we'll emit different request headers (e.g. Accept) depending on the context. But this context in most cases is just the content type.

In most cases, yeah, but with whatever.svg the type is image/svg+xml, whereas context can be more expressive, it could be image, xmlhttprequest, frame.

In XHR, there is no context, and the content type only gets assigned in the very end when you take this opaque blob of data and insert into the DOM with a specific element type.

We have a xmlhttprequest context (I called it connect earlier, but I was confusing it with the CSP name).

igrigorik commented 10 years ago

@jakearchibald how does that look?

Also, question.. What's the right context for a navigation? Could be a due to a click on a link, window.location, etc. I'm using generic "fetch" in the spec, but.. not sure that's the best one?

<link rel="prefetch" href="//example.com/thankyou.html" context="fetch">
annevk commented 10 years ago

Navigation depends. window.location, <a>, ... have their own contexts. For those triggered directly by the user we do not have one yet. See https://www.w3.org/Bugs/Public/show_bug.cgi?id=26247#c15 We'll add one soonish I suspect.

igrigorik commented 10 years ago

@annevk @jakearchibald wait, I think there is a conflict here.. Current context list contains prefetch and subresource (but skips prerender, which is odd?) all of which map to their <link rel=...> equivalents. This, by itself, makes sense, but that's not what I'm after here...

The problem is that we need to indicate the type/priority of the resource communicated via the hint, such that the UA can set the appropriate priority on the request. This is why I originally went with type instead of context.

a) There is an argument that it makes sense to have both the context and type.. In which case, I'd have to unwind the change here and go back to type="{mime}". b) Alternatively, drop subresource and prefetch and instead say that the context should be set to the value in which that resource will be used later? I.e. if the prefetch is for an image asset, then context should be set to image, and so on?

Thoughts?

[1] http://fetch.spec.whatwg.org/#concept-request-context

annevk commented 9 years ago

Not sure anyone proposed prerender.

How can you tell what prefetch will be used for?

igrigorik commented 9 years ago

preload/prerender are a manual way for the developer (or server) to embed hints to resources that will be required later. Whoever is inserting that hint has a context in mind where it'll be used - a font, image, html doc, etc. This context is very important because it allows us to determine the priority of the resource... without it, all resources have same priority and this causes lots of trouble (this is why rel=subresource is effectively broken in Chrome).

As such, I'd argue that "prefetch" and "subresource" contexts should be removed from the spec? They're not useful, or worse, harmful. Instead, resource hints should take on the context in which they will be used, and if none is specified, we can pick some default.

Also, as @willchan pointed out yesterday in a separate conversation.. the context is critical in some cases because it may affect how the preload/prerender is done. E.g. prerender for an iframe+seamless vs navigation, which could (in theory) be resolved via:

<link rel=prerender href="http://site.com/page2.html" context="navigation">
<link rel=prerender href="http://site.com/widget.html" context="iframe"> <!-- seamless? could we do "iframe;seamless" -->
annevk commented 9 years ago

@mikewest you want to read this.

So one problem I see is that contexts are used by CSP as well. If you fetch a for a certain context (but it's actually prerender that does the fetching) and then use the resource elsewhere, what happens?

I guess as long as the subsequent use of the resource still goes through Fetch it will be safe and okay, but it does seem a bit weird.

(navigation is not a context btw)

igrigorik commented 9 years ago

So one problem I see is that contexts are used by CSP as well. If you fetch a for a certain context (but it's actually prerender that does the fetching) and then use the resource elsewhere, what happens? I guess as long as the subsequent use of the resource still goes through Fetch it will be safe and okay, but it does seem a bit weird.

Yes, it does. Don't have any better suggestion though... /me looks at @mikewest :)

(navigation is not a context btw)

Yep, using it as a placeholder until its equivalent is defined.

annevk commented 9 years ago

@igrigorik there is no equivalent. There's hyperlink, there's location, there's form, there's iframe, there's frame. There's no catch-all.

willchan commented 9 years ago

@annevk I think @igrigorik's point is that we should make up a concept for it. He's handwaving over the details and hoping that you will come through with a good specification mechanism. The motivation for that concept is that it provides a key signal for the UA which the UA can use to optimize. More specifically, if asked to prerender a HTML document that will be navigated to, if the user actually navigates to a different page instead, the UA knows that it's kosher to immediately toss the prerendered document. But if the HTML is not known to be for some magical navigation context, it might be for an iframe, an import, etc, so it's not known yet if it's kosher to toss it out. Given the heavyweight cost of a full document render, this signal is quite useful.

annevk commented 9 years ago

@willchan we need more granular contexts as otherwise CSP cannot be done. At the UA-level you could of course group related contexts, but we need more granular than "navigate".

willchan commented 9 years ago

@annevk Perhaps we're talking past each other. You're saying the proposed context mechanism won't work, due to existing CSP uses. I'm saying, I don't care what the specified mechanism is, I think the problem is worth solving. So redo the mechanisms in whatever ways. Maybe we can't use context and we need something separate, which would probably be a shame. There are clearly tradeoffs here. But I believe they're probably solvable. Am I misunderstanding the impedance mismatch? Are you actually asserting there's no good way to solve the use case @igrigorik and I are proposing? Or are you simply making factual statements about the existing CSP uses of the context attribute?

bizzbyster commented 9 years ago

I think this comment "If you fetch a for a certain context (but it's actually prerender that does the fetching) and then use the resource elsewhere, what happens?" exposes the fact that we're trying to re-use fetch context but that in fact we need something new, something closer to the type of object as that is what impacts the set of possible contexts within which the resource can be used.

What about going back to "type" and defining a new enum of resource hint types and these map to the superset of different ways the handling (prioritization and Accept headers) of fetched resources is impacted by the resource type? A good starting point for this list in WebKit is Resource::Type in https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/fetch/Resource.h.

Peter

willchan commented 9 years ago

I hinted at this when I said it "would probably be a shame." Because we're talking about duplicating a lot of similar stuff. Duplication of most of the same values for type and context would just confuse most authors. It's a tradeoff for sure. I'm hoping we don't have to make this tradeoff.

bizzbyster commented 9 years ago

Right. Maybe the LINK attribute can be "contexts" and it can support a list of allowed contexts for this resource? Feels ugly I know but it would allow us to avoid having to make a third resource typing scheme in addition to type and context.

igrigorik commented 9 years ago

@annevk what about internal? Perhaps a bit of a stretch, but at high level I think triggering a swap of prerendered page maps pretty well into "other user agent usage" / forward navigation?

https://github.com/whatwg/fetch/commit/2cc41cf2d0872bc87a849e1faaa7cec1633aea39#diff-1feda49b40370635faef8b655f144f64R595

annevk commented 9 years ago

@bizzbyster Resource::Type is Fetch's resource context, so I don't think that helps.

@willchan what I'm saying is that resource context is more granular as it has several consumers. So you can't say navigate, you would list hyperlink or form.

Maybe we should have a distinction between contextSource and contextDestination. I'm somewhat curious how this works in the source code today. Feedback from @mikewest and maybe @bzbarsky would help.

DavidLerner commented 9 years ago

One thing we might want to enable developers to do is to provide higher priority hints for objects needed before start of render, followed by hints for objects needed for visual completeness, etc. Additionally, developers might want to prioritize "serializer" objects that block further fetches. The context and type paradigm don't seem to lend themselves to specifying this notion of a prefetch graph. Any ideas?

annevk commented 9 years ago

If we go in that direction, paging @Hixie as he is trying to figure out how to reconcile the various (JavaScript modules, HTML imports, other HTML features) dependency management systems.

igrigorik commented 9 years ago

Kicked off a thread on whatwg: http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Aug/0027.html

igrigorik commented 9 years ago

Maybe we should have a distinction between contextSource and contextDestination. I'm somewhat curious how this works in the source code today. Feedback from @mikewest and maybe @bzbarsky would help.

@annevk how would Source vs Destination distinction help us in this case?

As I noted in the whatwg thread, I'm wondering if we can instead separate download from processing. As in, the context is applied when the fetched asset is "matched" with a request.

annevk commented 9 years ago

I'm not sure. I know that for CSP we need the form annotation. And we need the context frame type (as defined in Fetch). What you want seems like a more generic annotation. I have no idea how we would reconcile the need for both form and navigate without two distinct types.

annevk commented 9 years ago

Oh, and context we most definitely need when fetching as CSP wants to block that, the transmission of bytes.

igrigorik commented 9 years ago

@annevk perhaps there is a distinction to be made based on type of request? Hint requests should be idempotent (i.e. GET) and should not trigger state changes on the server. Whereas form submit is a POST and it would make sense that you'd want to run some policy checks before the request is sent.

igrigorik commented 9 years ago

@annevk thinking about this some more, I'm coming around to the contextDestination idea. Specifically, contextSource can be set as {preload, prerender}, but contextDestination is an optional parameter on the hint that indicates the likely target context -- this has nothing to do with CSP and everything with setting reasonable defaults for fetch priority, headers, etc. Later, the hinted response is consumed by a script, img, etc, and the CSP policies are applied at that time -- this is exactly how rel=subresource and rel=prefetch work today.

A couple of concrete examples:

<!-- contextSource = preload, (likely) contextDestination = image -->
<link rel="preload" href="/some/image.jpg" context="image" />

<!-- contextSource = prerender, (likely) contextDestination = unknown -->
<link rel="prerender" href="/some/thing.html" />

<!-- contextSource = prerender, (likely) contextDestination = script -->
<link rel="prerender" href="/other/script.js" context="script" />

In the first case I'm using context attribute to indicate the likely destination context, such that the UA can infer the content type, set the right headers, priorities, etc. In the second case, I'm prerendering a page, the navigation to which may be triggered through different destination contexts: clicking on a link, window.location, etc. Hence "unknown" and the UA just uses some reasonable defaults. Finally, in the last case, I'm prerendering a script, and I can once again communicate that to let the UA set the right fetch settings.

This approach has the benefit of preserving the source context, and the "context" attribute becomes an optional hint to help the UA determine the likely destination context.


Alternatively, we just push this "context" business into fetch params. Perhaps with something like this: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2014-August/297437.html

<!-- set default headers, priority based on (likely) destinationContext, but then customize if needed -->
<link rel="preload" href="/some/image.jpg" params="{destinationContext: 'image', priority: {...}}" />

In fact, assuming something like above is reasonable, it seems like a better and more flexible solution.

annevk commented 9 years ago

Inserting headers based on the destinationContext seems okay, but needs to be elaborated in detail. Is that Accept, Accept-Language? Where would you expect those to be set per http://fetch.spec.whatwg.org/#http-header-layer-division as it will become observable soon?

The tentative plan is that at least Accept & Accept-Language are set at the API-layer, e.g. by the implementation of <img>, so that they get exposed to service workers. I guess for this that would mean that the implementation of params has some kind of lookup table to set those headers as a convenience.

igrigorik commented 9 years ago

@annevk it seems like destinationContext would be initializing defaults at the API layer, but those defaults can be overridden by a custom header. For example:

<link rel="preload" href="/some/image.jpg" 
        params="{destinationContext: 'image', headers: {'Accept': 'image/jpeg'}}" />

In the above case destinationContext would initialize the default set of headers for an image fetch (as if the download was initiated via img), but the custom header in the same options hash can then override the default Accept value. Any protected headers (Host, etc) would be ignored by the UA.

Does that seem reasonable?

igrigorik commented 9 years ago

Updated the spec to use fetch-settings in https://github.com/igrigorik/resource-hints/commit/955a9ffaef2e14fad3595b56196c2eb6d1944e41. See here: https://igrigorik.github.io/resource-hints/#fetch-settings

@annevk please take a look, let me know if that makes sense.

I'm closing this bug, let's continue the discussion in: https://github.com/igrigorik/resource-hints/issues/21