whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.17k stars 2.69k forks source link

Add a note why module map uses request URLs #3624

Open annevk opened 6 years ago

annevk commented 6 years ago

I tried to find the rationale in #443 why we picked request URLs in the module map (and not tried to wait for the response to come or some such) and couldn't find anything. It just states it as a fact.

Then later on there was #613, but that only talks about banning redirects, which was a non-starter.

It seems worth at least clarifying somewhere why we made this decision.

I guess the main reason is that deduplicating wouldn't work ahead-of-time so you end up with lots of fetches?

guybedford commented 6 years ago

This would be great to see, was also wondering recently why the response URL isn't used.

If it's to do with interpretation, note that loading WASM will rely on "content-type" mime, so that the response is needed before further work in the loading algorithm can be done.

annevk commented 6 years ago

That can't be it, since that's true for module JavaScript too. Only classic JavaScript ignores Content-Type (to the extent that Fetch allows that).

guybedford commented 6 years ago

As a further suggestion here I was wondering if there aren’t deep technical limitations if we could specify a redirect using the request response URL in the module map. This could possibly be a new 300 code even. Do you think this might be a possibility? It’s an important use case for automatic version resolution.

guybedford commented 6 years ago

Correction: I mean response URL of course.

annevk commented 6 years ago

I'm not sure what that means. That when you hit a redirect with a flag set you'd update the module map?

guybedford commented 6 years ago

I guess it would require the same architecture change in coalescing requests separately until their responses arrive, and then creating a module map entry.

I guess I’m asking (a) is it too late to change to this behavior and (b) if so, otherwise would it be possible to spec something like this as a separate redirect code perhaps.

I still don’t see a strong technical reason why not though. On Sun, 15 Apr 2018 at 15:39, Anne van Kesteren notifications@github.com wrote:

I'm not sure what that means. That when you hit a redirect with a flag set you'd update the module map?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/whatwg/html/issues/3624#issuecomment-381407402, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkiyudp4-zpIcL13lrF6s-2HXJ98EViks5to02CgaJpZM4TQKlE .

annevk commented 6 years ago

I'm still not sure what you mean by (b). Per CanIUse Chrome, Edge, and Safari have shipped, so I suspect revisiting it at this point would not be easy (and also, it would be incompatible with the shared workers architecture).

cc @whatwg/modules

domenic commented 6 years ago

We're not going to change this behavior, if only because of the consistency with other request-URL-based maps (image cache, shared worker cache, the as-yet-unspecified preload cache). But the reasoning is deeper than that.

Consider the following module:

import './a.mjs';
import './a.mjs';

If the module map was keyed on response URL, this would necessarily create two requests, since we don't know ahead of time which whether these two requests will end up at the same response URL. That's quite wasteful. Furthermore, the result becomes nondeterministic based on server behavior.

You could then imagine adding some hack on top, e.g. storing the request URL but then switching to the response URL once the response comes in. Now you have a 2-to-1 mapping of keys to values, also bad. Plus you get more nondeterminism. And you're basically just reinventing the existing system, but in a more complicated and confusing way.

There's also no value in using the response URL. Basically, having something different in the source specifier than actually used in the module map just creates a lot of confusion for no gain.

guybedford commented 6 years ago

I was under the impression that the specification process was supposed to be an open and transparent process where feedback and discussion around the intricacies of implementations was welcomed. Since the opinion here is already made, and I don't work at Google, it seems there's nothing more I can bring to this discussion.

domenic commented 6 years ago

I see nothing in my comment that would lead to the kind of ad hominem message you have sent, @guybedford. Indeed, I tried to provide you with details on the intricacies of implementation and why and how we made the decision we did during the open and transparent process that took place during #433. I also have no idea why you are bringing Google into this.

Perhaps if you wish to continue this discussion, it's best placed on whatwg/meta, so that we can continue to focus this thread on technical aspects (like the ones I explained in my post) and not ad hominem meta-critiques.

guybedford commented 6 years ago

@domenic I only want to discuss the arguments here, feasibility and use cases, this is what I am here to share. The first line of your response already states the outcome, and it is clear to both of us that is what it will be, regardless of those arguments, and from my experience in these discussions. Only a fool would keep arguing in such a situation. An open discussion is one in which you do not assume the outcome from the start. So yes, I'm appealing to the process itself to try to open up discussion again, as spec discussions should be. If you want to discuss this further, feel free to open a meta issue.

As for the 2-1 key mapping, the response here is that browsers already coalesce requests based on URL and headers, and while I know this is a part of browsers that is perhaps under-specified, it seems like there's no reason the coalescing couldn't happen there using the module map after the response has been received. The definition of confusing in this context is somewhat subjective, and certainly something to be weighed up, but it doesn't seem infeasible.

The use case is a version resolving module CDN - I request https://cdn.com/react and get a response from https://cdn.com/react@1.2.3/index.js which then imports from ./dep.js. If ./dep.js happens to import from ./index itself, then we have an instancing issue making this CDN approach unsuitable.

domenic commented 6 years ago

If dep.js contains import './index', it imports https://cdn.com/react@1.2.3/index.

Specifier resolution is via the response URL. Module map keys are via the request URL. They are unrelated, as noted already in the specification:

It is intentional that the module map is keyed by the request URL, whereas the base URL for the module script is set to the response URL. The former is used to deduplicate fetches, while the latter is used for URL resolution.

guybedford commented 6 years ago

If dep.js contains import './index', it imports https://cdn.com/react@1.2.3/index.

https://cdn.com/react@1.2.3/index.js is then instantiated and executed in the module map twice, once as https://cdn.com.react (the one the user imported) and a second time as https://cdn.com/react@1.2.3/index.js (the one ./dep.js sees). This one edge case thus makes the entire approach fall over as it will fail on some packages.

domenic commented 6 years ago

Indeed. If you refer to the same content by two different URLs in your program---whether that be one URL that redirects to the other, two different query strings, or simply two completely separate endpoints which your server happens to serve the same content for---the browser will treat those two URLs as separate. This holds for pretty much everything, including e.g. the request-keyed HTTP and service worker caches.

aapoalas commented 5 months ago

I was somewhat bitten by this today: I made the mistake of assuming that a returning Response.redirect(url) from a ServiceWorker to a module import would indeed "rename" the module request as well. In hindsight it makes sense that this is not the case, but it was a slightly bitter pill to find. For some reason I had no luck trying to search for this issue online, though that's probably just lack of SEO-fu on my part. Still, getting this note included might be worthwhile.

In my case I was assuming that because I was mapping from ../foo--^1.0.0 to ../foo--1.0.0 which is clearly sort of "safe", that every other case is safe as well. A great point that disproves this safety assumption (mentioned elsewhere) was that if the redirected URL points to a different origin, then "renaming" the module request would break the opaqueness of the redirect, as you could now observe that eg. foo.com/file.js and bar.com/file.js when imported produce the same Module object, hence one must be a redirect to the other.