Use moduleIds instead of urls

justinbmeyer commented 9 years ago

This responds to the rational of using urls instead of moduleIds and to many of the arguments posted supporting this change in this traceur thread.

whatwg/loader changes relative module identifiers, like "../foo", from being name-normalized to address-normalized. I think this is an incorrect solution because:

Most of the use cases address-normalized addresses are not common, the results not expected, or users misunderstand the module namespace.
The remaining use case would be better solved with other approaches.
It results in non-deterministic moduleNames which create problems for many other use cases.
Background

There are 3 namespaces:

Module specifiers (the string in the declaration) ../foo,
Normalized name, the result of normalize() bar/foo,
Addresses, the result of locate() http://cdn.com/bar/foo.js.

Terms:

parent module - the module that is doing the importing.
name-normalized - the previous way of normalizing module specifiers based on the parent module's name.
address-normalized - the new way of normalizing module specifiers based on the parent module's address.

This change makes the default normalize hook use the "parentAddress" with module specifiers that look like "../" or "./". This will result in normalized names that are urls/paths.

Example

Compare what happens previously to currently with the following example:

System.paths = {
  "a/b": "path/B/b.js",
  "a/c": "path/C/c.js"
}

// in main.js
import b from "a/b"

// in a/b.js
import c from "./c"

Previously:

specifier -> normalized -> address
"a/b"     -> "a/b"      -> "path/B/b.js"
"./c"    -> "a/c"      -> "path/C/c.js"

Current

specifier -> normalized -> address
"a/b"     -> "a/b"      -> "path/B/b.js"
"./c"    -> "path/B/c" -> "path/B/c"

Rationale for the change.

The following are examples I've found discussing the reasons for the change. This seemingly is to solve the confusion that moduleNames are a separate namespace from addresses. I think address-normalization will cause more problems than it solves. I will discuss the reasons for this next; however, I will link to those reasons in the following rationale and example sections.

Rationale 1

When you write relative paths in your import you expect these to be relative to the file you are writing and not something else.

It is true that you will commonly expect the file loaded to be relative to the parent module's address. This will happen most of the time with name-normalization. However, the following example shows a case where this does not happen:

Example 1.A

A user expected the following to load "path/to/two.js" instead of "two.js":

System.paths['one'] = 'path/to/one.js';
System.import('one');

// in path/to/one.js
import * from "./two"

With name-normalization, this is a mistake by the user of misunderstanding that moduleNames are not addresses. ./ is swapping out the current module name one for two. They could fix this problem by simply adding:

System.paths['two'] = 'path/to/two.js';

or

System.paths['*'] = 'path/to/*.js';

Example 1.B

Remember that names and URLs are not the same. Assume someone maps a name/a to http://foo.com/a.js and name/a loads ./b but there is a mapping for name/b to http://bar.com/b.js. With the new scheme we would instead load http://foo.com/b.js since we resolve things relative to the URL and not the unresolved name. (ignoring the .js suffix changes)

In this example, the code might look like:

System.paths['name/a'] = "http://foo.com/a.js"
System.paths['name/b'] = "http://bar.com/b.js"

// in http://foo.com/a.js
import "./b"

The result is that name-normalized would load "http://bar.com/b.js" while address-normalized would load "http://foo.com/b.js". I think users would and should expect "http://bar.com/b.js" to load. The address-normalized result will be more difficult to deal with in builds.

Example 2

URLs are unique too. I feel like I am missing something here... Let me think a bit more about it. If I'm not mistaken the issue we showed was related to polyfilling a builtin module that would reside at "js/reflect" for example and this would load "./util.js" but we definitely do not want this to be loaded from "js/reflect/util.js" but from a file relative to the file that implemented the polyfill.

I'm unsure what this use case is exactly. I hope the details of it can be provided. My guess is that a polyfill/reflect/util should be loaded instead of js/reflect/util and that polyfill/reflect/util has its own relative dependencies. The code setup might look like:

// in js/reflect.js
import util from "./reflect/util"

// in js/reflect/util.js
import helpers from "./helpers"
CODE

The idea is to swap js/reflect/util.js with something like the following polyfills/reflect/util.js:

// in polyfills/reflect/util.js
import helpers from "js/reflect/helpers"
import magic from "./magic"
CODE

The problem is that with a paths config like:

System.paths["js/reflect/util"] = "polyfills/reflect/util.js"

... ./magic will normalize to js/reflect/magic.

However, this can be solved better with a map config that is used by RequireJS or SystemJS like:

System.map["js/reflect/util"] = "polyfills/reflect/util"

Conclusions

In short, the problems attempted to be solved by address-normalization are either:

not very common
not expected
can be solved better ways

The following will discuss some very real problems that will be created by address-normalization.

Problems with address-normalization

By incorporating addresses into moduleNames, address-normalization creates non-deterministic moduleNames in all sorts of situations. Deterministic moduleNames are critical for a lot of advanced, but essential functionality.

System extensions that match moduleNames and different environments.

It's pretty common to point something like lodash to a CDN only in production. And, it will be pretty common that some extension might want to run on only certain modules specified by a System config like:

if(ENV === "production") {
  System.paths["lodash"] = "http://cdn.com/lodash/*"
}
System.es6toES5 = ['lodash/*','ember/*'];

However, this will not work because my ES6-ES5 plugin can not simply match moduleNames that start with lodash because if any lodash module uses a relative path, it's module name might be something like: "http://cdn.com/lodash/array.js". address-normalization will force all config to be addressed based, and much more likely to change.

if(ENV === "production") {
  System.paths["lodash"] = "http://cdn.com/lodash/*"
  System..es6toES5 = ["http://cdn.com/lodash/*",'lodash/*','ember/*'];
} else {
  System.es6toES5 = ['lodash/*','ember/*'];
}

Build systems

Build systems typically have to write out a version of a module that is self defining. For instance:

System.define("lodash/array", function(){
  //... lodash/array's code ...
})
System.define("lodash", ["./array"],function(){
  //... lodash's code ...
})

It's very important that the right moduleNames are written out that the client understands and can repeatably locate. address-normalization will make this very hard.

Build systems often load files in different paths and have different addresses. The client might find "lodash/array" at http://localhost/utils/lodash/array.js, but the server might see it at: /user/jbm/dev/project/shared/utils/lodash/array.js. This file can not be written out like:

System.define("/user/jbm/dev/project/shared/utils/lodash/array", function(){
  //... lodash/array's code ...
})
System.define("lodash", ["./array"],function(){
  //... lodash's code ...
})

Or even:

System.define("/user/jbm/dev/project/shared/utils/lodash/array", function(){
  //... lodash/array's code ...
})
System.define("lodash", ["/user/jbm/dev/project/shared/utils/lodash/array"],function(){
  //... lodash's code ...
})

The reason is that the client might want to dynamically import "lodash/superArray" which might look like:

import "./array"

The client would see "lodash/superArray" at http://localhost/utils/lodash/array/superArray.js and would load http://localhost/utils/lodash/array/array.js twice.

Recommendations

It's highly likely I am missing something. But if I'm not, this seems like a step in the wrong direction. I recommend reverting to name-normalization and adding a "map" specification.

johnjbarton commented 9 years ago

"../c" -> "a/c" -> "path/C/c.js"

Do you mean

"./c" -> "a/c" -> "path/C/c.js"

In two places in the first Example? These lines do not match the import statements.

justinbmeyer commented 9 years ago

@johnjbarton Yes. Updating.

Edit: removed the Duplicate Loading of Modules. I wrote this response up a while ago, and now I can't remember why I was thinking that. Here's that section. Probably best to ignore:

Duplicate loading of Modules

What happens if I do:

System.paths["lodash/*"] = "cdn.com/lodash/*";

import "lodash/main";
import "lodash/array";

And in "lodash/main.js", it does:

import "./array";

Will this load "lodash/array.js" twice?

johnjbarton commented 9 years ago

// in a/b.js
import c from "./c.js"

This has to mean a/c.js. We should not allow these kinds of imports to be remapped. Users of packages that want to re-do the semantics of the package can do so by changing/overwriting the files. It should be hard, not easy to do.

One side effect of allowing

import {c} from 'foo/c.js';

is that, absent a remapping, it would resolve to ./foo/c.js but a mapping could point to package foo file c.js. Thus foo/c.js expresses "fallback to the local package foo unless configuration directs you to a different foo". The form ./foo/c.js (with the extra ./) would express disallowing configuration to change the target.

justinbmeyer commented 9 years ago

Edit: Updated title.

matthewp commented 9 years ago

This has to mean a/c.js. We should not allow these kinds of imports to be remapped. Users of packages that want to re-do the semantics of the package can do so by changing/overwriting the files. It should be hard, not easy to do.

That effectively means no map configuration, which is extremely useful. What is the justification for not allowing that?

justinbmeyer commented 9 years ago

It should be hard, not easy to do.

This is something we commonly want people to do in CanJS. CanJS runs on either Dojo / Mootools / Zepto / jQuery or YUI. We let people alias a can/util/lib to jquery.

This type of thing should be possible.

johnjbarton commented 9 years ago

We let people alias a can/util/lib to jquery.

Yes, allow that. Just don't allow ./can/util/lib to be mapped.

justinbmeyer commented 9 years ago

Just don't allow ./can/util/lib to be mapped.

Why?

I'd argue that there are good reasons to patch some specific file. For instance browserify's "browser" field allows developers to do this for node modules so they work on the server. Example:

{
  browser: {  "./foo" : "./foo-browser"}
}

justinbmeyer commented 9 years ago

Accident!

johnjbarton commented 9 years ago

In a module system, module foo can be replaced by module foo-browser in the map. But "./foo" does not expression module relationship, it expresses directory relationship. Of course this is just a convention, but one we should adopt.

By distinguishing "./foo/x.js" and "foo/x.js" we allow package authors to express two different ideas. With "./foo/x.js" they are saying "I support the use of my implementation of "x.js"; with "foo/x.js" they say "I support any implementation matching the interface of "foo/x.js".

Package users can remap "foo/x.js" but if they want to change the "x.js" pulled in by "./foo/x.js" they have to hack. That sets the right kind of barrier to help package authors give appropriate levels of support to users.

Allowing every path to be remapped creates an overly complex solution with no role for the package author to create boundaries. It's similar to the difference between private_ and public members or other forms of software boundaries.

caridy commented 9 years ago

@justinbmeyer I see a lot of misleading parts in this issue. we need some concrete examples that illustrate the issue. Keep in mind that any discussion around this will have to cover these two cases as well:

<script src="./a/b.js" type="module"></script>
<script type="module">
import "./x/y.js";
</script>
<script>
System.import('./j/k.js');
</script>

you should also look at https://github.com/whatwg/loader/issues/20, where we have some details sites rules.

justinbmeyer commented 9 years ago

Thanks everyone for all the feedback. I'm trying my best to understand this choice and be as level-headed about the reasons for and against it as possible.

@caridy What is misleading?

Does the "Build systems" section make sense? Why is it not concrete enough?

On those use cases, I don't believe a user should be able to write that. Those are not valid moduleIds.

Another take

After having a discussion with @guybedford about this I'm trying to understand why people support this change and what the tradeoffs are.

In my opinion, it comes down to:

Making it slightly easier for users to understand by using URLS instead of moduleIds
There's less to write for spec authors and implementors

VS

Making things easy for ecosystem / extensions developers

I favor the ecosystem / extension developer point. I believe that:

Any mature application development will be based around moduleIds.

This means that even if canonical URLS are used, advanced users will add loader extensions that use moduleIds.

My build example is a case where some non-url based representation of a module must be used to make bundled code portable from server to client.
Most configuration will be based around modueIds so it is portable to other locations and urls. I believe the site config is an example of this.

Extension development will be difficult.

Taking my es6toES5 example, the only way to make this work is to somehow know when all extensions have been added and call the resolve() hook on each value to get the full URL. For example, converting "lodash/*" to `"http://cdn.com/lodash/*", so it could be matched during transpile.

The form of and access to those moduleIds will be fragmented.

Many extensions will have their own form of moduleIds and moduleId side-tables, making interoperability more difficult.

If the goals (#27) of this group are only to create a minimal spec that enables as much as possible, then I understand the removal of moduleIds. However, I'm fairly confident that it will have missed an opportunity to provide an extremely important integration point for extension and ecosystem authors needed to build a wide variety of interoperable tools and libraries.

justinbmeyer commented 9 years ago

TLDR; moduleIds are important because it's very common to have a module at different URLs at different times of an application lifecycle. moduleIds make dealing with this much easier and are worth a little effort to understand on part of users.

jrburke commented 9 years ago

From the requirejs perspective, I tried supporting URLs and module IDs in dependency references, and it just led to confusion, and ultimately limited, even broke optimizations, and made things like "map" config not possible in a general sense. To use those features, the requirement is "use only IDs", and in a requirejs 3, I would remove the support for plain URLs as module references.

I believe the problem is assuming the forms that @caridy mentioned above should be supported. I started from the same place with requirejs, but it did not work out well. So I suggest the reasons for trying to support that should be revisited.

I would be happy to talk more about this in a more realtime talk, as these sorts of issues just get mired in people's subjective text parsing and time issues, and often the disconnect is a deeper philosophical issue that really only comes out in people being able to talk in realtime.

In the absence of that, some other things to consider:

AMD loaders allow for loader plugins, via IDs of the form "pluginID!resourceId". These may not actually have an actual URL/address associated with them. The needs met by those plugin IDs do not go away with the ES loader, and I expect to author an ES loader wrapper that provides them if they do not make it to native support.
People will still want to inline modules. Addresses in those case make less sense.

I now view module IDs like the identifiers for functions, and the loader should treat their storage similarly. Instead of language identifiers, modules use string names that allow for some relative referencing via segments separated by "/", and they may be async-resolved. But they have a name that is incidental to what address the came from.

This becomes even clearer once there is nested inlining of modules. This happens today with browserified/amd-optimized modules having an internal module structure all defined inline, but also a public face for use by other module code outside. This is similar to how named functions can be nested inside other named functions.

Using module IDs makes it clear that these are just units of code referenced via a name. Making sure they are distinct entities from an address (which may or may not actually be used to load the module) will lead to fewer misunderstandings, and allows much for flexibility for loader configuration, flexibility that has been proven to be useful in AMD loaders, specifically the package and map configs.

matthewp commented 9 years ago

I want to chime in that moduleIds work better for browsers. @jrburke pointed out some very good examples and I want to expound on plugins. Consider a resource that is loaded by 2 different packages. One might load it with the json plugin and another with the text plugin. In order to make this work the module loader has to construct distinct ids for each module (they are separate modules). If the key is a url address you can't have multiple modules from the same url. You can fudge the url to make them distinct, but once you do this you have effectively created a moduleId that happens to start with http.

There are lots of scenarios where you want "pseudo" modules that don't actually map to a url. Inlined modules is one example. But if we can System.install anything in memory as a module, urls being the key stops making sense. For example in StealJS we have a special module @loader which represents the loader that is loading your code, this way you can configure the loader in environments where multiple loaders exist.

Lastly I just want to stress how important this decision is, as it will influence how modules are written going forward.

guybedford commented 9 years ago

https://github.com/whatwg/loader/issues/52 could be considered a response to this.

caridy commented 8 years ago

with the latest refactors to allow sync operations on the registry, we have settle on using the result from resolve hook as the keys for the registry.

justinbmeyer commented 8 years ago

Bah humbug. I realize you have to make a decision. I hope this is the right one.

Sent from my iPhone

On Dec 24, 2015, at 2:04 PM, Caridy Patiño notifications@github.com wrote:

with the latest refactors to allow sync operations on the registry, we have settle on using the result from resolve hook as the keys for the registry.

— Reply to this email directly or view it on GitHub.

csnover commented 8 years ago

I don’t know what the WHATWG policy is regarding reopening “closed” issues, but I was just informed today of this issue via Twitter and I would strenuously encourage reconsideration of the use of URLs as registry keys in the final design of the loader specification.

Last month I wrote a fairly lengthy article about this topic that explains why I think you can’t just use file paths or URLs to address various different real world use cases, and how using URLs makes a loader design brittle and harder to reason about for any non-trivial application.

(Alternatively I would be happy for someone from the WG to explain how I am wrong, and how the WHATWG loader using URLs is going to be able to cleanly solve the use cases described therein.)

Others in this thread have also been making many of the same arguments, and it’s disheartening to see that there isn’t even a single rebuttal to their arguments from the WG, just a “we’re doing it this other way”. Is there some other discussion forum where all of the people in this ticket were answered and I just don’t know about it?

Thanks!

caridy commented 8 years ago

@csnover feel free to ping me (caridy at gmail dot com), and we can chat about it. As for reconsider any decision we make here, it is as simple as having a good reason to do so. We are very open to correct any mistake or problems to make this spec better.

whatwg / loader