Closed YurySolovyov closed 7 years ago
I want to start off with the 99% fallacy here. We are talking about module graphs. If 1% of your graph is wrong it can affect 100% of your graph. Let us keep that in mind.
Lets also try and enumerate where things differ between CJS, Script, and ESM whenever we talk about the problematic situations.
Lets also try and enumerate use cases where those situation might occur whenever we accept or dismiss them as being valid or invalid.
Let's start with simple stuff.
I think one of the most common patterns in node is to require()
a bunch of core modules and export either one object with some API, or just export a plain function:
// Full CJS
const fs = require('fs');
const path = require('path');
const api = function(...params) {
// ...
};
module.exports = {
api: api
};
Which in ESM I guess translates to something like:
// Full ESM
import fs from 'fs';
import path from 'path';
const api = function(...params) {
// ...
};
export {
api: api
};
Are there any problems so far in that particular example?
Source | Script | CJS | ESM |
---|---|---|---|
var arguments |
global | local | parse error |
var eval |
global | local | parse error |
import("x") |
ok | ok | ok |
import "x"; |
parse error | parse error | ok |
export {}; |
parse error | parse error | ok |
with({}){} |
ok | ok | parse error |
<!--\n |
ok | ok | parse error |
-->\n |
ok | ok | parse error |
0777 |
ok | ok | parse error |
delete x |
ok | ok | parse error |
try {} catch (eval) {} |
ok | ok | parse error |
try {} catch (arguments) {} |
ok | ok | parse error |
(function (_, _) {}) |
ok | ok | parse error |
eval = eval |
ok | ok | parse error |
arguments = [] |
ok | ok | parse error |
implements |
ok | ok | parse error |
interface |
ok | ok | parse error |
let |
ok | ok | parse error |
package |
ok | ok | parse error |
private |
ok | ok | parse error |
protected |
ok | ok | parse error |
public |
ok | ok | parse error |
static |
ok | ok | parse error |
yield |
ok | ok | parse error |
return |
parse error | ok | parse error |
await |
ok | ok | parse error |
Source | Script | CJS | ESM |
---|---|---|---|
this |
global | module | undefined |
var x |
global | local | local |
(function (){return this}()) |
global | global | undefined |
(function (x) {x = 1; return arguments[0];})() |
1 | 1 | undefined |
(function () {return typeof this;}).call(1) |
"object" | "object" | "number" |
var x = 0; eval('var x = 1'); x |
1 | 1 | 0 |
__filename |
global | local | global |
__dirname |
global | local | global |
require |
global | local | global |
exports |
global | local | global |
module |
global | local | global |
arguments |
global | local | global |
Source | Timing | Hoisted | Blocking |
---|---|---|---|
require('foo'); |
sync | no | yes |
import "foo"; |
untimed (async generally) | yes | yes |
import('foo'); |
async | no | no |
In all cases where only import
declarations are used import()
may be used instead. This list really is about things that do not export
values.
source-map-support
, some config scripts)import()
)vm.runInContext
)
Relevant to when import
or export
declarations may be added/removed.
package.json
npm
bin scripts have similar but not same)"-e"
, "-p"
, REPL)
node myapp.js
package.json
capabilitiesnvm
in particular"-e"
, "-p"
, REPL)
node myapp.mjs
None known yet.
Resolution in ESM is URL based and should be 100% compatible in non-error cases with the web specification. This is known to have minor differences with CJS.
A path for CJS to being using ESM modules must exist.
import()
ESM.
A path for ESM to be created that uses legacy or CJS files must exist.
import
CJS files.
module.parent
(due to parallel loading, idempotency requirement, etc.)."lodash/chunk"
default
exportIt should be possible for the ecosystem to move to be ESM only for newly written code.
@YurySolovyov can you clarify what you mean by "Are there any problems so far in that particular example?" I mean, they do act similar but aren't the same.
I mainly meant that they don't introduce ambiguity, right? Even if we'll have to parse them >1 times, we can with 100% certainty tell which is which, right?
I am not asking in this issue if ESM can fulfill a use case, I'm asking about use cases where ambiguity is possible / exists. Take for example a simple prefetching script:
import('./something-for-later-1')
import('./something-for-later-2')
import('./something-for-later-3')
import('./something-for-later-4')
Such text is either a Script, CJS, or a Module, but could rely on something pretty easily that causes it to cease functioning:
import(`${__dirname}/something-for-later-1`)
@YurySolovyov for
const fs = require('fs');
const path = require('path');
const api = function(...params) {
// ...
};
module.exports = {
api: api
};
We can make a good guess that it is CJS if we parse for require()
but not all CJS uses require()
. We need to define what mechanism are you using to perform these guesses.
It also would need to ensure no local variable named require
exists.
Also in theory that CJS can parse and eval just fine in Module.
Will the absence of import/export
be a better sign of CJS than trying to parse require
s ?
If import
and export
declarations do not exist, it could be either ESM, [Script], or CJS. It is better to say import
and export
declarations (not import()
) signifies something that currently is only able to parse in ESM.
Detecting CJS really won't work due to amount of overlap in that direction, detecting [something that is only] ESM is possible though.
So we can't just always start with ESM and then fallback to CJS if that failed?
Looking at all these parse errors in the table above, I don't think many of them are useful or just make a lot of sense in general.
I have a question though about let
, yield
, return
and await
: is this about top-level ones or what?
I can't imagine a module system where you can't use these in function's body.
No, default needs to be CJS since that is what existing backwards compat needs.
I have a question though about let, yield, return and await: is this about top-level ones or what? I can't imagine a module system where you can't use these in function's body.
return
works in CJS at top level.
let
, yield
, await
, etc. are reserved words in ESM but not in the other goals.
Looking at all these parse errors in the table above, I don't think many of them are useful or just make a lot of sense in general.
Indeed! thats why https://github.com/bmeck/UnambiguousJavaScriptGrammar went to TC39!
Eval errors are much more insidious. Too many chats going on right now for me to complete.
To note as well: the guess cannot change over time. Once you ship the guessing mechanism it will stay backwards compat (so if something guesses CJS, it will always guess CJS even 10 years from now)
Ok, I remember it was proposed at some point that we might want to have import/export
declarations as indicators for switching into ESM mode, and it didn't worked out, is that because when switching to ESM you are also switching some parsing rules?
A file need not have import
or export
to be parsed as a Module, and parsing a file as a module or a script can definitely break it if you guess wrong.
@ljharb yup, which is why it needed standardization to remove wrong guesses
@YurySolovyov Jan 2017 TC39 Notes
A file need not have import or export to be parsed as a Module, and parsing a file as a module or a script can definitely break it if you guess wrong.
What do you mean by "break" ? If the file is valid in some mode, you'll guess until you succeed, or you just report that file is invalid in all of them.
The modes have different behaviors during execution time, it is not just about guessing how to parse the file. If you guess that something is an ESM, then this
at the top level of the file for instance is undefined
, whereas if you guess that it is CSJ then this
is the exports
object, and if it's a standard script, this
is window
.
Similarly, guessing something is a module and successfully parsing it means that code will now run in strict mode, but it's just as possible that a given file is a script that will fail when executed in strict mode.
If we start with CJS, and that's indeed a module written with CJS in mind, we're ok.
If we start with CJS and it fails because of import/export
, then the module just have to have valid ESM syntax, otherwise it would fail in ESM anyway.
I don't think we should try to "make the most sense" of invalid modules.
@YurySolovyov I think the point being made is that suddenly valid ESM is being treated as CJS. The implication being, ESM and CJS ambiguity shouldn't change how code evaluates. Hence attempt to pass a standard to remove the ambiguity.
Ah, now I get it, I think.
Since so far node only had CJS, I'd expect any ambiguous code to end up in CJS mode, since that's first thing that "succeeds".
So if one wants this code to be ESM, you need to switch it manually, which I totally agree won't look very pretty. (export {}
?)
I'd like to know the use-case for such code though.
A polyfill, that is imported for side effects, that relies on implicit strict mode. import 'foo'
and require('foo');
would have very different effects unless there's a way to ensure foo's "main" is parsed deterministically as a Script or a Module. A file extension (.mjs
in this case) is the easiest and most appropriate way to describe how a file should be parsed - that's what extensions are for.
A polyfill, that is imported for side effects
Can you just import it with require
then?
Given the purpose of polyfills, I don't expect them to be imported in most of the modern envs that have proper modules.
You can also just export and invoke a function that conditionally performs polyfilling
@YurySolovyov yes but then that means the consumer has to know what kind of module it is - and you shouldn't have to know that.
@YurySolovyov that was one approach that was discussed in great depth, it means CJS permanently exists in all ESM of Node though, so no path towards a --esm-only
flag etc. It also causes upgrade problems; consider the following:
import "dep";
If the mode of "dep" must be known:
require
import
If the mode is not specified by how it is loaded:
default
exportimport
are unaffected assuming it keeps the same single default
export compatible with CJS versions.It is safe to move to ESM even if your dependencies are not known if we provide a safe facade like the single default
export approach.
We should also list other things outside of polyfills like:
sourcemap-support
the prefetch use / anything that only uses import()
is the most trouble to me
It gets into more trouble once source texts are combined when taking the parsing approach, that means in some 5000 line file, like 1337 might be an export
that changes how the whole file works. Thats a bit of a needle in the haystack to find.
You might even remove that line and accidentally change how the whole file works as well.
Or you might accidentally add an export
that swaps a CJS file. [via file concat or somesuch]
[ /me thinks of stack overflow mode poisoning w/ import/export
]
Problemspace comment I think is in a decent place now
Polyfills
My best option so far for these, is to export a function and call it to actually activate polyfill. You can also group them in separate place to reduce some noise
@YurySolovyov thats not the use case of filling in globals.
@bmeck is that also a problem even with .mjs
?
Running in wrong mode? Only in the CLI subset listed in problemspace, where there are no files
On May 12, 2017 9:05 AM, "Yury" notifications@github.com wrote:
@bmeck https://github.com/bmeck is that also a problem even with .mjs ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node-eps/issues/57#issuecomment-301085837, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOUo35vBytH6e5KsMMjPUK1d7smWZp3ks5r5GctgaJpZM4NYUQA .
Old mandates placed in problem space.
I think I will stop here as these discussions have happened for a couple years already, if you have specific questions I will answer them.
I'd like to clarify what we mean by "parsing": does that mean we'll try to get some source-code represenstation like AST and validate it, or we just "try" evaluating the source with different options? Cause in second there is a problem that source may contain side-effects and doing them twice is unacceptable.
@YurySolovyov depends on context, in general it deals with what happens when needing to decide which/calling of ParseModule or ParseScript to use. CJS uses ParseScript, but changes the top level of the file to be a FunctionBody.
To clarify, those do not evaluate code.
So, with ESM, there is no wrapper function for module like
(function(__dirname, require, module, exports /* etc. */) {
and all these are properties of the global object, right?
@YurySolovyov there is no wrapper/magic variables. Accessing those variables accesses w/e is at the global. You will see undefined
generally.
We don't yet know how we will do __dirname
and __filename
iirc.
@Fishrock123 yup, but have momentum on https://github.com/whatwg/html/issues/1013
Ok, given we try to parse in CJS mode first, what tools do we have to know if parsing is failed due to file being in ESM mode? Is it enough to just rely on "Unexpected import/export declaration" or something like that?
what tools do we have to know if parsing is failed due to file being in ESM mode
This is non-trivial / none. Just try the other one, as suggested in the parse guessing / disambiguation proposal
[edit] this is fine for now since all the current cases where CJS fails to parse ESM parses except on the import/export declaration.
[edit +1] benchmarks show doing this is not the bottleneck of loading
It does not have to be 100% accurate, if it works for common cases, the rest is just about teaching people about the rules.
From what I was able to understand, this is roughly like:
try {
parseAsCJS(source);
} catch (e) {
if (canRetryAsESM(e)) {
try {
parseAsESM(source);
} catch (fatal) {
// bail, both failed
}
}
}
It absolutely has to "not silently do the wrong thing" in 100% of cases though.
@YurySolovyov please refer to TC39 meeting notes on this exact proposal https://github.com/nodejs/node-eps/issues/57#issuecomment-300892499 in this issue
How about path with a query (as URL) in module locators for interoperability? For example:
import {a, b, c} from "./script.js?type=nomodule"
const {a, b, c} = require("./module.js?type=module");
$ node ./module.js?type=module
and {"main": "./module.js?type=module"}
)Note that with no query, import
statements are only parsed as module, and require()
expressions are only parsed as script.
@bellbind that has actually never been brought up. Need to think on this a bit, but first concerns is in ESM having different query strings produces different modules:
import "./script.js";
import "./script.js?type=module";
Would load 2 different times.
I obviously can't speak for the whole community, but it seems like a lot of people are not happy with
.mjs
.One of the main arguments to keep
.js
is that if we can detect 99% of cases where we CAN tell if is it CJS or ESM (or where we just know what to do), we may just call rest 1% edge cases and deal with it.We can even come up with some linter rules and/or workarounds to simply teach people to do the right thing.