nodejs / node-eps

Node.js Enhancement Proposals for discussion on future API additions/changes to Node core
443 stars 66 forks source link

.mjs extension trade-offs revision #57

Closed YurySolovyov closed 6 years ago

YurySolovyov commented 7 years ago

I obviously can't speak for the whole community, but it seems like a lot of people are not happy with .mjs.

One of the main arguments to keep .js is that if we can detect 99% of cases where we CAN tell if is it CJS or ESM (or where we just know what to do), we may just call rest 1% edge cases and deal with it.

We can even come up with some linter rules and/or workarounds to simply teach people to do the right thing.

bmeck commented 7 years ago

I want to start off with the 99% fallacy here. We are talking about module graphs. If 1% of your graph is wrong it can affect 100% of your graph. Let us keep that in mind.

Lets also try and enumerate where things differ between CJS, Script, and ESM whenever we talk about the problematic situations.

Lets also try and enumerate use cases where those situation might occur whenever we accept or dismiss them as being valid or invalid.

YurySolovyov commented 7 years ago

Let's start with simple stuff. I think one of the most common patterns in node is to require() a bunch of core modules and export either one object with some API, or just export a plain function:

// Full CJS
const fs = require('fs');
const path = require('path');

const api = function(...params) {
  // ... 
};

module.exports = {
  api: api
};

Which in ESM I guess translates to something like:

// Full ESM
import fs from 'fs';
import path from 'path';

const api = function(...params) {
  // ... 
};

export {
  api: api
};

Are there any problems so far in that particular example?

bmeck commented 7 years ago

Parsing (focus on when ESM is not a parse error but others are):

Source Script CJS ESM
var arguments global local parse error
var eval global local parse error
import("x") ok ok ok
import "x"; parse error parse error ok
export {}; parse error parse error ok
with({}){} ok ok parse error
<!--\n ok ok parse error
-->\n ok ok parse error
0777 ok ok parse error
delete x ok ok parse error
try {} catch (eval) {} ok ok parse error
try {} catch (arguments) {} ok ok parse error
(function (_, _) {}) ok ok parse error
eval = eval ok ok parse error
arguments = [] ok ok parse error
implements ok ok parse error
interface ok ok parse error
let ok ok parse error
package ok ok parse error
private ok ok parse error
protected ok ok parse error
public ok ok parse error
static ok ok parse error
yield ok ok parse error
return parse error ok parse error
await ok ok parse error

Eval differences (overlap parses and runs, but diff results)

Source Script CJS ESM
this global module undefined
var x global local local
(function (){return this}()) global global undefined
(function (x) {x = 1; return arguments[0];})() 1 1 undefined
(function () {return typeof this;}).call(1) "object" "object" "number"
var x = 0; eval('var x = 1'); x 1 1 0
__filename global local global
__dirname global local global
require global local global
exports global local global
module global local global
arguments global local global

Timing differences (not a problem, listed for posterity)

Source Timing Hoisted Blocking
require('foo'); sync no yes
import "foo"; untimed (async generally) yes yes
import('foo'); async no no

Potential relevant use cases w/o ESM specific declarations:

In all cases where only import declarations are used import() may be used instead. This list really is about things that do not export values.

Potential relevant use cases that may combine source texts

Relevant to when import or export declarations may be added/removed.

Potential relevant use cases w/o package.json

Potential tooling that lacks package.json capabilities

Potential relevant use cases w/o a file extension

Potential tooling that lacks file extension capabilities

None known yet.

Mandates

Resolution

Resolution in ESM is URL based and should be 100% compatible in non-error cases with the web specification. This is known to have minor differences with CJS.

Forward Path

A path for CJS to being using ESM modules must exist.

  1. It must be possible for CJS files to import() ESM.
    • Since this is available in the Script goal, this is automatically fulfilled.

Backwards Path

A path for ESM to be created that uses legacy or CJS files must exist.

  1. It must be possible to import CJS files.
    • Some things such as Node core are unable to be upgraded to ESM due to design or usage differences. Other Modules such as meow use things that are not present/do not translate into ESM like module.parent (due to parallel loading, idempotency requirement, etc.).
    • Some things like "deep" linking is in use and necessary to support like "lodash/chunk"
  2. There must be an upgrade path for ESM to be safe against CJS becoming ESM.
    • Generally achieved by ensuring CJS files are a facade with only a single default export
  3. Mixed mode situations (both ESM and CJS in same app/package) must be supported. Note: again, there may be cases where code can never be updated to ESM.

ESM only future

It should be possible for the ecosystem to move to be ESM only for newly written code.

  1. Whatever path is taken, it should be considered debt if files are still easily or accidentally able to be CJS.
  2. Whatever path is taken, it should be ready for the so called "3rd goal" problem.
    • Relevant to WASM that is looking to use same stuff as JS
    • Could be relevant with other things that have been proposed like a "use pure" situation if syntax or semantic changes are required.
bmeck commented 7 years ago

@YurySolovyov can you clarify what you mean by "Are there any problems so far in that particular example?" I mean, they do act similar but aren't the same.

YurySolovyov commented 7 years ago

I mainly meant that they don't introduce ambiguity, right? Even if we'll have to parse them >1 times, we can with 100% certainty tell which is which, right?

bmeck commented 7 years ago

I am not asking in this issue if ESM can fulfill a use case, I'm asking about use cases where ambiguity is possible / exists. Take for example a simple prefetching script:

import('./something-for-later-1')
import('./something-for-later-2')
import('./something-for-later-3')
import('./something-for-later-4')

Such text is either a Script, CJS, or a Module, but could rely on something pretty easily that causes it to cease functioning:

import(`${__dirname}/something-for-later-1`)
bmeck commented 7 years ago

@YurySolovyov for

const fs = require('fs');
const path = require('path');

const api = function(...params) {
  // ... 
};

module.exports = {
  api: api
};

We can make a good guess that it is CJS if we parse for require() but not all CJS uses require(). We need to define what mechanism are you using to perform these guesses.

bmeck commented 7 years ago

It also would need to ensure no local variable named require exists.

bmeck commented 7 years ago

Also in theory that CJS can parse and eval just fine in Module.

YurySolovyov commented 7 years ago

Will the absence of import/export be a better sign of CJS than trying to parse requires ?

bmeck commented 7 years ago

If import and export declarations do not exist, it could be either ESM, [Script], or CJS. It is better to say import and export declarations (not import()) signifies something that currently is only able to parse in ESM.

bmeck commented 7 years ago

Detecting CJS really won't work due to amount of overlap in that direction, detecting [something that is only] ESM is possible though.

YurySolovyov commented 7 years ago

So we can't just always start with ESM and then fallback to CJS if that failed?

Looking at all these parse errors in the table above, I don't think many of them are useful or just make a lot of sense in general.

I have a question though about let, yield, return and await: is this about top-level ones or what? I can't imagine a module system where you can't use these in function's body.

bmeck commented 7 years ago

No, default needs to be CJS since that is what existing backwards compat needs.

I have a question though about let, yield, return and await: is this about top-level ones or what? I can't imagine a module system where you can't use these in function's body.

return works in CJS at top level.

let, yield, await, etc. are reserved words in ESM but not in the other goals.

Looking at all these parse errors in the table above, I don't think many of them are useful or just make a lot of sense in general.

Indeed! thats why https://github.com/bmeck/UnambiguousJavaScriptGrammar went to TC39!

Eval errors are much more insidious. Too many chats going on right now for me to complete.

bmeck commented 7 years ago

To note as well: the guess cannot change over time. Once you ship the guessing mechanism it will stay backwards compat (so if something guesses CJS, it will always guess CJS even 10 years from now)

YurySolovyov commented 7 years ago

Ok, I remember it was proposed at some point that we might want to have import/export declarations as indicators for switching into ESM mode, and it didn't worked out, is that because when switching to ESM you are also switching some parsing rules?

ljharb commented 7 years ago

A file need not have import or export to be parsed as a Module, and parsing a file as a module or a script can definitely break it if you guess wrong.

bmeck commented 7 years ago

@ljharb yup, which is why it needed standardization to remove wrong guesses

@YurySolovyov Jan 2017 TC39 Notes

YurySolovyov commented 7 years ago

A file need not have import or export to be parsed as a Module, and parsing a file as a module or a script can definitely break it if you guess wrong.

What do you mean by "break" ? If the file is valid in some mode, you'll guess until you succeed, or you just report that file is invalid in all of them.

loganfsmyth commented 7 years ago

The modes have different behaviors during execution time, it is not just about guessing how to parse the file. If you guess that something is an ESM, then this at the top level of the file for instance is undefined, whereas if you guess that it is CSJ then this is the exports object, and if it's a standard script, this is window.

Similarly, guessing something is a module and successfully parsing it means that code will now run in strict mode, but it's just as possible that a given file is a script that will fail when executed in strict mode.

YurySolovyov commented 7 years ago

If we start with CJS, and that's indeed a module written with CJS in mind, we're ok. If we start with CJS and it fails because of import/export, then the module just have to have valid ESM syntax, otherwise it would fail in ESM anyway.

I don't think we should try to "make the most sense" of invalid modules.

bmeck commented 7 years ago

@YurySolovyov I think the point being made is that suddenly valid ESM is being treated as CJS. The implication being, ESM and CJS ambiguity shouldn't change how code evaluates. Hence attempt to pass a standard to remove the ambiguity.

YurySolovyov commented 7 years ago

Ah, now I get it, I think. Since so far node only had CJS, I'd expect any ambiguous code to end up in CJS mode, since that's first thing that "succeeds". So if one wants this code to be ESM, you need to switch it manually, which I totally agree won't look very pretty. (export {} ?) I'd like to know the use-case for such code though.

ljharb commented 7 years ago

A polyfill, that is imported for side effects, that relies on implicit strict mode. import 'foo' and require('foo'); would have very different effects unless there's a way to ensure foo's "main" is parsed deterministically as a Script or a Module. A file extension (.mjs in this case) is the easiest and most appropriate way to describe how a file should be parsed - that's what extensions are for.

YurySolovyov commented 7 years ago

A polyfill, that is imported for side effects

Can you just import it with require then? Given the purpose of polyfills, I don't expect them to be imported in most of the modern envs that have proper modules. You can also just export and invoke a function that conditionally performs polyfilling

ljharb commented 7 years ago

@YurySolovyov yes but then that means the consumer has to know what kind of module it is - and you shouldn't have to know that.

bmeck commented 7 years ago

@YurySolovyov that was one approach that was discussed in great depth, it means CJS permanently exists in all ESM of Node though, so no path towards a --esm-only flag etc. It also causes upgrade problems; consider the following:

import "dep";

If the mode of "dep" must be known:

If the mode is not specified by how it is loaded:

It is safe to move to ESM even if your dependencies are not known if we provide a safe facade like the single default export approach.

bmeck commented 7 years ago

moved into above

We should also list other things outside of polyfills like:

the prefetch use / anything that only uses import() is the most trouble to me

bmeck commented 7 years ago

It gets into more trouble once source texts are combined when taking the parsing approach, that means in some 5000 line file, like 1337 might be an export that changes how the whole file works. Thats a bit of a needle in the haystack to find.

You might even remove that line and accidentally change how the whole file works as well.

Or you might accidentally add an export that swaps a CJS file. [via file concat or somesuch]

[ /me thinks of stack overflow mode poisoning w/ import/export ]

bmeck commented 7 years ago

Problemspace comment I think is in a decent place now

YurySolovyov commented 7 years ago

Polyfills

My best option so far for these, is to export a function and call it to actually activate polyfill. You can also group them in separate place to reduce some noise

bmeck commented 7 years ago

@YurySolovyov thats not the use case of filling in globals.

YurySolovyov commented 7 years ago

@bmeck is that also a problem even with .mjs ?

bmeck commented 7 years ago

Running in wrong mode? Only in the CLI subset listed in problemspace, where there are no files

On May 12, 2017 9:05 AM, "Yury" notifications@github.com wrote:

@bmeck https://github.com/bmeck is that also a problem even with .mjs ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node-eps/issues/57#issuecomment-301085837, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOUo35vBytH6e5KsMMjPUK1d7smWZp3ks5r5GctgaJpZM4NYUQA .

bmeck commented 7 years ago

Old mandates placed in problem space.

bmeck commented 7 years ago

I think I will stop here as these discussions have happened for a couple years already, if you have specific questions I will answer them.

YurySolovyov commented 7 years ago

I'd like to clarify what we mean by "parsing": does that mean we'll try to get some source-code represenstation like AST and validate it, or we just "try" evaluating the source with different options? Cause in second there is a problem that source may contain side-effects and doing them twice is unacceptable.

bmeck commented 7 years ago

@YurySolovyov depends on context, in general it deals with what happens when needing to decide which/calling of ParseModule or ParseScript to use. CJS uses ParseScript, but changes the top level of the file to be a FunctionBody.

bmeck commented 7 years ago

To clarify, those do not evaluate code.

YurySolovyov commented 7 years ago

So, with ESM, there is no wrapper function for module like

(function(__dirname, require, module, exports /* etc. */) {

and all these are properties of the global object, right?

bmeck commented 7 years ago

@YurySolovyov there is no wrapper/magic variables. Accessing those variables accesses w/e is at the global. You will see undefined generally.

Fishrock123 commented 7 years ago

We don't yet know how we will do __dirname and __filename iirc.

bmeck commented 7 years ago

@Fishrock123 yup, but have momentum on https://github.com/whatwg/html/issues/1013

YurySolovyov commented 7 years ago

Ok, given we try to parse in CJS mode first, what tools do we have to know if parsing is failed due to file being in ESM mode? Is it enough to just rely on "Unexpected import/export declaration" or something like that?

bmeck commented 7 years ago

what tools do we have to know if parsing is failed due to file being in ESM mode

This is non-trivial / none. Just try the other one, as suggested in the parse guessing / disambiguation proposal

[edit] this is fine for now since all the current cases where CJS fails to parse ESM parses except on the import/export declaration.

[edit +1] benchmarks show doing this is not the bottleneck of loading

YurySolovyov commented 7 years ago

It does not have to be 100% accurate, if it works for common cases, the rest is just about teaching people about the rules.

From what I was able to understand, this is roughly like:

try {
  parseAsCJS(source);
} catch (e) {
  if (canRetryAsESM(e)) {
    try {
      parseAsESM(source);
    } catch (fatal) {
      // bail, both failed
    } 
  }
} 
ljharb commented 7 years ago

It absolutely has to "not silently do the wrong thing" in 100% of cases though.

bmeck commented 7 years ago

@YurySolovyov please refer to TC39 meeting notes on this exact proposal https://github.com/nodejs/node-eps/issues/57#issuecomment-300892499 in this issue

bellbind commented 7 years ago

How about path with a query (as URL) in module locators for interoperability? For example:

Note that with no query, import statements are only parsed as module, and require() expressions are only parsed as script.

bmeck commented 7 years ago

@bellbind that has actually never been brought up. Need to think on this a bit, but first concerns is in ESM having different query strings produces different modules:

import "./script.js";
import "./script.js?type=module";

Would load 2 different times.