nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
104.51k stars 28.19k forks source link

esm: Implement esm mode flag #18392

Closed guybedford closed 5 years ago

guybedford commented 6 years ago

This provides a mode: "esm" flag for package.json files which will treat ".js" files as ES modules within a given package boundary.

The package boundary of a package.json file is deemed to be all modules in that folder and subfolder until the next nested package.json file.

The plan is to add a top-level --mode flag to this, roughly matching the npm proposal, but restricted to the current package boundary. This work is currently at https://github.com/guybedford/node/tree/package-top-mode.

The mode effectively marks a directory as an "esm mode" directory, where all ".js" files in that directory are loaded as ES modules, unless they are located in a subfolder that contains a package.json without an "esm mode", which results in reverting to the transparent interop of treating ".js" as CommonJS.

In addition this PR enables caching of package.json lookups within the ES module resolver.

Presentation: https://docs.google.com/presentation/d/1xK1ana_TIxfAHX33CYVHFnJsV0If_YirLtRBBga9cv0/edit#slide=id.p

This work is based on the following prior proposal work:

Checklist
Affected core subsystem(s)

esmodules

bmeck commented 6 years ago

I have a writeup from conversations in form of a pseudo design document.

jdalton commented 6 years ago

The mode effectively marks a directory as an "esm mode" directory, where all ".js" files

What about extensionless bin files that CLI's often use?

Update

There's a mention in the writeup about --format related to extensionless but I'm not sure why a mode property couldn't cover it.

The writeup mentions a gotcha with require and extensionless files. Does this mean require can load ESM?

JacopKane commented 6 years ago

Thanks a lot for your work that might save us from a new javascript extension.

I know this is an entirely minor detail to consider right now, but I would only argue using shorthand, esm, instead of something more descriptive like esModule, ecmaModule or maybe something else but just more apparent.

bmeck commented 6 years ago

@jdalton --format is for entry points that don't have extensions at all / STDIN, it does not alter the default format of file extensions. If --mode affected those situations it would need to expand scope so that all extension-less files were treated in a specific way. This complication is why it was punted.

jdalton commented 6 years ago

@bmeck

This complication is why it was punted.

What about tackling it in the shebang of the bin file like

#!/usr/bin/env node --mode esm
bmeck commented 6 years ago

@jdalton punted and as describe in the design doc "bin" files can have extensions even if the command does not which is the common case for using command line wrappers, feel free to PR the --format flag as described if you feel strongly about needing that specific use case.

jdalton commented 6 years ago

punted and as describe in the design doc "bin" files can have extensions even if the

The bin file is opting in to esm with the --mode in the shebang. It's not executing the command. Node would parse it out. How you handle conflicting signals is a separate issue.

bmeck commented 6 years ago

@jdalton --mode is setting up file extension mapping similar to a MIME DB, --format is a very different idea which overrides something that doesn't have a well known intention already. We absolutely should not conflate these flags in my opinion.

jdalton commented 6 years ago

We absolutely should not conflate these flags in my opinion.

Having it specified in the shebang is a nice descriptive and natural way to tackle extensionless bins.

bmeck commented 6 years ago

@jdalton PR up the --format flag :)

jdalton commented 6 years ago

Naw. It doesn't have to be --format or bust.

jkrems commented 6 years ago

@jdalton One problem with putting arguments into the shebang is that it's not super portable. :( https://unix.stackexchange.com/questions/63979/shebang-line-with-usr-bin-env-command-argument-fails-on-linux

P.S.: Shipping something like a nodem or mnode shim with node might fix that though.

jdalton commented 6 years ago

Or we could try it in this PR, write tests, and see if issues are had.

jkrems commented 6 years ago

Or we could try it in a PR, write tests, and see if issues are had.

Not sure I follow - this isn't hard to verify. The following:

$ cat test.js
#!/usr/bin/env node --harmony
console.log(process.argv, process.execArgv);

Works on OSX but won't run on (all) Linux (the same without --harmony works on both):

./test.js
/usr/bin/env: node --harmony: No such file or directory
jdalton commented 6 years ago

Not sure I follow - this isn't hard to verify. The following

Many times ideas are shot down with a quick SO post without actually seeing if, in the given scenario, it is doable or an issue is able to be worked around.

Pinging @ceejbot and @chrisdickinson who had this in their npm proposal, though it was unimplemented (wondering if they ran into similar issues or if they've thought of anything to workaround the gotcha).

P.S.: Shipping something like a nodem or mnode shim with node might fix that though.

That may be it :yum:. I never considered something like that. Kinda neat!

giltayar commented 6 years ago

Question: with such a proposal, how would I write a "dual mode package", one which works whether we require the package or whether we import it. This is possible in the mjs proposal, but I don't see how you can do it in this proposal.

If we go with this proposal, all library developers that want to migrate to the esm model will need to fork their modules. So we will have moment and moment-esm. lodash and lodash-esm, uuid and uuid-esm, etc...

Given that we're going to live in dual-mode world for a long time, I believe this is not viable.

bmeck commented 6 years ago

@giltayar You would still use the .mjs approach for such a behavior. This PR is ontop of .mjs and seeks to alleviate common cases that do not need such a behavior since it is already available through .mjs. A variety of situations do not need to support dual mode packages, such as top level applications that are never meant to be treated as dependencies.

giltayar commented 6 years ago

Another question: if I am doing an import "./foo/bar/gar/qux.js", would Node need to check whethe there is a package.json in ./foo/bar/gar and then ./foo/bar and then ./foo and .? That feels like a performance hit. If it doesn't, how does it figure out what the "package boundary" of the file is?

bmeck commented 6 years ago

@giltayar yes it would stat every directory for a package.json once per process, and then it can cache it for later. It will need to do that also for other features like per package loader hooks. If startup performance is still a concern I recommend using a bundler, even for Node. This "mode" field is static, cached, and not determined by any runtime API so it should not be difficult to figure out if bundlers want to support it. It is not in a first iteration, but the design document does have a link to the fact that this "mode" might have more complex file format models as well in the future for things like Flow or JSX. Those designs have been discussed but are not in this iteration until we get more feedback on "mode".

giltayar commented 6 years ago

@bmeck: OK, so "package boundary" is a new concept in Node. Didn't exist until now, right? (this is not criticism, it's more trying to understand).

And it's the first time, IIRC, that package.json contains something that influences the running of files "underneath" it. And this concept of "package boundary" will be used in the near future by the loader mechanism for things like transpiling flow/json/ts. Interesting.

bmeck commented 6 years ago

@giltayar

OK, so "package boundary" is a new concept in Node. Didn't exist until now, right? (this is not criticism, it's more trying to understand).

Correct, it has existed in userland to some extent under differing definitions/mechanisms, but this is the first time something of its nature has been proposed to come to Node's core.

And it's the first time, IIRC, that package.json contains something that influences the running of files "underneath" it. And this concept of "package boundary" will be used in the near future by the loader mechanism for things like transpiling flow/json/ts. Interesting.

We are still hashing out those workflows details, but thats the plan. The recommendation will remain not to do heavy workflows like completely transpiling on every startup though. Loaders will want to design a cache mechanism across processes, and/or manage to ensure they can be run ahead of time if they want to avoid startup time problems.

guybedford commented 6 years ago

@MylesBorins the "mode" effectively sets how a module format is determined - using the extensionFormatEsm extension checks for "esm" mode and extensionFormatStd extension checks for the backwards compatible mode or "commonjs" or "legacy" or "interop" or whatever we want to call that.

The distinctions can evolve over time but the basic changes in "esm" mode are (1) ".js" files within that same mode boundary are treated as ES modules and (2) loading files without file extensions is not permitted.

MylesBorins commented 6 years ago

@guybedford that eventually hit me... commented as such above.

from above

I have mixed feelings about how this enables importing a .js to have different semantic meaning based on meta data in the folder. It seems ambiguous in a way that isn't statically analyzable. Does this mean that if you directly require a file by full path name we need to check the system for a package.json and parse that package.json for the mode every time?

One one hand I really like that it is configurable. We can make a default, and that default can change as the ecosystem transitions between module systems. On the other hand it feels like there may be too much magic and it is hiding the differences in a way that introduces a lot of complexity into the module resolution algorithm (it would to check and crawl up all folders until finding a package.json to know mode for this algorithm afaict).

I think I may prefer the pattern of having mode be global, more similar to the npm proposal, and forcing the use of import.meta.require for cjs modules when in the esm mode. The keeps the use of import consistent and would allow us to eventually moonlight the complexity of transparent interop at some point by changing defaults while still giving application developers a legacy mode if we don't remove the cjs mode.

With all that being said I'm not 100% how I feel about this just yet and will think on this quite a bit more and discuss with folks before landing on specific semantics. I'm not going to block anything, but I've marked this in-progress and would like chat more about the long term implications of this implementation before landing.

Thanks for all the hard work on this, it is really stellar stuff!

bmeck commented 6 years ago

@MylesBorins

One one hand I really like that it is configurable. We can make a default, and that default can change as the ecosystem transitions between module systems. On the other hand it feels like there may be too much magic and it is hiding the differences in a way that introduces a lot of complexity into the module resolution algorithm (it would to check and crawl up all folders until finding a package.json to know mode for this algorithm afaict).

It crawls up until it finds a package.json file. Usually it shouldn't be incredibly deep, but it might be deep. This behavior is also required for things like per-package hooks into the loader so perhaps isolating all of this magic into one place is a good thing?

I think I may prefer the pattern of having mode be global, more similar to the npm proposal, and forcing the use of import.meta.require for cjs modules when in the esm mode. The keeps the use of import consistent and would allow us to eventually moonlight the complexity of transparent interop at some point by changing defaults while still giving application developers a legacy mode if we don't remove the cjs mode.

Even without transparent interop and with import.meta.require we still need a flag if we want to disambiguate .js files, like the one in this proposal. We cannot let consumer be the definer of file format if we want any static analyzability without ambiguity.

BridgeAR commented 6 years ago

Ping @guybedford

bmeck commented 6 years ago

@BridgeAR I believe his is on vacation right now, what is up?

guybedford commented 6 years ago

Would it help for review and consideration if I removed the top-level --mode flag here for now? That would bring the diff down by about 150-200 lines I believe.

The main reason I bundled this in is because one of the major complaints with the previous proposal was for the scenario of node x.js where there is no package.json resulting in unnecessary statting, so the flag provides the mechanism for avoiding that, while also aligning with the npm proposal quite nicely. But it could be deferred to a separate PR if that would simplify.

bmeck commented 6 years ago

@guybedford +1 since we can do that separate PR and it seems to be a bigger talking point on this one.

targos commented 6 years ago

@guybedford If it's not too difficult to split, +1.

guybedford commented 6 years ago

Ok I've split out the features here and updated the description above (https://github.com/nodejs/node/pull/18392#issue-291868841) to reflect that this is just the package.json "mode": "esm" feature now. The separate --mode flag is at https://github.com/guybedford/node/tree/package-top-mode.

I hope that helps to review, and let me know if there's anything more I can do to help make this easier to consider.

targos commented 6 years ago

Thanks! I'm going to test it on a project and review.

targos commented 6 years ago

It looks like this is breaking CJS interop.

Given:

Current master branch prints 42. This PR throws:

Error [ERR_MODULE_RESOLUTION_LEGACY]: cjs not found by import in file:///test.mjs. Legacy behavior in require() would have found it at /node_modules/cjs/src/index.js
    at search (internal/loader/DefaultResolve.js:33:15)
    at Loader.resolve [as _resolve] (internal/loader/DefaultResolve.js:71:7)
    at Loader.resolve (internal/loader/Loader.js:49:18)
    at Loader.getModuleJob (internal/loader/Loader.js:81:40)
    at ModuleWrap.promises.module.link (internal/loader/ModuleJob.js:37:40)
    at link (internal/loader/ModuleJob.js:36:36)
    at <anonymous>
guybedford commented 6 years ago

@targos great catch, I've pushed the fix now with a test case. This was a typo missing out the boolean argument copying from https://github.com/nodejs/node/blob/master/src/module_wrap.cc#L575.

targos commented 6 years ago

After little testing, I think this is doing what I would expect!

@guybedford Could you take a look at https://github.com/image-js/tiff/commit/834c0adc6c20436d4b2a93971f779e59bae0cea3 ? Would it be the right way to update a library to support both ESM and CJS if this PR lands? If so, +1. I want that :)

bmeck commented 6 years ago

@targos that would work if you only provide a "main" entry point to your module. Yes.

guybedford commented 6 years ago

@targos glad it seems sensible to you. Nice dual-target approach - yes that works well!

guybedford commented 6 years ago

@MylesBorins I definitely agree there can be a deprecation workflow here, although I think it would be best-suited to the package manager to manage this:

  1. Packages could have "mode": "cjs" automatically inserted on install if they don't have "mode": "esm"
  2. The warning for the local package could be provided by the package manager, with an automatic fix option (which we can't do from Node really)
  3. When running init it can be standard to include "mode": "esm" this is a crucial part of making this workflow easy.

I'm adding these features to my own package manager work, and the hope is others could do the same. Then we can start to coordinate that deprecation path definitely.

guybedford commented 6 years ago

@targos much appreciated for review, and catching the inconsistencies in the test names. If we can expand that query to confirm the "mode" property is safe, would be keen to add that validation.

MylesBorins commented 6 years ago

@guybedford I guess the question for me is "Does Node intend to eventually change the default"? If the answer is yes, which I hope it is, I think the onus is on us to warn that this behavior is going to change.

guybedford commented 6 years ago

@MylesBorins certainly Node could have a warning, but if we added that today we'd get ~100~ 10s of warnings for every single npm package dependency loaded from ESM, with no easy action from the end-user apart from sending a PR. If the package manager could take the first step adding on install that will avoid this painful middle ground, but otherwise I suppose we can go for this bulldozer warning approach :P

MylesBorins commented 6 years ago

@guybedford I'm imagining this would only be a warning for the top level package. Seem reasonable?

bmeck commented 6 years ago

@MylesBorins or we could just state that there was at least 1 warning and tell them to use something like NODE_DEBUG if they want to see all of them.

MylesBorins commented 6 years ago

@bmeck not sure I'm following. Could you rephrase?

bmeck commented 6 years ago

@MylesBorins I imagine we could have:

$ node my-app
Warning: Encountered a package without a "mode" field in its package.json
use NODE_DEBUG=loader to get more details.

And only print the warning once per process, and when the flag is set:

$ NODE_DEBUG=loader node my-app
LOADER 12345: Encountered a package without a "mode" field in its package.json at file:///app/package.json
LOADER 12345: Encountered a package without a "mode" field in its package.json at file:///app/node_modules/foo/package.json
LOADER 12345: Encountered a package without a "mode" field in its package.json at file:///app/node_modules/bar/package.json
MylesBorins commented 6 years ago

@bmeck so I was imagining it being a bit more descriptive

$ node my-app
Warning: Encountered a package without a "mode" field in its package.json. This currently defaults to the `cjs` mode but will default to the `esm` mode in the future

That copy is bad, and needs work... but that is what I wanted to be explicit about.

With all this being said I'm still having mixed feelings about the mode being on a per package level for external imports. I don't think that I have totally wrapped my head around all of the edge cases though, so I'm going to keep most of that opinion to myself until I've worked it out

guybedford commented 6 years ago

Providing only the first warning could definitely make sense. I'd still be averse to implementing this if that warning could apply to a third party package without some type of greater upgrade path strategy (ideally with package manager buy-in though). Let me know if you disagree, but for now I'd say it falls out of the scope of this PR (until tangible upgrade path plans can be put in place, but they will definitely be possible).

bmeck commented 6 years ago

I'm fine punting the warning mechanics to afterwards.

MylesBorins commented 6 years ago

I'd personally like the single warning at a top level that defaults are subject to change in this PR. This is closer to a -0 than a -1, but I want to set expectations early

ceejbot commented 6 years ago

What is a package consumer supposed to do when they encounter this warning? What does the warning mean, in practical terms? What is it warning against?

What does the package author achieve by marking a directory as containing only module-parse-goal files or common-js-requireable files? What if the package author has, say, a bin dir with files intended to be used by older and newer nodes in a hybrid package? How would they silence the noise? Is an override per-file possible for hybrids?

I realize I'm new to this PR, but is there a design doc I could read outlining the overall vision & use cases we'd like to support? That would help me aim feedback usefully.

guybedford commented 6 years ago

@ceejbot thanks so much for taking a look, as mentioned in previous comments this is something that works much better together with the package manager, so it would be really great to hear your feedback further. The original proposal for this is located at https://github.com/nodejs/node-eps/pull/60/files. This mode does not affect the way CommonJS loading works and won't stop require from loading .js files marked as esm, it is only for the es module resolver.

The warnings as discussed here would simply be to ensure the mode is explicit - "mode": "cjs" is currently equivalent to not having a mode or even having an invalid mode property. Currently there are no warnings and I'm resistant to integrate any warnings until we have actionable steps for them. Directories marked as module files only affect the ES module resolver, CommonJS require doesn't change behaviour at all with this PR so dual packages work fine like in https://github.com/nodejs/node/pull/18392#issuecomment-363122412. In addition a package.json in a subdirectory can provide directory-level overrides.

The warnings are about getting to a place where the default mode can eventually (mid to long term) be switched to treating js files as ES modules. To prepare for this Node could warn for packages not containing a mode, with for example, "mode": "esm" added during npm init workflows with a prompt of "Use ES Modules for js files?" or similar and "mode": "cjs" automatically written into the package.json for installed packages when no mode is present. This is where package manager support comes in as I say. That might then allow us to eventually change the default as Myles suggests.

@MylesBorins I guess the top-level main is usually guaranteed to be a user package except in the case of running bin scripts, where the warning would then be completely unwanted. If there was a way we could at least distinguish those running cases it might be possible to add something sooner, but until all actionable steps are fully worked out I'd say we should wait.