Open weswigham opened 5 years ago
Thanks for posting this. I can see this potentially working as a solution.
A few questions:
Early on we ruled out require
of ESM, though I don’t remember why. Was it because we assumed it just wouldn’t be possible technically? Is there a spec concern with loading ESM synchronously?
If require('esm-package')
becomes possible, shouldn’t require('./file.mjs')
be possible too? How would that work?
To cover the case where a user wants to use ESM in .js
but also provide a CommonJS package for older versions of Node, you have language like “in a package with a cjs fallback for an older node, the entrypoint cannot be a .js
esm
file - however other esm
in the project can be (by placing a package.json
with {"type": "module"}
in whichever project subfolder has the esm
source beyond the entrypoint)”. This seems awfully complicated, when the alternative is simply to use a new field to define the ESM entry point. Is there a reason you’re not just going ahead and defining a new field?
I have several issues with the approach of overloading "main"
to support two modes based on extension searching (e.g. "main": "./index"
where there are index.js
and index.mjs
side by side):
src/index.js
and dist/index.js
..mjs
resolves before .js
in ESM mode. It’s much easier, in my opinion, to tell users “put the path to the ESM entry point in "module"
“ (or whatever field) than to explain “in ESM mode .mjs
is resolved before .js
...”.package.json
fields such as "browser"
and "module"
. I don’t see a reason to deviate from the UX they’re already familiar with, especially when the current model seems much more straightforward."main": "./index"
assumes we enable extension searching in ESM mode at all, which currently it’s not. So you wouldn’t be able to able to merge this design into the implementation as things stand today, unless/until our stance on extension searching changes; but we’re waiting on feedback from users in Node 12 before revisiting that, so the earliest it could change is months from now.I don’t think there’s a need to couple this proposal to the fight for enabling extension searching. Just define a new field for the ESM entry point. If/when extension searching gets enabled in ESM mode, a new PR can propose removing the new ESM entry point field in favor of main
with searching; but I would still be opposed to that because of the first three concerns I mentioned above, which remain even with extension searching enabled. But this proposal can stand on its own without needing extension searching.
packages which execute different code for cjs vs esm callers is distasteful
I don't agree; there's nothing inherently distasteful to me about this. One mitigation is making the ESM and CJS module caches shared - ie, once an extensionless specifier has a value in either module format, that's the value provided to the other.
It kind of sounds like you're proposing using C++ to make otherwise-async JS code be sync. Is that accurate?
As far as your case 2 suggestion - a package shipping .js
/.mjs
pairs would already work great as long as, in an ESM-supporting node, require('pkg')
and import 'pkg'
didn't cause two files to be executed - which might indeed relay back to your case 1 suggestion.
Hopefully we can all agree that "a single package can be published that works the same on both ESM-supporting node and older node, whether it's require
d or import
ed, without causing duplication issues/complexity for stateful or singleton modules" is a good goal. There's a number of ways to solve it; this, as well as nodejs/modules#273, are certainly two possible directions. I'd love to see us attain consensus on the goal first so that our discussions boil down to implementation approach instead of debating priorities.
As far as having a separate directory from which deep imports are resolve (a "deep root", as it were), I think that's a problem that CJS has as well, and I'd greatly prefer that node solve it for CJS and ESM just inherit that solution, rather than punishing CJS by depriving it of that kind of solution.
packages which execute different code for cjs vs esm callers is distasteful
I don’t agree; there’s nothing inherently distasteful to me about this.
I don’t have an opinion on this, other than to suggest that maybe “distasteful” might not have been the best word to choose 😄But I think it’s worth reviewing the long threads in the import file specifier resolution proposal repo that concerned how to handle the same specifier (e.g. the package name) resolving to different files in CommonJS versus ESM:
Early on we ruled out require of ESM, though I don’t remember why. Was it because we assumed it just wouldn’t be possible technically? Is there a spec concern with loading ESM synchronously?
Yes and no, respectively. require
is 100% the domain of the cjs loader and the execution of the esm
modules themselves is unaffected.
If require('esm-package') becomes possible, shouldn’t require('./file.mjs') be possible too? How would that work?
Yes - I didn't explicitly mention it, but that's implicit in the "cjs loader and mjs loader should resolve the same specifier to the same identity" assertion I added at the end. It works trivially - the .mjs
extension currently in the cjs
resolver that's defined to throw simply defers to the esm loader and syncify's the result (see impl for details as to how "syncifying" actually works, but it's similar to child_process
's execSync
, as I said before, and can ofc be iterated on to change exactly how it's done). It also implies that require('esm-package/foo')
will resolve and load foo.js
as esm
if esm-package
has "type": "module"
set in scope.
Is there a reason you’re not just going ahead and defining a new field?
Because I don't need to do so. (Less is more and so on) However were we to go with this and also finalize removing extension resolution on the main entrypoint, I would be forced to do so to still allow nodejs/modules#2 to happen, I believe.
I have several issues with the approach of overloading "main" to support two modes based on extension searching (e.g. "main": "./index" where there are index.js and index.mjs side by side):
This is purely a matter of organization and personal preference. I don't really need to or mean to pursue it in either direction here, since it's noncritical. Personally, TS has long supported both side-by-side configurations and separate entrypoints. Supporting both isn't bad, and a cjs/esm-wide exports
map while retaining extension searching in esm
would allow that, y'know.
The only thing I'm asserting here is that within the same node runtime, the same specifier should map to the same module (regardless of if the source is a cjs or esm module). That's technically not required for a synchronous require
to be useful, however I think it'd be very confusing were it not the case. I'm really just trying to assert that the cjs
loader shouldn't try to load things it knows the esm
loader should handle - instead, it should defer to it. Honestly, that we keep considering the two loaders "separate" disturbs me slightly - they are deeply interlinked and ultimately make up one cohesive system. While a lot of the esm
loader is just dedicated to managing building the esm
graph, the bits about resolution (and where user-defined loaders fit in) are less independent, IMO (and there's no spec for resolution to speak of, so nobody can really claim resolution needs to follow any spec other than "but another run-time does it different").
For example, taking my statement that specifiers within cjs
and esm
should resolve the same to its conclusion, that does mean that require.extensions
should, to one degree or another, be respected by the esm
loader (ofc, where by default every non-.mjs
mapping is a deferral to the cjs
loader's behaviors). However that point specifically I would like to pursue separately after this - from a migration stance, it's a given, however from a "match browser resolution at all costs" it's not, and so that's 100% a topic that will be an opinionated debate which I'd like to keep separate from this. It is not strictly required for a synchronous require to be useful, however I think it compounds its beneficial cjs
ecosystem effects.
It requires users to know and understand Node’s extension precedence order
Just like how they already need to know where .node
, and .json
rank among .js
? This argument is a little weak - only people using both are exposed to it, and if you're using both you have a problem that needs both to solve already, and will need to know the priority order to work with them. Plus, .mjs
is already configured to throw in cjs
- the priority is already exposed.
Users are already familiar with defining separate entry points in separate package.json fields such as "browser" and "module". I don’t see a reason to deviate from the UX they’re already familiar with, especially when the current model seems much more straightforward.
And "types"
as well. And having a secondary entrypoint may be desirable, as I said before, but I don't need it for this proposal, and I do think there's much value in supporting both separate-out-dir and side-by-side-output compilation settings. While every person here has a preference (or not), I for one am always for user choice in this area. Everyone's project has unique requirements~
"main": "./index" assumes we enable extension searching in ESM mode at all, which currently it’s not. So you wouldn’t be able to able to merge this design into the implementation as things stand today, unless/until our stance on extension searching changes; but we’re waiting on feedback from users in Node 12 before revisiting that, so the earliest it could change is months from now.
Not so. I could merge it now, and not upstream it until we're happy with feedback one way or the other. I don't see us up-streaming regularly. And again, while that's incredibly useful for supporting older versions of node, it's non-critical for compatibility within the same node runtime, which is the primary focus here. It's just stating how it's very much still possible to support older versions of node without have a package that ever exposes two identities at runtime.
I don’t think there’s a need to couple this proposal to the fight for enabling extension searching. Just define a new field for the ESM entry point. If/when extension searching gets enabled in ESM mode, a new PR can propose removing the new ESM entry point field in favor of main with searching; but I would still be opposed to that because of the first three concerns I mentioned above, which remain even with extension searching enabled. But this proposal can stand on its own without needing extension searching.
I don't need to do anything one way or the other here - it's just a matter of stating that even with this, no matter how the resolution argument pans out, there's a way forward for the back compat story. I don't need to work on either direction here (though my preferences are likely known, but since I missed the out-of-band meeting where the flag was decided on, maybe not - extension searching 4 lyfe).
I don't agree; there's nothing inherently distasteful to me about this.
Anyone who relies on prototype chains is going to have serious issues with packages that create them twice - once in cjs
and once in esm
. Similar issues exist for unique symbols. Yes, they can be worked around by making just once instance in cjs
and importing it into the esm
side via a created require function or a default import - but that becomes code you can never migrate. (And you can't go the other way, since it'd force all your exports into Promise
s) It's all in all not a great situation.
It kind of sounds like you're proposing using C++ to make otherwise-async JS code be sync. Is that accurate?
Yes, but that JS code is just JS code running within the loader itself, no user code is included, except potentially user-defined loaders, which, ofc, we'd very much like to isolate anyway (I mentioned those caveats in the OP and won't go into them again here). The sync
v async
distinction is ultimately artificial - it's all continuations being triggered by something, the only difference is weather we wait on the result before executing more code or not. And again: This is very similar to how child_process
's execSync
works in how it manipulates a secondary event loop within the same process. We talked about creating a general abstraction and unifying the two to make it nicer to work with - if this is appealing I can see putting in the work to do that.
As far as your case 2 suggestion - a package shipping .js/.mjs pairs would already work great as long as, in an ESM-supporting node, require('pkg') and import 'pkg' didn't cause two files to be executed - which might indeed relay back to your case 1 suggestion.
Yep. That's why I said it's "effectively free" - most all designs support it like that, except for some designs which remove extension resolution wholesale from the resolver (and those which do not aughta have a separate entrypoint field for esm exactly this reason).
As far as having a separate directory from which deep imports are resolve (a "deep root", as it were)
I don't think I've explicitly called anything similar to that out here, except maybe how supporting older nodes comes effectively for free with extension priorities on the main
entrypoint. In any case, I don't think it's terribly important here.
Explain dual-instantiation - this thread has the conclusion
And at the time I agreed - except I've actually done research and implementation work in the intervening time to find a better solution is quite palatable without the downsides of having the same specifier resolve to two separate identities. We've been saying that everything's subject to change for a reason ❤️
In addition, with a synchronous require
of esm
and dynamic modules, we open the door to supporting the __esModule
marker already emitted on TS and babel-compiled CJS to suppress the module-as-default behavior (and have the real default
named member as default instead, if present), and allow real drop-in upgrades for people using those toolchains (y'know, provided we don't outright break the resolver in other ways, but I'm being optimistic there). I think that's a laudable goal.
I don’t have an opinion on this, other than to suggest that maybe “distasteful” might not have been the best word to choose
I 'd call it synonymous with "problematic" with a hint of opinion on the matter, which is IMO more accurate than just saying "problematic", no? Is it really the lack of expounding on why it's distasteful in the OP the real issue, or is the word "distasteful" itself distasteful?
Unrelated to all those responses: I'm pretty sure that if require
'ing esm
had been deemed technically feasible 2-3 years ago, we'd be doing it already, no questions asked (and if I'm honest, I was only able to look into this by looking at @devsnek 's experiments in the area and limiting them a bit to specifically just syncify the resolution process - syncifying async code is not, in general, safe, but in this case it can be considered safe to do so). I don't think this is actually contentious, other than concerns over how it might impact some other stories which assume and make use of the opposite (which I've attempted to assuage here).
So besides splitting this apart from the extension searching debate, there’s another way we can split this up: into require
of ESM, period, separate from the issue of dual packages. You could make a PR that simply enables require
of specifiers that resolve as ESM, whether they’re relative files or package bare specifiers, and that could be the first PR. That might actually be a great first PR, as it would let @guybedford and @bmeck and others see the guts of how require
as ESM would work without needing to get into the details of extending it into handling dual packages.
Then once that’s merged in, you could add the code for how to handle bare specifiers that could resolve to either ESM or CommonJS. (In the current implementation they’re always either one or the other, that’s why I’m thinking that the dual part could be a second PR.) I don’t want to wait months to get a solution upstreamed for dual packages, so I would encourage you to use a second field to define the ESM entry point. We can always support "main"
instead (or in addition) if we enable extension searching by default at some point; but let’s not let this PR be blocked on that, if you don’t mind? I don’t see why it needs to be. The separate-folders case is a good enough reason on its own to support a second field, in my opinion, even if we later on also extend this to "main"
.
Re my comment on “distasteful,” I was being tongue-in-cheek (sorry for the pun). I just meant that I was assuming @ljharb responded as he did because you chose such a forceful word. I mean, he surely would’ve responded anyway, but you made it even more likely 😄
Anyway what do you think of this phased approach?
Just thought of one more thing you’ll need a good answer for: how does this work with dependencies? Both direct and dependencies of dependencies.
For example, I just pushed https://github.com/jashkenas/coffeescript/pull/5177, a PR for creating an ESM version of the CoffeeScript compiler for use in ESM-capable browsers. (Yes, I’m very proud of myself.) It’s an ESM version of only the existing CoffeeScript browser compiler, which itself has a more limited API than the Node API; the Node API has an option to enable transpilation via Babel, for example, while the browser API does not. Or looked at another way, my package’s CommonJS exports and ESM exports are different.
Maybe this isn’t all that common of a case, as most people will transpile to produce the secondary output format, but it’s not inconceivable; it might very well be exactly what I do to add ESM support to the Node CoffeeScript package, when Node’s ESM implementation is stable.
So say I did just that, and published coffeescript
to NPM with differing exports for CommonJS and ESM, and it’s October and ESM support in Node is unflagged. A user whose project depends on my package might run fine in Node 12 but crash in Node 10, because some ESM export they’re relying on in Node 12 isn’t exported by my package’s CommonJS version in Node 10. Or vice versa, they rely on a CommonJS export in Node 10 that’s not present in Node 12. If the user wanted their project to be able to run in either Node 10 or 12, how would they resolve this issue? An if
block detecting the Node version? At least in our current implementation, they could use createRequireFromPath
to explicitly include the CommonJS version of the package if they wanted to; in your version, I think aside from version-detecting Node the only other option would be to do a deep import into the package to its CommonJS entry point, and hope that that entry point doesn’t change in future versions of the package.
So that can at least be worked around without too much pain, though it would be annoying. But what about dependencies of dependencies? Like, say, Express, which has 30 dependencies. What if one of its dependencies differs in its CommonJS and ESM exports? Would the ESM version even be loaded at all in Node 12, if the top-level Express is still a CommonJS package? What if Express itself gets updated to be dual; would then the ESM versions of Express’ dependencies be loaded as ESM if possible? What if any of them have different CommonJS/ESM APIs? In short, how do we keep Express from behaving differently in Node 12 as Node 10, if any number of its dependencies might change under its feet because they suddenly are exporting different ESM files than the CommonJS files that Express is expecting? This might require that Express’s authors updated its dependencies to their latest versions without testing that Express still worked in Node 12, which is unlikely for a project as high-profile as Express but surely will happen for at least some commonly-used packages.
Edit: I know I ask others to always try to provide answers for problems they pose, and I just realized I didn’t do so for the Express case, because I can’t think of any 😄though one could argue that it’s unavoidable and still preferable to the other PR, which has some of the same issues.
Or looked at another way, my package’s CommonJS exports and ESM exports are different
You have two different API surfaces. Ship two different packages.
A user whose project depends on my package might run fine in Node 12 but crash in Node 10, because some ESM export they’re relying on in Node 12 isn’t exported by my package’s CommonJS version in Node 10.
If you have two different APIs, ship two different packages. If your dependent is reexposing you it would also be reexposing your bad practice. Two packages, one of which with an appropriate advisory "engine"
field set, would have indicated this issue in advance, and also been easy to analyse as having differing and incompatible APIs.
If the user wanted their project to be able to run in either Node 10 or 12, how would they resolve this issue?
A conditional require based on process.version
of each of two differing packages (handling differing APIs explicitly) since they have different APIs, they're not semver compatible at the same version number of the underlying package and shouldn't be in the same package.
The gist is this: If you have a cjs package, and you follow semver, your upgrade path needs to look like this (to provide the same API to all consumers at a given version):
default
). This can be done gradually in a less-breaky way by adding them in a minor version, then deprecating the namespace assignment form in a major version and removing them in another. The node versions supported ofc doesn't actually change during this process.IMO, there is no good reason a package should present two different APIs based on node version or caller module kind. Doing so feels like a violation of semver - if not explicit, in spirit at least. If different APIs are for some reason needed, the solution is always trivial - just publish an additional package with the differing APIs.
What if one of its dependencies differs in its CommonJS and ESM exports?
When they published an esm version with an incompatible API, that was a semver break and the version number should indicate that. It should not match the dependency version until it's dependents have updated to handle the differing API (which may include conditional behaviors based on the shape presented, if compat with multiple APIs is needed).
Would the ESM version even be loaded at all in Node 12, if the top-level Express is still a CommonJS package?
Anything that ships esm should be loaded as esm in runtimes that support esm.
What if Express itself gets updated to be dual; would then the ESM versions of Express’ dependencies be loaded as ESM if possible?
As much of the dependency tree as is loadable as esm is loaded as esm. Only things which only ship cjs would be loaded as cjs. On older node, only cjs would be loaded. If a dependency no longer shipped cjs, it would, ofc, be a semver major break that prevents it and its dependents from running on older versions of node.
This might require that Express’s authors updated its dependencies to their latest versions without testing that Express still worked in Node 12, which is unlikely for a project as high-profile as Express but surely will happen for at least some commonly-used packages.
A semver compatible dependency should provide a semver compatible API when a compatible version is selected - that includes consistent APIs across node versions. If it does not, it has violated the expectations of semver and needs to be handled as a misbehaving dependency (ie, by locking down it's versions more tightly and scrutinizing changes in it), same as if it had suddenly deleted or incompatibly modified any other important API of the library.
You have two different API surfaces. Ship two different packages.
So this is easy to say in the abstract, but reality is more complicated. Let me use the coffeescript
package as an example.
I want to make my package easy to use in Node 12 in an ESM environment. It’s already usable via import CoffeeScript from 'coffeescript'
, but users expect to be able to get named exports too, e.g. import { compile } from 'coffeescript'
. So I want to provide those named exports, but like most developers I want to achieve that goal with as little effort as possible. So I’ll make a new ESM entry point for my package:
import CoffeeScript from './lib/coffeescript/index.js';
const { compile } = CoffeeScript;
export { compile }
I tried this out by creating a new folder, running npm install coffeescript
, saving the above as ./node_modules/coffeescript/index.mjs
and setting "main"
in ./node_modules/coffeescript/package.json
to point to the new index.mjs
file. Now in my project folder, I can create the following:
import { compile } from 'coffeescript';
console.log(compile('alert "hello!"', {bare: true}));
And it works as expected. Done, right? That one compile
method is the entirety of my Node.js API as per the docs, aside from my loader which is called via require('coffeescript/register')
. Let’s just ignore that for the purposes of this example, as few packages include loaders.
Anyway so as a developer I think I’m done; I’ve exposed my public API in ESM. Except there’s no concept of a public API in CommonJS, at least not programmatically. If you run console.log(Object.keys(require('coffeescript')))
in RunKit you’ll see that the CommonJS 'coffeescript'
has a lot more things attached to it than just compile
; for example, helpers
, which was never intended to be public. Am I supposed to expose those in ESM, too? It’s not obvious that I need to.
In other words, it might not be obvious to developers that the CommonJS and ESM versions of their packages’ APIs need to match. Some developers might well just decide to use the default export and stop there. There’s no telling what developers will do, and you’ll need to engage in an campaign to educate them on doing dual packages right to avoid these issues.
There’s also the problem that even if the APIs match, by definition they’re not running the same code. For packages where the ESM is transpiled into CommonJS, you’re basically trusting that Babel (or whatever transpiler) is doing a faithful job, and that nothing in the user’s package falls into one of the edge cases where transpilation isn’t quite the same as native execution (like the little gotchas of subtle differences between Object.assign
and object rest/spread). So it’s always the case under this approach that require('coffeescript')
in Node 10 will be different than the same in Node 12, even if I’m not a sneaky or uninformed developer changing up my API. By definition ESM code is different.
Now of course, most of the time there won’t be an issue. (To be really cynical, I could argue that that makes this even more of an issue, as then it’s more likely to be a surprise; but I won’t go there. 😄) So the question is whether this “different code in different versions of Node” issue is worse than the “different code whether the package is used in CommonJS versus in ESM” of the other PR. I’m not sure how to answer that question; one factor surely should be how common problems are likely to occur in one versus the other, and what those problems are likely to be.
Except there’s no concept of a public API in CommonJS...
what? how is this any different from esm? i can export const _privateThing
and not document it, just as i can module.exports._privateThing
and not document it.
...which was never intended to be public
this is a problem with your package, not this proposal.
you’ll see that the CommonJS 'coffeescript' has a lot more things attached to it than just compile;
Y'all should fix that. People will come to depend on those privates if you expose them, and it'll then be a semver break to remove them. Not including them in your esm API is likewise semver breaking, since something proving the same "version" of your API isn't actually API compatible.
And I'm not talking in the abstract here - we deal with these discrepancies every day over at DefinitielyTyped. People moving from, eg, cjs to esm, do notice the small changes to an API based on environment, and do classify them as breaks, even if the library maintainer didn't. Having a consistent API across all access patterns is expected.
So the question is whether this “different code in different versions of Node” issue is worse than the “different code whether the package is used in CommonJS versus in ESM” of the other PR
Different code within the same runtime is worse. Significantly. That's a forever cost - a legacy you can't be rid of. That support of that behavior is there forever. People don't transition node versions often, and once you've crossed the boundary, that's it, you're done, you can almost never have a similar issue again (and the issue is unlikely enough to begin with for well-behaved packages), short of repeated downgrade/upgrade cycles across the esm boundary (which would imply we've done something very wrong that necessitated repeated node downgrades).
By definition ESM code is different.
Usually and ideally in unobservable minutia that very few reasonably relies on. Pulling a name out of a namespace and calling it is pretty universal. Reflective patterns are more rare and by nature require inspection of the object being reflected upon and if you have consumers whom you value and you value reflective stability in the minute details of your module's construction (why), then you many need to include that in your API contract and version appropriately. Generally, it's something pretty much no one needs to care about.
The coffeescript
package predates my involvement by several years. I didn’t design its architecture. I use it as an example just because I’m familiar with it.
I don’t need to be explained the value of a consistent API. I understand the issues. The point I’m trying to make is that in the examples I describe above, I can imagine a reasonable developer making the same mistakes that I’m tempted to make here. My point is that these mistakes might be common, and that’s an important thing to consider. The harder it is to successfully make workable dual packages, the slower the migration to ESM will be.
One way we can limit the risk of developers screwing up their migrations is to have a CommonJS dependency only receive CommonJS versions of its dependencies. In current Node, I think, if my project uses lodash@4
and request@2
, and then Request uses lodash@1
, both my project and Request each get the separate versions of Lodash that each expects. The same can happen with CommonJS/ESM, where my project would get ESM Lodash and the CommonJS Request dependency would get the CommonJS Lodash. Then at least I don’t need to worry about poorly migrated packages that are dependencies of dependences; my scope of concern is limited to my project’s direct dependencies, which I have more control over.
Different code within the same runtime is worse. Significantly. That’s a forever cost - a legacy you can’t be rid of.
It’s only a cost for as long as a package author wants to maintain both CommonJS and ESM versions of their package, as there’s no “different code within the same runtime” if a package contains only ESM sources (or only CommonJS sources, for that matter). If we backport ESM to Node 10 and 8, then a package author could stop publishing CommonJS versions of their packages by next month, when Node 6 is end-of-life. If we backport ESM to Node 10, a package author could drop CommonJS by the end of this year. The transition period could be very short. (If it’s only 12+, then packages will need to dual-publish until 12 is end-of-life in April 2021.)
I don’t think it’s reasonable to expect that node’s EOL has any impact on the ecosystem dropping support; most of my packages support down to 0.6 and will do so for the foreseeable future, and plenty of people still use node 4 in production. We need a solution that works for a long transition period; some group of people will need/want to dual-publish for a very long time regardless of node’s support period.
One way we can limit the risk of developers screwing up their migrations is to have a CommonJS dependency only receive CommonJS versions of its dependencies. In current Node, I think, if my project uses lodash@4 and request@2, and then Request uses lodash@1, both my project and Request each get the separate versions of Lodash that each expects. The same can happen with CommonJS/ESM, where my project would get ESM Lodash and the CommonJS Request dependency would get the CommonJS Lodash. Then at least I don’t need to worry about poorly migrated packages that are dependencies of dependences; my scope of concern is limited to my project’s direct dependencies, which I have more control over.
One way we can limit the risk of developers screwing up their migrations is to have a old version dependency only receive old versions versions of its dependencies. In current Node, I think, if my project uses lodash@4 and request@2, and then Request uses lodash@1, both my project and Request each get the separate versions of Lodash that each expects. The same can happen with old versions/new versions, where my project would get new Lodash and the old Request dependency would get the old Lodash. Then at least I don’t need to worry about poorly migrated packages that are dependencies of dependences; my scope of concern is limited to my project’s direct dependencies, which I have more control over.
Nothing about that situation cared about module kinds and it's why semver is used. The solution is already in wide use. Yes, people can mess it up, but the workarounds, too, already exist as well (version pinning). Nothing about that is a new problem. It's identical to major breaking changes in packages today.
My recommendation based on meeting discussions today, can we zero-in on the sync/async flattening concept and make into a meeting?
Edit: @weswigham I think there would be interest if you feel an offline meeting first is something useful towards having the PR.
Per 2019-04-10 meeting, next steps:
require
of ESM by itself, leaving aside the package parts for now.If I understood correctly this proposal, I think making require
able to deal anyhow with ESM is not a long term migration friendly decision, and it will keep instead both require
, and a third way to do things, that might stick around forever, and will need to be updated in parallel, when module source/map happens, or any future change of the standard module, happens.
Having require
in the wild to disambiguate something that has already both syntax and extensions to disambiguate, when necessary, also seems unnecessary overhead for require
itself.
We should move toward a place where require
won't exist anymore in any ESM form, if following standardization is what all this new module effort is about.
Live bindings also might confuse require
users ... so, at the end, I hope this proposal won't move forward as it stands.
Very few modules utilize live bindings (in cjs or ESM, since it’s simulable in cjs too), so i don’t think that’s really something that needs considering,
if we return the namespace object (which is what is proposed) then live bindings just work normally, so i don't think its a huge issue.
so, are you saying:
const {a, b, c} = require('./live-bindings.mjs');
will have const
that changes at distance? If that's the case, it looks like unexpected/undesired.
Anyway, live bindings were the very last point of my comment, if these work anyway, other thoughts still apply.
@WebReflection a, b, and c bindings you created would never change. only repeatedly accessing requiredOutput.a
could change.
@devsnek then live bindings don't really work, 'cause importing those instead, would carry new values once changed, right?
I still think don't see why require
should be any more complicated, and I don't see any clear goal/win of this proposal in the long term.
Moving to ESM should implicitly mean migrating from CJS, in the long term, unless impossible.
Importing .cjs
, however, would already address that.
@WebReflection it's the same as const { a } = await import('b')
vs resultOfImport.a
. the only time declarative bindings are live is with static import (import { a } from 'b'
)
Moving to ESM should implicitly mean migrating from CJS, in the long term, unless impossible.
right... this allows modules to migrate to esm regardless of how they are consumed.
the only time declarative bindings are live is with static import (import { a } from 'b')
that's what I meant, yes.
this allows modules to migrate to esm regardless of how they are consumed.
by keeping require
around? how does that help? 🤔
@WebReflection I think live bindings which are often glossed are a much bigger headache in the equation — that is with me staying neutral to this discussion, migration of pseudo-esm code is a very layered problem that maybe needs to be worked out in a separate issue (#314).
@SMotaal cjs consumers don't expect to use "live bindings" - they're a form of value-as-a-mutable-reference that actually doesn't make much sense. Take this code:
const {readFileSync} = require("fs");
callIntoOtherCode();
readFileSync("/path/to/file");
"live bindings" of readFileSync
would mean that callIntoOtherCode
could change the value of readFileSync
without the consumer knowing. TBH, even in ESM they're a hack to allow for circular references without zebra-striped execution; their behavior is incredibly out of place in JS, and we get "bug reports" about it over at TS every once in a while.
Given the above code, literally no cjs consumer would expect readFileSync
to change, irrespective of how fs
is implemented (since, well, barring runtime shenanigans, it can't). Live bindings are a language hack IMO, but beyond that they're unexpected for a cjs user. The namespace members updating (which they do) as execution goes on is the limit of what a cjs consumer expects.
FWIW I agree on the fact live bindings are a hack, all I was saying is that these are in specs, so these might surprise CJS consumer of ESM modules that use such feature here and there.
it's the same as
const { a } = await import('b')
Specially because static imports are synchronous, and so are CJS requires, it's easy to fail devs expectations.
However, since live bindings are indeed a footgun that could produce different results if imported statically or dynamically, the issue might not be too relevant. 👋
@weswigham I appreciate the rundown, I think my concern is more about how existing pseudo-esm transpilers handle going from import {readFileSync} from 'fs';
and how a seasoned CJS consumer would actually do it without magic dependencies prebuilt into their create-app.
If many (or most) create-app, and create-app relies on magic-bindings for third-party deps which are literally more like import * as x from 'x'
» const X = require('x')
and now we have crazy-bindings scenarios like X.………
which are technically not statically defined live bindings but they are not const
!
Isn't that still a practice — maybe I am outdated here :)
@devsnek can you please verify an additional concern I have
@WebReflection a, b, and c bindings you created would never change. only repeatedly accessing requiredOutput.a could change.
Is it the case the CJS modules also allow requiredOutputInModule1.b = x
then const y = requiredOutputInModule1.b
where x === y
?
If so, does that also allow const y = requiredOutputInModule2.b
where y (in module 1) === y (y in module 2)
? — please consider this in absolute technical terms here (not better practices which are better by default 😄)
Can we say that the two concepts of live bindings are fundamentally different, and the closest transpilation (at least from a named member standpoint) is that import * as requiredOutput
where requiredOutput.a
is literally the module.exports.a
?
Side note about emoting — many of us avoid them in discussions where they can distract away from actually discussing things so please don't get discouraged from any such tally.
@WebReflection live bindings per ESM spec are simply different at this point — and terms static or dynamic is misleading to characterize them for contrast effect imho.
Please note that in a spec where statically linked modules are not the only type of module record allowed — you could technically have live-bindings per the namespacing behaviour of module.exports.…
prefixing and they would exhibit the dynamic traits you are referring to in the sense that if the module.exports was to be reassigned (ie module.exports = ns2
) your bound variables would reflect accordingly against their counterparts on ns2
— but all this is not what the spec is today or what anyone knows will or should become tomorrow.
In the real world, default
breaks this ability to substitute you module.exports
after the fact, because default
is about the only thing that is ~statically~ (oops did that myself — constantly) defined next to what names the namespace actually exposes.
@Smotaal
X.……… which are technically not statically defined live bindings
X.whatever
is never a static binding... By definition it's a lookup into the X namespace every time. That's why that's what every transpilers transpiles references to destructured es6 imports to, to emulate "live bindings". As I said before, their behavior is unintuitive to many (as a local reference can be externally mutated, while otherwise only internal mutations are usually observable) and is one of the earliest footguns someone migrating to es6 might hit. ;)
Anyways, all of this about live bindings is neither here nor there - with a synchronous require
that returns the namespace object, cjs users always get a namespace (whose members may change during execution) irrespective of if they require an esm of cjs package - nothing changes there from how cjs behaves today. In fact, so long as an esm namespace has no runtime observable exoticness (not needed or anything, but as a hypothetical), it's impossible for a cjs consumer to know if the require resolved to a cjs or esm namespace without introspecting the resolution itself, since the objects are fairly substituible - this is why I keep arguing for named export support for CJS libraries, since then a CJS library can still be cjs and have named exports, then migrate to esm without breaking any consumers, weather they're already using esm or still using cjs. This is a critical feature, IMO, since it's quite impossible for of the community to upgrade to esm simultaneously. This feature (plus named exports and potential runtime recognition of the __esModule marker field) are the core expectation of current transpiler forward-compatability. With them, every transpiler user, today, can become a node esm user tomorrow, with usually no code changes (cough barring globalish API changes cough), without breaking any existing consumers of the transpiled output. That's big game and that so few people in this group respect that astounds me, considering the majority (69%!) of node devs use a transpiler of some kind, according to recent polls.[1]
@weswigham I think this is a classic case of duplicity of intent when referring to live bindings. So, a bound export, irrespective of it having a bound import somewhere is still a live binding nonetheless.
The optimizable semantics for exporting top-level declarations onto a namespace object in ESM, irrespective of implementation optimizations, purely takes the form of () => x
in the top-level scope where export { x }
occurs. That is a live binding.
The optimizable semantics for importing an exported declaration into a scope as a variable takes the form of bind x to "x" of ‹exporter›
(since there is no first-class syntax to represent this yet) where import { x } from ‹exporter›
occurs. That too is a live binding.
There is a natural bias when looking at code that is exporting versus one that is importing, but in both cases, if either or both sides are ESM, there is an live binding taking place respectively.
So, with that in mind, can we think about the execution order taking place to make an ESM namespace object available to CJS, if that namespace object circularly refers to the namespace object of the CJS module?
This goes to the core of why live bindings occur prior to execution, and why they take the form of a top-level scoped declaration, imho.
can we think about the execution order taking place to make an ESM namespace object available to CJS, if that namespace object circularly refers to the namespace object of the CJS module?
this is that "zebra striping" thing that's been mentioned a few times, and because the namespace is returned before the execution stage it's not a problem.
@devsnek, absolutely, the historical use of zebra-striping has proven reliable prior even before the native implementation of ESM in browsers. I may have not realized that layering is applicable here, or at least I inferred that it was not based on the TDZ circularity concerns that are delaying the Dynamic Modules proposal effort.
Am I correct to infer from your input that the following will not hit the TDZ issue:
In that case, the suspense is killing me to actually play with — any updates on timeline for the PR?
@weswigham and I had a brief call to discuss require(esm)
.
The main issue we focussed on was understanding the impact of require(esm)
on CommonJS modules at the point in time when a (transitive) dependency becomes irreversibly async, e.g. once a dependency is upgraded/published that first introduces a top-level await or a module is migrated from CJS to WebAssembly. We agreed that ESM consumers would not be impacted by such changes in their dependencies, because both the static and dynamic forms of import
tolerate async loading. So this is only a compatibility risk for modules using require()
.
In practice, the issue normally arises when a CJS consumer immediately uses a require()
'd value without writing error-handling for the case of the value being temporarily unpopulated. Right now, most users write code that depends on the require()
'd value being synchronously populated - check-before-use patterns are effectively non-existent.
The conclusion was that package authors will be responsible for judging whether they are likely to have transitive CJS consumers that would be broken when they make a release that first introduces irreversible async behaviour into one of their modules. If the package author wants to avoid breaking any transitive CJS consumers using their package, they can bump their package major version. Conversely, if the package author believes all their transitive users are ESM, this change is deemed non-breaking and no special action is required.
My main question to the group is: do we think this will become an ecosystem compatibility footgun?
I will offer my personal experience here, of operating a reasonably large AMD module ecosystem (~10k packages) that permits both async module evaluation (equivalent to top-level await) with both synchronous require()
and promise-returning import()
. Despite our usage of require()
being fractionally tiny, we have had a steady stream of bugs over the years where require()
fails after refactoring - when a transitive async dependency is being added or moved. Package authors have not been able to successfully predict the subtle changes in timing, and rarely bump major versions. Testing does not always expose the problem because there are many cases where the dependency graph is satisfied by luck, i.e. an async module gets loaded early via import/import()
, so by the time a require()
tries to load the same module, it can immediately and successfully return the value from the module registry. The only mitigation we found effective was to convert those require()
sites to import/import()
.
As a result, I am concerned that if we introduce this hazard on an ecosystem the size of npm, we'll see a large number of breaks over a long period of time.
As far as I understand, synchronous require()
necessarily places file access, parsing and bytecode generation/compilation, all on the main thread. I have no concerns about whether this is possible for WebAssembly, though top-level await does break things. By contrast, import
statements or dynamic import()
can all be put off the main thread, and multiple imports can be processed in parallel. The addition of ES modules could be a good opportunity to really push the ecosystem to adopt more performance-friendly patterns.
As far as I understand, synchronous require() necessarily places file access, parsing and bytecode generation/compilation, all on the main thread
False. All it does is make the main thread wait on the results, which it would be doing async anyway. Multiple threads can still be compiling simultaneously while the main thread waits for all of them.
It is quite literally only the main thread that is prevented from executing random stuff while the syncified promise chain is executing - the syncified chain can run whatever parallel/concurrent operations it wants, which the main thread will wait for completion on at the sync point.
To respond to @robpalme:
The "hazard" as described already exists in cjs today, especially given async
functions already exist in node (It is quite easy for someone to write an async main
and set exports within it), so there being a chance that a similar hazard is introduced by the introduction of TLA or wasm modules (although wasm modules as-is are likely going to be fine), doesn't seem too great too me. The fear is overblown, IMO. The fear is more of a reason, IMO, that not having a synchronous require
of esm is a hazard, since then people will fall back to interoping via import
, have a top level async IIFE to await it.... and create the same problem for their consumers, but be forced to do so for every esm module they consume, rather than only those with actual async-dependent execution.
Multiple threads can still be compiling simultaneously while the main thread waits for all of them.
How do you queue up these multiple pieces of work to be run in parallel, if all the require()
calls return before another one is made?
How do you queue up these multiple pieces of work to be run in parallel, if all the require() calls return before another one is made?
Why would one assume that work can be queued up? There's potentially arbitrary code executing between those require
s that could potentially change what's actually require
d. Code that maybe needs that module and it's features to have already executed. Batching things up just isn't the cjs model. It's nice that esm semantics allow it for an esm caller, but from cjs it's not going to happen (barring cache warming shenanigans) - and there's no reason it should. As I said, this just isn't how cjs works - thinking otherwise is trying to force the esm model onto cjs, which is the opposite of what I'm trying to do here, which is coerce esm into something cjs compatible.
Besides, it's moreso if your require
pulls in an esm module that pulls in multiple wasm modules, there won't be any issue parallelising that work off-thread then (there's no reason to even go off-thread until you have multiple things to compile simultaneously anyway, which, as stated, can only happen from a cjs caller if they import an esm module that pulls in multiple wasm modules).
I share your impression that CJS does not lend itself to batching things up. This makes it hard for me to see how you could make use of parallelism/off-main-thread work.
@littledan he's saying dependencies of things you require can be batched, not the multiple things you require.
Responding to @weswigham
It's true that in CJS you can write an async-IIFE that will assign exports in a later tick. I suspect most people who consider that pattern will understand they are creating a dangling promise hazard that may unleash zalgo.
For top-level await in ESM, it's more reasonable to use it care-free because it's an endorsed pattern with no dangling promise. So I do not think the two case are equivalent.
For top-level await in ESM, it's more reasonable to use it care-free because it's an endorsed pattern with no dangling promise
While I think TLA has important usecases in the REPL context and at application entrypoints, even a TLA that blocks module graph execution introduces sequencing hazards and potential deadlocks in the context of dynamic import - one of the TLA proposals tries to hide the issues by making the "common" case of a single closed module graph execute without issue, but it still does introduce hazards. I'm just arguing that the difference between those hazards and existing hazards isn't that much, and that the benefit of allowing the vast majority of existing modules to migrate without any kind of penalty on the ecosystem (as they do not rely on anything async during execution time, as doing so today, as you said, would be regarded with great suspicion) is totally worth it. Especially since it's only, at that time, "legacy" cjs callers who'd have to "deal with" such a hazard, anyway (and workarounds like an await-able namespace symbol can be provided). It's a much smaller hazard than, say, your entire package ecosystem crumbling around you as they outright drop support for cjs, IMO.
your entire package ecosystem crumbling around you as they outright drop support for cjs, IMO.
Things are not so black and white and there are lots of ways to continue to have legacy support, this to me feels like a non-sequiter
Take a look at this repo where i've attempted to explore one method for moving a module from CJS -> ESM in a Semver-Major without requiring interop or overloaded specifiers
CJS: https://github.com/mylesborins/node-osc ESM: https://github.com/MylesBorins/node-osc/tree/next
In the CJS implementation there is a deep import import 'node-osc/esm.mjs'
In the ESM implementation there is a deep import require('node-osc/cjs')
This works, does not leave the "ecosystem crumbling". While it may not have the same ease of use and automagic loading, it does not introduce the hazards of either dual mode or require('esm')
Needing to know the format of the thing you're importing/requiring in order to consume it, however, is a pretty big hazard to some of us (altho not, obviously, to all of us).
For every single (extensionless, to be fair) path, i think I should be able to import or require it without having to know what format it's authored in - whether it's CJS, ESM, wasm, etc.
moving a module from CJS -> ESM in a Semver-Major
Semver-Major
There are no constraints there. You can publish an entirely different package with a semver major bump. There's nothing you can't do. It's about making a backwards compatible (aka semver minor) migration path.
This is a counter-proposal to nodejs/modules#273.
A bunch of us have brought this up during informal dual-mode discussions before, but the very concept of "dual mode" packages which execute different code for cjs vs esm callers is distasteful. It pollutes a single identifier (the package name or specifier) with two distinct and potentially irreconcilable package identities (which themselves potentially contain duplicated identities and dependencies). From anything other than the perspective of the registry, it's effectively shipping two separate packages for a single runtime - one for esm callers and one for cjs callers.
The original desire for so-called "dual mode" packages derives from two uses:
cjs
consumers in the newest versions of node.In this proposal, I will tackle the two issues separately.
First, case 1: In the current implementation, a esm-authored package cannot be
require
'd. This means that you cut support for all cjs consumers when you migrate to esm. This kind of hard cut is, IMO, obviously undesirable, and both the "dual-mode" proposal and this proposal seek to remedy this. In the "dual-mode" proposal, this is solved by shipping seperate cjs code alongside the esm code, which is loaded instead of the esm code. In this proposal, the esm code itself is loaded. This means that when a package specifies that it has an entrypoint that is esm, it will only ever be loaded as esm. The astute in the crowd would note that while yes, that's all well and good, the cjs resolver is synchronous, while we've specified that the esm resolver is asynchronous - a seemingly irreconcilable difference. I arrive to tell you something: this is not so. An esm-based require may be executed async, but appear to be synchronous to the cjs caller - similarly to howchild_process.execSync
works today (and, in fact, using similar machinery). This synchronization only affects instantiation and resolution - the execution phase remains untouched, so if at some point in the future top-level await becomes a thing, depending on variant, either therequire
can conditionally return a promise (if TLA is supposed to be blocking) or happily return the exports object while the module is still asynchronously executing. The only other concern would be the observable affects on user-defined loaders, which, if we follow through on that design with out-of-process (or at least out-of-context) loaders (which are very desirable from an isolation perspective), the solution there, likewise, is simply using an apparently synchronous execution of the async loader, just as with the builtin loader. In-process loaders can also be syncified (and in fact are in my current implementation), but care must be taken to not have a user loader depend on a task which is meant to resolve in the main "thread" of execution (since it will not receive an opportunity to execute, as only the child event loop will be turned) - this means relying on a promise made before the loader was called is a no-go. This shouldn't be a problem (the builtin loader, despite being written to allow async actions, is actually fully synchronous in nature), and, again, is fully mitigated by having loaders in a new context (where they cannot possibly directly depend on such a task).By allowing
esm
to berequire
'd in cjs, we can close the difference between the esm and cjs resolvers. If the cjs resolver can resolve a file, IMO, in the esm resolver it should resolve to the same thing or issue an error - it should not be possible to get two different identities for the same specifier based on the resolver used (this implies that the esm resolver we use should be a subset of or identical to the cjs resolver). This means implementing knowledge of"type": "module"
and the like into the cjs resolver, since afaik, it's not already there (though it does already reserve .mjs).And case 2: With the above, a package can only have one runtime execution associated with it, however that only requires shipping esm. To support older versions of node, a cjs version of a package must be shipped. An answer falls out naturally from extension priorities: Older versions of
node
do not search for the.mjs
extension. Making your entrypoint a.mjs
/.js
pair (with.js
ascjs
), the.mjs
will be found by versions of node which support esm, while the.js
will be found by older versions of node. This is ofc constrained by what older versions of node already support - there's no "work to be done" to improve this, just work to ensure that it continues to work and that it does not become blocked by some other modification. naturally, this means in a package with a cjs fallback for an older node, the entrypoint cannot be a.js
esm
file - however otheresm
in the project can be (by placing apackage.json
with{"type": "module"}
in whichever project subfolder has theesm
source beyond the entrypoint). This, too, has been alluded to by many people in many other issues, yet has not yet been written down.TL;DR:
require
ofesm
can be done. Doing so allows anesm
to actually replace acjs
one, and alleviates the need for a "dual mode" resolver system. This also greatly aids the migration story (especially when paired with the dynamic modules spec, which allow for more than just thedefault
member to overlap during migration).esm
(andcjs
) resolver to be a proper subset of the cjs resolver and resolve to the same cache entry or an error for each specifier which can be represented in both (cjs obviously does not respect URLs in any way, so any URL resolution remains solely the domain of the esm loader)cjs
fallbacks for older versions of node come free with the extension searching that older versions of node already do (so long as we allow a higher priority entrypoint to be found in newer versions of node, this method of backcompat comes "free"). Without that, an alternate esm entrypoint is useful. Neither is tied to this, but it's worth mentioning that it's still easily doable.