nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.99k stars 29.8k forks source link

Proposal for single-mode packages with optional fallbacks for older versions of node #49450

Open weswigham opened 5 years ago

weswigham commented 5 years ago

This is a counter-proposal to nodejs/modules#273.

A bunch of us have brought this up during informal dual-mode discussions before, but the very concept of "dual mode" packages which execute different code for cjs vs esm callers is distasteful. It pollutes a single identifier (the package name or specifier) with two distinct and potentially irreconcilable package identities (which themselves potentially contain duplicated identities and dependencies). From anything other than the perspective of the registry, it's effectively shipping two separate packages for a single runtime - one for esm callers and one for cjs callers.

The original desire for so-called "dual mode" packages derives from two uses:

  1. The ability to support cjs consumers in the newest versions of node.
  2. The ability to retain support for older versions of node, while still supporting esm where possible.

In this proposal, I will tackle the two issues separately.

First, case 1: In the current implementation, a esm-authored package cannot be require'd. This means that you cut support for all cjs consumers when you migrate to esm. This kind of hard cut is, IMO, obviously undesirable, and both the "dual-mode" proposal and this proposal seek to remedy this. In the "dual-mode" proposal, this is solved by shipping seperate cjs code alongside the esm code, which is loaded instead of the esm code. In this proposal, the esm code itself is loaded. This means that when a package specifies that it has an entrypoint that is esm, it will only ever be loaded as esm. The astute in the crowd would note that while yes, that's all well and good, the cjs resolver is synchronous, while we've specified that the esm resolver is asynchronous - a seemingly irreconcilable difference. I arrive to tell you something: this is not so. An esm-based require may be executed async, but appear to be synchronous to the cjs caller - similarly to how child_process.execSync works today (and, in fact, using similar machinery). This synchronization only affects instantiation and resolution - the execution phase remains untouched, so if at some point in the future top-level await becomes a thing, depending on variant, either the require can conditionally return a promise (if TLA is supposed to be blocking) or happily return the exports object while the module is still asynchronously executing. The only other concern would be the observable affects on user-defined loaders, which, if we follow through on that design with out-of-process (or at least out-of-context) loaders (which are very desirable from an isolation perspective), the solution there, likewise, is simply using an apparently synchronous execution of the async loader, just as with the builtin loader. In-process loaders can also be syncified (and in fact are in my current implementation), but care must be taken to not have a user loader depend on a task which is meant to resolve in the main "thread" of execution (since it will not receive an opportunity to execute, as only the child event loop will be turned) - this means relying on a promise made before the loader was called is a no-go. This shouldn't be a problem (the builtin loader, despite being written to allow async actions, is actually fully synchronous in nature), and, again, is fully mitigated by having loaders in a new context (where they cannot possibly directly depend on such a task).

By allowing esm to be require'd in cjs, we can close the difference between the esm and cjs resolvers. If the cjs resolver can resolve a file, IMO, in the esm resolver it should resolve to the same thing or issue an error - it should not be possible to get two different identities for the same specifier based on the resolver used (this implies that the esm resolver we use should be a subset of or identical to the cjs resolver). This means implementing knowledge of "type": "module" and the like into the cjs resolver, since afaik, it's not already there (though it does already reserve .mjs).

And case 2: With the above, a package can only have one runtime execution associated with it, however that only requires shipping esm. To support older versions of node, a cjs version of a package must be shipped. An answer falls out naturally from extension priorities: Older versions of node do not search for the .mjs extension. Making your entrypoint a .mjs/.js pair (with .js as cjs), the .mjs will be found by versions of node which support esm, while the .js will be found by older versions of node. This is ofc constrained by what older versions of node already support - there's no "work to be done" to improve this, just work to ensure that it continues to work and that it does not become blocked by some other modification. naturally, this means in a package with a cjs fallback for an older node, the entrypoint cannot be a .js esm file - however other esm in the project can be (by placing a package.json with {"type": "module"} in whichever project subfolder has the esm source beyond the entrypoint). This, too, has been alluded to by many people in many other issues, yet has not yet been written down.

TL;DR:

GeoffreyBooth commented 5 years ago

Thanks for posting this. I can see this potentially working as a solution.

A few questions:

To cover the case where a user wants to use ESM in .js but also provide a CommonJS package for older versions of Node, you have language like “in a package with a cjs fallback for an older node, the entrypoint cannot be a .js esm file - however other esm in the project can be (by placing a package.json with {"type": "module"} in whichever project subfolder has the esm source beyond the entrypoint)”. This seems awfully complicated, when the alternative is simply to use a new field to define the ESM entry point. Is there a reason you’re not just going ahead and defining a new field?

I have several issues with the approach of overloading "main" to support two modes based on extension searching (e.g. "main": "./index" where there are index.js and index.mjs side by side):

I don’t think there’s a need to couple this proposal to the fight for enabling extension searching. Just define a new field for the ESM entry point. If/when extension searching gets enabled in ESM mode, a new PR can propose removing the new ESM entry point field in favor of main with searching; but I would still be opposed to that because of the first three concerns I mentioned above, which remain even with extension searching enabled. But this proposal can stand on its own without needing extension searching.

ljharb commented 5 years ago

packages which execute different code for cjs vs esm callers is distasteful

I don't agree; there's nothing inherently distasteful to me about this. One mitigation is making the ESM and CJS module caches shared - ie, once an extensionless specifier has a value in either module format, that's the value provided to the other.

It kind of sounds like you're proposing using C++ to make otherwise-async JS code be sync. Is that accurate?

As far as your case 2 suggestion - a package shipping .js/.mjs pairs would already work great as long as, in an ESM-supporting node, require('pkg') and import 'pkg' didn't cause two files to be executed - which might indeed relay back to your case 1 suggestion.

Hopefully we can all agree that "a single package can be published that works the same on both ESM-supporting node and older node, whether it's required or imported, without causing duplication issues/complexity for stateful or singleton modules" is a good goal. There's a number of ways to solve it; this, as well as nodejs/modules#273, are certainly two possible directions. I'd love to see us attain consensus on the goal first so that our discussions boil down to implementation approach instead of debating priorities.

ljharb commented 5 years ago

As far as having a separate directory from which deep imports are resolve (a "deep root", as it were), I think that's a problem that CJS has as well, and I'd greatly prefer that node solve it for CJS and ESM just inherit that solution, rather than punishing CJS by depriving it of that kind of solution.

GeoffreyBooth commented 5 years ago

packages which execute different code for cjs vs esm callers is distasteful

I don’t agree; there’s nothing inherently distasteful to me about this.

I don’t have an opinion on this, other than to suggest that maybe “distasteful” might not have been the best word to choose 😄But I think it’s worth reviewing the long threads in the import file specifier resolution proposal repo that concerned how to handle the same specifier (e.g. the package name) resolving to different files in CommonJS versus ESM:

weswigham commented 5 years ago

Early on we ruled out require of ESM, though I don’t remember why. Was it because we assumed it just wouldn’t be possible technically? Is there a spec concern with loading ESM synchronously?

Yes and no, respectively. require is 100% the domain of the cjs loader and the execution of the esm modules themselves is unaffected.

If require('esm-package') becomes possible, shouldn’t require('./file.mjs') be possible too? How would that work?

Yes - I didn't explicitly mention it, but that's implicit in the "cjs loader and mjs loader should resolve the same specifier to the same identity" assertion I added at the end. It works trivially - the .mjs extension currently in the cjs resolver that's defined to throw simply defers to the esm loader and syncify's the result (see impl for details as to how "syncifying" actually works, but it's similar to child_process's execSync, as I said before, and can ofc be iterated on to change exactly how it's done). It also implies that require('esm-package/foo') will resolve and load foo.js as esm if esm-package has "type": "module" set in scope.

Is there a reason you’re not just going ahead and defining a new field?

Because I don't need to do so. (Less is more and so on) However were we to go with this and also finalize removing extension resolution on the main entrypoint, I would be forced to do so to still allow nodejs/modules#2 to happen, I believe.

I have several issues with the approach of overloading "main" to support two modes based on extension searching (e.g. "main": "./index" where there are index.js and index.mjs side by side):

This is purely a matter of organization and personal preference. I don't really need to or mean to pursue it in either direction here, since it's noncritical. Personally, TS has long supported both side-by-side configurations and separate entrypoints. Supporting both isn't bad, and a cjs/esm-wide exports map while retaining extension searching in esm would allow that, y'know.

The only thing I'm asserting here is that within the same node runtime, the same specifier should map to the same module (regardless of if the source is a cjs or esm module). That's technically not required for a synchronous require to be useful, however I think it'd be very confusing were it not the case. I'm really just trying to assert that the cjs loader shouldn't try to load things it knows the esm loader should handle - instead, it should defer to it. Honestly, that we keep considering the two loaders "separate" disturbs me slightly - they are deeply interlinked and ultimately make up one cohesive system. While a lot of the esm loader is just dedicated to managing building the esm graph, the bits about resolution (and where user-defined loaders fit in) are less independent, IMO (and there's no spec for resolution to speak of, so nobody can really claim resolution needs to follow any spec other than "but another run-time does it different").

For example, taking my statement that specifiers within cjs and esm should resolve the same to its conclusion, that does mean that require.extensions should, to one degree or another, be respected by the esm loader (ofc, where by default every non-.mjs mapping is a deferral to the cjs loader's behaviors). However that point specifically I would like to pursue separately after this - from a migration stance, it's a given, however from a "match browser resolution at all costs" it's not, and so that's 100% a topic that will be an opinionated debate which I'd like to keep separate from this. It is not strictly required for a synchronous require to be useful, however I think it compounds its beneficial cjs ecosystem effects.

It requires users to know and understand Node’s extension precedence order

Just like how they already need to know where .node, and .json rank among .js? This argument is a little weak - only people using both are exposed to it, and if you're using both you have a problem that needs both to solve already, and will need to know the priority order to work with them. Plus, .mjs is already configured to throw in cjs - the priority is already exposed.

Users are already familiar with defining separate entry points in separate package.json fields such as "browser" and "module". I don’t see a reason to deviate from the UX they’re already familiar with, especially when the current model seems much more straightforward.

And "types" as well. And having a secondary entrypoint may be desirable, as I said before, but I don't need it for this proposal, and I do think there's much value in supporting both separate-out-dir and side-by-side-output compilation settings. While every person here has a preference (or not), I for one am always for user choice in this area. Everyone's project has unique requirements~

"main": "./index" assumes we enable extension searching in ESM mode at all, which currently it’s not. So you wouldn’t be able to able to merge this design into the implementation as things stand today, unless/until our stance on extension searching changes; but we’re waiting on feedback from users in Node 12 before revisiting that, so the earliest it could change is months from now.

Not so. I could merge it now, and not upstream it until we're happy with feedback one way or the other. I don't see us up-streaming regularly. And again, while that's incredibly useful for supporting older versions of node, it's non-critical for compatibility within the same node runtime, which is the primary focus here. It's just stating how it's very much still possible to support older versions of node without have a package that ever exposes two identities at runtime.

I don’t think there’s a need to couple this proposal to the fight for enabling extension searching. Just define a new field for the ESM entry point. If/when extension searching gets enabled in ESM mode, a new PR can propose removing the new ESM entry point field in favor of main with searching; but I would still be opposed to that because of the first three concerns I mentioned above, which remain even with extension searching enabled. But this proposal can stand on its own without needing extension searching.

I don't need to do anything one way or the other here - it's just a matter of stating that even with this, no matter how the resolution argument pans out, there's a way forward for the back compat story. I don't need to work on either direction here (though my preferences are likely known, but since I missed the out-of-band meeting where the flag was decided on, maybe not - extension searching 4 lyfe).

I don't agree; there's nothing inherently distasteful to me about this.

Anyone who relies on prototype chains is going to have serious issues with packages that create them twice - once in cjs and once in esm. Similar issues exist for unique symbols. Yes, they can be worked around by making just once instance in cjs and importing it into the esm side via a created require function or a default import - but that becomes code you can never migrate. (And you can't go the other way, since it'd force all your exports into Promises) It's all in all not a great situation.

It kind of sounds like you're proposing using C++ to make otherwise-async JS code be sync. Is that accurate?

Yes, but that JS code is just JS code running within the loader itself, no user code is included, except potentially user-defined loaders, which, ofc, we'd very much like to isolate anyway (I mentioned those caveats in the OP and won't go into them again here). The sync v async distinction is ultimately artificial - it's all continuations being triggered by something, the only difference is weather we wait on the result before executing more code or not. And again: This is very similar to how child_process's execSync works in how it manipulates a secondary event loop within the same process. We talked about creating a general abstraction and unifying the two to make it nicer to work with - if this is appealing I can see putting in the work to do that.

As far as your case 2 suggestion - a package shipping .js/.mjs pairs would already work great as long as, in an ESM-supporting node, require('pkg') and import 'pkg' didn't cause two files to be executed - which might indeed relay back to your case 1 suggestion.

Yep. That's why I said it's "effectively free" - most all designs support it like that, except for some designs which remove extension resolution wholesale from the resolver (and those which do not aughta have a separate entrypoint field for esm exactly this reason).

As far as having a separate directory from which deep imports are resolve (a "deep root", as it were)

I don't think I've explicitly called anything similar to that out here, except maybe how supporting older nodes comes effectively for free with extension priorities on the main entrypoint. In any case, I don't think it's terribly important here.

Explain dual-instantiation - this thread has the conclusion

And at the time I agreed - except I've actually done research and implementation work in the intervening time to find a better solution is quite palatable without the downsides of having the same specifier resolve to two separate identities. We've been saying that everything's subject to change for a reason ❤️ In addition, with a synchronous require of esm and dynamic modules, we open the door to supporting the __esModule marker already emitted on TS and babel-compiled CJS to suppress the module-as-default behavior (and have the real default named member as default instead, if present), and allow real drop-in upgrades for people using those toolchains (y'know, provided we don't outright break the resolver in other ways, but I'm being optimistic there). I think that's a laudable goal.

I don’t have an opinion on this, other than to suggest that maybe “distasteful” might not have been the best word to choose

I 'd call it synonymous with "problematic" with a hint of opinion on the matter, which is IMO more accurate than just saying "problematic", no? Is it really the lack of expounding on why it's distasteful in the OP the real issue, or is the word "distasteful" itself distasteful?

Unrelated to all those responses: I'm pretty sure that if require'ing esm had been deemed technically feasible 2-3 years ago, we'd be doing it already, no questions asked (and if I'm honest, I was only able to look into this by looking at @devsnek 's experiments in the area and limiting them a bit to specifically just syncify the resolution process - syncifying async code is not, in general, safe, but in this case it can be considered safe to do so). I don't think this is actually contentious, other than concerns over how it might impact some other stories which assume and make use of the opposite (which I've attempted to assuage here).

GeoffreyBooth commented 5 years ago

So besides splitting this apart from the extension searching debate, there’s another way we can split this up: into require of ESM, period, separate from the issue of dual packages. You could make a PR that simply enables require of specifiers that resolve as ESM, whether they’re relative files or package bare specifiers, and that could be the first PR. That might actually be a great first PR, as it would let @guybedford and @bmeck and others see the guts of how require as ESM would work without needing to get into the details of extending it into handling dual packages.

Then once that’s merged in, you could add the code for how to handle bare specifiers that could resolve to either ESM or CommonJS. (In the current implementation they’re always either one or the other, that’s why I’m thinking that the dual part could be a second PR.) I don’t want to wait months to get a solution upstreamed for dual packages, so I would encourage you to use a second field to define the ESM entry point. We can always support "main" instead (or in addition) if we enable extension searching by default at some point; but let’s not let this PR be blocked on that, if you don’t mind? I don’t see why it needs to be. The separate-folders case is a good enough reason on its own to support a second field, in my opinion, even if we later on also extend this to "main".

Re my comment on “distasteful,” I was being tongue-in-cheek (sorry for the pun). I just meant that I was assuming @ljharb responded as he did because you chose such a forceful word. I mean, he surely would’ve responded anyway, but you made it even more likely 😄

Anyway what do you think of this phased approach?

GeoffreyBooth commented 5 years ago

Just thought of one more thing you’ll need a good answer for: how does this work with dependencies? Both direct and dependencies of dependencies.

For example, I just pushed https://github.com/jashkenas/coffeescript/pull/5177, a PR for creating an ESM version of the CoffeeScript compiler for use in ESM-capable browsers. (Yes, I’m very proud of myself.) It’s an ESM version of only the existing CoffeeScript browser compiler, which itself has a more limited API than the Node API; the Node API has an option to enable transpilation via Babel, for example, while the browser API does not. Or looked at another way, my package’s CommonJS exports and ESM exports are different.

Maybe this isn’t all that common of a case, as most people will transpile to produce the secondary output format, but it’s not inconceivable; it might very well be exactly what I do to add ESM support to the Node CoffeeScript package, when Node’s ESM implementation is stable.

So say I did just that, and published coffeescript to NPM with differing exports for CommonJS and ESM, and it’s October and ESM support in Node is unflagged. A user whose project depends on my package might run fine in Node 12 but crash in Node 10, because some ESM export they’re relying on in Node 12 isn’t exported by my package’s CommonJS version in Node 10. Or vice versa, they rely on a CommonJS export in Node 10 that’s not present in Node 12. If the user wanted their project to be able to run in either Node 10 or 12, how would they resolve this issue? An if block detecting the Node version? At least in our current implementation, they could use createRequireFromPath to explicitly include the CommonJS version of the package if they wanted to; in your version, I think aside from version-detecting Node the only other option would be to do a deep import into the package to its CommonJS entry point, and hope that that entry point doesn’t change in future versions of the package.

So that can at least be worked around without too much pain, though it would be annoying. But what about dependencies of dependencies? Like, say, Express, which has 30 dependencies. What if one of its dependencies differs in its CommonJS and ESM exports? Would the ESM version even be loaded at all in Node 12, if the top-level Express is still a CommonJS package? What if Express itself gets updated to be dual; would then the ESM versions of Express’ dependencies be loaded as ESM if possible? What if any of them have different CommonJS/ESM APIs? In short, how do we keep Express from behaving differently in Node 12 as Node 10, if any number of its dependencies might change under its feet because they suddenly are exporting different ESM files than the CommonJS files that Express is expecting? This might require that Express’s authors updated its dependencies to their latest versions without testing that Express still worked in Node 12, which is unlikely for a project as high-profile as Express but surely will happen for at least some commonly-used packages.

Edit: I know I ask others to always try to provide answers for problems they pose, and I just realized I didn’t do so for the Express case, because I can’t think of any 😄though one could argue that it’s unavoidable and still preferable to the other PR, which has some of the same issues.

weswigham commented 5 years ago

Or looked at another way, my package’s CommonJS exports and ESM exports are different

You have two different API surfaces. Ship two different packages.

A user whose project depends on my package might run fine in Node 12 but crash in Node 10, because some ESM export they’re relying on in Node 12 isn’t exported by my package’s CommonJS version in Node 10.

If you have two different APIs, ship two different packages. If your dependent is reexposing you it would also be reexposing your bad practice. Two packages, one of which with an appropriate advisory "engine" field set, would have indicated this issue in advance, and also been easy to analyse as having differing and incompatible APIs.

If the user wanted their project to be able to run in either Node 10 or 12, how would they resolve this issue?

A conditional require based on process.version of each of two differing packages (handling differing APIs explicitly) since they have different APIs, they're not semver compatible at the same version number of the underlying package and shouldn't be in the same package.

The gist is this: If you have a cjs package, and you follow semver, your upgrade path needs to look like this (to provide the same API to all consumers at a given version):

  1. Move API away from a namespace assignment to named members, if needed (assuming we have dynamic modules, otherwise it's a bit more complex and non-obvious and you can only present members on a default). This can be done gradually in a less-breaky way by adding them in a minor version, then deprecating the namespace assignment form in a major version and removing them in another. The node versions supported ofc doesn't actually change during this process.
  2. Then, a package can introduce the esm version with the same API has the cjs version transparently - no more version bumps needed - and ship cjs providing the same API alongside (probably generated by a tool so they stay in sync). The two can run side-by-side (kept in sync) as long as desired. Older versions of node are ofc still supported, since a synchronized cjs and esm API is presented.
  3. (Optional) Remove cjs version and bump major version to indicate lack of legacy entrypoint (as I think most would consider dropping cjs support on older versions of node as "semver major").

IMO, there is no good reason a package should present two different APIs based on node version or caller module kind. Doing so feels like a violation of semver - if not explicit, in spirit at least. If different APIs are for some reason needed, the solution is always trivial - just publish an additional package with the differing APIs.

What if one of its dependencies differs in its CommonJS and ESM exports?

When they published an esm version with an incompatible API, that was a semver break and the version number should indicate that. It should not match the dependency version until it's dependents have updated to handle the differing API (which may include conditional behaviors based on the shape presented, if compat with multiple APIs is needed).

Would the ESM version even be loaded at all in Node 12, if the top-level Express is still a CommonJS package?

Anything that ships esm should be loaded as esm in runtimes that support esm.

What if Express itself gets updated to be dual; would then the ESM versions of Express’ dependencies be loaded as ESM if possible?

As much of the dependency tree as is loadable as esm is loaded as esm. Only things which only ship cjs would be loaded as cjs. On older node, only cjs would be loaded. If a dependency no longer shipped cjs, it would, ofc, be a semver major break that prevents it and its dependents from running on older versions of node.

This might require that Express’s authors updated its dependencies to their latest versions without testing that Express still worked in Node 12, which is unlikely for a project as high-profile as Express but surely will happen for at least some commonly-used packages.

A semver compatible dependency should provide a semver compatible API when a compatible version is selected - that includes consistent APIs across node versions. If it does not, it has violated the expectations of semver and needs to be handled as a misbehaving dependency (ie, by locking down it's versions more tightly and scrutinizing changes in it), same as if it had suddenly deleted or incompatibly modified any other important API of the library.

GeoffreyBooth commented 5 years ago

You have two different API surfaces. Ship two different packages.

So this is easy to say in the abstract, but reality is more complicated. Let me use the coffeescript package as an example.

I want to make my package easy to use in Node 12 in an ESM environment. It’s already usable via import CoffeeScript from 'coffeescript', but users expect to be able to get named exports too, e.g. import { compile } from 'coffeescript'. So I want to provide those named exports, but like most developers I want to achieve that goal with as little effort as possible. So I’ll make a new ESM entry point for my package:

import CoffeeScript from './lib/coffeescript/index.js';
const { compile } = CoffeeScript;
export { compile }

I tried this out by creating a new folder, running npm install coffeescript, saving the above as ./node_modules/coffeescript/index.mjs and setting "main" in ./node_modules/coffeescript/package.json to point to the new index.mjs file. Now in my project folder, I can create the following:

import { compile } from 'coffeescript';
console.log(compile('alert "hello!"', {bare: true}));

And it works as expected. Done, right? That one compile method is the entirety of my Node.js API as per the docs, aside from my loader which is called via require('coffeescript/register'). Let’s just ignore that for the purposes of this example, as few packages include loaders.

Anyway so as a developer I think I’m done; I’ve exposed my public API in ESM. Except there’s no concept of a public API in CommonJS, at least not programmatically. If you run console.log(Object.keys(require('coffeescript'))) in RunKit you’ll see that the CommonJS 'coffeescript' has a lot more things attached to it than just compile; for example, helpers, which was never intended to be public. Am I supposed to expose those in ESM, too? It’s not obvious that I need to.

In other words, it might not be obvious to developers that the CommonJS and ESM versions of their packages’ APIs need to match. Some developers might well just decide to use the default export and stop there. There’s no telling what developers will do, and you’ll need to engage in an campaign to educate them on doing dual packages right to avoid these issues.

There’s also the problem that even if the APIs match, by definition they’re not running the same code. For packages where the ESM is transpiled into CommonJS, you’re basically trusting that Babel (or whatever transpiler) is doing a faithful job, and that nothing in the user’s package falls into one of the edge cases where transpilation isn’t quite the same as native execution (like the little gotchas of subtle differences between Object.assign and object rest/spread). So it’s always the case under this approach that require('coffeescript') in Node 10 will be different than the same in Node 12, even if I’m not a sneaky or uninformed developer changing up my API. By definition ESM code is different.

Now of course, most of the time there won’t be an issue. (To be really cynical, I could argue that that makes this even more of an issue, as then it’s more likely to be a surprise; but I won’t go there. 😄) So the question is whether this “different code in different versions of Node” issue is worse than the “different code whether the package is used in CommonJS versus in ESM” of the other PR. I’m not sure how to answer that question; one factor surely should be how common problems are likely to occur in one versus the other, and what those problems are likely to be.

devsnek commented 5 years ago

Except there’s no concept of a public API in CommonJS...

what? how is this any different from esm? i can export const _privateThing and not document it, just as i can module.exports._privateThing and not document it.

...which was never intended to be public

this is a problem with your package, not this proposal.

weswigham commented 5 years ago

you’ll see that the CommonJS 'coffeescript' has a lot more things attached to it than just compile;

Y'all should fix that. People will come to depend on those privates if you expose them, and it'll then be a semver break to remove them. Not including them in your esm API is likewise semver breaking, since something proving the same "version" of your API isn't actually API compatible.

And I'm not talking in the abstract here - we deal with these discrepancies every day over at DefinitielyTyped. People moving from, eg, cjs to esm, do notice the small changes to an API based on environment, and do classify them as breaks, even if the library maintainer didn't. Having a consistent API across all access patterns is expected.

So the question is whether this “different code in different versions of Node” issue is worse than the “different code whether the package is used in CommonJS versus in ESM” of the other PR

Different code within the same runtime is worse. Significantly. That's a forever cost - a legacy you can't be rid of. That support of that behavior is there forever. People don't transition node versions often, and once you've crossed the boundary, that's it, you're done, you can almost never have a similar issue again (and the issue is unlikely enough to begin with for well-behaved packages), short of repeated downgrade/upgrade cycles across the esm boundary (which would imply we've done something very wrong that necessitated repeated node downgrades).

By definition ESM code is different.

Usually and ideally in unobservable minutia that very few reasonably relies on. Pulling a name out of a namespace and calling it is pretty universal. Reflective patterns are more rare and by nature require inspection of the object being reflected upon and if you have consumers whom you value and you value reflective stability in the minute details of your module's construction (why), then you many need to include that in your API contract and version appropriately. Generally, it's something pretty much no one needs to care about.

GeoffreyBooth commented 5 years ago

The coffeescript package predates my involvement by several years. I didn’t design its architecture. I use it as an example just because I’m familiar with it.

I don’t need to be explained the value of a consistent API. I understand the issues. The point I’m trying to make is that in the examples I describe above, I can imagine a reasonable developer making the same mistakes that I’m tempted to make here. My point is that these mistakes might be common, and that’s an important thing to consider. The harder it is to successfully make workable dual packages, the slower the migration to ESM will be.

One way we can limit the risk of developers screwing up their migrations is to have a CommonJS dependency only receive CommonJS versions of its dependencies. In current Node, I think, if my project uses lodash@4 and request@2, and then Request uses lodash@1, both my project and Request each get the separate versions of Lodash that each expects. The same can happen with CommonJS/ESM, where my project would get ESM Lodash and the CommonJS Request dependency would get the CommonJS Lodash. Then at least I don’t need to worry about poorly migrated packages that are dependencies of dependences; my scope of concern is limited to my project’s direct dependencies, which I have more control over.

Different code within the same runtime is worse. Significantly. That’s a forever cost - a legacy you can’t be rid of.

It’s only a cost for as long as a package author wants to maintain both CommonJS and ESM versions of their package, as there’s no “different code within the same runtime” if a package contains only ESM sources (or only CommonJS sources, for that matter). If we backport ESM to Node 10 and 8, then a package author could stop publishing CommonJS versions of their packages by next month, when Node 6 is end-of-life. If we backport ESM to Node 10, a package author could drop CommonJS by the end of this year. The transition period could be very short. (If it’s only 12+, then packages will need to dual-publish until 12 is end-of-life in April 2021.)

ljharb commented 5 years ago

I don’t think it’s reasonable to expect that node’s EOL has any impact on the ecosystem dropping support; most of my packages support down to 0.6 and will do so for the foreseeable future, and plenty of people still use node 4 in production. We need a solution that works for a long transition period; some group of people will need/want to dual-publish for a very long time regardless of node’s support period.

weswigham commented 5 years ago

One way we can limit the risk of developers screwing up their migrations is to have a CommonJS dependency only receive CommonJS versions of its dependencies. In current Node, I think, if my project uses lodash@4 and request@2, and then Request uses lodash@1, both my project and Request each get the separate versions of Lodash that each expects. The same can happen with CommonJS/ESM, where my project would get ESM Lodash and the CommonJS Request dependency would get the CommonJS Lodash. Then at least I don’t need to worry about poorly migrated packages that are dependencies of dependences; my scope of concern is limited to my project’s direct dependencies, which I have more control over.

One way we can limit the risk of developers screwing up their migrations is to have a old version dependency only receive old versions versions of its dependencies. In current Node, I think, if my project uses lodash@4 and request@2, and then Request uses lodash@1, both my project and Request each get the separate versions of Lodash that each expects. The same can happen with old versions/new versions, where my project would get new Lodash and the old Request dependency would get the old Lodash. Then at least I don’t need to worry about poorly migrated packages that are dependencies of dependences; my scope of concern is limited to my project’s direct dependencies, which I have more control over.

Nothing about that situation cared about module kinds and it's why semver is used. The solution is already in wide use. Yes, people can mess it up, but the workarounds, too, already exist as well (version pinning). Nothing about that is a new problem. It's identical to major breaking changes in packages today.

SMotaal commented 5 years ago

My recommendation based on meeting discussions today, can we zero-in on the sync/async flattening concept and make into a meeting?

Edit: @weswigham I think there would be interest if you feel an offline meeting first is something useful towards having the PR.

GeoffreyBooth commented 5 years ago

Per 2019-04-10 meeting, next steps:

  1. @weswigham will open a PR demonstrating require of ESM by itself, leaving aside the package parts for now.
  2. Once we have some time to review that, we’ll have an out-of-band meeting to discuss that and dual packages generally. I’ll create a doodle to organize that.
WebReflection commented 5 years ago

If I understood correctly this proposal, I think making require able to deal anyhow with ESM is not a long term migration friendly decision, and it will keep instead both require, and a third way to do things, that might stick around forever, and will need to be updated in parallel, when module source/map happens, or any future change of the standard module, happens.

Having require in the wild to disambiguate something that has already both syntax and extensions to disambiguate, when necessary, also seems unnecessary overhead for require itself.

We should move toward a place where require won't exist anymore in any ESM form, if following standardization is what all this new module effort is about.

Live bindings also might confuse require users ... so, at the end, I hope this proposal won't move forward as it stands.

ljharb commented 5 years ago

Very few modules utilize live bindings (in cjs or ESM, since it’s simulable in cjs too), so i don’t think that’s really something that needs considering,

devsnek commented 5 years ago

if we return the namespace object (which is what is proposed) then live bindings just work normally, so i don't think its a huge issue.

WebReflection commented 5 years ago

so, are you saying:

const {a, b, c} = require('./live-bindings.mjs');

will have const that changes at distance? If that's the case, it looks like unexpected/undesired.

Anyway, live bindings were the very last point of my comment, if these work anyway, other thoughts still apply.

devsnek commented 5 years ago

@WebReflection a, b, and c bindings you created would never change. only repeatedly accessing requiredOutput.a could change.

WebReflection commented 5 years ago

@devsnek then live bindings don't really work, 'cause importing those instead, would carry new values once changed, right?

I still think don't see why require should be any more complicated, and I don't see any clear goal/win of this proposal in the long term.

Moving to ESM should implicitly mean migrating from CJS, in the long term, unless impossible.

Importing .cjs, however, would already address that.

devsnek commented 5 years ago

@WebReflection it's the same as const { a } = await import('b') vs resultOfImport.a. the only time declarative bindings are live is with static import (import { a } from 'b')

Moving to ESM should implicitly mean migrating from CJS, in the long term, unless impossible.

right... this allows modules to migrate to esm regardless of how they are consumed.

WebReflection commented 5 years ago

the only time declarative bindings are live is with static import (import { a } from 'b')

that's what I meant, yes.

this allows modules to migrate to esm regardless of how they are consumed.

by keeping require around? how does that help? 🤔

SMotaal commented 5 years ago

@WebReflection I think live bindings which are often glossed are a much bigger headache in the equation — that is with me staying neutral to this discussion, migration of pseudo-esm code is a very layered problem that maybe needs to be worked out in a separate issue (#314).

weswigham commented 5 years ago

@SMotaal cjs consumers don't expect to use "live bindings" - they're a form of value-as-a-mutable-reference that actually doesn't make much sense. Take this code:

const {readFileSync} = require("fs");
callIntoOtherCode();
readFileSync("/path/to/file");

"live bindings" of readFileSync would mean that callIntoOtherCode could change the value of readFileSync without the consumer knowing. TBH, even in ESM they're a hack to allow for circular references without zebra-striped execution; their behavior is incredibly out of place in JS, and we get "bug reports" about it over at TS every once in a while.

Given the above code, literally no cjs consumer would expect readFileSync to change, irrespective of how fs is implemented (since, well, barring runtime shenanigans, it can't). Live bindings are a language hack IMO, but beyond that they're unexpected for a cjs user. The namespace members updating (which they do) as execution goes on is the limit of what a cjs consumer expects.

WebReflection commented 5 years ago

FWIW I agree on the fact live bindings are a hack, all I was saying is that these are in specs, so these might surprise CJS consumer of ESM modules that use such feature here and there.

it's the same as const { a } = await import('b')

Specially because static imports are synchronous, and so are CJS requires, it's easy to fail devs expectations.

However, since live bindings are indeed a footgun that could produce different results if imported statically or dynamically, the issue might not be too relevant. 👋

SMotaal commented 5 years ago

@weswigham I appreciate the rundown, I think my concern is more about how existing pseudo-esm transpilers handle going from import {readFileSync} from 'fs'; and how a seasoned CJS consumer would actually do it without magic dependencies prebuilt into their create-app.

If many (or most) create-app, and create-app relies on magic-bindings for third-party deps which are literally more like import * as x from 'x' » const X = require('x') and now we have crazy-bindings scenarios like X.……… which are technically not statically defined live bindings but they are not const!

Isn't that still a practice — maybe I am outdated here :)

SMotaal commented 5 years ago

@devsnek can you please verify an additional concern I have

@WebReflection a, b, and c bindings you created would never change. only repeatedly accessing requiredOutput.a could change.

Is it the case the CJS modules also allow requiredOutputInModule1.b = x then const y = requiredOutputInModule1.b where x === y?

If so, does that also allow const y = requiredOutputInModule2.b where y (in module 1) === y (y in module 2)? — please consider this in absolute technical terms here (not better practices which are better by default 😄)

Can we say that the two concepts of live bindings are fundamentally different, and the closest transpilation (at least from a named member standpoint) is that import * as requiredOutput where requiredOutput.a is literally the module.exports.a?

SMotaal commented 5 years ago

Side note about emoting — many of us avoid them in discussions where they can distract away from actually discussing things so please don't get discouraged from any such tally.

SMotaal commented 5 years ago

@WebReflection live bindings per ESM spec are simply different at this point — and terms static or dynamic is misleading to characterize them for contrast effect imho.

Please note that in a spec where statically linked modules are not the only type of module record allowed — you could technically have live-bindings per the namespacing behaviour of module.exports.… prefixing and they would exhibit the dynamic traits you are referring to in the sense that if the module.exports was to be reassigned (ie module.exports = ns2) your bound variables would reflect accordingly against their counterparts on ns2 — but all this is not what the spec is today or what anyone knows will or should become tomorrow.

In the real world, default breaks this ability to substitute you module.exports after the fact, because default is about the only thing that is ~statically~ (oops did that myself — constantly) defined next to what names the namespace actually exposes.

weswigham commented 5 years ago

@Smotaal

X.……… which are technically not statically defined live bindings

X.whatever is never a static binding... By definition it's a lookup into the X namespace every time. That's why that's what every transpilers transpiles references to destructured es6 imports to, to emulate "live bindings". As I said before, their behavior is unintuitive to many (as a local reference can be externally mutated, while otherwise only internal mutations are usually observable) and is one of the earliest footguns someone migrating to es6 might hit. ;)

Anyways, all of this about live bindings is neither here nor there - with a synchronous require that returns the namespace object, cjs users always get a namespace (whose members may change during execution) irrespective of if they require an esm of cjs package - nothing changes there from how cjs behaves today. In fact, so long as an esm namespace has no runtime observable exoticness (not needed or anything, but as a hypothetical), it's impossible for a cjs consumer to know if the require resolved to a cjs or esm namespace without introspecting the resolution itself, since the objects are fairly substituible - this is why I keep arguing for named export support for CJS libraries, since then a CJS library can still be cjs and have named exports, then migrate to esm without breaking any consumers, weather they're already using esm or still using cjs. This is a critical feature, IMO, since it's quite impossible for of the community to upgrade to esm simultaneously. This feature (plus named exports and potential runtime recognition of the __esModule marker field) are the core expectation of current transpiler forward-compatability. With them, every transpiler user, today, can become a node esm user tomorrow, with usually no code changes (cough barring globalish API changes cough), without breaking any existing consumers of the transpiled output. That's big game and that so few people in this group respect that astounds me, considering the majority (69%!) of node devs use a transpiler of some kind, according to recent polls.[1]

[1]https://nodejs.org/en/user-survey-report/

SMotaal commented 5 years ago

@weswigham I think this is a classic case of duplicity of intent when referring to live bindings. So, a bound export, irrespective of it having a bound import somewhere is still a live binding nonetheless.

The optimizable semantics for exporting top-level declarations onto a namespace object in ESM, irrespective of implementation optimizations, purely takes the form of () => x in the top-level scope where export { x } occurs. That is a live binding.

The optimizable semantics for importing an exported declaration into a scope as a variable takes the form of bind x to "x" of ‹exporter› (since there is no first-class syntax to represent this yet) where import { x } from ‹exporter› occurs. That too is a live binding.

There is a natural bias when looking at code that is exporting versus one that is importing, but in both cases, if either or both sides are ESM, there is an live binding taking place respectively.

SMotaal commented 5 years ago

So, with that in mind, can we think about the execution order taking place to make an ESM namespace object available to CJS, if that namespace object circularly refers to the namespace object of the CJS module?

This goes to the core of why live bindings occur prior to execution, and why they take the form of a top-level scoped declaration, imho.

devsnek commented 5 years ago

can we think about the execution order taking place to make an ESM namespace object available to CJS, if that namespace object circularly refers to the namespace object of the CJS module?

this is that "zebra striping" thing that's been mentioned a few times, and because the namespace is returned before the execution stage it's not a problem.

SMotaal commented 5 years ago

@devsnek, absolutely, the historical use of zebra-striping has proven reliable prior even before the native implementation of ESM in browsers. I may have not realized that layering is applicable here, or at least I inferred that it was not based on the TDZ circularity concerns that are delaying the Dynamic Modules proposal effort.

Am I correct to infer from your input that the following will not hit the TDZ issue:

  1. Circularity CJS-1 › … › ESM-n › CJS-1
  2. Circularity ESM-1 › … › CJS-n › ESM-1

In that case, the suspense is killing me to actually play with — any updates on timeline for the PR?

robpalme commented 5 years ago

require(esm)

@weswigham and I had a brief call to discuss require(esm).

The main issue we focussed on was understanding the impact of require(esm) on CommonJS modules at the point in time when a (transitive) dependency becomes irreversibly async, e.g. once a dependency is upgraded/published that first introduces a top-level await or a module is migrated from CJS to WebAssembly. We agreed that ESM consumers would not be impacted by such changes in their dependencies, because both the static and dynamic forms of import tolerate async loading. So this is only a compatibility risk for modules using require().

In practice, the issue normally arises when a CJS consumer immediately uses a require()'d value without writing error-handling for the case of the value being temporarily unpopulated. Right now, most users write code that depends on the require()'d value being synchronously populated - check-before-use patterns are effectively non-existent.

The conclusion was that package authors will be responsible for judging whether they are likely to have transitive CJS consumers that would be broken when they make a release that first introduces irreversible async behaviour into one of their modules. If the package author wants to avoid breaking any transitive CJS consumers using their package, they can bump their package major version. Conversely, if the package author believes all their transitive users are ESM, this change is deemed non-breaking and no special action is required.

Post-call thoughts

My main question to the group is: do we think this will become an ecosystem compatibility footgun?

I will offer my personal experience here, of operating a reasonably large AMD module ecosystem (~10k packages) that permits both async module evaluation (equivalent to top-level await) with both synchronous require() and promise-returning import(). Despite our usage of require() being fractionally tiny, we have had a steady stream of bugs over the years where require() fails after refactoring - when a transitive async dependency is being added or moved. Package authors have not been able to successfully predict the subtle changes in timing, and rarely bump major versions. Testing does not always expose the problem because there are many cases where the dependency graph is satisfied by luck, i.e. an async module gets loaded early via import/import(), so by the time a require() tries to load the same module, it can immediately and successfully return the value from the module registry. The only mitigation we found effective was to convert those require() sites to import/import().

As a result, I am concerned that if we introduce this hazard on an ecosystem the size of npm, we'll see a large number of breaks over a long period of time.

littledan commented 5 years ago

As far as I understand, synchronous require() necessarily places file access, parsing and bytecode generation/compilation, all on the main thread. I have no concerns about whether this is possible for WebAssembly, though top-level await does break things. By contrast, import statements or dynamic import() can all be put off the main thread, and multiple imports can be processed in parallel. The addition of ES modules could be a good opportunity to really push the ecosystem to adopt more performance-friendly patterns.

weswigham commented 5 years ago

As far as I understand, synchronous require() necessarily places file access, parsing and bytecode generation/compilation, all on the main thread

False. All it does is make the main thread wait on the results, which it would be doing async anyway. Multiple threads can still be compiling simultaneously while the main thread waits for all of them.

weswigham commented 5 years ago

It is quite literally only the main thread that is prevented from executing random stuff while the syncified promise chain is executing - the syncified chain can run whatever parallel/concurrent operations it wants, which the main thread will wait for completion on at the sync point.

weswigham commented 5 years ago

To respond to @robpalme:

The "hazard" as described already exists in cjs today, especially given async functions already exist in node (It is quite easy for someone to write an async main and set exports within it), so there being a chance that a similar hazard is introduced by the introduction of TLA or wasm modules (although wasm modules as-is are likely going to be fine), doesn't seem too great too me. The fear is overblown, IMO. The fear is more of a reason, IMO, that not having a synchronous require of esm is a hazard, since then people will fall back to interoping via import, have a top level async IIFE to await it.... and create the same problem for their consumers, but be forced to do so for every esm module they consume, rather than only those with actual async-dependent execution.

littledan commented 5 years ago

Multiple threads can still be compiling simultaneously while the main thread waits for all of them.

How do you queue up these multiple pieces of work to be run in parallel, if all the require() calls return before another one is made?

weswigham commented 5 years ago

How do you queue up these multiple pieces of work to be run in parallel, if all the require() calls return before another one is made?

Why would one assume that work can be queued up? There's potentially arbitrary code executing between those requires that could potentially change what's actually required. Code that maybe needs that module and it's features to have already executed. Batching things up just isn't the cjs model. It's nice that esm semantics allow it for an esm caller, but from cjs it's not going to happen (barring cache warming shenanigans) - and there's no reason it should. As I said, this just isn't how cjs works - thinking otherwise is trying to force the esm model onto cjs, which is the opposite of what I'm trying to do here, which is coerce esm into something cjs compatible.

Besides, it's moreso if your require pulls in an esm module that pulls in multiple wasm modules, there won't be any issue parallelising that work off-thread then (there's no reason to even go off-thread until you have multiple things to compile simultaneously anyway, which, as stated, can only happen from a cjs caller if they import an esm module that pulls in multiple wasm modules).

littledan commented 5 years ago

I share your impression that CJS does not lend itself to batching things up. This makes it hard for me to see how you could make use of parallelism/off-main-thread work.

devsnek commented 5 years ago

@littledan he's saying dependencies of things you require can be batched, not the multiple things you require.

robpalme commented 5 years ago

Responding to @weswigham

It's true that in CJS you can write an async-IIFE that will assign exports in a later tick. I suspect most people who consider that pattern will understand they are creating a dangling promise hazard that may unleash zalgo.

For top-level await in ESM, it's more reasonable to use it care-free because it's an endorsed pattern with no dangling promise. So I do not think the two case are equivalent.

weswigham commented 5 years ago

For top-level await in ESM, it's more reasonable to use it care-free because it's an endorsed pattern with no dangling promise

While I think TLA has important usecases in the REPL context and at application entrypoints, even a TLA that blocks module graph execution introduces sequencing hazards and potential deadlocks in the context of dynamic import - one of the TLA proposals tries to hide the issues by making the "common" case of a single closed module graph execute without issue, but it still does introduce hazards. I'm just arguing that the difference between those hazards and existing hazards isn't that much, and that the benefit of allowing the vast majority of existing modules to migrate without any kind of penalty on the ecosystem (as they do not rely on anything async during execution time, as doing so today, as you said, would be regarded with great suspicion) is totally worth it. Especially since it's only, at that time, "legacy" cjs callers who'd have to "deal with" such a hazard, anyway (and workarounds like an await-able namespace symbol can be provided). It's a much smaller hazard than, say, your entire package ecosystem crumbling around you as they outright drop support for cjs, IMO.

MylesBorins commented 5 years ago

your entire package ecosystem crumbling around you as they outright drop support for cjs, IMO.

Things are not so black and white and there are lots of ways to continue to have legacy support, this to me feels like a non-sequiter

Take a look at this repo where i've attempted to explore one method for moving a module from CJS -> ESM in a Semver-Major without requiring interop or overloaded specifiers

CJS: https://github.com/mylesborins/node-osc ESM: https://github.com/MylesBorins/node-osc/tree/next

In the CJS implementation there is a deep import import 'node-osc/esm.mjs' In the ESM implementation there is a deep import require('node-osc/cjs')

This works, does not leave the "ecosystem crumbling". While it may not have the same ease of use and automagic loading, it does not introduce the hazards of either dual mode or require('esm')

ljharb commented 5 years ago

Needing to know the format of the thing you're importing/requiring in order to consume it, however, is a pretty big hazard to some of us (altho not, obviously, to all of us).

For every single (extensionless, to be fair) path, i think I should be able to import or require it without having to know what format it's authored in - whether it's CJS, ESM, wasm, etc.

weswigham commented 5 years ago

moving a module from CJS -> ESM in a Semver-Major

Semver-Major

There are no constraints there. You can publish an entirely different package with a semver major bump. There's nothing you can't do. It's about making a backwards compatible (aka semver minor) migration path.