Rediscussion for out-of-order CJS exec?

guybedford commented 4 years ago

I'd like to reopen the discussion of out-of-order CJS exec one last time, given how much criticism there has been over the lack of named exports for CommonJS in ESM.

I can revive a former PR for this which is still sitting around somewhere no problem.

The two "problematic semantics" as far as this group is concerned are:

CJS executing during the instantiation phase instead of during the execute phase.
Because of the getter triggers I would also prefer to move the "wrapper creation" to be lazily done on the first import, instead of for every CJS module. That is, doing the exports snapshot at import time, even if it was required earlier. The reason being that otherwise an app that loads 100s of CommonJS modules snapshotting all of those on startup will both be slower and also may trigger lots of unnecessary warning and error side effects.

I have no problem with either of these semantics and never have.

Will other members of this group finally reconsider their positions in the name of practicality for users?

This scenario is definitely one of the biggest catches for users that we can still change. It could be worth re-hashing this old number even if that's a small possibility.

giltayar commented 4 years ago

I seem to remember a PR where you implemented a "golden middle": during the binding phase, Node assumes that the exports are as the imports defined them, leaves the execution to the execution phase, and if during the execution it finds that the imports were not the same as the exports, throws an exception.

That would sound to me like an (almost) perfect solution.

guybedford commented 4 years ago

@giltayar yes this was the dynamic modules proposal that we put so much time into at TC39 (https://github.com/nodejs/dynamic-modules). The solution, implemented pretty much exactly as you describe, hits on some quite deep complexities:

export * from 'a'; export * from 'b' is supposed to throw when a and b both export the same named export. Allowing the names to be user/consumer-defined means this validation phase needs to be post-handled / ignored somehow.
import * as ns in a cycle can expose a module namespace object for a CJS module with the exports being displayed before the execution of that CJS module has completed. Since the exposed exports are then the user definitions this exposes non-determinism despite any later execution error.

The dynamic modules solution was to ban export star of commonjs modules for (1) and to patch the namespace for (2), but this approach was shot down at TC39 after multiple attempts at consensus.

The simple solution being discussed here can be implemented in a PR tomorrow, and doesn't come with anything close to these complexities.

GeoffreyBooth commented 4 years ago

yes this was the dynamic modules proposal

So this proposal determined the names of the CommonJS exports by looking at what was imported, e.g. in import statements; did we ever consider determining the names of the CommonJS exports by parsing the CommonJS?

I remember there was opposition to evaluating the CommonJS in the instantiation phase, and strong opposition to evaluating CommonJS code twice; but what about just parsing it in one of these pre-evaluation phases? We would miss any CommonJS exports that are dynamically named, but perhaps that's okay? That could be a caveat we explain in the docs as a limitation of importing CommonJS into ESM; hopefully that edge case is rare enough that it shouldn't come up much.

ljharb commented 4 years ago

Is there a way to reuse the results of the parse to minimize the perf hit of that?

giltayar commented 4 years ago

I'm not a fan of parsing CJS modules, because it's heuristic, and there would be no rule we could write in the documentation that says when it will work and when not. Worse: due to a change in the parsing code (which will happen), a module that worked in one version of Node, will stop working in another.

Too non-deterministic for my taste.

giltayar commented 4 years ago

but this approach was shot down at TC39 after multiple attempts at consensus

True, and from TC39's perspective, they may be right. But if you're anyway deeming CJS to be "out of TC39 scope" (which I'm OK with!) then you can definitely go the "dynamic modules" route, which has the added benefit of making the execution order intuitive to the developer: I believe execution order should be "serial" and not an interleaving (which it will be if we execute CJS in the binding phase), whereby first the CJS in the tree will be executed and then the ESM.

jkrems commented 4 years ago

then you can definitely go the "dynamic modules" route

The problem is that dynamic module is the route that does need TC39 support because it requires actual support in the VM (correct me if I'm wrong, Guy). Early execution or something like a top-level-parse would both be possible if we say "CJS exists outside of the standardized module system and its execution isn't considered module execution".

Afaik the current state is:

Early Execution: Technical side works fine, allows full named exports as long as they exist after initial execution. Downside is that CJS in the tree always runs before any ESM executes. ESM can never do bootstrap logic as long as there's any CJS in the tree. Some race conditions (potentially, likely rare) if we don't snapshot the exports of every CJS after first execution. Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
(Partial) Parse: Technical side works fine, allows named exports but only heuristically if they can be determined from a parse (top level parse?). Every imported CJS file has to be parsed twice (once by module loader, once by V8). Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
Dynamic Module: Technical side requires buy-in from V8 which requires buy-in from TC39. The weirdness around * exports and namespace objects which may prevent it from ever finding acceptance (see Guy's comment above). Outside of *, "perfect" bindings. CJS executes when the respective ESM node should execute. Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
Default Only: Technical side works fine but only allows module.exports as default. No collision between module.exports.default and module.exports. CJS executes when the respective ESM node should execute. Relatively easy to simulate/recreate/modify in a userland loader.

My personal take: I think partial parse would be a reasonable compromise and JDD's esm seemed to show that the performance impact was okay, especially since it would only affect the boundaries between ESM and CJS so the majority of modules wouldn't be double-parsed.

The bad news is that interop-require conflated module.exports.default and module.exports in some places. We would have to pick module.exports for backwards compat afaict and that would create a potentially even uncannier valley than what we have today. If we're okay with a breaking change there, the parser could look out for that magic es module property and switch to module.exports.default based on it.

jkrems commented 4 years ago

Relatively easy to simulate/recreate/modify in a userland loader.

To elaborate on this point: All the more advanced solutions have issues once somebody wants to write a userland loader. If I want to serve CJS from HTTP/archive/compiled-on-the-fly, I may have to reproduce the core behavior. Assuming the core behavior is "parse the source and then apply a specific heuristic to determine likely named exports", that quickly becomes a challenge. The same is true for any kind of static analysis tool (compiler, linter, bundler, ...). It's not a blocker but I wanted to call it out.

bmeck commented 4 years ago

We have discussed this many times, at length. I do not see a way forward and do not want to take this trade off given previous discussions.

Downside is that CJS in the tree always runs before any ESM executes. ESM can never do bootstrap logic as long as there's any CJS in the tree.

I think this might be a bit downplayed. You import not just the CJS shallowly but all of its dependencies as well. This means that:

transitive dependencies of CJS also get hoisted. it is hard to understand how things would be re-arranged by both humans and tools. I do not see this as making the user experience better.
workflows like console based debugging will start to get very confusing, I do not see this as making the user experience better.
it doesn't match how faux modules (transpiled CJS trying to emulate parts of ESM, not real ESM) work. If the goal is to achieve the same thing as faux modules, this does route not seem to be well understood or having history being tested in the wild. If tooling could provide such a historical benchmark to see breakage and/or UX it would be helpful.
it doesn't match what is now shipping in Node unflagged. Re-ordering things that people are depending in a stable version on is not something I feel comfortable with.
it cements that APMs/Polyfills/Security hardening/global error handlers/etc. need to stay CJS forever (and want import condition to load CJS in particular). Side effects in modules are a natural part of the runtime having so many features being 1st class dynamic JS APIs. If the goal is to make CJS a legacy feature, this instead would permanently force those workflows to be CJS in order to safeguard the timing of their evaluation. This would be against one of the goals of ESM allowing a path towards codebases that are purely ESM and/or work on environments like the web. Breaking such a goal seems to need to be weighed heavily.

I agree that named import being able to pull from CJS is valuable, but I do not see the research into the affects that this causes to quickly jump on things. Additionally, this trade off doesn't seem to focus on anything except the ability to have named imports from faux modules as CJS. I would like to understand why none of the above are important vs that fact.

It would also be helpful to understand why the consumer needs this and the author is in a situation that updates would be too painful to make. Currently transpiled CJS can be loaded into our ESM implementation, and you can use the full interface it provides, albeit with manually reproducing what a transpiler would do to interface with it. Things like https://github.com/addaleax/gen-esm-wrapper are providing facades for build time. Understanding how the transpiled workflows are unable to maintain build time tooling for purposes like this would also be helpful.

Right now I do see that people using their transpilers are having problems when they try to stop using them; but, I don't think they need to stop using them. I do see that other problems with dropping transpilers exist (async loading, mocking, magic variables, etc.) and think the best approach at least in the short term is to continue transpiling, and think that trade offs should match transpiler behavior more than this would. It would be a breaking change from transpiler behavior as well.

This trade off does not necessarily have a way out in the future since it would be another breaking change to undo this oddity if we introduce it. If we land such a change I would also never want to revert it. We would forever not match historical transpiler semantics nor match ESM semantics of order of evaluation. Currently, the semantics are somewhat easy to understand, even if they do not provide all of the features of faux modules. I value the ability to understand/teach how my program works, and introducing implicit and transitive preemption is far from desirable to me. Additional mitigations for performance like late binding also make this even worse for console based debugging and match transpilers potentially even less.

Providing named imports from CJS does not solve a variety of other issues with ESM adoption, and I am unclear on how a variety of other upcoming and existing problems wouldn't be made even worse by taking this trade off. Things like standardized JSON modules are in talks in both TC39 and WHATWG and are being looked at as only providing a default export. Providing named imports from CJS and not from JSON would just be more confusion rather than having at least a semi-uniform approach for integrating values that are not designed for ESM.

I do not want to re-discuss all of these issues unless we can discuss the trade offs on both directions.

bmeck commented 4 years ago

I forgot that this also may violate things like tick ordering of queued tasks, but I assume that is understood by the nature of this hoisting.

GeoffreyBooth commented 4 years ago

@bmeck Does your take above apply to any of the options from @jkrems’ comment, or the overall idea of executing CommonJS out of order? I'm not clear on which proposal you're pointing out problems with.

Looking at Jan's options, it sounds like the last one (Default Only) is what we have now; Early Execution is probably disqualified for all the reasons you listed in your comment about it cementing CommonJS primacy; Dynamic Modules is a no-go without TC39 buy-in; and that leaves (Partial) Parse. I don't think the parse option has all the timing and other issues you're discussing, does it? Aside from the parse possibly missing some names, and the issue with default, are there any other problems with that option?

bmeck commented 4 years ago

@GeoffreyBooth parsing does not have the issues of timing, but does have heuristic concerns. See the output of transpilers like: this with export * which do require runtime info. I also had a stripped down pragma PR "use exports ...list..."; a long while back but people didn't like the perf problems it had. The complexity of a full JS parse is non-trivial and we would need to document our exact heuristic such that transpilers could target it and it could continued to be supported.

GeoffreyBooth commented 4 years ago

So others who have thought about this more can propose better solutions, but what I had in mind for a parsing algorithm would be to try to keep it simple enough that it's straightforward to understand. Like:

Find all references to a non-declared exports object (i.e. an exports variable that has no var/const/let/function parameter declaration) or a likewise non-declared module object with an exports property.
For each such reference, parse the references of statically-defined properties of that object (such as exports.foo or exports['foo']) or module.exports.foo and module.exports['foo'] etc.).
Also parse assignment of an object to that reference (exports = { foo: 1 }, etc.).

And . . . that's it? If there are any other really common ways of declaring exports, we could extend our logic to include them, but the idea would be that the docs describe this algorithm in plain English and say that CommonJS exports defined such that they're discoverable via this algorithm will be importable into ESM; and others will not be.

So yes, the transpiler output of export * from would fall into the unsupported category; but the transpiler output of export const foo = would be parseable. Ditto export function foo() and export class foo() and export { foo }. The main exceptions that I can find in some cursory tests are export * from and export default would be unsupported, and we would call those out in the docs (as well as a traditional CommonJS export named default).

targos commented 4 years ago

Prior art: https://github.com/rollup/plugins/blob/master/packages/commonjs/src/transform.js That module also has to handle imports, but I think it's an interesting case because users of rollup already rely on it to support named imports from CommonJS dependencies.

MylesBorins commented 4 years ago

Regarding perf hit of a double parse.

It is my understanding that the fetch / parse / link phase can be done asynchronously. As CJS modules will all be terminal nodes in the graph. While I don't think we are doing this yet we could move the parsing / validation off the main thread. I also think, but could be wrong, they there might be wiggle room in the spec to begin the execution phase before the validation is finished here... This might be a bit more controversial though. I'm also curious if we could potentially code cache the result of the parsing to reuse later... if V8 does not yet have an API for something like that it might be something we could ask about.

So, with that said, I think there is likely room to explore the double parse option and ways to architect it so that the performance hit is not significant. Is there a reason it is a non-starter from a spec perspective?

bmeck commented 4 years ago

I do not have any significant objections to parsing but would like to see more numbers around it. To my knowledge parsing to form the export list does not violate any spec expectations (indeed this is what ESM does and across modules no less!), but doing evaluation would be a change. If parsing is done up front, I am unsure if we would need the execution to actually be hoisted. For a variety of runtime dependent stuff like the babel output I linked, if it is setup to be cross module you might be able to fully form the heuristic graph even for export * from

guybedford commented 4 years ago

I have previously always been against parsing for the extra overhead but can possibly get behind it if we do it lazily only on the ESM/CJS boundary.

I think it would be important that the parsing is done lazily because if we apply it to all CJS modules loaded that means even Node.js users who don't use ES modules will pay this cost - that is we should remove the injection into the ESM registry to do lazy snapshotting.

@bmeck I know you have had a problem with the idea of lazy snapshotting before, would you be able to compromise on that?

bmeck commented 4 years ago

If evaluation order is preserved I can live with it. Might still be surprising but our current CJS cache in the translator also has a similar gotcha if you try hard enough.

On Mon, May 4, 2020, 3:15 PM Guy Bedford notifications@github.com wrote:

I have previously always been against parsing for the extra overhead but can possibly get behind it if we do it lazily only on the ESM/CJS boundary.

I think it would be important that the parsing is done lazily because if we apply it to all CJS modules loaded that means even Node.js users who don't use ES modules will pay this cost - that is we should remove the injection into the ESM registry to do lazy snapshotting.

@bmeck https://github.com/bmeck I know you have had a problem with the idea of lazy snapshotting before, would you be able to compromise on that?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/modules/issues/509#issuecomment-623681844, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZJI44BT22F65PGFPU4CTRP4O57ANCNFSM4MYNSHUA .

guybedford commented 4 years ago

Great, that sounds like it could well be a possibility then to me, with analysis as @GeoffreyBooth describes in https://github.com/nodejs/modules/issues/509#issuecomment-623626012.

GeoffreyBooth commented 4 years ago

I know we use Acorn for parsing here and there in our internals; is it possible to use V8 itself to parse JavaScript source code and return an abstract syntax tree back to Node?

On top of that, could we then submit that AST back into V8 when it’s time to evaluate that code? Rather than asking V8 to parse the source again.

If both of these are possible, we could avoid a double parse. Even if only the first is possible, it would be preferable to using Acorn both for speed and to ensure that the parsed AST we work with matches what V8 evaluates.

bmeck commented 4 years ago

V8 does not expose any AST like structure after it parses. The AST is internal to V8 and not a public API.

MylesBorins commented 4 years ago

/cc @nodejs/v8 who might have opinions

devsnek commented 4 years ago

I'm against the cjs parsing idea. I know of a lot of cjs code that dynamically builds exports which this would fail on. Even if we do go forward with it, we will need to make sure we parse using the exact behaviour of the parser in the engine we use, which acorn doesn't fulfill. Since people also embed node within other engines (for example graaljs), I don't think we can ever reach that.

MylesBorins commented 4 years ago

Could this be a "perfect is the enemy of good" situation?

If we can statically analyze some exports with minimal performance overhead is there a reason to not do it?

On Mon, May 4, 2020 at 6:23 PM Gus Caplan notifications@github.com wrote:

I'm against the cjs parsing idea. I know of a lot of cjs code that dynamically builds exports which this would fail on. Even if we do go forward with it, we will need to make sure we parse using the exact behaviour of the parser in the engine we use, which acorn doesn't fulfill. Since people also embed node within other engines (for example graaljs), I don't think we can ever reach that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nodejs/modules/issues/509#issuecomment-623739611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADZYV6AS4AJ3DACFI7V7NDRP4565ANCNFSM4MYNSHUA .

devsnek commented 4 years ago

@MylesBorins since we know the pattern exists, we are creating a breaking change in cjs where there previously was none (dynamic assignment vs static assignment), so I would not go so far as to say this is "good"

jkrems commented 4 years ago

I wouldn’t call it “breaking” when those values are still reachable from the default export. And I would assume that the entire module.exports object would be on the default export in those cases.

weswigham commented 4 years ago

since we know the pattern exists, we are creating a breaking change in cjs where there previously was none (dynamic assignment vs static assignment), so I would not go so far as to say this is "good"

Is it worth mentioning that JS autocomplete/autoimport in vscode, VS, webstorm, and a bunch of other editors, is essentially predicated on the idea that "most JS cjs module exports are statically analyzable"? And that all of those go out the window when you use more dynamic export patterns, too? Don't get me wrong, I'd still rather see the dynamic modules solution happen via some means, but I wouldn't say cjs static analysis is a wholly untested concept.

devsnek commented 4 years ago

@jkrems if you restructure your cjs module to use dynamically created properties and someone using esm depends on named exports it is by definition a breaking change. the graph will not validate. the application will exit with an error.

weswigham commented 4 years ago

Like, there's nobody advocating for people to use more dynamic cjs export names, I hope, since they cause a ton tooling headaches elsewhere in the ecosystem, like in tree shaking or editor intellisense... Like, you can dynamically build your exports, but it's really edge case packages that actually do, in practice, because of all the other ecosystem bits it already impedes.

devsnek commented 4 years ago

@weswigham

works most of the time

I don't think node core should do things that only work most of the time. That's somewhat of a philosophical argument about correctness though, so I'm not sure if that's the thing we should discuss.

advocating for migrating to dynamic exports

Regardless of what is being advocated, people do all sorts of odd things with cjs. If we enable this behaviour, we create a refactoring hazard around otherwise invisible changes within a module, which I would classify as node being broken.

GeoffreyBooth commented 4 years ago

I don't think node core should do things that only work most of the time.

This is not one of those cases. It will work 100% of the time when the author follows the rules we specify, e.g. that the author writes CommonJS that is easily statically analyzable.

devsnek commented 4 years ago

@GeoffreyBooth "it will work 100% of the time ignoring the times when it doesn't work" i can't tell if this is satire or not.

aside from that, if you're updating your package you can just add esm wrappers. The entire point of worrying about cjs is that there are oodles of packages that won't be updated to support esm.

giltayar commented 4 years ago

My concern with parsing is not that it won't 100% work, but rather that any change in the algorithm (and there are always changes/bugs in heuristic algorithms) will break previously running code because a named export that was there will suddenly not be. That sounds to me dangerous and will lead to fear of changing the heuristics, even for a bug.

Yes, VSCode and others do that, but I've seen time and time again when working with them, that modules I have that weren't analyzed correctly, suddenly are, and (a bit less I have to admit) vice-versa. But what is OK in an IDE, I believe is not OK in a runtime.

Dynamic Modules is a no-go without TC39 buy-in

Are we sure about that? Have we asked the v8 team?

ryzokuken commented 4 years ago

But what is OK in an IDE, I believe is not OK in a runtime.

I would second that.

If there's a perfectly valid, syntactically correct way of writing a CJS module, I don't see why someone somewhere won't be using it, and adding behavior that only works in some cases but not the others sounds like a dangerous way to make things confusing.

devsnek commented 4 years ago

@giltayar

Have we asked the v8 team?

It would require a willful violation of the spec which is something we try to avoid.

giltayar commented 4 years ago

It would require a willful violation of the spec which is something we try to avoid.

Too bad. I hope the end result won't be that Node.js ends up executing CJS out of order, which is even worse. But I understand the hesitation.

GeoffreyBooth commented 4 years ago

"it will work 100% of the time ignoring the times when it doesn't work" i can't tell if this is satire or not.

I'm serious. If you set the bar so high that it's impossible to meet, by definition that's making the perfect the enemy of the good. If your bar is 'named exports from CommonJS, no matter what, even if they're dynamically set' and you refuse to accept anything less, then you'll get nothing.

Whereas 'named exports from CommonJS when they're defined in a way that's easily statically analyzable' is a standard that we can meet, without a too-complicated heuristic. We don't need code as complex as Rollup's, since we're not trying to be as comprehensive.

We can also test our algorithm. We can run our 'get statically-defined CommonJS named exports' function against a whole bunch of npm packages and save the names returned; and then actually require those packages and inspect the exports and compare them against our earlier results. Yes, I would expect that some of them will have exports named default or dynamically-named exports or exports that appear only because they've overridden require or some other ridiculous thing; but we can in our test determine the real-world percentage of such things for the top 100 or 1000 or however many npm packages. If that percentage of correctly parsed named exports is very high, say 99%, then we have to weigh the value of denying named exports under any circumstances against the value of providing them under defined circumstances. But first someone needs to write a PR that makes the attempt.

My concern with parsing is not that it won't 100% work, but rather that any change in the algorithm (and there are always changes/bugs in heuristic algorithms) will break previously running code because a named export that was there will suddenly not be.

To properly evaluate this concern we need to actually write the code. If we can write a function that's simple and looks like it won't change once it's unflagged (and I agree, that's probably going to be the case) and we can document its algorithm along the lines of https://nodejs.org/api/esm.html#esm_resolution_algorithm, then I think it's shippable.

devsnek commented 4 years ago

@GeoffreyBooth as I've said, that is indeed my bar, and I would rather not ship something that uses guesswork in node core.

devsnek commented 4 years ago

and repeating myself again, if we have to document how to write a cjs package we've probably failed, because people writing packages can/should include esm wrappers.

bmeck commented 4 years ago

@devsnek it is not guesswork, but it is not 100% coverage. Assigning to the a property of a newly allocated exports object is a fairly clear sign that you want to add something to it. I'm not sure what guesswork is being done in the algorithm mentioned above, it lists a heuristic that does not seek to change over time nor is affected by things like locality or order of operations. It might be helpful to figure out what is guesswork so that we could remove any guessing. I think there is a firm grasp that such a heuristic might not cover all cases in which code may be written. There are similar issues with things like the name inference of anonymous functions within the specification itself not adopting names in all source positions such as when provided as an argument to a function call. Can you explain what the necessary qualification of things to be considered not to be guesswork?

Perhaps it would also be helpful to separate the ability to cover 100% of CJS runtime effects vs static analysis that must be done ahead of time since the proposed algorithm is only placed in the static analysis timeline. If the stated constraint is we must not use static analysis then that would be helpful to state directly. However, given that static analysis can alleviate some use cases; it would also be enlightening to what harm using it might cause for things like forwards compatibility. As @GeoffreyBooth states above, if the algorithm is rigidly documented any failure to satisfy automatic generation of bindings are an exercise of reading the documentation to understand why something failed and not an attempt to infer guesswork.

bmeck commented 4 years ago

As a slight aside, I think we should avoid calling things/people the "enemy" or implying that something is inferior by nature (good vs perfect). I'm not clear on those statements on how they are actually making any claims about the arguments at hand and it is a bit confusing to read. If such a catch phrase is used can we explain the positions of why something is a better trade off more clearly and without implying that other trade offs are in some way evil?

devsnek commented 4 years ago

@bmeck it actually matches the definition of guess:

estimate or suppose (something) without sufficient information to be sure of being correct.

The suggestion is that we guess the properties of module.exports, using some sort of syntactic analysis.

Can you explain what the necessary qualification of things to be considered not to be guesswork?

For any CJS module, getExportsStatically(module) is equal to Object.keys(require(module))

bmeck commented 4 years ago

@devsnek

The suggestion is that we guess the properties of module.exports, using some sort of syntactic analysis.

It does not claim to do so nor does it claim to do so completely, it only claims to create bindings if assignments in the special forms listed are found. In particular it was stated:

parse the references of statically-defined properties of that object

I assume you are claiming this to be guesswork based upon "without sufficient information to be sure of being correct", however if we are only concerned with static analysis it seems we do not fulfill your definition of guesswork. Can you clarify why only obtaining the static values might not be considered correct/still be considered guesswork if our stated goal is to only add support for those special forms?

For any CJS module, getExportsStatically(module) is equal to Object.keys(require(module))

So to be absolutely clear, since require(module) generates some properties at runtime that cannot be statically analyzed; therefore it is my assumption that you are against any static analysis based solution. Can you clarify in which cases generating from static information would be a hinderance rather than an additive feature? If the claim is that it is better to only have purely guaranteed behavior to match runtime because it has some ill effect to add these bindings that might be more easy to understand your concerns.

devsnek commented 4 years ago

@bmeck

Can you clarify why only obtaining the static values might not be considered correct/still be considered guesswork if our stated goal is to only add support for those special forms?

Because I can still import the other forms, even if the modules group tries to pretend they don't exist.

Can you clarify in which cases generating from static information would be a hinderance rather than an additive feature?

As I've said a few times now, it creates breaking changes to esm consumers from otherwise invisible changes to cjs module source.
People consuming modules have to either read the source or try to use a named import and see if it works to use this feature.

bmeck commented 4 years ago

Because I can still import the other forms, even if the modules group tries to pretend they don't exist.

Can you clarify this? I'm unclear on how hoisting the special forms having 2 ways to access them makes the non-special form still being available a problem. Are you stating if we did not allow access to non-static analysis members of CJS modules it would not be a problem since we would be limited to only having one form of access and not a super/subset problem for a special form vs normal form? We could take an alternate stance that since we cannot identify runtime based values we could ban anything non-static. This is actually an interesting potential alternative trade-off we could pursue since export * and dynamic assignments to exports are semi-rare. Would you have the same objections if we limited CJS exports to purely static forms so that we do not have a mismatch?

As I've said a few times now, it creates breaking changes to ESM consumers from otherwise invisible changes to CJS module source.

Yes, but I am unclear on how it create creates the breaking change, can you elaborate? Right now it seems that this would not be a breaking change any any way except adding bindings that did not previously exist.

People consuming modules have to either read the source or try to use a named import and see if it works to use this feature.

I do not believe so, they could use the REPL to see by importing a namespace and reflection, they could always use the normal form, they could import a namespace, or use docs as well. More pointedly, the same problem is true for all of ESM bindings that are exported and CJS itself to some extent; coders need to know how they can use dependencies by some means of understanding the module being loaded in both CJS and ESM and this is not changed by any algorithm. Do you have an example of where you need to know if this is available as an export in a way that is true for this algorithm but are unable to reproduce a similar issue if you never knew the format of the dependency?

ljharb commented 4 years ago

It's worth noting that since module.exports can have properties that are not IdentifierNames, as well as that are Symbols, import * as ns from 'cjs'; Reflect.ownKeys(ns) can never match Reflect.ownKeys(module.exports), so I'm not sure that's actually a meaningful bar to set.

If you narrow it down to only worrying about own string data properties of module.exports that are IdentifierNames, then the only possible discrepancy left would be dynamically added IdentifierName properties, something which absolutely could happen, but in practice is likely to be somewhere between "exceedingly rare" and "never".

devsnek commented 4 years ago

@bmeck

Can you clarify this? I'm unclear on how hoisting the special forms having 2 ways to access them makes the non-special form still being available a problem.

Because when someone requires/imports a module they aren't thinking "ok now i'm going to import a module that is written in commonjs where the commonjs source uses static assignments to module.exports". They're thinking "ok now i'm going to import express".

I am unclear on how it create creates the breaking change, can you elaborate?

// module 1.0.0
module.exports = {
  a: require('./x/a'),
  b: require('./x/b'),
  etc
};

// module 1.0.1
for (const name of readdirSync('./x')) {
  module.exports[name] = require(`./x/${name}`);
}

Do you have an example of where you need to know if this is available as an export in a way that is true for this algorithm but are unable to reproduce a similar issue if you never knew the format of the dependency?

If we assume there is documentation, because without documentation you have to guess no matter what you are importing: The documentation for the above module says module.a and module.b and etc are available for consumption. The documentation for available apis doesn't change during the version bump, because the available apis don't change. Yet in version 1.0.0, they are named exports and in 1.0.1 they are not. If that contradiction is not enough, you can compare it to an ES module's documentation (regardless of if the module is wasm or source text or whatever), which would obviously say what the exports are.

devsnek commented 4 years ago

@ljharb that problem is bigger than cjs (wasm exports, for example) so i'm not sure its worth worrying about that much in this context.

bmeck commented 4 years ago

@devsnek

Because when someone requires/imports a module they aren't thinking "ok now i'm going to import a module that is written in commonjs where the commonjs source uses static assignments to module.exports". They're thinking "ok now i'm going to import express".

If they are importing an ESM module don't they have the same issue of needing to know what bindings are exported? I'm confused on why needing to know the exported bindings of a CJS module is problematic but ESM requires it.

for (const name of readdirSync('./x')) {
 module.exports[name] = require(`./x/${name}`);
}

Thanks for the example. So your concern is that people may accidentally create breakage by moving to more dynamic forms of code in some non-major version. We have seen some issues of this with how package.json#exports can often create breaking changes if the understanding of how the feature works is poor. Perhaps since this was acceptable for package.json#exports we could try and better understand what the difference there was an if we can also make a similar trade-off that we did if it is ameniable.

If we assume there is documentation, because without documentation you have to guess no matter what you are importing: The documentation for the above module says module.a and module.b and etc are available for consumption. The documentation for available apis doesn't change during the version bump, because the available apis don't change. Yet in version 1.0.0, they are named exports and in 1.0.1 they are not.

I don't understand the point about it being contradiction, if Node asserts a clear definition about when something will be flagged as an exported binding in CJS and code goes against it there is nothing we can do. Stating that creating exports using a static form flagging binding names is quite similar to how the export statement also flags binding names. It might be good to have a lint rule for this, and for module authors to test their changes, but it doesn't seem like something that is unable to be understood, tested, or even done through feature detection.

If that contradiction is not enough, you can compare it to an ES module's documentation (regardless of if the module is wasm or source text or whatever), which would obviously say what the exports are.

I don't understand this comparison, this algorithm is defining a special form of static syntax. It defines what the named exports for ESM are by using that form. As @weswigham points out there is precedent and tooling relying on similar behavior. Comparing WASM and Source Text Module Records also define their exports for ESM using static syntax. How is using a static syntax in CJS different from the fact that ESM also uses static syntax?

devsnek commented 4 years ago

How is using a static syntax in CJS different from the fact that ESM also uses static syntax?

if Node asserts a clear definition about when something will be flagged as an exported binding in CJS and code goes against it there is nothing we can do.

Isn't the entire point of this issue to figure out what to do about all the cjs that won't be updated, or will be updated but won't be transitioned to support esm? If someone is updating their package and wants to support esm (reads our documentation, for example) we should tell them to use an esm wrapper file. @addaleax even created a tool to do it for you: https://github.com/addaleax/gen-esm-wrapper. We shouldn't tell them to write their cjs code in a way such that node can tease details from the ast.

nodejs / modules

Rediscussion for out-of-order CJS exec? #509