Divergent specifier hazard

nodejs / modules

Node.js Modules Team

MIT License

413 stars 42 forks source link

Divergent specifier hazard #371

Closed GeoffreyBooth closed 4 years ago

GeoffreyBooth commented 5 years ago

@jkrems and I created a repo to demonstrate the hazard mentioned by @MylesBorins in https://github.com/nodejs/modules/issues/273#issuecomment-492355098 and https://github.com/nodejs/modules/issues/273#issuecomment-492408041 where a single specifier resolves to separate things in ESM and CommonJS environments within the same runtime. This hazard is currently blocking dual packages, and logically would also block --es-module-specifier-resolution=node from becoming the default behavior.

Hopefully the repo illustrates the issue clearly enough that everyone can get a solid grasp on it. It seems to me that we have three options:

Accept that the hazard is a serious concern and that it therefore rules out supporting dual packages and automatic extension resolution. The current modules implementation is what ships with regard to those two features, and presumably we would remove the --es-module-specifier-resolution flag (since arguably the flag’s existence invites users to stumble into the hazard).
Acknowledge the hazard but decide that user education can overcome it. @MylesBorins and others concerned about the hazard would need to be persuaded as to why it isn’t a blocker, especially considering that it’s already come up in the real world. If the persuasion effort succeeds, we would then need to decide how far to lean into enabling features that invite the hazard (such as automatic extension resolution and dual packages).
Find a technical solution to prevent the hazard, such as making require of ESM work in a way that everyone is comfortable with (cc @weswigham). As with the previous option, we would then need to decide whether or not to support dual packages and/or automatic extension resolution.

Deciding on any of these options would lead to a resolution of the dual packages item in Phase 3 of our roadmap.

MylesBorins commented 5 years ago

TSC meeting did not result in any consensus. Folks did not have enough context to make informed decisions. I'm going to send an email to the TSC that give more context around this and we'll invite TSC members who are interested in this to attend the Modules Team. The general sentiment from the TSC was that we should hold off on unflagging for now while individuals get ramped up.

GeoffreyBooth commented 5 years ago

Verifying that pkg-plugin1 and pkg-plugin2 both depend on the same semver-major of pkg should be the end of my check for compatibility but in this case these two packages could be using different singletons. This returns me to the point that exporting CJS and ESM under separate specifiers will do almost nothing to avoid the singleton hazard.

Permitting the hazard and allowing both require('pkg') and import pkg from 'pkg' would still produce two singletons. That wouldn’t be the case if you transpiled via Babel or esm, because that would convert the ESM code into require('pkg') and therefore the CommonJS singleton would be used everywhere; but that doesn’t happen when ESM is being used natively.

See https://github.com/jkrems/singleton-issue/tree/master/dual-esm-commonjs-package. That example is running under --es-module-specifier-resolution=node, where require('pkg')/import 'pkg' is permissible and therefore the hazard is present. That example throws an error because the CommonJS and ESM singletons imported are different; one doesn’t instanceof the other. Changing 'pkg' to 'pkg/module' doesn’t change that; the CommonJS and ESM singletons are still (and will always be) different. The only difference with 'pkg/module' is that this behavior is expected, and also occurs in transpiled all-CommonJS environments.

It sounds like what you really want is require('pkg') and import 'pkg' to not just be permitted, but to also return the same singleton. That’s only possible when 'pkg' references the same files in each module system. Currently CommonJS can be imported in either, so at the moment it’s only possible when 'pkg' is all CommonJS. If Node someday supports require of ESM, then you could write 'pkg' as all ESM and it would be importable into either module system; and a transpiled CommonJS version of 'pkg' would be used in older Node environments. Separate singletons in separate Node versions isn’t a hazard.

ljharb commented 5 years ago

Not all commonJS - just the single file that contains the singleton. The rest can be dual, and can import/require the shared singleton from the same place. Using "exports", that internal file can even be kept internal.

coreyfarrell commented 5 years ago

It sounds like what you really want is require('pkg') and import 'pkg' to not just be permitted, but to also return the same singleton.

I want the documentation to specify conditions for which the hazard exists, leave it to the module maintainers to not publish modules with bugs. You've given examples of major projects which published broken versions but this happened under a lack of proper documentation about the hazard. I trust properly informed module maintainers to generally avoid creating broken releases. Being a known hazard means that it won't be as difficult to identify in the future.

Some solutions for maintainers that I can think of:

Do not use singletons or instanceof (not always possible but worth suggesting first)
Publish pure CJS or pure ESM (no dual mode)
Publish CJS code with an ESM wrapper (to control named vs default export for ESM)
Tag classes with a hidden Symbol.for('module-v1-specific-id') and check for it instead of using instanceof
Use globalThis[Symbol.for('module-v1-specific-id')] for storage of shared singleton data

If Node someday supports require of ESM, then you could write 'pkg' as all ESM and it would be importable into either module system

If this were possible it could be interesting. I'm unclear if/how it's possible, specifically my concern is that loading ESM is async and the require function is sync. How does require deal with async loader hooks and in the future loading modules with top-level await?

GeoffreyBooth commented 5 years ago

If this were possible it could be interesting. I’m unclear if/how it’s possible, specifically my concern is that loading ESM is async and the require function is sync. How does require deal with async loader hooks and in the future loading modules with top-level await?

Well that’s why some people think it can’t be done 😄 This thread is probably the place where it’s been discussed the most; see https://github.com/nodejs/modules/issues/371#issuecomment-526740456 or farther up.

GeoffreyBooth commented 5 years ago

I want the documentation to specify conditions for which the hazard exists, leave it to the module maintainers to not publish modules with bugs.

Certainly if we permit the hazard, the docs will try to guide people as clearly as possible to avoiding it. The issue though is that it’s not just an author thing; consumers also need to be aware of it. Consider the graphql case. In that one, graphql was a dual package and there was this other package graphql-yoga that imported one of the graphql singletons (I think the CommonJS version) and extended it. So if you’re a developer using graphql, and you import it, you’re getting the ESM version and that’s the only copy in your module graph; but then you add graphql-yoga to your project and now both the ESM and CommonJS versions of graphql are floating around in your module graph. The graphql-yoga plugin attaches itself to the graphql CommonJS singleton, so if you want to use graphql-yoga and your code is ESM, you need to know to require('graphql') rather than import it in order to get the CommonJS singleton that graphql-yoga is attached to. How would we explain this to users?

ljharb commented 5 years ago

The same way as one explains peer dep issues. Anything that depends on identity - === comparisons, collection keys, instanceof, or, as you point out, specific mutations - must not be duplicated.

This is solved with separate packages by ensuring that the singleton-containing-package is always a peer dep of everything that depends on it - this is how Babel, React, eslint, etc, all solve this issue already, and it's widely understood.

The new hazard is the same, just within a package - it's that if you have anything that depends on identity, it must not be duplicated (as in, not duplicated between a CJS and an ESM file). This piece would have to remain CJS to be usable in both module systems; or it could be ESM to be usable in only ESM.

In the graphql case, graphql should be a peer dep of everything in its ecosystem already, since it's the core of its ecosystem, and the onus remains on the graphql developers to ensure that there's only one file that conceptually represents a given stateful singleton (altho that doesn't even apply here).

Their bug is the identical bug as what would have happened if graphql was duplicated in someone's dep graph, it's just in a new place. By duplicating their files (presumably automated, with a build process), the developers opted in to having to know this hazard.

Also, given that import() works in CJS, this same hazard would happen even without dual mode modules unless graphql-yoga explicitly mutated both the CJS and the ESM versions - something they could have done here, too, but failed to do.

weswigham commented 5 years ago

Also, given that import() works in CJS, this same hazard would happen even without dual mode modules unless graphql-yoga explicitly mutated both the CJS and the ESM versions - something they could have done here, too, but failed to do.

How so? If require('graphql') throws because it's an esm entrypoint, import('graphql') produces the only possible copy.

jkrems commented 5 years ago

How so?

I think we should careful separate three scenarios:

Every published package is exactly CJS or ESM. Nobody ships any version that contains both. Pre-ESM and ESM are considered 100% incompatible in practice.
People ship packages that contain both CJS and ESM. Consumers use one specifier, package authors need to make sure there's no issues from that (e.g. by using shared files between implementations as necessary).
People ship packages that contain both CJS and ESM. Consumers use different specifiers to access each implementation (e.g. graphql/esm) and are responsible for dealing with any potential hazards. Package authors aren't expected to handle apps using both implementations at once.

Your comment seems to assume we're talking about (1) or it doesn't address the issue with (3): It requires perfect coordination between all packages in the dependency tree. For example:

graphql ships both CJS and ESM. ESM as exposed as graphql/esm and contains 2nd instances of all types (not safe to combine with each other).
graphql-yoga ships both CJS and ESM, the latter as graphql-yoga/esm. It uses existing transpilers, so both implementations require/import the CJS version of graphql. It may even have been published before graphql supported ESM. It has passing tests and 100% test coverage. :tada:
my-graphql-schema ships only CJS.
some-3rd-party-schema ships only ESM. It's super new and used graphql/esm.

In a world where an app owner should now figure out what to do, this isn't great. Should they use graphql/esm since all their code is written as modules? Well, no. That would break both graphql-yoga and my-graphql-schema. So... just graphql then. But then they can't use some-3rd-party-schema.

The situation would be a lot easier if the onus would be on graphql to provide a safe way to handle the hazard (scenario 2, e.g. via require guard in exports). It could share class identities (likely in individual CJS files) but have ESM wiring to provide a cleaner interface and named exports.

ljharb commented 5 years ago

@weswigham if the entry points are different, i mean, but there’s still a conceptual duplication. I’m saying that the bug is only avoidable by eschewing back compat, or properly understanding how to avoid it - in the former case, it’s hostile to many users; in the latter, there’s no hazard regardless.

coreyfarrell commented 5 years ago

The issue though is that it’s not just an author thing; consumers also need to be aware of it.

@GeoffreyBooth this is a point of disagreement we have. IMV the maintainers of each module are responsible for not producing releases which cause the singleton issue. I mentioned a few suggestions above - https://github.com/nodejs/modules/issues/371#issuecomment-542829995. I'm sure other ways to mitigate exist. This is something which consumers generally should not need to be concerned about.

The only thing end-users should need to be concerned about is how to write an import vs require. Take const _ = require('lodash'); as an example. If an ESM wrapper is provided does it still support import _ from 'lodash' or do you instead need import * as _ from 'lodash'? You can use const {uniq} = require('lodash'); but without an ESM wrapper import {uniq} from 'lodash' fails. What exports are provided would need to be documented by the module author so end-users could know what to expect.

GeoffreyBooth commented 5 years ago

consumers also need to be aware of it.

@GeoffreyBooth this is a point of disagreement we have.

I didn’t mean that consumers should need to be aware of it, just that I don’t see how they wouldn’t need to be. Just look at @jkrems’ examples and how many of them involve the consumer needing to be aware of the module format of the various modules they’re installing. Now some of that awareness will remain even in a /module approach, but the knowledge needs to be much deeper (and the debugging harder) for the “allow the hazard” approach. Your suggestions are good ones, and if we do allow the hazard I would recommend a whole section on the docs explaining the hazard and offering guidance on how to avoid it, including those approaches.

The only thing end-users should need to be concerned about is how to write an import vs require.

Agreed, but unfortunately I don’t see how this is possible in any form of dual packages. See the various examples above.

If an ESM wrapper is provided does it still support import _ from 'lodash'

You can always import the default export, regardless of wrapper. There’s no situation where a named export { uniq } from would be supported but the default _ from wouldn’t. The wrapper would also define the exported names ('uniq', etc.) so that { uniq } from would also work, which we don’t have for CommonJS packages loaded via import today. It almost goes without saying (but the docs should probably mention it) that the author of the package should create an ESM wrapper that exports the same names into ESM as are available in CommonJS.

jkrems commented 5 years ago

There seemed to be some confusion about my point above: I believe that we currently have a dual copy hazard within the pkg/pkg/esm pattern. Once an ecosystem of packages is involved, the solution requires that every package in the whole ecosystem supports the pattern and uses it consistently. I don't think this is a realistic outcome. To me it feels much more likely that the package owner of the identity-aware package could handle the issue locally than to depend on global cooperation to solve it.

GeoffreyBooth commented 4 years ago

Closing per https://github.com/nodejs/modules/issues/408#issuecomment-548210350.