microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
101.02k stars 12.49k forks source link

Rethinking `module` for the present and the future #55221

Open andrewbranch opened 1 year ago

andrewbranch commented 1 year ago

What does --module actually mean?

Which of these is a better definition for the flag as it exists today? Which is a better fit for the future?

  1. The output module syntax you want to be emitted
  2. A declarative description of the module system that will process your emitted code at bundle-time or runtime

When the possible values of module were limited to amd, umd, commonjs, system, and es2015, the former definition was perfectly fine. When es2020 and es2022 were added, which added syntax features like import.meta and top-level await that couldn’t be transformed into other module emit targets besides system, it started to feel like module described not just an output format, but the intrinsic capabilities of some external system. With node16 and nodenext, the scope of the module flag suddenly expanded to include a new module format detection algorithm used by the target module system and special interop rules between module formats, while it stopped directly controlling the output format, since the format of every output file would be fully determined by Node.js’s format detection algorithm.

The latter interpretation of module, one that fully describes the target module system, works well for node16/nodenext, but trying to project that definition onto the other, older module values makes them feel kind of incoherent.

All the values except node16/nodenext are kind of weird

Some of the important characteristics of the module system described by --module nodenext are:

If we try to infer from existing what the other module values say about these characteristics, the result is confusing. For example, you might expect that --module esnext means an ESM-only module system that must reject CommonJS/AMD/System modules—after all, you’re not allowed to write import foo = require("./mod") in that mode. But you are allowed to import a dependency that declares CommonJS constructs like that.

None of these module modes have any restriction on the kinds of modules that can be imported, nor do they particularly make any effort to detect what kind of module a dependency is. Essentially, type checking between modules proceeds as if everything is CommonJS, even when we’re explicitly emitting esnext. This can be observed direclty by writing a default import of a .d.ts file that only declares named exports:

// @module: esnext
// @esModuleInterop: true

// @Filename: /esm.d.ts
export const x: string;

// @Filename: /main.ts
import esm from "./esm";
esm.x; // string, no error, what??

This behavior is enabled by esModuleInterop/allowSyntheticDefaultImports, but those settings should only affect how the exports of CommonJS modules appear (and arguably only to imports written other CommonJS modules, since esModuleInterop is an emit setting that only emits code into CommonJS outputs). There’s no attempt to distinguish between what happens when two ES modules interact, two CJS modules interact, or an ES module imports a CJS module. This is perhaps, historically, because we had no idea what the actual module format of the JS file described by the declaration file is. (It would have been really nice for declaration emit to have always encoded the output module format, but here we are.)

Even if we had perfect information about the module format of every file, the distinction between I want to emit ESM and My module system can only handle ESM is potentially useful, and these old module modes can only describe the former. Essentially, they all describe the same hypothetical module system, where any module format can be loaded interchangeably.

Supporting bundlers

Webpack and esbuild vary their handling of ESM→CJS imports based on whether the importing file would be recognized as ESM according to Node.js’s module format detection algorithm. According to the node16/nodenext prior art, the module flag is the trigger that should enable this behavior.[^1] Unlike in Node.js, files in these bundlers’ module systems are not always unambiguously ESM or CJS. When a file has a .ts/.js extension, and the ancestor package.json doesn’t have a "type" field at all, they’re not treated as CJS; they just don’t get the aforementioned special Node.js-compatible import behavior.

Other bundlers don’t implement this Node.js compatibility behavior (at least by default). They’re already fairly well served by --module esnext, with the exception of the bug described in the previous section (#54752). It seems like we could improve on all the older module modes by including file extension and package.json "type" fields as a heuristic for when a default export of should be synthesized, and to avoid emitting syntax into .mjs or .cjs files that would be invalid in Node.js. (#50647, #54573)

Options

Decisions I think are on the table:

My proposed minimal change:

Why I’d rather rethink module as a whole than do the minimal change:

[^1]: Today, the module format detection (the setting of impliedNodeFormat) is actually triggered by moduleResolution, not module, but I think this doesn‘t make sense. #54788 swaps the trigger, and that change can go unnoticed since we already made moduleResolution: nodenext and module: nodenext inseparable at #54567.

andrewbranch commented 1 year ago

I want to draw out two things that were discussed in the design meeting #55271.

First, there was broad agreement that it would be worth updating the old esnext/commonjs/etc. modes to fix issues like #50647 by having them refuse to emit CJS syntax into .mjs files or ESM syntax into .cjs files, and that we don’t need to wait until 6.0 to at least experiment and see how breaky that kind of change would be. This could be done either by issuing a program error, or updating those modes to take file extension (and perhaps package.json "type") into account and really treat those files as the module format their extension implies, even though it may disagree in name with the module value. I’m leaning toward trying the latter, because as I discussed in the issue body, you can look at the semantics of how imports and exports in declaration files work in these modes and notice that they are not actually intended to limit program files to just ESM or just CJS. For example, in --module esnext, you’re allowed to import a declaration file that uses import x = require("...") and export = ..., so it seems to acknowledge that CJS files exist and can be imported, so there doesn’t seem to be a strong reason to refuse to emit CJS syntax into a .cjs file in that program.

Secondly, @weswigham floated the idea of creating several granular, advanced-usage flags that control individual aspects of the module system, and rolling them up into named presets reflecting real known runtimes, e.g. esbuild, webpack, etc. The individual controls that might be relevant are:

I feel fairly confident that this set of levers would let us model everything we currently have and everything that we’d like to add in the near term. It still makes me a bit uncomfortable to expose all these as public API though, as they would really take the “advanced” section of the tsconfig options to a new level. On the other hand, if we could use these to dramatically lower the barrier to giving users named presets that are a really good fit for their runtime/bundler, that might be a good tradeoff. (That does necessitate another decision about preset versioning—do we need a bun2023 and a bunnext? Or are we ok with changing presets as needed in each new TS version?)

fatcerberus commented 1 year ago

Today, the module format detection (the setting of impliedNodeFormat) is actually triggered by moduleResolution, not module, but I think this doesn‘t make sense.

I could go either way on this. If you say that "resolution" is only the process of "resolve a module specifier to a file on disk", then that's fair, but I think it could be argued that the process of module resolution also covers what kind of module it resolves to (in particular imagine a world in which you could write esm:foo or commonjs:foo where the module loader sees the exact same resolved filename for both). The other thing is that module, intuitively, is the answer to the question "What environment do you want to emit modules for?", and under that interpretation it doesn't make much sense for that to control what kind of module an import resolves to, as that's a (mostly) orthogonal concern.

FWIW, I myself considered module type detection to be an inherent part of resolution when I implemented neoSphere's current module loader: https://github.com/spheredev/neosphere/blob/main/src/neosphere/module.c#L398 I think I originally tried a solution wherein module type was handled later, during module loading, but that caused more problems than it solved; for example there was no way to say I wanted Node-like semantics for require but not for import, since all the loader saw was a filename and not, e.g., the contents of package.json.

jonkoops commented 1 year ago

We discussed the potential utility of a true ESM-only mode that would error when attempting to import unambiguously CJS dependencies. That said, no existing mode and no planned bundler mode would use this.

I would just like to express my support for this option. I increasingly have (mostly) standards compliant pure ESM codebases and dependencies, and CommonJS interoperability is steadily becoming more of a burden than a boon to productivity.