microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.06k stars 12.37k forks source link

Dual ESM/CJS emit with tsc #54593

Open andrewbranch opened 1 year ago

andrewbranch commented 1 year ago

54546 explored one approach of enabling dual ESM/CJS emit for packages with tsc alone. The idea was that instead of determining the module format of .ts files by looking for package.json files in its ancestor directories, we would look for them starting at the directory where that file’s .js output would be emitted (a subdirectory of outDir). That way, two tsconfig files could point to two different output directories, each pre-seeded with a package.json files that set the module format for the output generated by that tsconfig.

This approach has two main downsides:

  1. It’s annoying to have to commit package.json files into your output directory while gitignoring everything else. Additionally, if used in combination with other tools (e.g. tsc for declaration emit but rollup for JS emit), other tools might wipe the output directory, deleting your package.json.
  2. Thinking about the implications for projects that aren’t doing dual emit, if we determine the module format based only on the output file structure, we potentially fail to analyze the behavior of ts-node, or a bundler that cares about package.json "type". This can be a problem if a project compiles with tsc for publishing, but runs input .ts files directly during development (which is a scenario I think we should make a habit of considering). Ideally, we want a solution that ensures the module format of the output agrees with that of the input, unless the output format is being intentionally changed by config (presumably for purposes of dual emitting).

I think both of these can be solved by doing two things:

  1. Introduce a compiler option that allows a package.json with { "type": "module" } or { "type": "commonjs" } (or blank, whatever) into the outDir, and use this config setting (instead of what’s already present in the outDir) to determine the module format of input files at a higher priority than any package.json files in a directory higher than outDir. This solves problem (1) above—no need to pre-seed or commit files in your build directory.
  2. Whenever a package.json in a subfolder of the common source directory / rootDir that affects the computed module format of a file is seen, emit that package.json (or a stub of it with just "type"?) into the corresponding subfolder within outDir. This solves (2), and actually solves an issue that exists today, where tsc output can be invalid for Node without manually copying a package.json that occurs inside rootDir. (@rbuckton mentioned this in team chat one time, but it didn’t get much discussion.)

Emitting package.json files would be new territory for us, but I think it’s worth it for the problems it solves.

rbuckton commented 1 year ago

54546 explored one approach of enabling dual ESM/CJS emit for packages with tsc alone. The idea was that instead of determining the module format of .ts files by looking for package.json files in its ancestor directories, we would look for them starting at the directory where that file’s .js output would be emitted (a subdirectory of outDir). That way, two tsconfig files could point to two different output directories, each pre-seeded with a package.json files that set the module format for the output generated by that tsconfig.

This isn't an approach I would recommend. At one point I was looking into using "stub" package.json files to set { "type": "module" } in a specific directory to support dual emit using the TypeScript API. At first glance, it seems like this might work as long as you only ever use relative imports, or import from packages in your node_modules. However, if you use an import map and # imports to other locations in the root of your package, those imports will no longer work.

For example, consider a structure like this:

/dist/cjs/package.json    # contains `{ "type": "commonjs" }`
/dist/cjs/index.js
/dist/cjs/index.d.ts
/dist/esm/package.json    # contains `{ "type": "module" }`
/dist/esm/index.js
/dist/esm/index.d.ts
/lib/compat.node.js       # Node-specific functionality (not a build output)
/lib/compat.browser.js    # Browser-specific functionality (not a build output)
/lib/compat.d.ts
/src/index.ts             # source to be built as a dual module
/package.json

and a root package.json like this:

{
  "type": "commonjs",
  "exports": {
    ".": {
      "require": "./dist/cjs/index.js",
      "import": "./dist/esm/index.js"
    }
  },
  "imports": {
    "#lib": {
      "types": "./lib/compat.d.ts",
      "node": "./lib/compat.node.js",
      "browser": "./lib/compat.browser.js"
    }
  }
}

You would define your index.ts file (that is built to the /dist/cjs and /dist/esm outputs) such that it imports from "#compat" to load javascript that is specific to the runtime environment:

import * from "#compat";
...

At compile time it seems like this will work, because there is no stub package.json file in the /src directory. As such, resolution walks up to the package.json in the root and can successfully resolve the "#compat" import. However, the build outputs in /dist/cjs and /dist/esm will fail to find that import at runtime because #-style import maps are specific to the nearest package.json.

I ran into this in https://github.com/esfx/esfx/tree/main/packages/equatable, which forced me to abandon that approach.

rbuckton commented 1 year ago

Instead, the approach I took was to just build .cjs outputs to /dist/cjs and then transform them to .mjs (including extension renaming) under /dist/esm.

I know we've been reticent to rewrite imports during emit, but I think building a foo.ts with dual emit to a foo.cjs and foo.mjs, and rewriting module specifiers to match the extension is probably the most reliable mechanism.

andrewbranch commented 1 year ago

At compile time it seems like this will work, because there is no stub package.json file in the /src directory. As such, resolution walks up to the package.json in the root and can successfully resolve the "#compat" import. However, the build outputs in /dist/cjs and /dist/esm will fail to find that import at runtime because #-style import maps are specific to the nearest package.json.

Central to my proposal is considering not just the package.json files that already exist on disk, but the ones that we know are going to exist due to emit from this feature. Accurate compile time format detection and module resolution is table stakes and is relatively easily achievable. However, I didn’t think about the fact that this would block the root-level imports and exports, and that is definitely unfortunate.

rbuckton commented 1 year ago

However, I didn’t think about the fact that this would block the root-level imports and exports, and that is definitely unfortunate.

As someone who would want to use this feature, this would make it completely unusable for me.

Honestly, I'd love to see package.json add some mechanism of specifying the default "type" for a subfolder. That way there would still only be a single package.json to refer to, and package.json is already read and interpreted as part of module lookup.

andrewbranch commented 1 year ago

That’s exactly what @DanielRosenwasser said 👀

andrewbranch commented 1 year ago

I do wonder if imports/exports has to be a show-stopper, though. If we emit a module-format-controlling package.json to the root of a given outDir, we could copy imports and exports, modifying relative paths in all the keys. That strikes me as simpler, safer, and more scoped to new code than trying to modify every emitted filename and every relative module specifier in the program.

rbuckton commented 1 year ago

I do wonder if imports/exports has to be a show-stopper, though. If we emit a module-format-controlling package.json to the root of a given outDir, we could copy imports and exports, modifying relative paths in all the keys. That strikes me as simpler, safer, and more scoped to new code than trying to modify every emitted filename and every relative module specifier in the program.

I'm not sure if there are other implications to using stub package.json files we would have to consider. I would expect that extension mangling would be far easier and far more reliable than trying to rewrite exports and imports.

There are a number of other considerations as well. Some packages expect to walk up to find package.json for version information to use in logging, for configuration information to use at runtime, etc. That would no longer work if they switched to this mode, or they'd have to rewrite that logic to cater to our behavior. There just seems to me to be a lot more that can go wrong with this approach.

fatcerberus commented 1 year ago

modifying relative paths

Doesn't this open the exact same can of worms as rewriting paths in import specifiers?

andrewbranch commented 1 year ago

No, it opens a can of worms which is a strict subset of the can of worms you mentioned.

egasimus commented 1 year ago

Shameless plug: I wrote hackbg/ubik to solve this exact scenario (source since the link seems to be missing from NPM). It works well enough in my case (publishing CJS, ESM and DTS side by side from the same TS source) -- though admittedly it's still slightly rough around the edges. Give it a try if you want, feedback appreciated :wink:

knightedcodemonkey commented 1 year ago

If you're tired of waiting for this, I would recommend @knighted/duel (of course I would).

knightedcodemonkey commented 1 year ago

I know we've been reticent to rewrite imports during emit, but I think building a foo.ts with dual emit to a foo.cjs and foo.mjs, and rewriting module specifiers to match the extension is probably the most reliable mechanism.

@rbuckton gets it. Except, make sure only one package.json file is required, the two package.json thing is not necessary. Or just opt out of supporting the two file new extensions.

alshdavid commented 11 months ago

RE: Comment

For example, consider a structure like this:


/dist/cjs/package.json    # contains `{ "type": "commonjs" }`
/dist/cjs/index.js
/dist/cjs/index.d.ts
/dist/esm/package.json    # contains `{ "type": "module" }`
.....

This risks TypeScript emitting incorrect output code because TypeScript infers module type checking rules from the type: module property on the root package.json.

If it is set to commonjs, TypeScript will not check for things like extensions on imports - something required by ESM Node. This means you could emit code that will pass compile time checks but fail at runtime.

The approach I have taken is similar but requires adding another package.json to the src directory with type: module:

/dist/cjs/package.json    # contains `{ "type": "commonjs" }`
/dist/cjs/index.js
/dist/cjs/index.d.ts
/dist/esm/package.json    # contains `{ "type": "module" }`
/dist/esm/index.js
/dist/esm/index.d.ts
/src/package.json         # contains `{ "type": "module" }` <------ Adding this
/src/index.ts             # source to be built as a dual module
/package.json             # contains `{ "type": "commonjs" }`

and a root package.json like this:

{
  "type": "commonjs",
  "exports": {
    ".": {
      "require": "./dist/cjs/index.js",
      "import": "./dist/esm/index.js"
    }
}

With this configuration and compiler options module: Node16 and moduleResolution: Node16, TypeScript will check the source using the rules for Node with native modules.

You also cannot use tsc to compile the output because TypeScript still uses the type: module property to infer the output type with module: Node16 and moduleResolution: Node16. So I use swc to compile the source to JavaScript and tsc to check the types.

jakebailey commented 11 months ago

What you're describing is exactly what tshy does, linked in your other issue: https://github.com/microsoft/TypeScript/issues/55925#issuecomment-1741827024

If you're not type checking twice, there's no guarantee that things will work the way you expect.

alshdavid commented 11 months ago

tshy is a cool solution, very Parcel-like - but I would prefer to limit additional tooling as much as possible. Listing the army of tools needed for just a basic production-ready TypeScript project already sounds like the start of an incantation 😆.

I guess I could write a build script that rewrites the package.json to toggle the type between module and commonjs before running the build command. I couldn't run the build processes in parallel though.

Perhaps adding a compiler option like moduleOutputFormat: esmodule | commonjs | default to replace package.json inference might be better?

andrewbranch commented 11 months ago

Considering making the format detection customizable as part of #55221 (e.g. input-relative package.json, output-relative package.json, force ESM, force CommonJS)

vilicvane commented 11 months ago

I think we are really making things overcomplicated. What about just bring the compatibility back, and a suppressible warning and we are good as the old days. And then we can discuss the refined solution with ease.

romainmenke commented 10 months ago

I just tried again to fix the types for all the PostCSS plugins under @csstools, and again I failed.

Our requirements :

Our users requirements :

Our packages on npm contain both .cjs and .mjs versions, this bloats the npm package size but it is the only way to satisfy all our requirements.

However it is impractical to also have correct types for all our users and all their configs.

The only way I see that we could do this is to prevent TypeScript from emitting types and to then manually write .d.ts files for both variants. Which kinda defeats the purpose of using TypeScript in the first place and this is a maintenance nightmare.


Hurdles/issues which I expect to be solved by the TypeScript team, not because I take it for granted that others do this work, but because I suspect that they are best positioned to solve this.

For me these two are linked. I fail to properly setup any of the hacks to dual emit declarations because I end up having to change tsconfig files which then breaks other things. But I don't know if that breakage is expected and how to proceed from there.

I don't want to randomly change our tsconfig files because our current setup works for most of our users.

I really hope that something can be done here because this is holding back so much progress. If dual emit did work, more packages could easily set this up correctly. That would allow more end users to migrate to es modules without encountering error messages they don't know how to resolve.

romainmenke commented 6 months ago

In the end we decided to not do dual typing even though we are dual publishing as commonjs and es modules.

There is just no way to get this working in our case without making the whole substantially worse and harder to maintain.

We are now only providing types for es modules. This is a short term solution. In the long term we will drop support for commonjs entirely.

knightedcodemonkey commented 3 months ago

Since there is no real analogue to top level await in CommonJS, you might as well just wait until --experimental-require-module reaches Stable status.

See Loading ECMAScript modules using require().

Tobbe commented 1 month ago

@jakebailey

If you're not type checking twice, there's no guarantee that things will work the way you expect.

If I'm only using tsc for emitting types, is there any reason to run TSC twice? Or can I just reuse the esm types for the cjs output? Is there any scenario where the esm and cjs types would differ?

jakebailey commented 1 month ago

Surely you're using tsc to typecheck, not just emit, so running twice would still be the only way to check two times that your code functions in both modes. Some dep may have totally different types depending on whether it was required versus imported.

Tobbe commented 1 month ago

Thanks for your reply 🙏

Surely you're using tsc to typecheck, not just emit

Yes, you're right 🙂 What I meant was I was using it with emitDeclarationOnly: true

Some of our packages I can get to build for CJS pretty much as they are by just (temporarily) changing to type: "commonjs". But for some I have to run TSC with module: 'commonjs', and that just feels wrong 🙁 Back to the drawing board I guess! 😅

(We're building ESM to /dist and CJS to /dist/cjs and placing a package.json with just the correct type in each of those directories)

jakebailey commented 1 month ago

(We're building ESM to /dist and CJS to /dist/cjs and placing a package.json with just the correct type in each of those directories)

FWIW this is pretty much exactly what https://www.npmjs.com/package/tshy automates.