microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.19k stars 12.38k forks source link

Problems with Node.js `--experimental-detect-module` #56678

Open andrewbranch opened 9 months ago

andrewbranch commented 9 months ago

Node.js’s --experimental-detect-module flag poses a problem for TypeScript. If a future release of Node.js enables it by default, there is little we can do to model its semantics in our own module system under --module nodenext, which could lead to a confusing developer experience for TypeScript and JavaScript authors relying on tsc or editor language features in projects configured for Node.js.

This is fundamentally a TypeScript problem, and I don’t know that the severity merits lobbying Node.js to change course. However, I still thought it was important to document and share as feedback.

The problem

  1. When TypeScript reads a type declaration file, it needs to know the true module format of the JavaScript file it represents.
  2. The contents of a declaration file do not always provide a reliable indication of the true module format of the corresponding JavaScript file.

When a user compiles a TypeScript file like

export default 0;

they can vary the module format of the output JavaScript file by changing the --module flag, but the output declaration file always looks like

declare const _default: 0;
export default _default;

So, that declaration file text on its own is ambiguous. It could represent either of these JavaScript files:

export default 0;
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.default = 0;

or scripts in AMD or SystemJS format.

For TypeScript users who want to run their code in Node.js, it’s important that TypeScript can distinguish between these two cases, because it affects whether those users can access that module with certain types of imports, and what type should be assigned to the names imported from that module:

import z1 = require("export-default-zero"); // Error if resolved file is ESM

import z2 from "export-default-zero";
z2.default; // 0 if resolved file is CJS
            // undefined if resolved file is ESM

Currently, Node.js’s own module format detection algorithm resolves this ambiguity for us. The file extension of the declaration file tells us the file extension of its JavaScript counterpart, which tells us how Node.js will interpret that file’s module format:

Declaration extension JavaScript extension Module format
.d.mts .mjs ESM
.d.cts .cjs CJS
.d.ts .js Determined by package.json

These constraints allow us to make a safe assumption about the format of every JavaScript file represented by a declaration file. For example, when we see a .d.ts file, we know it represents a .js file whose format is determined by its package.json scope—if we find that package.json and see that it does not include a "type" field, we assume the JavaScript file contains CommonJS syntax. If that assumption is wrong, the user’s problem is that their JavaScript file is misconfigured for usage in Node.js—they don’t have a problem with TypeScript.

--experimental-detect-module prevents us from being able to make safe assumptions about the module format of .js files whose package.json scope does not define a "type". The flag makes Node.js detect the module format of those files by reading their contents, but TypeScript can’t do the same in the parallel world of declaration files due to the syntax ambiguity discussed earlier.

Possible mitigations

Resolve and read JavaScript files when declaration files are ambiguous

TypeScript avoids reading, or even checking for the existence of, JavaScript files when declaration files are found first. The declaration files currently tell the compiler everything it needs to know about the presence and contents of JavaScript files, so there’s no point in spending extra memory and file I/O reading JavaScript files. If that changes, we could potentially resolve ambiguity by looking at JavaScript content. This would come with a performance penalty. Additonally, multiple team members expressed discomfort at the prospect of giving up the invariant of declaration files being ultimate sources of truth for the compiler. As things stand now, this is the mitigation we’re least likely to pursue.

Eliminate declaration file ambiguity going forward

We could begin changing our declaration file emit format to ensure that module formats are not ambiguous in the future. This is something I strongly advocate for, but it isn’t a solution on its own, because so many existing packages are already ambiguous, and many will never update.

Assume ambiguous files are CommonJS (i.e., do nothing)

If Node.js made --experimental-detect-module the default in the future, we would advocate strongly that all packages define a "type" in their root package.json to avoid ambiguity. (We would likely consider mandating this when compiling local projects under --module nodenext.) We could rely on this becoming best practice for new and actively maintained packages, and assume that packages lacking a "type" field were published before --experimental-detect-module, when such a lack of a "type" implied CommonJS.

Type imports of ambiguous files as permissively as possible

It might be possible to do some type system trickery to make something like this work:

import z1 from "export-default-zero";
z1;         // Type: AmbiguousModule<0> ~= 0 & { default: 0 }
z1.default; // Type: 0

This would be an attempt to get out of the way and never issue a false positive error, at the expense of missing some real errors.

Allow users to assert the format of ambiguous imports

In combination with some other behavior as the default, we could allow users to resolve the ambiguity themselves, perhaps with an import attribute or some central configuration.

aduh95 commented 9 months ago

we would advocate strongly that all packages define a "type" in their root package.json to avoid ambiguity. (We would likely consider mandating this when compiling local projects under --module nodenext.

I'd in favor of that, even if we end up using one of the other options to workaround that issue, or if Node.js never makes --experimental-detect-module the default; pushing for less ambiguity is a win for the whole ecosystem IMHO.

GeoffreyBooth commented 9 months ago

we would advocate strongly that all packages define a "type" in their root package.json to avoid ambiguity.

Node also encourages everyone to always include a "type" field, especially for published packages: https://nodejs.org/api/packages.html#determining-module-system:~:text=For%20this%20reason,should%20be%20interpreted. Perhaps tsc could sometimes error when "type" is missing, such as when run in a version of Node where detect-module is enabled by default or when certain settings are specified.

andrewbranch commented 9 months ago

That’s something we might consider for a user’s project package.json, but the real issue for us is the thousands of existing npm packages that have no "type" and will never be updated to get one. We can’t reasonably error on those, but we still need to know what format their .js files are.

karlhorky commented 9 months ago

As Node.js tries to move forward and navigate the migration to ESM, wonder if any large-scale ecosystem projects would help.

First ideas just off the top of my head to motivate conversation, of course each idea with downsides / ownership concerns:

1) Publish package details

1a) npm does a 1-time analysis of all 2.6 million packages to find out the module format, using the same heuristics Node.js does with the --experimental-detect-module flag 1b) npm provides a location where these details will be published (eg. returning the data in the existing API / registry routes or providing a completely separate location) 1c) npm performs this analysis + publishing on every publish of a module without the "type" field in package.json 1d) npm provides deprecation warnings for packages published without "type" in package.json 1e) npm fails to allow publishing of modules without "type" in package.json (after deprecation period)

1a-1c could also be taken over by a different organization eg. DefinitelyTyped / Microsoft in case the value to npm wouldn't be high enough.

2) Serve "type": "commonjs"

2a) npm serves all package.json files which are published without "type" as package.json files with "type": "commonjs"

This one feels more controversial and complicated, altering the files returned by a package. I'm sure there are a number of storage, caching and consistency concerns here (maybe also security concerns).


I don't expect these to be taken into consideration wholesale without deeper investigation and proper planning. I just mean to open the discussion that potentially some large-scale work would make the transition to ESM easier.

cc @mylesborins @lukekarrys

GeoffreyBooth commented 9 months ago

2a) npm serves all package.json files which are published without "type" as package.json files with "type": "commonjs"

Node asked npm about this when --experimental-default-type was being designed, and they said that it’s a hard requirement that files aren’t modified by npm once they’re published, for security reasons. That’s why --experimental-default-type doesn’t apply to folders below node_modules, because all typeless packages would be broken under --experimental-default-type=module and therefore the flag would be unusable unless users did some kind of post-install transform of all their dependencies to add the missing "type": "commonjs" entries.

--experimental-detect-module doesn’t have this issue, as code that can run as CommonJS will run as CommonJS, and so it doesn’t have a node_modules exception. It’s also preferable to have code behave the same way whether or not it’s under node_modules, so that the library author and the library consumer have code run the same way (at least when they’re both using the same version of Node).

GeoffreyBooth commented 9 months ago

TypeScript avoids reading, or even checking for the existence of, JavaScript files when declaration files are found first.

For what it’s worth, within Node I implemented a containsModuleSyntax function in C++ to answer this one question very quickly. We could expose this as a public API if TypeScript might find it useful. It could be exposed before --experimental-detect-module is enabled by default, so therefore TypeScript could include logic like “if the Node version is < the version where module detection is enabled by default, assume CommonJS (prior behavior); else read the file and use Node’s containModuleSyntax to see how Node would interpret it.” The new API would exist in any version of Node where TypeScript would need it, because of the ability to assume the current behavior for prior versions.

Yes this would be a performance penalty, just like detection itself is a performance penalty, which is why we encourage everyone (especially package authors) to include "type" fields. (Or potentially even add missing "type" fields to dependencies in a post-install pass.) But it would be the most direct solution, at least until packages are updated with "type" fields and/or unambiguous type definition files.

andrewbranch commented 9 months ago

I experimented with ways to mitigate the performance penalty of reading and analyzing additional JavaScript files. We have two limitations that take all the good options off the table: (1) our API is synchronous, so parallelizing the work within the same process is not practical, and (2) file I/O is delegated through a host object provided via the JavaScript API (this is what lets us read from unsaved editor files in TS Server, for example), so calling into another process (whether native code or parallelized JS) that performs file system operations is not possible.

guybedford commented 9 months ago

@andrewbranch I know you likely don't want to be implying Node.js directions here, but I'd be interested to hear if you feel this motivates Node.js not unflagging this feature in future?

andrewbranch commented 9 months ago

It’s complicated. I keep waffling between two plausible perspectives.

On one hand, if we think about think about the scenarios that are most likely to be problems for TypeScript in an unflagged future, those are scenarios that are currently problems for Node.js and will be fixed by unflagging this behavior—loading “unmarked” ESM from packages that included it for bundlers, for example. Meanwhile, TypeScript’s declaration files being ambiguous is a problem of TypeScript’s own creation. Node.js is doing right by their users, and it’s on TypeScript to solve their side of things.

On the other hand, given my present feedback that TypeScript won’t be able to solve our side of things to the degree that we would hope, Node.js probably needs to adjust their expectations for how much unflagging --experimental-detect-module is going to improve the end user experience. If Node.js Just Works™ in more cases than it used to, but most of those cases retain red squigglies in VS Code, that’s not a very satisfying improvement for a huge portion of Node.js users. The Node.js team might appreciate knowing that the problem lies squarely with TypeScript, but users are less interested in watching Node.js and TypeScript perform spiderman-pointing-at-spiderman.png than having an experience that works end to end, and I wonder if incrementally changing interop rules, even if in a nominally desirable direction, creates more confusion than it resolves.

I think a good analysis here would require a better understanding of the most common consequences for users. I’ve laid out a theory of how things could go wrong, but I’m not yet sure how often we would see problems and how bad they would be. I’m curious what other feedback about this Node.js has received, and whether they are still leaning towards unflagging, and on what kind of timeline?

I should also point out that because our issues center around declaration files, node_modules packages are a much bigger problem for us than local project files, and especially the loose .js file scenario. I don’t know if it would be on the table to reconsider disabling this behavior inside node_modules, but that would largely eliminate my concerns.

GeoffreyBooth commented 9 months ago

I’m curious what other feedback about this Node.js has received

We haven’t received much. There’s really only one open issue, https://github.com/nodejs/node/issues/50917, which involves the edge case of non-ESM-related parse errors in a file. The “try again as ESM” behavior is triggered by an ESM-related parse error, but if V8 errors on something else before getting to the first import or export statement, the retry never happens; and some of these parse errors are valid when parsed as ESM. The solution is to either just document this, that this is another edge case that won’t trigger running as ESM (like how we currently document top-level await as not triggering detection) or we could remove the limitation of “only ESM parse errors” and try again for any parse error. Either way, though, the solution would be simple.

Once that’s resolved, unless anything else comes up then it’s really just a question of how much more time we want to wait before considering this having been out in the wild long enough to unflag it. Optimistically the earliest that could happen would be in the next few months, sometime near the end of the 21.x line so that we get maximum feedback before 22.0.0 in April. Or we could unflag it as part of 22.0.0, though it doesn’t necessarily need to wait until then as it’s not a semver-major change. And of course it could always happen later, in the 22.x line or beyond, especially if issues are reported.

I think the fundamental thing here is that Node has accepted (or will be accepting, if we unflag this) a conditional performance hit in the name of usability. TypeScript could do the same: that’s the first option in the above list of potential solutions, where tsc reads the accompanying JavaScript file. If Node and TypeScript were managed together as one project, that would presumably be a decision made together; that if Node behaves one way, TypeScript would behave the same. I am very sympathetic to the desire to not slow down any users, but there are many ways to both match Node’s behavior and mitigate or eliminate the performance impact on TypeScript’s side; for example, some new tsconfig option like "dependencyTypeDefinitionsModuleFormat": "commonjs" / "module" / "detect", where the default is "detect" (performance hit, but matches Node) and users could set "commonjs" to preserve current behavior and current performance. I’m not sure it’s worth creating the potential footgun on the Node side of a node_modules exception for this feature, thereby limiting everyone, when TypeScript has the ability to create such an exception just for itself.

fatcerberus commented 9 months ago

watching Node.js and TypeScript perform spiderman-pointing-at-spiderman.png

It’s honestly amazing how often this meme is relevant to the dev scene in general, especially JS. Tech stacks become so intertwined that everyone starts looking at what everyone else is doing for direction, and then before you know it, bam, Spider-Man meme happens.

guybedford commented 9 months ago

I don’t know if it would be on the table to reconsider disabling this behavior inside node_modules, but that would largely eliminate my concerns.

I don't think this would be an option unfortunately, since having packages work locally then break on publish and install would be a really bad footgun.

It's plausible we could unflag in 22 as @GeoffreyBooth describes. So if you do have any strong arguments against that now would be the time to mention I think.

ljharb commented 9 months ago

"not unflagging" still seems like the best option here; the usability isn't worth the risks.

andrewbranch commented 9 months ago

I momentarily forgot about the footgun of a package working during local development and breaking when installed into node_modules. I agree that’s not worth it.

My instinct is that there are non-performance reasons why relying on JavaScript files to resolve declaration file ambiguity is unappealing, but so far they’re mostly rooted in principle and history. @DanielRosenwasser and I discussed this recently and he felt the same way. Maybe he can weigh in on how much that approach feels on or off the table.

GeoffreyBooth commented 9 months ago

This isn’t really an issue yet, is it? Like until someone publishes a package where they rely on detection to work—so they put ESM syntax in .js files and leave out the type field—the ambiguous type definition files always refer to “running as CommonJS” JavaScript files, right?

And it’ll be a long time before anyone publishes a package that relies on detection without getting complaints from users, since any users installing that package in Node <= 21 would try to run it as CommonJS and it would error for them.

So it’s probably safe the vast majority of the time for TypeScript to assume that ambiguous type definition files refer to CommonJS sources. If TypeScript changes the type definition files to be unambiguous soon, then by the time the day comes that someone publishes a “relies on detection” package that runs in most of the versions of Node active by then, that person will likely also be using a future version of TypeScript that outputs an unambiguous type definition file.

There are several “most of the time” and “vast majority” qualifiers in the above, though. It’s definitely possible for someone to slip through the cracks of the edge cases and publish a “relies on detection” package that can’t run in Node < 21 but for whatever reason they generated type definition files with TypeScript 4. That said, it’s always possible for someone to publish incorrect types for a package. I guess the question is what your tolerance level is for this kind of thing. People publish broken packages all the time, that fail in certain versions of Node or certain runtimes or bundlers, so it’s perhaps an acceptable edge case to assume that ambiguous type definition files mean CommonJS and going forward they need to be unambiguous.

mcollina commented 9 months ago

I don't think this is an issue about this new option, but rather about TS in general. More specifically, supporting for CJS, ESM, TS(CJS), TS(ESM) from a single .js file is very hard.

Specifically we have to do stuff like this in fastify:

module.exports = myfn
module.exports.default = myfn
module.exports.myfn

This needs to be matched by a .d.ts file like this:

namespace myfn {
  export const myfn: MyFn
  export { myfn as default }  
}

export = myfn

With all of this, import myFn from 'myfn' and import { myfn } from 'myfn' would work on all the combinations of CJS+ESM+TS above.

Note that this problem is so significant that multiple modules have to:

const X = require('anothermodule')

const x = new X.default() // because default export is not recognized

An alternative solution for module authoring is to https://github.com/isaacs/tshy and have both a CJS and ESM being built.

I do think the new option makes it worse, mostly because it exposes most developers to the same complex DX we have to do on the module authoring side.


I momentarily forgot about the footgun of a package working during local development and breaking when installed into node_modules. I agree that’s not worth it.

This is something that we might want to revisit. Can you open a separate issue about this feedback?


We should not plan to unflag in 22 before the TypeScript story is sorted.

However, I think simplifying the developer experience for CJS+ESM+TS is something we should strive for, as it's one of the major DX complaints we are hearing from Node.js users. Let's collaborate.

@andrewbranch @DanielRosenwasser what behavior would work for you? How do you think we can solve this problem?

andrewbranch commented 9 months ago

@mcollina I would love to talk about default exports but I think it’s a separate issue to work out, pretty much unrelated to --experimental-detect-module. Unless Node.js is prepared to allow require(esm) and recognize __esModule the same way Bun and every bundler do (and my understanding is neither of those is on the table), we’re going to need to know the true module format of every file in a program because they affect the types we need to assign to imports. Even if we were to come up with a perfect solution around default export interop, we would still like to protect people from trying to require(esm) at compile time. If both of those things were solved, --experimental-detect-module wouldn’t matter to us because interop would be transparent enough that we would never need to know whether a declaration file represented a CJS or ES module. (This is how TypeScript works when configured for Bun, tsx, bundlers, etc.) But then again, if those things were solved, --experimental-detect-module wouldn’t be necessary in the first place 😄

Unless I’m mistaken and those are up for discussion, we’ll need to continue to find ways of discerning the module format of files Node.js is going to load, and we can iterate on default export transpilation to try to make it more portable. But (a) I think it’s a separate issue, and (b) it’s a conversation that’s not specific to TypeScript; it would be better to discuss with Babel, esbuild, Rollup, etc.

GeoffreyBooth commented 9 months ago

node_modules packages are a much bigger problem for us than local project files

Could DefinitelyTyped help here? They can essentially publish new types for old/already published packages, can’t they? So perhaps a solution could look something like this:

There’s obviously more complexity than just this, like currently DefinitelyTyped doesn’t have types for packages that include their own and that would need to change, but the broader point is that we do have a way of updating the types for already-published packages.

andrewbranch commented 9 months ago

Yes, DefinitelyTyped can be updated in bulk, which helps for a large number of old/stale but still-popular packages, but not for ones that ship their own types. I’m not sure we would want to use DT to add module format info for packages that do ship their own types (much less duplicate those types to DT), but Daniel and I did briefly mention the idea of shipping a Bloom filter encoding a set of popular npm packages that are definitely CommonJS. I forgot to mention it in my OP because it was one of the more out-there ideas. I’m not sure it would be a big improvement over “always assume CommonJS.”

Skalman commented 9 months ago

Here's an alternative solution, that I think would be easy for developers to understand, though it would involve many more organizations than just Node.js and TypeScript, though mainly npm.

I don't know if this is at all workable, but it's the easiest solution I can think of.

ljharb commented 9 months ago

I don't think it's workable for anyone to refuse to publish new packages that don't have "type".

GeoffreyBooth commented 9 months ago
  • Various tools start refusing to publish new packages that don’t have "type" set

We asked npm about this when we were working on --experimental-default-type and the answer was basically no, because the npm registry is for more than just Node stuff, and targets like browsers don’t need a type field. They also won’t ever change an already-published package, say to add the type field, because of security; current consumers of old packages might be checking integrity checksums, and those are never supposed to change.

GeoffreyBooth commented 9 months ago

I’m not sure it would be a big improvement over “always assume CommonJS.”

Stepping back for a second, what if we did assume all ambiguous type definition files were just always CommonJS? Isn’t the problem limited to a potential future case where someone publishes a package with ambiguous type definition files where those packages only work when consumed by users with detection enabled (so ESM syntax and no type field)? But no one’s published such a package yet, and likely won’t for months if not years. If you changed the type definition output to be unambiguous in the meantime, you’ll more or less prevent the problem unless someone is trying to slip through the cracks (i.e. using an old version of TypeScript while targeting only very new, currently future, versions of Node). That’s not really very different than someone intentionally publishing incorrect type definition files which they can already do today. So if you’re okay with the types being possibly incorrect in rare cases (aren’t we only talking about default exports here?) then the problem is solved.

So it’s really a question of what your tolerance is. If you truly wanted to ensure correct types 100% of the time, then I’d think you would always read the accompanying JavaScript file and not simply trust that the type definitions alongside it are correct. But that’s not what TypeScript currently does, so we’re already compromising on correctness in favor of performance at least somewhat.

GeoffreyBooth commented 9 months ago

Actually to amend my previous comment slightly, what if:

  1. tsc is updated ASAP to start emitting unambiguous type definition files; and
  2. Ambiguous type definition files are assumed to be CommonJS, however if a type error is thrown that could be caused by an ambiguous type definition file, the accompanying JavaScript file is read to resolve the ambiguity.

This would close the edge case at the cost of performance, but only under a condition that would be rare to occur (someone relying on detection enabled by default, and who generated the type definition files using an old version of TypeScript). The vast majority of the time would be either correctly-guessed ambiguous type definition files that are CommonJS, or unambiguous type definition files.