nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.61k stars 29.59k forks source link

remove the warning about performance hit in --experimental-detect-module #52803

Open mcollina opened 6 months ago

mcollina commented 6 months ago

I did some experiments with --experimental-detect-module and I've found that the added cost of "typeless" package.json is almost negligible, around 1ms on my system.

I recommend we remove the warning.

mcollina commented 6 months ago

cc @GeoffreyBooth @joyeecheung

GeoffreyBooth commented 6 months ago

It’s certainly much reduced since the initial commit where that warning was added; probably next to nothing for CommonJS files, as that’s what we’ve optimized for. I think it’s probably still a noticeable impact for ESM files that are ambiguous (like a .js file using import/export where the package.json has no type field). Did you measure that case?

mcollina commented 6 months ago

yes... 1ms on my machine.

GeoffreyBooth commented 6 months ago

Well that’s great news, I wasn’t expecting that. 1ms for what, a single file? Or an entire app, like if you had a thousand ESM files in a "type": "module" scope and you removed the type field?

I guess the question then is what are the downsides of not trying to discourage detection? If I remember correctly, the primary intent with detection was to cover two cases:

With the thinking being that the performance hit of detection was outweighed in these cases by the UX improvement, of having the my-node-script command work or having the beginner’s code work without first needing to learn about .mjs or type. But that aside from these niche cases, most apps could and should specify the type field to avoid the performance penalty.

But if there is no performance hit, or a negligible one, then why wouldn’t everyone just rely on detection all the time? Why bother setting the type field? I feel like there must be reasons that people would want to still do so, like compatibility with build or linting tools maybe; but I don’t know concrete examples. Maybe there aren’t any, or not any compelling ones, and we shouldn’t bother trying to encourage the type field or .mjs/.cjs? It feels wrong even to suggest that, but off the top of my head I can’t think of the argument why not.

mcollina commented 6 months ago

I would not push our luck too much, as 1ms for 1000 modules is 1s, and I've seen apps with that many.

I would likely not emit the warning for the top level/entrypoint, but still for everything inside node_modules.

GeoffreyBooth commented 6 months ago

I would likely not emit the warning for the top level/entrypoint, but still for everything inside node_modules.

The problem there is that warning for node_modules is really annoying, as the code in there is generally outside of the user’s control. Like what are they supposed to do to address the warning; patch the dependency? Ask its maintainer to add the type field? Choose something else?

We discussed warning on node_modules early on and we were thinking that it wouldn’t matter so much for there because library authors would hear from users really fast if they published libraries that only work on Node versions with detection enabled by default. Eventually once detection has been enabled by default for years and the older versions of Node are EOL, that might be more of a reason to warn; though who knows what assumptions we should make by then. Maybe by 2027 we flip to attempting to parse as ESM first and fall back to CommonJS, if the ESM migration is finally over the hump by then, and so the only detection performance penalty is for CommonJS files relying on it.

mcollina commented 6 months ago

Then not warning at all is actually ok.

RomainLanz commented 6 months ago

I have run a benchmark on one of my "huge" project using ESM on Windows. Here are the results.

Node 20 with "type": "module"

Benchmark 1: node bin/server.js
  Time (mean ± σ):     728.4 ms ±  10.6 ms    [User: 87.5 ms, System: 42.2 ms]
  Range (min … max):   713.6 ms … 744.4 ms    10 runs

Node 22 with "type": "module"

Benchmark 1: node bin/server.js
  Time (mean ± σ):     739.5 ms ±  35.4 ms    [User: 78.1 ms, System: 34.4 ms]
  Range (min … max):   644.4 ms … 767.0 ms    10 runs

Node 22 with --experimental-detect-module

Benchmark 1: node --experimental-detect-module .\bin\server.js
  Time (mean ± σ):     744.8 ms ±  16.2 ms    [User: 118.8 ms, System: 37.5 ms]
  Range (min … max):   728.8 ms … 785.5 ms    10 runs
GeoffreyBooth commented 6 months ago

I have run a benchmark on one of my “huge” project using ESM on Windows

How many ESM files is this, and how many lines of code?

I also don’t quite know what to make of these results. The Node 20 numbers are interesting but not really relevant; I’m comparing Node 22 with and without the flag. If I wanted to boil it down to “here’s how much slower the flag makes the app,” am I comparing “Time (mean)” so 739.5 ms to 744.8 ms, or 0.7% slower? Or is it more relevant to compare the “User” numbers, 78.1 to 118.8, so 34% slower?

Or is what’s really happening here that the extra CPU cost of detection is irrelevant because the async file I/O takes so much longer? As in, without the flag the CPU is more idle than with the flag, but in the end the process takes almost exactly the same amount of wallclock time because the bottleneck is the I/O and not the CPU?

joyeecheung commented 6 months ago

The warning is defending against the case where there is a package.json, but it doesn’t contain “type”: “module” even though the .js modules inside are using ESM syntax. To hit the case that it is defending against, you will need to recursively remove all the “type” : “modules” entry in your package.jsons (assuming you have any, and that most of your dependencies use real ESM and count on this to work), and run it with the flag; then compare it with the case where this detection flag is unnecessary (when you still have the type field everywhere). you can’t compare it with a case where you still have package.json with type fields because without the flag that module graph would not load at all currently.

RomainLanz commented 6 months ago

How many ESM files is this, and how many lines of code?

I am not sure how I can count the loaded file since I am basically starting one of my application and killing it once the http server start (so it is the initialization time of the application).

I could re-run it many times to see if the number differ and if the "User" number seems related to the removal of the type in the package.json.

you will need to recursively remove all the “type” : “modules” entry in your package.jsons

Do you mean I should also remove it from all my dependencies to create a fair test?

GeoffreyBooth commented 6 months ago

@RomainLanz I just mean count the number of JavaScript files and lines of code in your project. So delete/move away node_modules and then run commands to count the number of .js files, and sum the lines of code. (Not sure what those commands would be for Windows but I’m sure you can Google it.)

The benchmarks that we want to compare are:

Assuming you probably have only one package.json for your project, that’s all you need to do. I wouldn’t modify anything in your node_modules, as that’s not realistic that typical users would do. If you’re using Hyperfine, please include the summary lines at the end (“such and such is 2x faster than such and such”).

joyeecheung commented 6 months ago

Do you mean I should also remove it from all my dependencies to create a fair test?

I would say yes because removing the warning effectively encourages packages to publish without adding a type field in their package.json and that’s what we don’t want. The warning here would show up in their CI to prevent packages from accidentally publish without the type field. Otherwise it could very well be interpreted as “it’s fine to publish ESM in .js without a type field now!” And the ecosystem start to get slower as it becomes the norm and Node.js commonly has to second guess the module type.

mcollina commented 6 months ago

Usually, the community is pretty good at correcting those things, so I don't expect it to be a problem.

targos commented 6 months ago

I don't think the ecosystem is going to start relying on automatic detection. This will break TypeScript, ESLint, etc.

joyeecheung commented 6 months ago

If we are confident that the ecosystem will add the type field, then those who do won’t be hitting the warning anyway. Either enough people will rely on it so we should prevent it from being a norm, or not enough people will rely on it then only very few people get to be bothered by this warning and it’s harmless to keep.

mcollina commented 6 months ago

Either enough people will rely on it so we should prevent it from being a norm, or not enough people will rely on it then only very few people get to be bothered by this warning and it’s harmless to keep.

Well, my main concern is developers who do npm init for a new app, and I would prefer they not see a warning for a few millis of delay. The DX benefits are not worth the warning.

GeoffreyBooth commented 6 months ago

And not to sound like a heretic on performance, but do we generally warn on performance concerns? Like there are all sorts of slow patterns that we could warn users about—Warning: This .forEach would be faster as for ... of—but I feel like in general we leave it as the user’s responsibility to write performant code. We try to nudge them in the performant direction, for sure, by making the performant approach the easiest or default option when possible; but if something works, and isn’t unsafe, is that worth warning about?

(Aside from experimental warnings that serve the purpose of warning about potential breaking changes; we definitely still want those warnings.)

RomainLanz commented 6 months ago

I just mean count the number of JavaScript files and lines of code in your project. So delete/move away node_modules

That would be highly complicated to count. As said, it is a real world project that depends a lot on third party libraries (coming from the node_modules). It is an AdonisJS application with many addons/providers that is "slow" to start in either way (with or without the type: "module" in the package.json). I can provide some examples if you want to run your own benchmark.

I shared the numbers about to show that there are no real difference with and without the flag. Meaning I believe we can remove the warning.

Well, my main concern is developers who do npm init for a new app, and I would prefer they not see a warning for a few millis of delay. The DX benefits are not worth the warning.

Will the warning stay when the feature become unflagged?

GeoffreyBooth commented 6 months ago

That would be highly complicated to count. As said, it is a real world project that depends a lot on third party libraries (coming from the node_modules).

If you can run Bash, you can count them via:

find . -type f -name "*.js" -not -path "*/node_modules/*" | wc -l

And count lines of code via:

find . -type f -name "*.js" -not -path "*/node_modules/*" | xargs wc -l

I’m excluding node_modules because I don’t expect users to be using detection for their dependencies.

Will the warning stay when the feature become unflagged?

The warning about performance would presumably stay, yes, because it doesn’t relate to the feature’s experimental status.

joyeecheung commented 6 months ago

Well, my main concern is developers who do npm init for a new app, and I would prefer they not see a warning for a few millis of delay. The DX benefits are not worth the warning.

Shouldn't the proper solution to that problem be including "type" in npm init?

GeoffreyBooth commented 6 months ago

Shouldn't the proper solution to that problem be including "type" in npm init?

https://github.com/nodejs/TSC/issues/1445#issuecomment-1742118965

joyeecheung commented 1 month ago

npm is considering adding the type field to npm init: https://github.com/npm/init-package-json/pull/302

(Although it seems the general stance from npm is - people shouldn't really be using npm init for anything other than basic apps - for real apps they should use the create- packages, which should configure the type field properly )

I did some local benchmarking out of curiosity - currently loading a ESM graph with detection is about 45% slower than loading the graph without detection (by just adding "type": "module" to pacakge.json) https://github.com/nodejs/node/pull/55238

Running the fixtures from the benchmarks locally, the numbers look like this:

$ hyperfine "../node/out/Release/node load.js"
Benchmark 1: ../node/out/Release/node load.js
  Time (mean ± σ):      1.799 s ±  0.047 s    [User: 1.827 s, System: 0.636 s]
  Range (min … max):    1.738 s …  1.875 s    10 runs

$ echo '{ "type": "module" }' > package.json
$ hyperfine "../node/out/Release/node load.js"
Benchmark 1: ../node/out/Release/node load.js
  Time (mean ± σ):      1.135 s ±  0.027 s    [User: 1.315 s, System: 0.491 s]
  Range (min … max):    1.088 s …  1.195 s    10 runs
mcollina commented 3 weeks ago

@joyeecheung good work. It seems the overhead get worse the more files it adds.

Should I close this then?

joyeecheung commented 3 weeks ago

In addition to performance hit, I think the warning can also be also updated a bit to remind about the possibility of misinterpretation and be more explicit about what's causing the ESM interpretation. e.g. if a beginner attempts to throw a top-level await into an existing CommonJS file:

const fs = require('fs');
await fs.promises.readFile('./README.md');

Previously you get

await fs.promises.readFile('./README.md');
^^^^^

SyntaxError: await is only valid in async functions and the top level bodies of modules

Now you get

const fs = require('fs');
           ^

ReferenceError: require is not defined in ES module scope, you can use import instead

At least, the second hint is confusing/not as practical when they are already modifying a CommonJS codebase (and today it's still very common for CommonJS code bases to have package.json without "type", so it's easier for them to fall into this, whereas ESM code bases that uses .js have been doing "type": "module" anyway so everything would've been clearer), and it would useful for them to learn why require is now suddently unavailable in this file. This also applies to other less-obvious syntaxes that can trigger the ESM interpretation (e.g. const module = ..).