Closed cjihrig closed 1 month ago
Another side effect of ES loaders to a separate thread: updating require.extensions
in an es loader no longer has any effect, meaning that node --loader=ts-node/esm
no longer functions properly to load both cjs and mjs typescript.
This makes it significantly more challenging to have loaders that work for both esm and cjs, effectively breaking the esm on-ramp for those of us supporting both modes, since now you need to know ahead of time what the module type is (or write the transpiled source to another path and return a short circuit url to that instead), rather than being able to have a single loader capable of handling both.
Verified that node --loader=ts-node/esm foo.ts
stopped working on 4667b07cd2d6eb0295d6cd1f40163ef14ce538b0
cc: @nodejs/loaders
As stated in the ESM doc, async activity like console.log
(which is specifically cited in the caveat) is not guaranteed to run.
Please try other methods, such as fs.writeSync
to confirm.
I believe this is not a dupe of #47615 because it specifically provides execArgv
, thus avoiding the fork-bomb.
@JakobJingleheimer I've tried fs.writeSync()
and process._rawDebug()
with no luck.
I’m not sure how would it work in the current design. There is (by design) one loader process for the whole Node.js process. In the example above, we are asking Node.js to start another one just for that specific thread.
We need a new API to handle this case, something that:
new Worker(path, { loader } )
, so that the loader thread could be reused.We need a new API to handle this case
What you are proposing seems like an explicit API. It seems like --experimental-loader
passed to the Worker
constructor should implicitly do that (without needing to expose more public APIs).
It's interesting because it doesn't work --loader
in the worker's execArgv
, but it works fine if --loader
is passed to the main script (the worker inherits the flag).
It’s interesting because it doesn’t work
--loader
in the worker’sexecArgv
, but it works fine if--loader
is passed to the main script (the worker inherits the flag).
I don’t think of worker threads like child processes that can have their own Node flags. I feel like they should inherit whatever the parent process has, which is what @targos is describing here.
I don’t think of worker threads like child processes that can have their own Node flags.
That is the exact purpose of the execArgv
option to the Worker
constructor. Some flags are not supported there, but, for example, --require
does appear to be. And the loaders docs do claim that they follow the pattern of --require
the loaders docs do claim that they follow the pattern of
--require
Only in the sense that the way that chaining works for multiple invocations of --require
is the same way that chaining works for multiple invocations of --loader
. There’s not much else that loaders share in common with require
.
Unless there’s some argument for why this should be considered a bug, I think we should just update the docs for https://nodejs.org/api/worker_threads.html#new-workerfilename-options to clarify that --loader
is one of the flags not supported by that option.
I guess my question is, are there actually technical reasons why this can't be implemented?
I guess my question is, are there actually technical reasons why this can't be implemented?
I was about to ask this too. If it works from main, I would've expected it to work here too, but if not, I imagine it would be fairly simple to support from the esm worker 🤔 (unless we get stuck in Inception)
I guess my question is, are there actually technical reasons why this can’t be implemented?
In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think); within that loaders thread there’s shared scope that the loaders can use, such as to keep caches and such. If we support execArgv
then that would require spinning up a new loaders thread specific to that worker thread, because the --loader
argument passed to execArgv
might be different than that used for the root process. This presents a few issues:
execArgv
, doubling the memory/processing requirements of each worker thread.In short, it’s a lot of added complexity for both Node and for loader authors, and for what? I don’t see the benefit. If you want to create a new process with different loaders, spawn a child process.
@GeoffreyBooth I agree with your assessment with supporting execArgv
: it will cost too much memory and CPU time. I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases (mocking, transpiling, ...). Processes are more costly and harder to manage.
IMHO, we need a new API to handle this case, something that:
new Worker(path, { loader } )
, so that the loader thread could be reused across multiple workers.I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases
What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?
Keep in mind there’s also the registerLoader
API in progress, that allows for registering loaders after startup.
What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?
CLI tools can't really set Node.js options. registerLoader
would solve all the same use cases of execArgv
in this case. One can always set up a "jump" worker to get things done.
I didn't know it was in the works, is there a tracking issue/pr?
Looks like the PR is https://github.com/nodejs/node/pull/46826. I believe that would work for my use case.
I still think execArgv
should be an option, but I also don't care enough to push for it if there is an alternative that works for me.
I believe that would work for my use case.
What is your use case?
CLI tools can’t really set Node.js options.
We’ve been discussing adding support for .env
files: https://github.com/nodejs/node/pull/43973#issuecomment-1249549346. This would be a shorter-term solution for being able to set NODE_OPTIONS
from a file. Farther down the road I think we’d like to support a new field in package.json
(or possibly even a new config file) that would do the same thing.
What is your use case?
Using loaders in a worker without requiring users to specify CLI flags. Using a shebang is not really an option for me either, and ideally I don't want to spawn a child process just to hide the CLI flag. The fact that the loader thread is shared between workers and the main thread is not a problem for me.
Using loaders in a worker without requiring users to specify CLI flags.
Can you explain a bit more? Is this a CLI tool? What does it do? Why does it create worker threads?
I can't go into too much detail yet unfortunately, as it is for work and still unreleased. There is a CLI, but also an SDK. The loader itself is just used for hot module reloading, so I think registerLoader()
will work just fine.
The loader itself is just used for hot module reloading, so I think
registerLoader()
will work just fine.
If you find a good general solution for hot module reloading, please share it.
The registerLoader
PR is by a new contributor and I’m not sure he’s working on it any more, so if you have time to help get it finished, that would be greatly appreciated.
I will take a look at the registerLoader
PR and can take it over if the current author is no longer interested in working on it.
In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think)
That's wrong. Each worker thread has its own loaders thread. Without evidence of the contrary, I think we should consider that there is a bug somewhere.
Found the bug: when a worker is created with execArgv
, its child loaders worker doesn't inherit the CLI arguments, so we have:
main thread: []
main thread loaders: []
worker thread: ['--experimental-loader', './loader.mjs']
worker thread loaders: []
Edit: sorry, pinged in the wrong issue
In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think)
That's wrong. Each worker thread has its own loaders thread. Without evidence of the contrary, I think we should consider that there is a bug somewhere.
The intended behaviour is for there to be 1 esm worker, regardless of other workers. I'm not sure about child process—perhaps if they are separable they should have their own esm worker, but that behaviour is not currently defined.
I'm in the same boat as other right now...
If import('blob:nodedata:fb4b1f92-71eb-42f9-b371-26ed3204a625')
had just worked as intended, then i wouldn't have to deal with this loader thing right now... i guess it is also the real possibility of not also being able to fetch a blob url from a different worker / off -thread script (https://github.com/nodejs/node/issues/46557)
if we are going to have multiple loader threads then i feel like it would be impossible to have one centralized globally blob store over all threads / workers if they aren't handled in core in a centralized maner where they can share resources
@kirrg001 when behaviour changes between releases, step 1 is check the release notes:
https://nodejs.org/en/blog/announcements/v20-release-announce#custom-esm-loader-hooks-nearing-stable
Custom ES module lifecycle hooks supplied via loaders (
--experimental-loader=./foo.mjs
) now run in a dedicated thread, isolated from the main thread.
It seems like this thread went off-topic and was closed without a resolution.
Can it be re-opened? ESM loaders/hooks are still not working in Worker threads.
@GeoffreyBooth FWIW this limitation is of the more popular issues in tsx: https://github.com/privatenumber/tsx/issues/354
I will refer users to share their use-cases here since you seemed to be interested in learning more.
Yes let's reopen this. Currently our only workaround is to stick to an older node version.
I think this should be fixed by #50752 whenever that is finished (help would be very welcome!) as then there would be only one hooks thread regardless of however many worker threads there are, and so there’s no need to worry about what arguments to pass to new hooks threads.
I'm not using any worker threads, but just:
import fs from 'node:fs'
import LinguistLanguages from 'linguist-languages'
import type { SupportLanguage } from 'prettier'
https://github.com/un-ts/prettier/actions/runs/7152253084/job/19477387552?pr=327#step:5:96
See source codes at https://github.com/un-ts/prettier/blob/master/scripts/languages.ts
@JounQin
This issue is about Workers in Node. Your comment seems unrelated.
@privatenumber I came from tsx
, the error is same.
I referred people using Workers from https://github.com/privatenumber/tsx/issues/354 to share their use-cases and help out with landing https://github.com/nodejs/node/pull/50752.
I'm not sure if linking to a CI error in a large repo and leaving it for others to debug is particularly helpful here.
I'm just sharing my case, if no one is interested, that's OK. And the related codes is just simple, it doesn't matter if the whole app is large. Or it could only be reproduced on large scale app.
Sharing my very straightforward use case here: I run my code using tsx, because the codebase is written in typescript. I'd expect my workers (which are also written in typescript) to be able to also leverage tsx
. Right now, this does not seem to work.
Similarly, I started implementing testing in my typescript project using ava + tsx. And got the same error.
Another use case: this prevents me from spinning up a worker thread in a test with vitest as the test runner. The purpose of the test is to reproduce and fix a concurrency issue that only occurs when running a particular piece of code from a worker thread. Writing the worker thread test piece in JavaScript isn't sufficient as it needs to import modules written in TypeScript.
The suggested solution in the vitest issue tracker doesn't work anymore due to the limitation tracked here.
This might have been fixed by https://github.com/nodejs/node/pull/52706, which is on main
or nightly but hasn’t gone out yet in a release; it will be in the next 22.x release. That PR won’t enable the code in the top post; the intent isn’t to allow different loaders per thread, but rather to have whatever was loaded for the main thread be shared with the worker threads. So in other words, if you start your app via node --import tsx/esm ./app.ts
and then app.ts
spawns worker threads that are TypeScript files, hopefully it should work now. You wouldn’t need to pass arguments to new Worker
.
@cjihrig or someone else on this thread, do you mind giving it a try and reporting back?
cc @dygabo
triggered by this issue I submitted PR #53006 to clarify a few things related to the single customization hooks thread introduced by #52706
✨ Edit: For those looking for a workaround in the meantime, @hi-ogawa has a working solution described here:
https://github.com/vitest-dev/vitest/issues/5757#issuecomment-2126095729
@GeoffreyBooth thanks for championing this for us! I've just tried out the newly released v22.2.0 as you suggested, with the following simplified repro, in which the script tries to spawn a worker thread from its own file path:
import { fileURLToPath } from "node:url";
import { Worker, isMainThread } from "node:worker_threads";
const THREAD_COUNT = 2;
if (isMainThread) {
console.log("Main thread running");
const workers = [...new Array(THREAD_COUNT)].map(() => {
const worker = new Worker(fileURLToPath(import.meta.url));
// Capture threadId immediately after spawning, while the worker is alive
const threadId = worker.threadId;
worker.on("exit", (exitCode: number) => {
console.log(`Worker ${threadId} exited with code ${exitCode}`);
});
return worker;
});
// The real application would post messages to the worker threads here
} else {
console.log("Doing work");
}
Unfortunately I found that for both Node v22.1.0 and v22.2.0, executing this with npx tsx src/index.ts
has the same result - it fails and logs the following:
Main thread running
node:internal/event_target:1090
process.nextTick(() => { throw err; });
^
TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".ts" for /app/src/index.ts
I was executing this in an ESM package (with { "type": "module" }
), but changing it to "type": "commonjs"
didn't appear to change anything.
I'm not familiar with the internals here but I'm wondering if this is a consequence of not being able to modify require.extensions
as per @isaacs' comment above, or something similar (e.g. would it work as expected with a custom loader for regular .js
files, and it's just .ts
that it doesn't recognise now?)
I'm getting ERR_UNKNOWN_FILE_EXTENSION
for node v20.13.1. For node v22.2.0, I'm getting timeouts before the scripts are even run.
Edit: For node v22.1.0 I also get ERR_UNKNOWN_FILE_EXTENSION
. So the error changed between v22.1.0 and v22.2.0.
Update
I was missing something, you need to import the code that registers the hooks inside the worker. Once I did that the hooks started working.
Am I missing something, or are single thread hooks not working in Node v22.2.0?
According to the changelog, this landed in v22.2.0
[8fbf6628d6] - module: have a single hooks thread for all workers (Gabriel Bota) https://github.com/nodejs/node/pull/52706
However, I can't seem to make this work. I'm trying to add a loader for HTTP imports, using the example in the Nodejs docs (modified for HTTP instead of HTTPS). It works on the main thread but when I spawn a worker I get the following error.
Error [ERR_UNSUPPORTED_ESM_URL_SCHEME]: Only URLs with a scheme in: file, data, and node are supported by the default ESM loader. Received protocol 'http:'
I actually do want different loaders for the main thread and the worker thread. My use case is for React 19 server components. But I can probably workaround the single hooks thread limitation if it worked.
I see most people here are trying to use tsx
to import a worker that is a TypeScript file. That still does not work. Node will throw an error
TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".tsx"
The workaround suggested here does work. However, when I pass conditions to my worker that does not seem to work with tsImport
. e.g.
const worker = new TsWorker(new URL("./ClientRenderer.tsx", import.meta.url), {
execArgv: ["--conditions", "node"], // <-- doesn't seem to be working with suggested workaround.
});
If you are using tsx
you can run work threads as follow:
Remove "type": "module",
from package.json
Imports without extension .ts
or .js
import {some} from "./my_dir/my_file"
import {some} from "./my_dir/my_file.js"
Create the Worker using tsx
with execArgv
param
const worker = new Worker(require.resolve(workerPath), { execArgv: ["--require", "tsx/cjs"] })
Like if its help you 👍
the 'bug' was reported at https://github.com/privatenumber/tsx/issues/354#issue-1963321961 too.
Remove type: module
?? That's really counterproductive.
BTW, the --conditions
issue is a separate issue with Node itself https://github.com/nodejs/node/issues/50885
Version
20.0.0 (tested on main at 36e4e3d86b31de7f38125aacad810e40295fd2ae too)
Platform
macOS but probably all platforms
Subsystem
esm
What steps will reproduce the bug?
The following works in Node 18 and 19, but not 20:
Run:
node main.mjs
. In Node 18 and 19, the exception in the loader is thrown. In Node 20 it is not. The problem is not unique to throwing. I haven't been able to seeconsole.log()
or any other indication that the loader is being called.How often does it reproduce? Is there a required condition?
100% of the time for me.
What is the expected behavior? Why is that the expected behavior?
I expect it to work as it did in earlier versions of Node. This is the expected behavior because right now it doesn't seem to work at all.
What do you see instead?
See the description of the bug above.
Additional information
Just a guess, but I'm assuming this is related to https://github.com/nodejs/node/pull/44710.