ESM loaders cannot be defined via `Worker` option `execArgv` in v20

cjihrig commented 1 year ago

Version

20.0.0 (tested on main at 36e4e3d86b31de7f38125aacad810e40295fd2ae too)

Platform

macOS but probably all platforms

Subsystem

esm

What steps will reproduce the bug?

The following works in Node 18 and 19, but not 20:

// main.mjs
import { Worker } from 'node:worker_threads';

new Worker('./worker.js', {
  execArgv: ['--experimental-loader', './loader.mjs'],
});

// worker.js
'use strict';

async function main() {
  await import('node:fs');
}

main();

// loader.mjs
export function resolve (specifier, context, nextResolve) {
  throw new Error('boom');
}

Run: node main.mjs. In Node 18 and 19, the exception in the loader is thrown. In Node 20 it is not. The problem is not unique to throwing. I haven't been able to see console.log() or any other indication that the loader is being called.

How often does it reproduce? Is there a required condition?

100% of the time for me.

What is the expected behavior? Why is that the expected behavior?

I expect it to work as it did in earlier versions of Node. This is the expected behavior because right now it doesn't seem to work at all.

What do you see instead?

See the description of the bug above.

Additional information

Just a guess, but I'm assuming this is related to https://github.com/nodejs/node/pull/44710.

isaacs commented 1 year ago

Another side effect of ES loaders to a separate thread: updating require.extensions in an es loader no longer has any effect, meaning that node --loader=ts-node/esm no longer functions properly to load both cjs and mjs typescript.

This makes it significantly more challenging to have loaders that work for both esm and cjs, effectively breaking the esm on-ramp for those of us supporting both modes, since now you need to know ahead of time what the module type is (or write the transpiled source to another path and return a short circuit url to that instead), rather than being able to have a single loader capable of handling both.

isaacs commented 1 year ago

Verified that node --loader=ts-node/esm foo.ts stopped working on 4667b07cd2d6eb0295d6cd1f40163ef14ce538b0

cjihrig commented 1 year ago

cc: @nodejs/loaders

JakobJingleheimer commented 1 year ago

As stated in the ESM doc, async activity like console.log (which is specifically cited in the caveat) is not guaranteed to run.

Please try other methods, such as fs.writeSync to confirm.

I believe this is not a dupe of #47615 because it specifically provides execArgv, thus avoiding the fork-bomb.

cjihrig commented 1 year ago

@JakobJingleheimer I've tried fs.writeSync() and process._rawDebug() with no luck.

mcollina commented 1 year ago

I’m not sure how would it work in the current design. There is (by design) one loader process for the whole Node.js process. In the example above, we are asking Node.js to start another one just for that specific thread.

We need a new API to handle this case, something that:

allows us to create a "loaders worker", potentially exposing what we are already doing in core
pass that loader down to new Worker(path, { loader } ), so that the loader thread could be reused.

cjihrig commented 1 year ago

We need a new API to handle this case

What you are proposing seems like an explicit API. It seems like --experimental-loader passed to the Worker constructor should implicitly do that (without needing to expose more public APIs).

targos commented 1 year ago

It's interesting because it doesn't work --loader in the worker's execArgv, but it works fine if --loader is passed to the main script (the worker inherits the flag).

GeoffreyBooth commented 1 year ago

It’s interesting because it doesn’t work --loader in the worker’s execArgv, but it works fine if --loader is passed to the main script (the worker inherits the flag).

I don’t think of worker threads like child processes that can have their own Node flags. I feel like they should inherit whatever the parent process has, which is what @targos is describing here.

cjihrig commented 1 year ago

I don’t think of worker threads like child processes that can have their own Node flags.

That is the exact purpose of the execArgv option to the Worker constructor. Some flags are not supported there, but, for example, --require does appear to be. And the loaders docs do claim that they follow the pattern of --require

GeoffreyBooth commented 1 year ago

the loaders docs do claim that they follow the pattern of --require

Only in the sense that the way that chaining works for multiple invocations of --require is the same way that chaining works for multiple invocations of --loader. There’s not much else that loaders share in common with require.

Unless there’s some argument for why this should be considered a bug, I think we should just update the docs for https://nodejs.org/api/worker_threads.html#new-workerfilename-options to clarify that --loader is one of the flags not supported by that option.

cjihrig commented 1 year ago

I guess my question is, are there actually technical reasons why this can't be implemented?

JakobJingleheimer commented 1 year ago

I guess my question is, are there actually technical reasons why this can't be implemented?

I was about to ask this too. If it works from main, I would've expected it to work here too, but if not, I imagine it would be fairly simple to support from the esm worker 🤔 (unless we get stuck in Inception)

GeoffreyBooth commented 1 year ago

I guess my question is, are there actually technical reasons why this can’t be implemented?

In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think); within that loaders thread there’s shared scope that the loaders can use, such as to keep caches and such. If we support execArgv then that would require spinning up a new loaders thread specific to that worker thread, because the --loader argument passed to execArgv might be different than that used for the root process. This presents a few issues:

There are now multiple loaders threads, with multiple scopes, and so loader authors would need to be aware of this possibility. We would probably get calls for ways to communicate between loaders threads.
The number of loaders threads would increase linearly with the number of worker threads spawned with execArgv, doubling the memory/processing requirements of each worker thread.

In short, it’s a lot of added complexity for both Node and for loader authors, and for what? I don’t see the benefit. If you want to create a new process with different loaders, spawn a child process.

mcollina commented 1 year ago

@GeoffreyBooth I agree with your assessment with supporting execArgv: it will cost too much memory and CPU time. I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases (mocking, transpiling, ...). Processes are more costly and harder to manage.

IMHO, we need a new API to handle this case, something that:

allows us to create a "loaders worker", potentially exposing what we are already doing in core.
pass that loader down to new Worker(path, { loader } ), so that the loader thread could be reused across multiple workers.

GeoffreyBooth commented 1 year ago

I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases

What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?

Keep in mind there’s also the registerLoader API in progress, that allows for registering loaders after startup.

mcollina commented 1 year ago

What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?

CLI tools can't really set Node.js options. registerLoader would solve all the same use cases of execArgv in this case. One can always set up a "jump" worker to get things done. I didn't know it was in the works, is there a tracking issue/pr?

cjihrig commented 1 year ago

Looks like the PR is https://github.com/nodejs/node/pull/46826. I believe that would work for my use case.

I still think execArgv should be an option, but I also don't care enough to push for it if there is an alternative that works for me.

GeoffreyBooth commented 1 year ago

I believe that would work for my use case.

What is your use case?

CLI tools can’t really set Node.js options.

We’ve been discussing adding support for .env files: https://github.com/nodejs/node/pull/43973#issuecomment-1249549346. This would be a shorter-term solution for being able to set NODE_OPTIONS from a file. Farther down the road I think we’d like to support a new field in package.json (or possibly even a new config file) that would do the same thing.

cjihrig commented 1 year ago

What is your use case?

Using loaders in a worker without requiring users to specify CLI flags. Using a shebang is not really an option for me either, and ideally I don't want to spawn a child process just to hide the CLI flag. The fact that the loader thread is shared between workers and the main thread is not a problem for me.

GeoffreyBooth commented 1 year ago

Using loaders in a worker without requiring users to specify CLI flags.

Can you explain a bit more? Is this a CLI tool? What does it do? Why does it create worker threads?

cjihrig commented 1 year ago

I can't go into too much detail yet unfortunately, as it is for work and still unreleased. There is a CLI, but also an SDK. The loader itself is just used for hot module reloading, so I think registerLoader() will work just fine.

GeoffreyBooth commented 1 year ago

The loader itself is just used for hot module reloading, so I think registerLoader() will work just fine.

If you find a good general solution for hot module reloading, please share it.

The registerLoader PR is by a new contributor and I’m not sure he’s working on it any more, so if you have time to help get it finished, that would be greatly appreciated.

cjihrig commented 1 year ago

I will take a look at the registerLoader PR and can take it over if the current author is no longer interested in working on it.

targos commented 1 year ago

In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think)

That's wrong. Each worker thread has its own loaders thread. Without evidence of the contrary, I think we should consider that there is a bug somewhere.

targos commented 1 year ago

Found the bug: when a worker is created with execArgv, its child loaders worker doesn't inherit the CLI arguments, so we have:

main thread: [] main thread loaders: [] worker thread: ['--experimental-loader', './loader.mjs'] worker thread loaders: []

targos commented 1 year ago

Edit: sorry, pinged in the wrong issue

JakobJingleheimer commented 1 year ago

In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think)

That's wrong. Each worker thread has its own loaders thread. Without evidence of the contrary, I think we should consider that there is a bug somewhere.

The intended behaviour is for there to be 1 esm worker, regardless of other workers. I'm not sure about child process—perhaps if they are separable they should have their own esm worker, but that behaviour is not currently defined.

jimmywarting commented 1 year ago

I'm in the same boat as other right now... If import('blob:nodedata:fb4b1f92-71eb-42f9-b371-26ed3204a625') had just worked as intended, then i wouldn't have to deal with this loader thing right now... i guess it is also the real possibility of not also being able to fetch a blob url from a different worker / off -thread script (https://github.com/nodejs/node/issues/46557)

if we are going to have multiple loader threads then i feel like it would be impossible to have one centralized globally blob store over all threads / workers if they aren't handled in core in a centralized maner where they can share resources

JakobJingleheimer commented 1 year ago

@kirrg001 when behaviour changes between releases, step 1 is check the release notes:

https://nodejs.org/en/blog/announcements/v20-release-announce#custom-esm-loader-hooks-nearing-stable

Custom ES module lifecycle hooks supplied via loaders (--experimental-loader=./foo.mjs) now run in a dedicated thread, isolated from the main thread.

privatenumber commented 10 months ago

It seems like this thread went off-topic and was closed without a resolution.

Can it be re-opened? ESM loaders/hooks are still not working in Worker threads.

@GeoffreyBooth FWIW this limitation is of the more popular issues in tsx: https://github.com/privatenumber/tsx/issues/354

I will refer users to share their use-cases here since you seemed to be interested in learning more.

SynthLuvr commented 10 months ago

Yes let's reopen this. Currently our only workaround is to stick to an older node version.

GeoffreyBooth commented 10 months ago

I think this should be fixed by #50752 whenever that is finished (help would be very welcome!) as then there would be only one hooks thread regardless of however many worker threads there are, and so there’s no need to worry about what arguments to pass to new hooks threads.

JounQin commented 10 months ago

I'm not using any worker threads, but just:

import fs from 'node:fs'

import LinguistLanguages from 'linguist-languages'
import type { SupportLanguage } from 'prettier'

https://github.com/un-ts/prettier/actions/runs/7152253084/job/19477387552?pr=327#step:5:96

See source codes at https://github.com/un-ts/prettier/blob/master/scripts/languages.ts

privatenumber commented 10 months ago

@JounQin

This issue is about Workers in Node. Your comment seems unrelated.

JounQin commented 10 months ago

@privatenumber I came from tsx, the error is same.

privatenumber commented 10 months ago

I referred people using Workers from https://github.com/privatenumber/tsx/issues/354 to share their use-cases and help out with landing https://github.com/nodejs/node/pull/50752.

I'm not sure if linking to a CI error in a large repo and leaving it for others to debug is particularly helpful here.

JounQin commented 10 months ago

I'm just sharing my case, if no one is interested, that's OK. And the related codes is just simple, it doesn't matter if the whole app is large. Or it could only be reproduced on large scale app.

steve2507 commented 6 months ago

Sharing my very straightforward use case here: I run my code using tsx, because the codebase is written in typescript. I'd expect my workers (which are also written in typescript) to be able to also leverage tsx. Right now, this does not seem to work.

JoCat commented 6 months ago

Similarly, I started implementing testing in my typescript project using ava + tsx. And got the same error.

thesauri commented 5 months ago

Another use case: this prevents me from spinning up a worker thread in a test with vitest as the test runner. The purpose of the test is to reproduce and fix a concurrency issue that only occurs when running a particular piece of code from a worker thread. Writing the worker thread test piece in JavaScript isn't sufficient as it needs to import modules written in TypeScript.

The suggested solution in the vitest issue tracker doesn't work anymore due to the limitation tracked here.

GeoffreyBooth commented 5 months ago

This might have been fixed by https://github.com/nodejs/node/pull/52706, which is on main or nightly but hasn’t gone out yet in a release; it will be in the next 22.x release. That PR won’t enable the code in the top post; the intent isn’t to allow different loaders per thread, but rather to have whatever was loaded for the main thread be shared with the worker threads. So in other words, if you start your app via node --import tsx/esm ./app.ts and then app.ts spawns worker threads that are TypeScript files, hopefully it should work now. You wouldn’t need to pass arguments to new Worker.

@cjihrig or someone else on this thread, do you mind giving it a try and reporting back?

cc @dygabo

dygabo commented 5 months ago

triggered by this issue I submitted PR #53006 to clarify a few things related to the single customization hooks thread introduced by #52706

bradchristensen commented 5 months ago

✨ Edit: For those looking for a workaround in the meantime, @hi-ogawa has a working solution described here:

https://github.com/vitest-dev/vitest/issues/5757#issuecomment-2126095729

@GeoffreyBooth thanks for championing this for us! I've just tried out the newly released v22.2.0 as you suggested, with the following simplified repro, in which the script tries to spawn a worker thread from its own file path:

import { fileURLToPath } from "node:url";
import { Worker, isMainThread } from "node:worker_threads";

const THREAD_COUNT = 2;

if (isMainThread) {
  console.log("Main thread running");

  const workers = [...new Array(THREAD_COUNT)].map(() => {
    const worker = new Worker(fileURLToPath(import.meta.url));

    // Capture threadId immediately after spawning, while the worker is alive
    const threadId = worker.threadId;

    worker.on("exit", (exitCode: number) => {
      console.log(`Worker ${threadId} exited with code ${exitCode}`);
    });
    return worker;
  });

  // The real application would post messages to the worker threads here
} else {
  console.log("Doing work");
}

Unfortunately I found that for both Node v22.1.0 and v22.2.0, executing this with npx tsx src/index.ts has the same result - it fails and logs the following:

Main thread running

node:internal/event_target:1090
  process.nextTick(() => { throw err; });
                           ^
TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".ts" for /app/src/index.ts

I was executing this in an ESM package (with { "type": "module" }), but changing it to "type": "commonjs" didn't appear to change anything.

I'm not familiar with the internals here but I'm wondering if this is a consequence of not being able to modify require.extensions as per @isaacs' comment above, or something similar (e.g. would it work as expected with a custom loader for regular .js files, and it's just .ts that it doesn't recognise now?)

SynthLuvr commented 5 months ago

I'm getting ERR_UNKNOWN_FILE_EXTENSION for node v20.13.1. For node v22.2.0, I'm getting timeouts before the scripts are even run.

Edit: For node v22.1.0 I also get ERR_UNKNOWN_FILE_EXTENSION. So the error changed between v22.1.0 and v22.2.0.

daniel-nagy commented 5 months ago

Update

I was missing something, you need to import the code that registers the hooks inside the worker. Once I did that the hooks started working.

Am I missing something, or are single thread hooks not working in Node v22.2.0?

According to the changelog, this landed in v22.2.0

[8fbf6628d6] - module: have a single hooks thread for all workers (Gabriel Bota) https://github.com/nodejs/node/pull/52706

However, I can't seem to make this work. I'm trying to add a loader for HTTP imports, using the example in the Nodejs docs (modified for HTTP instead of HTTPS). It works on the main thread but when I spawn a worker I get the following error.

Error [ERR_UNSUPPORTED_ESM_URL_SCHEME]: Only URLs with a scheme in: file, data, and node are supported by the default ESM loader. Received protocol 'http:'

I actually do want different loaders for the main thread and the worker thread. My use case is for React 19 server components. But I can probably workaround the single hooks thread limitation if it worked.

daniel-nagy commented 5 months ago

I see most people here are trying to use tsx to import a worker that is a TypeScript file. That still does not work. Node will throw an error

TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".tsx"

The workaround suggested here does work. However, when I pass conditions to my worker that does not seem to work with tsImport. e.g.

const worker = new TsWorker(new URL("./ClientRenderer.tsx", import.meta.url), {
  execArgv: ["--conditions", "node"], // <-- doesn't seem to be working with suggested workaround.
});

Guihgo commented 5 months ago

If you are using tsx you can run work threads as follow:

Remove "type": "module", from package.json
Imports without extension .ts or .js
- Use: import {some} from "./my_dir/my_file"
- Don't use: import {some} from "./my_dir/my_file.js"

Create the Worker using tsx with execArgv param

const worker = new Worker(require.resolve(workerPath), { execArgv: ["--require", "tsx/cjs"] })

Like if its help you 👍

the 'bug' was reported at https://github.com/privatenumber/tsx/issues/354#issue-1963321961 too.

SystemParadox commented 5 months ago

Remove type: module?? That's really counterproductive.

daniel-nagy commented 5 months ago

BTW, the --conditions issue is a separate issue with Node itself https://github.com/nodejs/node/issues/50885

nodejs / node