nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.87k stars 29.72k forks source link

Task runner (node --run) should set lifecycle_event environment variable #52673

Open justinfagnani opened 6 months ago

justinfagnani commented 6 months ago

What is the problem this feature will solve?

Some npm scripts delegate to another task runner (like Wireit) and require the environment variable npm_lifecycle_event to be set in order to tell which script to run. Yarn and pnpm both set this variable.

The way a Wireit package.json is configured is like this:

{
  "scripts": {
    "build": "wireit",
  },
  "wireit": {
    "build": {
      "command": "tsc"
    }
  }
}

and it's run with npm run build, yarn run build, etc. So Wireit needs npm_lifecycle_event to find the script to run.

Supporting task runner like Wireit would be great because Wireit caches script results, so some invocations don't perform any work after checking the cache and the npm overhead could be noticeable.

Downstream issue: https://github.com/google/wireit/issues/1094

What is the feature you are proposing to solve the problem?

node --run set the npm_lifecycle_event environment variable to the current script.

What alternatives have you considered?

Node could set a different variable and task runners like Wireit could detect that also.

anonrig commented 6 months ago

node task runner is not a replacement for npm. Please take a look at the intentional limitations part in the documentation.

justinfagnani commented 6 months ago

node task runner is not a replacement for npm. Please take a look at the intentional limitations part in the documentation.

This issue isn't asking for a replacement to npm, but a way to know which script was run so that delegated task runners can work.

anonrig commented 6 months ago

@justinfagnani The limitations include any npm or other package managers specific environment variables. Your request is to add the npm_lifecycle_event environment variable right? We can introduce "lifecycle_event" environment variable but this won't solve your issue.

justinfagnani commented 6 months ago

A lifecycle_event variable would be great. Then task runners could look for that variable too. Right now there appears to be no way to know which script was requested.

anonrig commented 6 months ago

Makes sense. Looks like a good feature. I've reopened it and will work on it after https://github.com/nodejs/node/pull/52609 lands.

justinfagnani commented 6 months ago

Thank you @anonrig!

Looking at the Wireit source code, it looks like it would need an equivalent to npm_package_json (ie, just package_json) as well to identify the package.json file to read configuration from.

It also reads additional args, in npm via another env variable, but it might work with the additional args support you've already added (cc @aomarks). I realize there would be an understandable resistance to adding too many environment variables.

aomarks commented 6 months ago

Cool! Wireit maintainer here. I'm excited for node --run, and would love for wireit to support it.

What wireit needs to know

  1. Which script runner are we using (npm, node --run, pnpm, yarn)? So that we know where to look for the next 4 items.
  2. Which package.json are we working with?
  3. Which script is running?
  4. Should we run in --watch mode? (Wireit has its own watch mode which watches all files across the entire transitive dependency graph of a script, re-executing only each script as needed).
  5. Which additional arguments should we pass down to the process?

Example

npm, pnpm, and yarn all provide this information one way or another. It varies across package managers and across their versions. Here's an example of how it works for npm:

npm run myscript --watch -- arg1 arg2

This means "run myscript, with wireit's watch mode enabled, and pass arg1 and arg2 down to the process." The way we figure this out is:

  1. Read the npm_config_user_agent environment variable. It will be "npm/10.5.1 node/v22.0.0 darwin arm64 workspaces/false" in this example.

  2. Read the npm_package_json environment variable if set (npm 8 onwards). If it's not set, walk up the filesystem from the CWD until we find a package.json file. It will be "/path/to/package.json" in this example.

  3. Read the npm_lifecycle_event environment variable. It will be "myscript" in this example.

  4. Read the npm_config_watch environment variable. It will be "true" in this example.

  5. Read argv.slice(2). It will be ["arg1", "arg2"] in this example.

The relevant code is at https://github.com/google/wireit/blob/main/src/cli-options.ts

Suggestions

My thoughts about how this all affects node --run:

npm_lifecycle_event

I agree with the assessment above that setting npm_lifecycle_event or equivalent is the most important thing. That would unlock most of the functionality of wireit.

I would gently encourage actually just using the npm_lifecycle_event name instead of lifecycle_event, because it has sort of become a defacto standard, despite the npm_ prefix. npm, pnpm, yarn, wireit itself, and probably other runners all set npm_lifecycle_event. Another name is also fine, though, since we can just add special support for it in wireit.

npm_config_user_agent

The npm_config_user_agent environment variable would be very nice to set, since that would give us a clear signal of which runner we're dealing with, so that we can then interpret all the other environment data correctly. npm, pnpm, and yarn all set npm_config_user_agent today, so it would be nice to follow that defacto standard.

npm_package_json

The npm_package_json environment variable would be nice, but its only benefit for wireit would be a slight performance optimization, since we can always just look at the filesystem if it's not set.

--watch flag

Wireit has its own watch mode, which is enabled by setting the --watch flag. It's also likely we'll add more wireit flags in the future, things like --clean to bust the cache, or --verbose for more debugging info.

The way this works with npm is that any argument after npm run myscript but before -- is converted into the environment variable npm_config_<name>. I concede this is a bit odd, since it means something like npm run myscript --typo means the --typo flag is silently ignored. It does enable some useful functionality though, since it lets us distinguish runner flags from script flags.

Currently, node --run interprets all arguments before the -- as being arguments to node itself, so something like node --run myscript --watch -- arg1 arg2 errors.

We can't really use node --run myscript -- --watch arg1 arg2 as the syntax, because then we can't distinguish between arguments for wireit vs arguments for underlying script process.

I think the only syntax that would currently work is node --run myscript -- --watch -- arg1 arg2.

I'm not sure what I'd suggest here -- allowing delegated runner arguments would be really nice, and would match the behavior of npm/pnpm/yarn, but it does have that downside of invalid arguments getting ignored in some cases.

anonrig commented 6 months ago

@aomarks Thank you for the response. Sorry I couldn't respond in a timely manner, but I've opened a PR to add NODE_LIFECYCLE_EVENT environment variable: https://github.com/nodejs/node/pull/53032

anonrig commented 6 months ago

@aomarks I've opened one more pull-request to add NODE_RUN_PACKAGE_JSON_PATH https://github.com/nodejs/node/pull/53058 to node --run

npm_config_user_agent

Unfortunately, I don't think we should add a user_agent kind environment variable. I think it's unnecessary at this point. NODE_RUN_SCRIPT_NAME is only available when node --run is executed, and you can use the existence of that to validate if it's node --run that runs the context.

--watch

I didn't quite understand the --watch flag argument. Do you want to get access to any cli arguments passed after the script name and before the --? For example, for the following command node --run test --my-flag -- --help, do you want to know --my-flag in an environment variable?

aomarks commented 6 months ago

@aomarks I've opened one more pull-request to add NODE_RUN_PACKAGE_JSON_PATH #53058 to node --run

Awesome!

npm_config_user_agent

Unfortunately, I don't think we should add a user_agent kind environment variable. I think it's unnecessary at this point. NODE_RUN_SCRIPT_NAME is only available when node --run is executed, and you can use the existence of that to validate if it's node --run that runs the context.

👍

--watch

I didn't quite understand the --watch flag argument. Do you want to get access to any cli arguments passed after the script name and before the --? For example, for the following command node --run test --my-flag -- --help, do you want to know --my-flag in an environment variable?

Yes, exactly.

github-actions[bot] commented 2 days ago

There has been no activity on this feature request for 5 months. To help maintain relevant open issues, please add the https://github.com/nodejs/node/labels/never-stale label or close this issue if it should be closed. If not, the issue will be automatically closed 6 months after the last non-automated comment. For more information on how the project manages feature requests, please consult the feature request management document.