nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.19k stars 28.9k forks source link

Support .noderc or similar file-based initialization configurations? #53787

Open joyeecheung opened 1 month ago

joyeecheung commented 1 month ago

In https://github.com/nodejs/node/issues/52219 it was mentioned that for APM use cases it would be nice to have a way to register loaders without using command line flags. It occurred to me maybe what we need is an initialization config similar to the rc files out there for various applications.

For example in my .lldbinit I have these that extend LLDB to help me debug

plugin load /path/to/llnode.dylib
command script import /path/to/lldb_commands.py

It seems this is serving the same use case as the Node.js loader registration - run some scripts or register some plugin to Node.js before you actually start running it.

For loading loaders, I guess we can support something like a .noderc that is discovered when users start running Node.js, or make that part of package.json, which allows registering certain hooks before the actual application code is run. For example in the case of package.json, I guess the project pacakge.json would include something like this:

{
  "preload": [
     "apm-package/register"
   ],
  "dependencies": {
    "apm-package": "1.1.1"
  }
}

If we want something more flexible than "just loading some scripts" though, I guess a dedicated rc file would be easier to use than package.json.

cc @timfish @nodejs/loaders

GeoffreyBooth commented 1 month ago

I’ve proposed this in the past, such as in https://github.com/nodejs/node/issues/49148#issuecomment-1747102503 and https://github.com/orgs/nodejs/discussions/44975#discussioncomment-3868855. The short version is that I think we need a config file, such as node.config.json or possibly a section within package.json, and I think it needs to include all of the same options that NODE_OPTIONS includes. This will handle the use case of the test runner config file and any other features that have configuration that’s complicated enough to want to specify in a file, or have the configuration be updated by automation such as in the hooks case.

The recent .env file support was meant as a bridge to this; that effort got us the ability to parse JSON files without needing to start V8, so therefore we can parse a config file that includes V8 options early enough for them to take effect. The work toward supporting NODE_OPTIONS within .env files was meant to lay the groundwork for this.

So yes, I do think we need a config file, and it should support all of Node’s configuration (or at least as much as is in NODE_OPTIONS). That probably means it needs to be in JSON or some other format that we can parse without V8. A simple schema could probably be something like:

{
  "options": {
    // Keys can be any flag available to NODE_OPTIONS, camelcased
    "import": "tsx",
    "experimentalRequireModule": true
  }
}

Or the same within a nodejs field in package.json if that’s preferable. Ideally this file is loaded by default if it exists, and we could have a flag like --config to backport it for older lines (or maybe this is a reason to use a key in package.json, if adding this new key isn’t considered a breaking change). I think it might be slightly preferable to have a dedicated file for this rather than package.json field because a project might have multiple package.jsons and we could separately define the configuration file and then Node knows where to look for it, but I don’t feel strongly if others can explain why package.json is preferable and won’t have issues.

Another consideration is conditions; should a single file support multiple values based on condition, so you could do something like node --config=node.config.json --condition=development entry.js and it loads certain config values related to the development condition.

targos commented 1 month ago

+1, but I am very much against JSON for the format, mainly because it doesn't support comments.

ShogunPanda commented 1 month ago

+1 here as well. And I agree with @targos. If we don't create a custom format, I would choose YAML or TOML.

GeoffreyBooth commented 1 month ago

It's important that the format be easily editable by other tools, such as those registering hooks. Anything other than comment-less JSON means that those tools need a dependency, unless we create a new builtin for the format.

We've had package.json for years and it's been acceptable. A bit annoying that it doesn't support comments or trailing commas, for sure, but it's easily editable both by hand and by tools. See the recent strongly negative reaction to Bun allowing comments in package.json; that's why interoperability is important.

ShogunPanda commented 1 month ago

That was my idea. Have a new builtin format. TBH, I always thought Node should support YAML natively since it's quite pervasive.

joyeecheung commented 1 month ago

If we want to be aligned with npmrc, https://www.npmjs.com/package/ini can be an option too (there are other native INI parsers too, like https://github.com/benhoyt/inih)

and it should support all of Node’s configuration (or at least as much as is in NODE_OPTIONS

I think we should consider supporting a subset and evaluate them on a case by case basis. NODE_OPTIONS are about “flat” command line options, some of them per-process, some of them per-isolate, some of them per-environment. I think for the file based config we need something better than a flat structure to avoid the NODE_OPTIONS and execArgv inheritance validity problem again, and it will take time to address the hierarchical problem for all the possible configs (I also think the current internal classification of options may still contain quite some errors to be surfaced to the config directly)

jsumners-nr commented 1 month ago

I am in favor of a standalone file with the following opinions:

  1. I have never enjoyed adding configuration to package.json. It just feels wrong to me. It's a package manifest, not program configuration.
  2. JSON is the native format for the language we are working in, but if we are parsing prior to V8, no need to stick to that as a limitation.
  3. Comments are a necessary possibility for configuration files.
  4. INI, and in particular npm's wacko variant of it, is not ideal.
  5. Supporting JSON, YAML, and maybe TOML would be my preference.
  6. Config files should require a proper extension so that tooling (and the implementation) can easily detect the format: .noderc.json, .noderc.yaml, .noderc.yml, and .noderc.toml.
Qard commented 1 month ago

I'm a fan of KDL.

I feel like the need for a config file format parser if we go with non-json (which we should because comments) suggests it might be of value to also have something built in for generalized parsing, which could also be helpful for stream processing in many cases. I wonder if we should take that into consideration when building whatever format parser we might need. 🤔

GeoffreyBooth commented 1 month ago

Whatever format we read needs to be parseable in C++. We added simdjson to be able to parse JSON in C++, so that’s an option. The addition of --env-file gave us the ability to parse essentially the INI format.

Another option is to expand out .env files into essentially configuration files, by creating new environment variables for every option. So my example above could become an .env file like:

# node.config.env
NODE_OPTION_IMPORT=tsx
NODE_OPTION_EXPERIMENTAL_REQUIRE_MODULE=true

And it gets loaded via node --env-file=node.config.env app.js just like any other .env file, and the options are read from these new environment variables. This has the benefit of automatically being inherited into child processes whenever env is passed down. We already have util.parseEnv to parse this into an object, and we could add a corresponding setter method to help convert such objects back into INI format strings.

joyeecheung commented 1 month ago

I think making this opt-in would bring us back to square one, especially if the surface is limited to a flat list, then it’s not really too different than .env; but that’s also inconsistent with the broader software convention of loading a rc file by default.

joyeecheung commented 1 month ago

this has the benefit of automatically being inherited into child processes whenever env is passed down.

I think this is actually a flaw, not a benefit, because a flat list of environment variables is not a great way for inherited configurations, see https://github.com/nodejs/node/issues/41103 - child workers and contexts need granular control of configurations.

GeoffreyBooth commented 1 month ago

I was thinking that whatever config file we choose would be loaded automatically as a semver-major change, and a flag could specify it for older Node versions. A flag might be used for the latest version too in case the user wants to specify a file that’s not the default filename or location.

I’m not sure what about https://github.com/nodejs/node/issues/41103 involves any of this. We should just filter out options that don’t make sense to inherit, both currently to resolve that issue and in the future when we add a proper config file. We should do that filtering however the problematic options are defined, such as via environment variables or NODE_OPTIONS or via a file. Users already have ways to exert granular control of what flags and/or environment variables are passed into child processes or workers, via execArgv or env.

isaacs commented 1 month ago

We all hate json. No comments, excessive quoting, no multi line strings, no trailing commas, etc.

But:

A nodejs section in package.json is ideal. Make it so that if it's a string, then that's the name of a file to load, like many other tools do. Then we don't have to agree on the perfect spelling for the filename.

The beauty of json is that no one has to agree on very much.

joyeecheung commented 1 month ago

A nodejs section in package.json is ideal. Make it so that if it's a string, then that's the name of a file to load, like many other tools do. Then we don't have to agree on the perfect spelling for the filename.

I am leaning towards "a field specified in package.json pointing to a file" too primarily because to load this by default, it adds overhead to the startup to probe the file system; And since we already probe the file system for package.json, the overhead of probing another field in it would be the smallest.

But I do like to see a fixed (at least base) name of the file, because I like the concept of universally recognizable manifest of projects, just like when I check out a JS project/package, if I want to see dependency information I go to package.json, if I want to see TypeScript information I go to tsconfig.json, if I want to see their eslint configs I go to any file that contains the word eslint etc. Maybe something we can do is to support multiple formats (like what eslint does), and JSON would be the first to support, but they all have the same basename, and people just pick the format they like, provided that we do add support for other formats later on.

joyeecheung commented 1 month ago

Users already have ways to exert granular control of what flags and/or environment variables are passed into child processes or workers, via execArgv or env.

The word "user" is unclear here. The core of https://github.com/nodejs/node/issues/41103 is that the end user spawning the process and the library spawning the workers can be controlled by different parties. What happens when the end user wants to apply a configuration to all the child workers (spawned by third party), but not in the main thread running their own code? Do they go over all the worker spawning code in node_modules and update the code in an ad-hoc way? What if they need to persist this setup?

I don't think the use of environment variables or command line flags are optimal for threads in the first place - you don't get system equivalent of these, because the concept of environment variables or command line flags are already per-process - like if you do setenv() and getenv() on Linux, the effect is already per-process. The non-standard copies Node.js adds to the worker options are artificial and bound to be somewhat awkward. They are a good enough abstraction when we just want to provide an API with values readily available in the main thread via process.execArgv and process.env. But we are designing an external file-based configuration here, which has not existed for the main thread anyway, so I don't think we should limit the thinking in a model that's already forced.

I also think it maybe worth supporting package-scoped runtime configurations, and in that case another flat configuration that can only be applied globally wouldn't work great either. In Node.js, chlid workers can form trees, and packages can form trees, so we need to integrate that tree structure into the configuration design, as one-dimensional lists are bound to be awkward to apply on a two-dimensional tree.

targos commented 1 month ago

Using a field in package.json to opt in sounds like a very good idea! We could allow a string to point to a specific file, or a boolean to load with a predefined name.

ljharb commented 1 month ago

That’s the approach we took with https://github.com/pkgjs/support - some advantages of using a separate file for the “meat” include granular codeowners control, and keeping weight out of the published packument (if it’s a published package).

meyfa commented 1 month ago

How could this feature be used successfully across different package.json scripts? Especially regarding loaders - most of my packages and applications today have different loader requirements during linting, testing, and when running in production. (Tests are often the only time that I have a TypeScript loader registered, for instance.)

Since Node.js - in contrast to something like ESLint - can serve multiple purposes within a single project, only being able to supply a single config that has to cater to all of them - without causing side effects - seems problematic at best. (Unless I missed something and this was somehow already addressed in the design ideas proposed above.)

GeoffreyBooth commented 1 month ago

How could this feature be used successfully across different package.json scripts?

I think we would ship a flag, like --config, in addition to whatever other ways there are to load a config file (package.json field, etc.). If I were designing it, it would be allowed multiple times and we would merge together the objects from successive calls. That way you could have scripts like:

"scripts": {
  "lint": "node --config=node.base.config.json --config=node.lint.config.json eslint",
  "test": "node --config=node.base.config.json --config=node.test.config.json test.js"
}
joyeecheung commented 1 month ago

Since Node.js - in contrast to something like ESLint - can serve multiple purposes within a single project, only being able to supply a single config that has to cater to all of them - without causing side effects - seems problematic at best. (Unless I missed something and this was somehow already addressed in the design ideas proposed above.)

That was similar to why I mentioned that we shouldn’t use a flat list for configs. One way to do it is to define a format that allows conditions in the config file. Conditions can include some kind of expressions involving environment variables. In your script fields in package.json, you can set the environment variables that match the desirable conditions in your rc file. Typical script runners should pass down the environment variables to the process launched for the selected script, and if that process is running Node.js (via shebang or something) during startup we locate the config file first, match the environment variables against the conditions in the config file, and apply configurations accordingly.

I think we would ship a flag, like --config

I think for scripts, Node.js CLI options wouldn’t work very well - typically scripts run executables that don’t necessarily take or pass down Node.js CLI options.

joyeecheung commented 1 month ago

Actually going back to OP I wonder if we should just support preloaded JavaScript files specified in package.json, because a declaration-based config can always be very limited (that’s why people end up with eslintrc.js). A JavaScript file gives you maximum flexibility to do this kind of condition matching, and to configure the process - we could also consider making the preloaded files applied to child workers by default, and the preloaded file can decide whether it wants to skip the configurafion when it sees it’s not preloaded by the main thread.

The bits missing from this would be the ability to specify configs that need to be applied before Node.js is ready to load user JS (say, ICU data). But that subset would be more limited and may be easier to specify in a declaration-based format.

GeoffreyBooth commented 1 month ago

I wonder if we should just support preloaded JavaScript files specified in package.json

JavaScript files wouldn't be editable by other tools, for example to register hooks.

targos commented 1 month ago

Maybe something we can do is to support multiple formats (like what eslint does), and JSON would be the first to support

SGTM

joyeecheung commented 1 month ago

JavaScript files wouldn't be editable by other tools, for example to register hooks.

If we go with the simplest APIs in the OP:

{
  "preload": [
     "apm-package/register"
   ],
  "dependencies": {
    "apm-package": "1.1.1"
  }
}

tools can simply update the package.json to add preload scripts.

timfish commented 1 month ago

Would this preload apply only to package.json#scripts or to any node process where this package.json is resolved as the closest?

RedYetiDev commented 1 month ago

Although a configuration file is an attractive concept, it could introduce a security risk by allowing arbitrary files to change Node.js behavior.

I'm concerned that if this feature is implemented without an opt-in mechanism, it could pose a risk.

jasnell commented 1 month ago

Putting the data format aside, we should talk about resolution of these files. Where would they live relative to the application? Current working directory only? In the node_modules path? I would want to avoid a pattern like ~/.noderc where a single instance of this file is automatically picked up by every node.js app as there is a greater opportunity to have unintended impact on any node.js app running locally.

targos commented 1 month ago

They would live relative to the package.json file that defines them or at an absolute pas if the reference is absolute.

jsumners-nr commented 1 month ago

One way to do it is to define a format that allows conditions in the config file. Conditions can include some kind of expressions involving environment variables.

In my opinion, it is best for configuration files to be static text files incapable of introducing logic. Logic in configuration introduces complexity that can be very difficult to keep track of.

meyfa commented 1 month ago

Logic in configuration introduces complexity that can be very difficult to keep track of.

Agreed. Nothing stops developers from creating a .js file that spawns a new Node.js process with the correct flags set, if more complex logic is required for the specific project. This is already possible today, I've done it a few times, and it works great.

timfish commented 1 month ago

Nothing stops developers from creating a .js file that spawns a new Node.js process with the correct flags set

If that was the case, --import would suffice. The problem this issue tries to solve is when frameworks and CLIs don't offer a way to modify the Node cli args.

joyeecheung commented 1 month ago

Although a configuration file is an attractive concept, it could introduce a security risk by allowing arbitrary files to change Node.js behavior.

a .rc file is a very common software convention, I am not seeing this is more dangerous than say .bashrc/.zhsrc/.npmrc. In our threat model, we trust the file system and the operating system. If the attacker has access to the on-disk file configuration, and can change package.json, then what they can do is already out of our security guarantees, there are probably a lot more bad things they can do with that capability.

joyeecheung commented 1 month ago

For the script correspondence, I wonder if we can use the following solution:

{
   "noderc": "./.noderc.json",
   "scripts": {
      "test":  "cross-env NODE_V8_COVERAGE=./coverage/tmp TEST_ENV=true node --test test.ts",
      "start":  "node run.ts"
   },
  "dependencies": {
    "@opentelemetry/api": "^1.9.0",
    "@opentelemetry/auto-instrumentations-node": "^0.48.0",
    "@opentelemetry/sdk-metrics": "^1.25.1",
    "@opentelemetry/sdk-node": "^0.52.1",
    "@opentelemetry/sdk-trace-node": "^1.25.1",
    "cross-env": "^7.0.3",
    "tsx": "^4.16.2"
  }
}

In .noderc.json:

{
    "preload": [
      "tsx",
      "./instrumentation.ts"
    ]
}

In instrumentation.ts

import process from 'node:process';
import { NodeSDK } from '@opentelemetry/sdk-node';
import { ConsoleSpanExporter } from '@opentelemetry/sdk-trace-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import {
  PeriodicExportingMetricReader,
  ConsoleMetricExporter,
} from '@opentelemetry/sdk-metrics';

if (!process.env.TEST_ENV) {     //  <- this needs to be edited by whoever controlling the script fields
  const sdk = new NodeSDK({
    traceExporter: new ConsoleSpanExporter(),
    metricReader: new PeriodicExportingMetricReader({
      exporter: new ConsoleMetricExporter(),
    }),
    instrumentations: [getNodeAutoInstrumentations()],
  });

  sdk.start();
}

The preloaded files can either be files under node_modules, or a file authored by the project, which can do the environment setting and skipping as they like, and load additional files if they want to (note that for certain packages like open telemetry, it's already necessary for the user to author a file instead of using a pre-baked one).

GeoffreyBooth commented 1 month ago

I think the design of the config file needs to include everything that’s in NODE_OPTIONS, even if it’s not all implemented at first. We at least need to avoid boxing ourselves in from potentially supporting all of Node’s configuration in this file.

joyeecheung commented 1 month ago

I think that can be a follow-up, like a execArgv field in the .noderc config, or "env": "./.env". I think using the flat list representation is problematic given https://github.com/nodejs/node/issues/53787#issuecomment-2222645542 but if someone insists that an array of options is the best way to represent this type of configs and wants to implement it that way, that's fine.

joyeecheung commented 1 month ago

Would this preload apply only to package.json#scripts or to any node process where this package.json is resolved as the closest?

I think we can either

  1. If there is a package.json in the current working directory, check the noderc field in it. Don't look it up anywhere else.
  2. Or, check all the package.json in the parent paths from the entrypoint module (so it will look up the directory tree), and use the nearest one with noderc.

1 would be the cheapest and it should match strictly with the package.json providing the scripts. For non-script entrypoints, it requires users to launch the process from where the package.json is at - maybe that would be an issue for executables? 2 is slightly more expensive but we already do that for "type" field lookup, and it works better with executables.

GeoffreyBooth commented 1 month ago

I think using the flat list representation is problematic

I don’t think it needs to be a flat list. We just need to design the config file such that the design is flexible enough to support all of the options we currently support via flags. It could be a flat list, like NODE_OPTIONS is, but I’m happy to support something more organized. Let’s just think about this from the start so that we don’t box ourselves in to a design that is hard to extend to cover everything.

I think we can either

I would go with option 2, but either is fine.

joyeecheung commented 3 weeks ago

Some notes about this issue from the loaders call yesterday (was just me an @marco-ippolito )

joyeecheung commented 3 weeks ago

Let’s just think about this from the start so that we don’t box ourselves in to a design that is hard to extend to cover everything.

I think we can address this with the version field, worst case we add a new version that is better suited for the use cases we discover after the first iteration gets out. I expect us to develop this for a few iterations before reaching stability and settle down on a last stable version, and maybe as the last stable version really gets adopted, we drop support for older versions. (It may not be that much complex to maintain multiple versions though, if the general direction is making it more flexible - I expect what might happen is that we convert version N and version N+1 to the same internal representation, while N typically has fewer fields or a simplified structure compared to N+1).

marco-ippolito commented 3 weeks ago

I think with versioning its "easy" to get started because its possible to migrate to a newer config without too much disruption, and versions can go through a deprecation cycle.

jsumners-nr commented 3 weeks ago

Going with "INI files" is basically "let's invent a new config format," because "INI files" are not standardized in any way what-so-ever. I don't understand the desire to do that. The closest thing to an INI standardized format is TOML, which isn't really designed to be INI but very simple configurations can look like INI (https://github.com/toml-lang/toml#comparison-with-other-formats).

My vote is against "INI files." Being able to reference documentation about the configuration format is vital.

marco-ippolito commented 3 weeks ago

Going with "INI files" is basically "let's invent a new config format," because "INI files" are not standardized in any way what-so-ever. I don't understand the desire to do that. The closest thing to an INI standardized format is TOML, which isn't really designed to be INI but very simple configurations can look like INI (https://github.com/toml-lang/toml#comparison-with-other-formats).

My vote is against "INI files." Being able to reference documentation about the configuration format is vital.

I think the idea is to get inspiration from .ini files of other languages, not use .ini files, but .json

joyeecheung commented 3 weeks ago

Going with "INI files" is basically "let's invent a new config format"

Maybe I wasn't being clear, we were basically thinking about "a JSON representation of the schema that other software typically conveys using the INI format". So a naive draft can be:

{
  "version": 1,
   "preload": [ "amaro"],
   "[test]": {
      "env": { "MY_CUSTOM_TEST_ENV": "foo" }
   },
   "[test.coverage]": {
      "env": { "NODE_V8_COVERAGE": "./.coverage" }
   },
   "[start]": {
      "preload": [ "./instrumentation.ts" ]
   }
}

TBD how to map between sections in the rc file to script fields or bin fields, maybe it can be done via an environment variable, or a cli flag, but that needs to be understood by the script runners/bin runners.

jsumners-nr commented 3 weeks ago

The clarification makes sense. Thank you.

rluvaton commented 4 days ago

After using the native test runner for quite some time, the addition of a config file will greatly improve the experience and IDE integrations

Adding a config file will help the test runner with:

  1. Adding global setup and teardown which is not possible right now using the command line and only available programmatically by manually calling some function before the run and calling a function after the tests finish which means that running a single file is much harder as you change the run file
  2. IDE integration support, with a config file, we can preload some script before any test and it will be applied for each test we run

I encountered too many times the thought - it would be so much easier with config file