nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.7k stars 29.64k forks source link

Entrypoint Hooks (carry over discussion from Austin Collab Summit) #43408

Open jasnell opened 2 years ago

jasnell commented 2 years ago

Originally had this as a discussion in https://github.com/nodejs/node/discussions/43384


At the Austin Collaborator Summit, there was significant discussion around the need for a more well-defined startup lifecycle with a clearer boundary between the preload phase and the loading/evaluation of the user entry point. The use cases include more reliable handling for APMs, dynamic transpilers, diagnostic tooling, and more. I took the task of working up an initial proposal. Here is that proposal:


Entrypoint Hooks

Currently, the Node.js startup process consists of a single bootstrap phase in which the Node.js core internal mechanisms and environment are set up followed by the loading and instantiation of the user-provided entry point script.

stateDiagram-v2
  state "Node.js Startup" as A
  state "Preloads (sync eval)" as B
  state "User entry point script (sync eval)" as C
  state "Start event loop" as D
  state "Process preload and entry point async tasks" as E
  state "Run event loop" as F
  [*] --> A
  A --> B
  B --> C
  C --> D
  D --> E
  E --> F
  F --> [*]

The User Entry point here is the script that is provided as the argument to the node binary (e.g. node foo.js, foo.js is the User Entry point).

Historically with Node.js, there have always been scenarios where it is desirable to load and run code before the User Entry point performs any actions. This can be accomplished with several methods:

While each of these have historically been effective, they each suffer from a number of limitations, not the least of which is the lack of a clear separation between the execution of the preload code and the user entry point. Take, for instance, the following example:

Imagine a preload script with a simple one-line of code:

// preload.js
setImmediate(() => console.log('preload');

And a User Entrypoint script with the following:

// entry.js
console.log('entrypoint');

Now run the node binary as:

node -r ./preload.js entry.js

The order of the statements printed will be:

entrypoint
preload

This is because while the preload script does run before entry point script, it schedules async activity that does not get invoked until after the event loop has started, after the entry point script has been evaluated. While waiting for the preload script to complete, a lot of user code can run.

In other words, while there is a clear boundary at which preload can begin, there is no such boundary for when preload completes.

This is a proposal for establishing a clearer lifecycle boundary

Proposal

In the proposed new model, a new Entrypoint Hook phase is introduced into the Node.js startup following the completion of the bootstrap. During the Entrypoint Hook phase, one or more preload scripts can be loaded and evaluated in a user-defined order, in precisely the same way that preload scripts (using the -r argument) are loaded except for one very important distinction: Immediately after loading and evaluating these preload scripts, the Node.js event loop will be started to allow any asynchronous operations initiated by those to be run to completion. When there are no further async tasks for that first run of the event loop to complete, the entry point hook phase of the bootstrap will be considered to be complete, the event loop will be reset, and the user entry point will be loaded and evaluated, continuing the Node.js startup just as it does today. If there are no preload scripts to run, this entire new phase is skipped.

stateDiagram-v2
  state "Node.js Startup" as A
  state "Preloads (sync eval)" as B
  state "Start event loop" as C
  state "Process preload async tasks" as D
  state "Stop event loop" as E
  state "User entry point script (sync eval)" as F
  state "Start event loop" as G
  state "Process entry point async tasks" as H
  state "Run event loop" as I
  state "Entry point hook phase" as J
  state "User entry point run phase" as K
  [*] --> A
  A --> B
  state J {
    B --> C
    C --> D
    D --> E
    E --> F
  }
  state K {
    F --> G
    G --> H
    H --> I
    I --> [*]
  }

With this approach, the preload scripts run during the Entrypoint Hook phase are permitted to fully complete and can alter the user entry point before it begins.

Importantly, at the end of the entry point hook phase, there are no pending async tasks of any kind carrying over into the evaluation of the user entry point script. The entry point hooks may allocate handles that persist across the boundary between phases (e.g. network handles, file descriptors, etc) but those will have no pending i/o by the end of the phase.

Use Case: Serverless

In the serverless use case, a serverless host environment can use the entry point hook phase to load any supporting framework code and initialization process it needs before completing the actual user entry point script.

Use Case: APMs/Diagnostic Tools

In the APM use case, diagnostic tools can use the entry point hook phase to load any diagnostic instrumentation it needs to prepare, even if that tooling is initialized asynchronously (e.g. to query file system or network for license or configuration data)

Use Case: Dynamic Transpilers

Because the entry point hook is guaranteed to run to completion before the start of the user entry point, they can be used to implement dynamic transpilation of the user entry point before it completes. For instance, a TypeScript entry point hook can transpile a typescript file passed in as the user entry point and trigger Node.js to load and execute the compiled JavaScript result rather than trying to run the typescript file that was provided:

What about startup time? Cold starts?

Entrypoint Hook scripts will have an impact on Node.js binary startup time when used. There are, fortunately, mechanisms for mitigating such costs. It would be possible, for instance, to capture a snapshot of the preloads such that loading and initial evaluation cost is reduced in exactly the same way that we have created snapshots of the Node.js bootstrap and are working to create snapshots of the user entry point. Preloads, however, are not trivial and effort will need to be made to ensure a minimal performance cost.

What is the relationship to Loaders?

Pluggable loaders are invoked as a result of require() or import (static or dynamic). The entry point hooks run once immediately upon start of the Node.js process or worker thread startup, and that is it. As such, they serve two entirely different purposes.

Flarna commented 2 years ago

Are async hooks really broken by using an ESM entry point? Clearly several async operations have been executed already at this time but usually noone relies that asyncId starts with value 1.

Or is it expected that during this startup phase already transactions are created which need to be tracked by AsyncLocalStorage?

GeoffreyBooth commented 2 years ago

Are async hooks really broken by using an ESM entry point?

This is what I'd like more clarification on. Basically, involving ESM at startup (so an ESM entry, using --loader or --import, or if core starts using promises as part of bootstrap beyond the ESM loader) means that unsettled promises already exist when the first line of user code runs. When I read https://nodejs.org/api/async_hooks.html I don't find any mention of concerns about startup, but apparently it's an undocumented requirement of async_hooks that they be initialized before the first promise of any kind—user or Node internal—is created. That's the assumed premise of the proposal on this thread. If anyone can point me to any resources explaining this, especially what can't be achieved if this requirement is unmet, I would appreciate it.

At the collaborator summit someone said that Datadog had an experimental instrumentation library that worked with Node ESM, that would leave experimental once loaders did. Can anyone explain more about this, too? Does this ESM library have any limitations as compared with the CommonJS equivalent, especially as related to async_hooks?

mcollina commented 2 years ago

We are lacking quite a few user friendly explanations of async_hooks. When async_hooks are active, every asynchronous activity in Node.js has an id and a parent id. In other terms, there is a causality link between all asynchronous activities. async_hooks enable developers to attach custom behavior and data to all asynchronous actions. The fundamental premise of this approach is to be able to set up the hooks before any asynchronous activity, otherwise they won't have a complete instrumentation, with hard to debug side effects.

GeoffreyBooth commented 2 years ago

I appreciate everyone trying to convey the importance of async_hooks being registered before any async activity occurs, but I struggle to understand why it’s so vital, or if it’s even happening. Perhaps some code might help. When I tried the useESMLoader = true experiment mentioned above, I got lots of failing tests. The first one in the list was https://github.com/nodejs/node/blob/main/test/parallel/test-async-hooks-correctly-switch-promise-hook.js, so I looked into that one. Here’s a simplified version:

const async_hooks = require('async_hooks');
const process = require('process');

const promises = new Map();

async_hooks.createHook({
  init(asyncId, _type, triggerAsyncId, _resource) {
    promises.set(asyncId, { asyncId, triggerAsyncId });
  },
  before(asyncId) {
    if (promises.has(asyncId)) {
      promises.get(asyncId).before = true;
    }
  },
  after(asyncId) {
    if (promises.has(asyncId)) {
      promises.get(asyncId).after = true;
    }
  },
  promiseResolve(asyncId) {
    if (promises.has(asyncId)) {
      promises.get(asyncId).promiseResolve = true;
    }
  }
}).enable();

async function main() {
  return Promise.resolve();
}

main();

process.on('exit', () => {
  console.log(promises.values());
});

I saved this as test-async-hooks.cjs and ran it under Node 18.7.0 as node ./test-async-hooks.cjs. It printed this output:

[Map Iterator] {
  {
    asyncId: 6,
    triggerAsyncId: 1,
    before: true,
    after: true,
    promiseResolve: true
  },
  { asyncId: 7, triggerAsyncId: 1, promiseResolve: true },
  {
    asyncId: 8,
    triggerAsyncId: 7,
    before: true,
    promiseResolve: true,
    after: true
  }
}

Next I copied it to test-async-hooks.mjs and rewrote the require statements to import statements, leaving everything else identical. Running that as node ./test-async-hooks.mjs printed this output:

[Map Iterator] {
  {
    asyncId: 11,
    triggerAsyncId: 0,
    before: true,
    after: true,
    promiseResolve: true
  },
  { asyncId: 12, triggerAsyncId: 0, promiseResolve: true },
  {
    asyncId: 14,
    triggerAsyncId: 13,
    before: true,
    promiseResolve: true,
    after: true
  },
  {
    asyncId: 15,
    triggerAsyncId: 12,
    before: true,
    promiseResolve: true,
    after: true
  }
}

The hooks captured 3 promises in the CommonJS version versus 4 promises in the ESM version, but in both versions there was one promise which triggered the init and promiseResolve hooks but not the before or after hooks. This leads me to a few questions:

In short, at least from this tiny test it seems like Node already violates the rule that async hooks must be registered before any async activity occurs, even for CommonJS. If I had to guess, I would think that all that this means is that instrumentation libraries simply can’t display that activity in their dashboards; and that some async activity can’t be traced back to its parents, but since the parent is something within Node core it’s probably not of much interest to the average application developer. What else am I missing?

mcollina commented 2 years ago

If it’s so important that async hooks are registered before any async activity, doesn’t this pending promise already violate that requirement? If so, what specific consequences entail as a result?

Likely nothing because that promise is very likely a one-off, without children. However if that generated a tree of asynchronous activity, that would be problematic.


Most people in this thread that worked on async_hooks would classify them as problematic to maintain. A significant refactor means that we will play whack-a-mole for bugs for a while.


async_hooks is not just about promises, we call them resources and this includes everything that can be asynchronous is Node.js. async_hooks is designed so that hooks can be initialize before everything is even loaded.

In commonjs, we can always execute something before any other code is loaded. In ESM, our entry point is executed last, after the initialization of all the modules we imported is completed. If any of those other modules starts something asynchronous in their initialization, we won't be able to track them.

@bengl can likely explain better how all of this currently works, and if/how having a clear entrypoint hook would make their integration significantly easier too.

GeoffreyBooth commented 2 years ago

In commonjs, we can always execute something before any other code is loaded. In ESM, our entry point is executed last, after the initialization of all the modules we imported is completed. If any of those other modules starts something asynchronous in their initialization, we won’t be able to track them.

If the requirement is just that users can register async hooks before executing any user code, in order to ensure that all user code-generated async activity can be tracked, that’s a huge difference. We can achieve that today, either through the just-landed --import or through a carefully written ESM entry like this:

await import('./register-async-hooks.js')
await import('./app.js')

Because these are dynamic import()s, no user code anywhere in the tree descending from app.js will be evaluated until after register-async-hooks.js runs and resolves. The brand-new --import flag should allow the same thing at the CLI level, like node --import ./register-async-hooks.js ./app.js.

Or to put it another way, in ESM too we can always execute something before any other code is loaded. If that’s all that’s needed, then I think the module systems are already equivalent with regard to async_hooks?

Qard commented 2 years ago

Just to clarify the behaviour of that specific test: the promise without a before/after is not created before async_hooks is registered. If it was it never would have run the init at all and would therefore get skipped entirely in the other hook functions because of the has(...) checks.

That promise actually represents the bit of code between the start of an async function and its first await. Due to spec weirdness, a promise is generated representing that part of the function, but it runs synchronously without a continuation so it never triggers the before and after. If you look at the last promise, you'll notice the triggerAsyncId matches the asyncId of that different promise which would normally only be the case if it ran during a continuation callback of it, but in this case it gets connected directly without a continuation.

What matters to avoid crashes with async_hooks is that new init events can correctly resolve the triggerAsyncId edge back to its initiator. If async_hooks was not loaded when that parent was initiated it may produce a resource stack which doesn't reflect reality and then try to unwind past what it knows how to unwind.

Generally async_hooks has a bunch of safety checks to try and gracefully handle conditions where it doesn't understand the state, but they've proven insufficient numerous times in the past so I wouldn't trust it to be able to recover safely.

Also, it has been my experience that most users of async_hooks have generally failed at making their own use of it safe, making assumptions like a before will always have an after, which isn't true if you stop it in a callback before the after for that callback is reached. More commonly I've seen an expectation that before/after should have had an init that was seen, which in your case of having an unresolved promise from before the entrypoint top-level could result in a before/after appearing with no seen init.

These implicit connections between events are why I've always strongly believed async_hooks was a bad design with far too many footguns. This is why I pushed for things like AsyncLocalStorage in core to have APIs which would behave in a much more predictable and understandable way. Unfortunately, ALS doesn't solve all the use cases of async_hooks, and there's been little effort to introduce better APIs for those other use cases.