AsyncLocalStorage without Async Hooks

jasnell commented 1 year ago

I've recently gone through and implemented a subset of the AsyncLocalStorage API for Cloudflare's workerd runtime, and I've done so without implementing any part of the async_hooks API under the covers, proving that it's not only possible but results in a significant performance improvement and greatly simplified implementation. I propose that we could greatly improve our implementation by eliminating dependency on async_hooks and adopting a model similar to what we've implemented in workerd.

First, an overview of how things currently work in Node.js...

At any given time, there is a "current execution context". This context is strongly associated with specific individual objects (timers, nextticks, promises, uv handles, etc).
For every AsyncLocalStorage instance, there is an associated hidden property on the resource associated with the current execution context. Whenever als.run(...) is called, we temporarily mutate the value associated with that key and run a function. Any async resources that are created while that function is running ends up receiving a full copy of that map.

This design ends up being very inefficient as it means we are allocating and copying the context for every single object. For applications that are particularly promise heavy, this gets very expensive very quickly.

The way we implemented this in workerd is far more efficient:

At any given time, there is a current AsyncContextFrame. By default there is a logical "root" frame but we do not actually need to allocate anything to represent it.

Whenever we call als.run(...), we create a new AsyncContextFrame that inherits the context of the current, and set the new frame as current. Any async resource that is created simply acquires a reference to the current AsyncContextFrame.

The context is only copied when a new AsyncContextFrame is created, which happens far less frequently than creating a new async resource.

Further, we do not need to rely on V8's PromiseHooks API to propagate context through promises. There is a built-in v8 API v8::Context::SetContinuationPreservedEmbedderData() that can do the propagation for us much more efficiently. Whenever we enter an AsyncContextFrame, we set the ContinuationPreservedEmbedderData to an opaque object that wraps a reference to the frame. V8 takes care of properly associating that with all promise continuations.

There's a lot more to it that I'm glossing over here but the basic idea is that AsyncLocalStorage is absolutely possible to implement without async_hooks and I think there's a ton of value in doing so.

There are some caveats:

In the workerd implementation, we have chosen not to implement the als.enterWith(...) and als.disable() APIs. With the new model, these would need to be removed as they make things way more complicated than they are worth.
Handling around the unhandledRejection event would need to change (see https://github.com/nodejs/node/issues/46262)
It's a massive change that touches lots of code. The way async context references are held on to by async resources would need to change dramatically, so it's not a small ask to change things.

The point of this issue is to see if there is interest in an approach to AsyncLocalStorage that is completely decoupled from async_hooks before @flakey5 and I decide whether or not to work on it.

/cc @mcollina @vdeturckheim @bmeck @bengl @nodejs/async_hooks

vdeturckheim commented 1 year ago

tl;dr I'm +1 on this

How AsyncLocalStorage works under the hood is a detail, the goal was/is to have an API solving most of the real world use cases with minimal needs to know anything about how node/the event loop/promises work
if we can make it faster, we should (optimising the bad Map part has been on my backlog for way too long

Bit more context about the APIs that would go in this case:

als.run(...) was designed to be the safe thing to use as it does not have a lot of corner cases
als.enterWith(...) was requested for the APM use case: in term of instrumentation, it might be challenging to inject a callback or a promise at certain point. Could we still have a way to handle this? (cc @bengl @Flarna who have been doing much more instrumentation than me in the past couple years)
als.disable() is mostly here to give a way out if there is a performance problem, if we are very confident about it, I think it's not a big deal to remove

Side node, as of today, only run and AsyncResource are stable so the scope of the proposed changes is technically acceptable.

Flarna commented 1 year ago

enterWith() can be usually replaced by run if one instruments by monkeypatching/wrapping a function. It adds overhead of one more closure but that's likely acceptable. For cases where one just gets an event/notification that a thing has started and one more that it has ended but the code inbetween is inaccessible for instrumentation requires to use enterWith(). It's not that common and can be solved on the caller side but not all libraries like to adapt their code for tracing.

Do you have already an idea how to integrate this into non promise use cases where SetContinuationPreservedEmbedderData() is of no help?

Any async resources that are created while that function is running ends up receiving a full copy of that map

Could you point to the location where a map is copied? actually only a reference to the store should be set on new resources (assuming here the store in an object not e.g. a string).

There is a built-in v8 API v8::Context::SetContinuationPreservedEmbedderData() that can do the propagation for us much more efficiently.

Is there some documentation available for this? In v8 headers I see only Sets a value that will be stored on continuations and reset while the continuation runs. which is not really enough for me to understand what this is actually doing.

mcollina commented 1 year ago

I'm +1 on reducing the overhead of AsyncLocalStorage to minimum. It would be a massive progress for Node.js.

I'm worrying about potential breakages. Unless you want to put the new implementation behind a flag, the best time to land this semver-major change is for v21, so we have ~1 year to iron out all the bugs before it goes into a LTS line.

jasnell commented 1 year ago

I'd be fine with putting the new implementation behind a flag. Doing so might make the implementation a bit trickier with keeping both models implemented simultaneously but the risk of breakage is non-trivial.

jasnell commented 1 year ago

Do you have already an idea how to integrate this into non promise use cases where SetContinuationPreservedEmbedderData() is of no help?

Yes, that's the part that is fairly complicated. In workerd we do this by capturing a counted reference to the current async context and setting the scope before entering the relevant callbacks, much like we do with the internal callback scope in Node.js. The advantage we have in workerd currently is that we have a much smaller set of async resource types so it's pretty simple. We'd effectively do the same with the new Node.js implementation but I still have to work out all of the implementation details.

Is there some documentation available for this? In v8 headers I see only Sets a value that will be stored on continuations and reset while the continuation runs. which is not really enough for me to understand what this is actually doing.

Unfortunately no, the documentation for this API is severely lacking. I'm hoping the v8 folks will expand the documentation on this relatively soon.

jridgewell commented 1 year ago

als.enterWith(...) was requested for the APM use case: in term of instrumentation, it might be challenging to inject a callback or a promise at certain point. Could we still have a way to handle this?

It's easy to implement.

Do you have already an idea how to integrate this into non promise use cases where SetContinuationPreservedEmbedderData() is of no help?

This is a side-effect of V8 not fully implementing the spec's HostMakeJobCallback and HostCallJobCallback. All jobs are supposed to preserve the continuation data, but V8 has only preserved it for Promise jobs. If V8 fully implemented, Node could use these two hooks for all queueing behaviors.

In v8 headers I see only Sets a value that will be stored on continuations and reset while the continuation runs. which is not really enough for me to understand what this is actually doing.

In the spec, these are the equivalent to exposing the [[HostDefined]] slot on JobCallback records. Related the previous answer, V8 is only carrying this [[HostDefined]] forward for PromiseJobs.

If you're comfortable with code splunking, the relevant links are:

PromiseFulfill/Reject (settling with a value): https://source.chromium.org/chromium/chromium/src/+/main:v8/src/builtins/promise-misc.tq;l=75-107;drc=39c3a97e848a7ecd1fa95e738771cc61d6d72552
PromiseThen (adopting a thenable): https://source.chromium.org/chromium/chromium/src/+/main:v8/src/builtins/promise-misc.tq;l=271-287;drc=39c3a97e848a7ecd1fa95e738771cc61d6d72552
Running promise jobs:https://source.chromium.org/chromium/chromium/src/+/main:v8/src/builtins/builtins-microtask-queue-gen.cc;l=191-327;drc=09769fa654c1c2d43005594163616ffbc51bc62f
(my search: https://source.chromium.org/search?q=continuation_preserved_embedder_data%20-filepath:%5Eout%20-filepath:%5Egen)

That also points us to the unimplemented other job types:

https://source.chromium.org/chromium/chromium/src/+/main:v8/src/builtins/builtins-microtask-queue-gen.cc;l=147-189;drc=09769fa654c1c2d43005594163616ffbc51bc62f

Flarna commented 1 year ago

If I understand this correct the context seen in the then/catch handler would be that one active at the time then() was called an no longer promise init. Is this correct?

For await it should be the same. For common cases where a non async function creates and returns a Promise and the caller immediatelly calls .then() it's also the same.

jridgewell commented 1 year ago

If I understand this correct the context seen in the then/catch handler would be that one active at the time then() was called an no longer promise init.

Yes. It should be unobservably the same because the .then() inits the new promise, so they'd have the same context.

For await it should be the same

Correct.

For common cases where a non async function creates and returns a Promise and the caller immediatelly calls .then() it's also the same.

Yup.

jridgewell commented 1 year ago

Testing with my implementation of this proposal, thenables are broken:

const ctx = new AsyncContext()

const queue = [];

const thenable = {
  then(onRes, _onRej) {
    queue.push("thenable: " + ctx.get());
    onRes();
  },
};

const out = ctx.run(1, () => {
  queue.push("new Promise");
  const p = new Promise(res => res(thenable));

  queue.push("p.then");
  const p2 = p.then(() => thenable);

  queue.push("p2.then");
  return p2.then(() => {
    queue.push("promise: " + ctx.get());
  });
});

queue.push("out.then");
out.then(() => {
  queue.push("done");
  //hook.disable();
  console.log(queue);
});

[
  'new Promise',
  'p.then',
  'p2.then',
  'out.then',
  'thenable: undefined',
  'thenable: undefined',
  'promise: 1',
  'done'
]

Because it uses EnqueueMicrotask, which doesn't preserve the context: https://source.chromium.org/chromium/chromium/src/+/main:v8/src/builtins/promise-resolve.tq;l=189;drc=4c61bb3131b7951ed2ed896b4df6110b1e5c072f

AndreasMadsen commented 1 year ago

I understand the goal here is to eliminate the dependence on async_hooks. But I see some similarities between this proposal and https://github.com/nodejs/diagnostics/issues/389.

Qard commented 1 year ago

I would definitely be very supportive of such a change. :)

This overlaps a lot with the work I started in #42393 too, but I haven't had time to continue work on it. It's also worth noting that my changes for TracingChannel in #44943 includes some efforts to eliminate the need for enterWith(...) for APMs via channel.bindStore(...) and channel.runStores(...).

I think we have a clear path to a good state here, there's just a bunch of work that needs to be done to get there.

GeoffreyBooth commented 1 year ago

This would seem to point a way forward for actually, finally, honest-to-god deprecation of async_hooks.createHooks. In particular, this provides two big benefits to instrumentation vendors that will hopefully motivate them to migrate from createHooks to AsyncLocalStorage:

It’s a lot faster, which is more than a little desirable for instrumentation libraries; the vendor that imposes the last CPU/memory cost on production apps would seem to have a competitive advantage I would think.
It’s cross-platform, letting a vendor’s instrumentation library support not just Node but also CloudFlare and other environments that implement this becoming-standardized AsyncLocalStorage.

@jasnell Could we perhaps have this live somewhere other than on node:async_hooks? Maybe on node:diagnostics? That could provide a way to keep the old and new implementations separate, and we could just deprecate the old one. We should probably stop adding anything to node:async_hooks, since most people associate it overall with async_hooks.createHooks which we’re trying to sunset. Thank you for doing this!

cc @nodejs/modules @nodejs/loaders ; related: https://github.com/nodejs/node/discussions/45711

Qard commented 1 year ago

@GeoffreyBooth I don't agree that this provides a path to deprecating async_hooks. There are other uses outside of context management. It would get the biggest use case off the costly and at times unstable core of async_hooks which I see as a major advantage though. Also makes it easier to restructure async_hooks if we get the highly depended on use case of context management off of it, making breaking changes less ecosystem impacting.

legendecas commented 1 year ago

Would the experimental onPropagate constructor option introduced in https://github.com/nodejs/node/pull/45386 work without async_hooks?

jasnell commented 1 year ago

Would the experimental onPropagate constructor option introduced in https://github.com/nodejs/node/pull/45386 work without async_hooks?

Hmmm good question. I'm not sure to be honest... I don't think so? At least not in its current form.

In the new model that I'm proposing, the async context is not propagated every time a new AsyncResource is created but every time the context is mutated using asyncLocalStorage.run() (essentially copy-on-write). Whenever an AsyncResource is created, it simply grabs a reference to the current storage context and having to invoke a callback function every time would be quite costly for performance.

What I'm thinking would likely be a better solution (and more likely to be something we could implement with this new model) is to associate a tag/namespace with the store that must be echoed back in the asyncLocalStorage.getStore() call in order to retrieve the stored value.

const { AsyncLocalStorage } = require('async_hooks');

const als = new AsyncLocalStorage({ tag: 'abc' });

const { 0: fn1, 1: fn2 } = als.run(123, () => {
  return [
    AsyncResource.bind(() => als.getStore('abc')),
    AsyncResource.bind(() => als.getStore()),
  ]
});

console.log(fn1());  // 123, because the tag passed in to als.getStore matched
console.log(ln2()); // undefined, because the tag did not match.

legendecas commented 1 year ago

In the new model that I'm proposing, the async context is not propagated every time a new AsyncResource is created but every time the context is mutated using asyncLocalStorage.run() (essentially copy-on-write). Whenever an AsyncResource is created, it simply grabs a reference to the current storage context and having to invoke a callback function every time would be quite costly for performance.

Sounds pretty similar to @jridgewell's investigation https://github.com/legendecas/proposal-async-context/pull/15 on the mutation of the global state and the possible optimization based on the assumption.

jasnell commented 1 year ago

I honestly think we should consider re-thinking onPropagate now with this in mind, before it sits for too long in its current state.

jasnell commented 1 year ago

I'll open an issue specifically to discuss onPropagate => https://github.com/nodejs/node/issues/46374

Flarna commented 1 year ago

@jasnell corrected your link above.

nodejs / node

AsyncLocalStorage without Async Hooks #46265