nodejs / help

:sparkles: Need help with Node.js? File an Issue here. :rocket:
1.48k stars 281 forks source link

AsyncLocalStorage doesn't propagate across Promise chains? #3041

Closed yairk closed 4 months ago

yairk commented 4 years ago

I've been playing with AsyncLocalStorage with the purpose of using it for propagating context across web requests. I had issues getting it to work in Jest-based unit-tests: specifically, stores set in beforeEach would not propagate to the tests. Ultimately I realized that the test's async context was created prior to the beforeEach hook being actually run.

While trying to investigate this, I came up with the following code:

const async_hooks = require('async_hooks');
const fs = require('fs');
const util = require('util');

let asyncLocalStorage = new async_hooks.AsyncLocalStorage();
asyncLocalStorage.enterWith({blah: 7});

function log(...args) {
  fs.writeSync(1, `${util.format(...args)}\n`);
}

function getCurrentId() {
  return async_hooks.executionAsyncId();
}

new Promise((res, rej) => {
    log(`first promise: ${getCurrentId()}`);
    res();
    log(`first promise done`);
}).then(() => {
    log(`second promise: ${getCurrentId()}`);
    asyncLocalStorage.enterWith({id: 7});
    log(`id in second promise=${asyncLocalStorage.getStore().id}`);
    log(`second promise done`);
}).then(() => {
    log(`third promise: ${getCurrentId()}`);
    log(`id in third promise=${asyncLocalStorage.getStore().id}`);
});

With some debug prints enabled (courtesy of async-local-storage), the above code gives the following output:

2(PROMISE) init by 1
first promise: 1
first promise done
3(PROMISE) init by 2    <-- when first promise is done, context for the second promise is init'd
4(PROMISE) init by 3    <-- but context for third promise is also initiated, before the second promise even ran!
second promise: 3
id in second promise=7  <-- so the async store is set only after the context for the third promise is init'd
second promise done
third promise: 4
id in third promise=undefined  <-- which means AsyncLocalStorage did not copy the store to the third promise

Am I missing something in expecting the store to propagate to the third promise as well?

mhdawson commented 4 years ago

@vdeturckheim, @quard FYI

yairk commented 3 years ago

@vdeturckheim, @Quard will you be able to weigh in on the above?

vdeturckheim commented 3 years ago

Hey, sorry for the late reply, for some reason I missed the ping. I need to run a couple tests - I will do it tonight an circle back :)

vdeturckheim commented 3 years ago

So, this is somehow an expected behavior tbh, I slightly changed the code to

'use strict';
'use strict';
const async_hooks = require('async_hooks');
const fs = require('fs');
const util = require('util');

let asyncLocalStorage = new async_hooks.AsyncLocalStorage();
asyncLocalStorage.enterWith({blah: 7});

function log(...args) {
    fs.writeSync(1, `${util.format(...args)}\n`);
}

function getCurrentId() {
    return async_hooks.executionAsyncId();
}

async_hooks.createHook({
    init(asyncId, type, triggerAsyncId, resource) {
        log(`${triggerAsyncId} => ${asyncId}`);
    }
}).enable();

log(`before all async id: ${async_hooks.executionAsyncId()}`);
Promise.resolve()
    .then(() => {
        log(`first promise async id: ${async_hooks.executionAsyncId()}`);
        log(`first promise: ${getCurrentId()}`);
        log(`before enterWith, id in first promise=${asyncLocalStorage.getStore().id} and blah=${asyncLocalStorage.getStore().blah}`);
        asyncLocalStorage.enterWith({id: 7});
        log(`id in first promise=${asyncLocalStorage.getStore().id} and blah=${asyncLocalStorage.getStore().blah}`);
        log(`first promise done`);
    })
    .then(() => {
        log(`second promise async id: ${async_hooks.executionAsyncId()}`);
        log(`second promise: ${getCurrentId()}`);
        log(`id in second promise=${asyncLocalStorage.getStore().id} and blah=${asyncLocalStorage.getStore().blah}`);
    });

and it logs

before all async id: 1
1 => 2
2 => 3
3 => 4
first promise async id: 3
first promise: 3
before enterWith, id in first promise=undefined and blah=7
id in first promise=7 and blah=undefined
first promise done
second promise async id: 4
second promise: 4
id in second promise=undefined and blah=7

Basically, the second promise async scope is created before the first one ran. At this point, they share the same object as strore. When updating the store in the promise, this was not spread because there was no new call to init.

That's a bit why I actually recommend to always avoid using enterWith and use run as much as possible (the callback-using syntax of run makes it way harder to fall in these situations). I see a few hacks we could use in the API to fix this behavior but I don't see any being memory safe or that would not introduce other complicated cases. Maybe @Qard or @Flarna can see something I miss.

(cc @bmeck I think this is one of the cases that concerns you regarding this API).

bmeck commented 3 years ago

correct

Qard commented 3 years ago

This is expected behaviour. The init of each promise is when .then(...) is called, not when the callback given to it is called. AsyncLocalStorage only follows the init path of async_hooks, while what you seem to want here is for it to follow the promiseResolve path. Even run(...) is not going to do what you want here:

const als = new AsyncLocalStorage()

als.run({ foo: 'bar' }, () => {
  Promise.resolve()
    .then(() => {
      return als.run({ foo: 'baz' }, () => Promise.resolve())
    })
    .then(() => {
      console.log(als.getStore()) // contains foo: bar
    })
})
yairk commented 3 years ago

@Qard @vdeturckheim thanks for looking into this! Indeed, this matches my experience that AsyncLocalStorage only propagates its store at the time of an init.

I encountered this when creating tests using jest (which is based on jasmine). Essentially I found that if I set the store in a beforeEach hook, it would not be visible to the test body. Turns out that jasmine first builds a Promise chain and only then starts invoking the hooks and the test.

I'm tempted to implement my own flavor of AsyncLocalStorage which would also propagate the store in a PromiseResolve hook. So in this example here it would use the init hook to keep track of the parent-child relationships, and once the first Promise is resolved, it would propagate the store to the child contexts created by this Promise.

Are there caveats here that I'm not anticipating? This is my first time working with async hooks so my thinking here could be completely wrong.

Qard commented 3 years ago

That sounds reasonable to me. You will just need to make sure you have a way to get the promise resource from the promiseResolve hook so you can propagate the context at that point. It might also make sense as an extra setting for AsyncLocalStorage to propagate in that way, if desired. Just be aware that there is a very specific reason this is not the default behaviour. Allow me to explain.

A promise is conceptually not itself async per-se, but rather it can be seen as an abstraction of something else async like an event emitter is sync until you emit from within some other asyncrony. But the fact that a promise handler is scheduled on the microtask queue makes it a form of queueing system. Normally the suggested way to handle function queuing systems is to wrap them with AsyncResource. Due to the low-level nature of promises this is not possible so we instead use PromiseHook which roughly corresponds to the async_hooks lifecycle events.

But why does init happen where it does, and why is promiseResolve a separate thing? From the user perspective, AsyncResource ensures a context links back to the user code which supplied the callback and not to internals somewhere such as a connection pool which could potentially trigger from a completely unrelated context. It's the same case here in that what happens within a then handler is internal to the promise whereas the attachment of that then should link back to the context in which it was attached. This problem mostly goes away in async/await as the continuations are effectively nested rather than chained, due to how the spec works, making the outer resolve and the init effectively the same context.

The graph produced by async_hooks is not a pure logical graph, it is an abstract conceptual graph, expressing the structure of the program as seen by the user, not as understood by the underlying system which views it purely as a sequence of interleaved events received from the event loop and mapped to callbacks.

yairk commented 3 years ago

I (think I) get what you mean by async_hooks producing a graph that corresponds to the structure seen by the user.

The way I'm currently thinking about my use case is that I want a particular piece of context to travel across "scheduling details" such as async/await, Promises, queues, events, etc. Seems like this is in conflict with your description of the way async_hooks works - from my perspective, I don't care when the Promise context is created; I only care about the actual execution of my code and the places where it "branches". From this perspective, moving to the next Promise in the chain should not lose the context.

If I understand you correctly, you're saying that AsyncResource is the right tool for this job, but that it doesn't work well with Promises? Are PromiseHooks exposed to Javascript code, or are they considered internal?

As for enhancing (or "forking") AsyncLocalStorage, I think this can create problems if some of the Promises do async work, for example:

'use strict';
const async_hooks = require('async_hooks');
const fs = require('fs');
const util = require('util');

let asyncLocalStorage = new async_hooks.AsyncLocalStorage();
asyncLocalStorage.enterWith({blah: 7});

function log(...args) {
    fs.writeSync(1, `${util.format(...args)}\n`);
}

async_hooks.createHook({
    init(asyncId, type, triggerAsyncId, resource) {
        log(`${triggerAsyncId} => ${asyncId}`);
    },
    promiseResolve(asyncId) {
        log(`resolve ${asyncId}`);
    }
}).enable();

log(`before all async id: ${async_hooks.executionAsyncId()}`);
Promise.resolve()
    .then(async () => {
        log(`first promise async id: ${async_hooks.executionAsyncId()}`);
        await new Promise((res, rej) => setTimeout(() => res(), 1000));  // <-- creates a new async context
        asyncLocalStorage.enterWith({id: 7});
        log(`first promise async id: ${async_hooks.executionAsyncId()}`);
        log(`id in first promise=${asyncLocalStorage.getStore().id} and blah=${asyncLocalStorage.getStore().blah}`);
        log(`first promise done`);
    })
    .then(() => {
        log(`second promise async id: ${async_hooks.executionAsyncId()}`);
        log(`id in second promise=${asyncLocalStorage.getStore().id} and blah=${asyncLocalStorage.getStore().blah}`);
    });

Now, the second Promise runs in a child context of the first Promise, but enterWith is running in a sibling context to the second Promise. So it's not strictly about parent-child relationships anymore...

I can probably figure out how to deal with this scenario, but this again makes me wary that the real problem here is with what I expect, not with the existing language mechanisms.

Qard commented 3 years ago

PromiseHook is part of V8. I'm working on a rewrite which should (eventually) enable it to be exposed in JavaScript cleanly, as a separate API, but it's not there yet. Currently it has a partly JavaScript-facing internal API, but only conditionally, so it's not yet possible for it to be used directly.

As for context flowing through promises, there's two ways to look at that: what you want is the internal path which flows through the resolving logic, whereas what it does now is step around that, binding the callback directly to the point where then(...) was called. This is important because a promise is a container which can be passed around and then(...) could be run from anywhere. If you were to flow context from the resolve path, that path might be entirely unrelated to the actual user code path, for example, a common pattern in database libraries is to create a connection promise out-of-band from a request, and do a then(...) continuation on it whenever the connection is needed. Flowing from the resolve path in this case would point outside of the request, which would be incorrect. By linking context of a then/catch handler directly to the point where it is called, it ensures the context within that callback is conceptually valid in expressing the relation in user code between the point where the handler is attached and the point where it is run. It's still possible to interact with the stored object if the promise resolution is a direct result of actions with the request, but replacing the object will not flow into the handler because the resolve and the corresponding promise are decoupled with a microtask barrier between when the resolve actually happens and when the handler runs--the handler could theoretically be put off indefinitely even though the resolve has already happened, depending on interactions with the microtask queue.

yairk commented 3 years ago

To your database example, I understand why one would expect the ALS store to be propagated during init, but I would argue that it would make sense to also propagate it across the resolution path. If the DB library created a connection promise and attached a then continuation, I think it's very plausible that any ALS store updates made in the connection promise, should also be propagated to the continuation. And for cases where this is not desired, we can have another construct with similar semantics to AsyncLocalStorage.run which would run a piece of code under a certain context without "leaking" it to the rest of the promise.

I played some more with async_hooks and realized it can be challenging for me to propagate the context correctly across Promise resolutions. For example, with the following code:

'use strict';
const async_hooks = require('async_hooks');
const fs = require('fs');
const util = require('util');

let asyncLocalStorage = new async_hooks.AsyncLocalStorage();
asyncLocalStorage.enterWith({blah: 7});

function log(...args) {
    fs.writeSync(1, `${util.format(...args)}\n`);
}

async_hooks.createHook({
    init(asyncId, type, triggerAsyncId, resource) {
        log(`${triggerAsyncId} => ${asyncId} (${type})`);
    },
    promiseResolve(asyncId) {
        log(`resolve ${asyncId}`);
    },
    after(asyncId) {
        log(`after ${asyncId}`);
    },
    destroy(asyncId) {
        log(`destroy ${asyncId}`);
    },
}).enable();

log(`before all async id: ${async_hooks.executionAsyncId()}`);
Promise.resolve()
    .then(async () => {
        // runs in context id 3
        log(`first promise async id: ${async_hooks.executionAsyncId()}`);
        await new Promise((res, rej) => setTimeout(() => res(), 1000));
        // runs in context id 8 (grandparent: 3)
        Promise.resolve()
            .then(() => {
                // runs in context id 11 (grandparent: 8)
                log(`subpromise: ${async_hooks.executionAsyncId()}`);
            });
        // still runs in context id 8 (grandparent: 3)
        asyncLocalStorage.enterWith({id: 7});
        log(`first promise async id: ${async_hooks.executionAsyncId()}`);
        log(`id in first promise=${asyncLocalStorage.getStore().id} and blah=${asyncLocalStorage.getStore().blah}`);
        log(`first promise done`);
    })
    .then(() => {
        // runs in context id 4 (parent: 3)
        log(`second promise async id: ${async_hooks.executionAsyncId()}`);
        log(`id in second promise=${asyncLocalStorage.getStore().id} and blah=${asyncLocalStorage.getStore().blah}`);
    });

I get the following output:

before all async id: 1
1 => 2 (PROMISE)
resolve 2
2 => 3 (PROMISE)
3 => 4 (PROMISE)
3 => 5 (PROMISE)
first promise async id: 3
3 => 6 (PROMISE)
3 => 7 (Timeout)
6 => 8 (PROMISE)
resolve 3
after 3
5 => 9 (PROMISE)
resolve 6
after 7
8 => 10 (PROMISE)
resolve 10
10 => 11 (PROMISE)
first promise async id: 8
id in first promise=7 and blah=undefined
first promise done
resolve 5
resolve 8
after 8
subpromise: 11
resolve 11
after 11
resolve 3
resolve 9
after 9
second promise async id: 4
id in second promise=undefined and blah=7
resolve 4
after 4
destroy 7

So now I set my store in the first Promise, but in a context that's a grandchild of the Promise context. That context is not a parent of the context of the second Promise. Rather, they are siblings (or even cousin and uncle...). So this means I have to do a more sophisticated relationship-analysis when deciding where I need to propagate the store upon promiseResolve.

To complicate things further, I wouldn't expect store updates done within the sub-Promise to propagate to the second Promise, but I can't come up with a good mechanism for ensuring this, without relying on the exact context parent/child tree which seems incredibly fragile. I thought that maybe the after or destroy hooks would help me see where a context was completely destroyed, but I added the logs and I don't see a way to use them.

It seems like the only thing to do at this point is to keep in mind the potential pitfalls and use the API very carefully. I'm also looking at https://www.npmjs.com/package/async-local-storage, which is similar to AsyncLocalStorage with one key difference that during a store lookup it dynamically traverses the parent/child tree to find where the store was set. This solves my scenario, with the cost of introducing other pitfalls (sometimes changing the store will affect contexts that were created before the store update).

ghost commented 2 years ago

hey @yairk any update about the issue? We are experience the same in node 16.17 and jest. If we init AsyncLocalStorage in before each its being lost in the second test suite. Note: jest is ran with --runInBand

yairk commented 2 years ago

hey @yairk any update about the issue? We are experience the same in node 16.17 and jest. If we init AsyncLocalStorage in before each its being lost in the second test suite. Note: jest is ran with --runInBand

@ezekiel25-17 we did not find a way to do this purely with AsyncLocalStorage. We ended up creating a hack: we created a wrapper around AsyncLocalStorage that either retrieves data from ALS or from a global variable that is only set in the tests...

jpreynat commented 1 year ago

Hi everyone 👋

I've been running into this kind of scenario with enterWith too. However, I noticed something even more strange when using it.

Here are two examples:

Maybe through your testing, someone could shed some light on why the second one acts like this.

Note that both examples are run using Node 16.18.0 without any transpiler (which I initially thought could be messing with the compiled code).

First example

const { AsyncLocalStorage } = require('async_hooks');

const als = new AsyncLocalStorage();

/** Runs the main code and reads the state of the AsyncLocalStorage after execution */
async function main() {
    console.log('main 1, store:', als.getStore());

    await doSomething();
    console.log('main 2, store:', als.getStore());
}

/**
 * Sets the AsyncLocalStorage value with `enterWith` for the remainder of the current synchronous execution
 * https://nodejs.org/api/async_context.html#asynclocalstorageenterwithstore
 */
function prepareALS(value) {
    als.enterWith(value);
}

/** Play with the store */
async function doSomething() {
    console.log('doSomething 1, store:', als.getStore());

    prepareALS(1);
    console.log('doSomething 2, store:', als.getStore());

    prepareALS(2);
    console.log('doSomething 3, store:', als.getStore());

    prepareALS(3);
    console.log('doSomething 4, store:', als.getStore());
}

main()
    .then(() => console.log('done'))
    .catch((err) => console.error(err));

Resulting logs:

main 1, store: undefined <= expected, no value set yet
doSomething 1, store: undefined <= expected, no value set yet
doSomething 2, store: 1 <= expected, value just set to 1 in the current context
doSomething 3, store: 2 <= expected, value just set to 2 in the current context
doSomething 4, store: 3 <= expected, value just set to 3 in the current context
main 2, store: 3 <= expected, value was lastly set to 3 in the synchronous context
done

Second example

Notice that only the prepareALS function was changed from sync to async

const { AsyncLocalStorage } = require('async_hooks');

const als = new AsyncLocalStorage();

/** Runs the main code and reads the state of the AsyncLocalStorage after execution */
async function main() {
    console.log('main 1', als.getStore());

    await doSomething();
    console.log('main 2', als.getStore());
}

/**
 * Sets the AsyncLocalStorage value with `enterWith` for the remainder of the current synchronous execution
 * https://nodejs.org/api/async_context.html#asynclocalstorageenterwithstore
 */
async function prepareALS(value) {
    als.enterWith(value);
}

/** Play with the store */
async function doSomething() {
    console.log('doSomething 1', als.getStore());

    await prepareALS(1);
    console.log('doSomething 2', als.getStore());

    await prepareALS(2);
    console.log('doSomething 3', als.getStore());

    await prepareALS(3);
    console.log('doSomething 4', als.getStore());
}

main()
    .then(() => console.log('done'))
    .catch((err) => console.error(err));

Resulting logs:

main 1, store: undefined <= expected, no value set yet
doSomething 1, store: undefined <= expected, no value set yet
doSomething 2, store: 1 <= expected, value just set to 1 in the current context
doSomething 3, store: 2 <= expected, value just set to 2 in the current context
doSomething 4, store: 3 <= expected, value just set to 3 in the current context
main 2, store: 1 <= ?! unexpected
done

So at the end of the second example, I see two possible coherent results:

However, the ALS ends up keeping the value set in the first asynchronous call for some reason that I can't explain. Thanks for your help.

Flarna commented 1 year ago

You changed a bit more then prepareALS - you changed also doSomething do await it. each await creates a a few promises and async operations.

Only the call to await prepareALS(1) is on the sync path therefore this value is finally on the store at the end of main(). All the others run in a new async operation and as a result the enterWith() calls done there have no impact to the initial sync operation.

jpreynat commented 1 year ago

Hi @Flarna. Thanks for the quick reply and the explanation, that makes sense 👍

So I've finally managed to reach the state that I wanted through a "simple" workaround and ensure the last value set on the AsyncLocalStorage, even in a further Promise, is properly readable when back in the main sync context. This could help other people trying to achieve the same behavior (which is what I initially wanted). (CC @yairk @ezekiel25-17 if that works for you).

The idea is simply to always bind the execution context of a function that gets the AsyncLocalStorage store to the latest context that set the store value via enterWith.

Let me know if you see anything that could go wrong with this method, any comments are welcome 🙂

const { AsyncLocalStorage, AsyncResource } = require('async_hooks');

const als = new AsyncLocalStorage();

/** Runs the main code and reads the state of the AsyncLocalStorage after execution */
async function main() {
    console.log('main 1, store:', getLatestStore());

    await doSomething();

    console.log('main 2, store:', getLatestStore());
}

/** Our mutable function allowing to get the ALS store in the last context that it was set */
let getLatestStore = getStore;
function getStore() {
    return als.getStore();
}

/**
 * Sets the AsyncLocalStorage value with `enterWith` for the remainder of the current synchronous execution
 * https://nodejs.org/api/async_context.html#asynclocalstorageenterwithstore
 */
async function prepareALS(value) {
    als.enterWith(value);
    // Set `getLatestStore` to return the store in the current context
    getLatestStore = AsyncResource.bind(getStore);
}

/** Play with the store */
async function doSomething() {
    console.log('doSomething 1, store:', getLatestStore());

    await prepareALS(1);
    console.log('doSomething 2, store:', getLatestStore());

    await prepareALS(2);
    console.log('doSomething 3, store:', getLatestStore());

    await prepareALS(3);
    console.log('doSomething 4, store:', getLatestStore());
}

main()
    .then(() => console.log('done'))
    .catch((err) => console.error(err));

Resulting logs:

main 1, store: undefined
doSomething 1, store: undefined
doSomething 2, store: 1
doSomething 3, store: 2
doSomething 4, store: 3
main 2, store: 3
done
Flarna commented 1 year ago

Honestly speaking I don't know the use case here. Usually using enterWith is a bad idea and run should be used. enterWith was intended for situations where it is not possible to wrap the relevant code and use run. There might be also situations where you really want to set something on local store for the remainder of the current synchronous execution but you should really know that only relevant code is running there (incl. node internals,...).

I assume in your final setup getLatestStore will be instantiated once per request/task or whatever you execute instead once per application. Otherwise I would assume that parallel tasks overwrite getLatestStore. if not you could avoid using AsyncLocalStorage and just store the value into this variable instead.

I also don't understand why prepareALS is an async function but maybe that's just to illustrate that doSomething has to await something.

Qard commented 1 year ago

I would recommend against using enterWith(...). There's a strongly-worded warning against its use in the docs for a good reason. There are lots of cases like this one where persisting until the end of the sync tick can be very confusing. It's much better to use the run(...) function as it's more clearly scoped.

jenseng commented 1 year ago

I filed https://github.com/nodejs/node/issues/45848 which is basically a dup of this issue, but there is one little wrinkle I don't see called out in the comments above. While I get that enterWith is problematic and that inner async context shouldn't propagate to the outer async scope, what's odd is that sometimes it does, which I think trips people up when they try to use enterWith in jest.

Consider these examples where the inner async context behavior changes depending on whether other stores are in play and if you use async/await syntax vs. promises:

  1. Promises ✅
    const { AsyncLocalStorage } = require("async_hooks");
    const als = new AsyncLocalStorage();
    Promise.resolve()
      .then(() => als.enterWith("inner nested async"))
      .then(() => console.log(als.getStore())) // undefined 👍 
    console.log(als.getStore()); // undefined 👍 
  2. async/await with unrelated ALS stuff happening ✅
    const { AsyncLocalStorage } = require("async_hooks");
    const als = new AsyncLocalStorage();
    const unrelatedAls = new AsyncLocalStorage();
    unrelatedAls.run("unrelated", () => {}); // Could also be in `Promise.resolve().then(() => { ... });`
    async function main() {
      await Promise.resolve().then(() => als.enterWith("inner nested async"));
      console.log(als.getStore()); // undefined 👍 
    }
    main().then(() => console.log(als.getStore())); // undefined 👍
    console.log(als.getStore()); // undefined 👍 
  3. async/await without any other ALS stuff happening ❌
    const { AsyncLocalStorage } = require("async_hooks");
    const als = new AsyncLocalStorage();
    async function main() {
      await Promise.resolve().then(() => als.enterWith("inner nested async"));
      console.log(als.getStore()); // "inner nested async" 😱
    }
    main().then(() => console.log(als.getStore())); // undefined 👍
    console.log(als.getStore()); // undefined 👍

Because the behavior changes depending on seemingly unrelated ALS stuff, in Jest land this can manifest itself as test suites that pass/fail depending on whether they are run in isolation.

github-actions[bot] commented 5 months ago

It seems there has been no activity on this issue for a while, and it is being closed in 30 days. If you believe this issue should remain open, please leave a comment. If you need further assistance or have questions, you can also search for similar issues on Stack Overflow. Make sure to look at the README file for the most updated links.

github-actions[bot] commented 4 months ago

It seems there has been no activity on this issue for a while, and it is being closed. If you believe this issue should remain open, please leave a comment. If you need further assistance or have questions, you can also search for similar issues on Stack Overflow. Make sure to look at the README file for the most updated links.