tc39 / proposal-async-context

Async Context for JavaScript
https://tc39.es/proposal-async-context/
Creative Commons Zero v1.0 Universal
588 stars 14 forks source link

Reducing nested callbacks #60

Open denk0403 opened 1 year ago

denk0403 commented 1 year ago

This is my first time going through this proposal, and as I was looking at the following example, I couldn't help but feel like I was in callback hell again:

const asyncVar = new AsyncContext.Variable();

// Sets the current value to 'top', and executes the `main` function.
asyncVar.run("top", main);

function main() {
  // AsyncContext.Variable is maintained through other platform queueing.
  setTimeout(() => {
    console.log(asyncVar.get()); // => 'top'

    asyncVar.run("A", () => {
      console.log(asyncVar.get()); // => 'A'

      setTimeout(() => {
        console.log(asyncVar.get()); // => 'A'
      }, randomTimeout());
    });
  }, randomTimeout());

  // AsyncContext.Variable runs can be nested.
  asyncVar.run("B", () => {
    console.log(asyncVar.get()); // => 'B'

    setTimeout(() => {
      console.log(asyncVar.get()); // => 'B'
    }, randomTimeout());
  });
}

I understand that this is probably not a practical example, but regardless, the proposal only offers a callback-style API. So it got me thinking about how we could flatten these callbacks, and I realized that there is an opportunity to interoperate with the Explicit Resource Management proposal. Specifically, AsyncContext.Variable instances could also have a set() method which returns a Promise that resolves to a disposable resource when the context value is updated, and upon disposing the resource, the context is changed back.

In the case of the example above, we would be able to clean it up like so:

const asyncVar = new AsyncContext.Variable();

// Sets the current value to 'top', and executes the `main` function.
asyncVar.run("top", main);

function main() {
  // AsyncContext.Variable is maintained through other platform queueing.
  setTimeout(() => {
    console.log(asyncVar.get()); // => 'top'

    using _ = await asyncVar.set("A");
    console.log(asyncVar.get()); // => 'A'

    setTimeout(() => {
      console.log(asyncVar.get()); // => 'A'
    }, randomTimeout());
  }, randomTimeout());

  // AsyncContext.Variable runs can be nested.
  using _ = await asyncVar.set("B");
  console.log(asyncVar.get()); // => 'B'

  setTimeout(() => {
    console.log(asyncVar.get()); // => 'B'
  }, randomTimeout());
}

To be clear, the lifetime of an AsyncContext value using set() isn't quite semantically equivalent to using run(), but this idea is more about readability and exposing interoperability with other patterns.

It's a pretty nitpicky suggestion, but I'd be interested in hearing what other people think about it.

bathos commented 1 year ago

I understand that this is probably not a practical example

In a real example, the callbacks would typically be input to an API responsible for the invocation of those callbacks in some way (event handlers, timer scheduling, etc), hence the need for managing their “async contexts” at all, right? For inline code, AsyncContext.Variable already exists — “LexicalScope.Variable”.

let asyncVar;

// Sets the current value [of the lexical binding] to 'top', and executes the `main` function.
asyncVar = 'top';
main();

function main() {
  // [lexical scope] is maintained through other platform queueing.
  setTimeout(() => {
    console.log(asyncVar); // => 'top'

    {
      let asyncVar = 'A';
      console.log(asyncVar); // => 'A'

      setTimeout(() => {
        console.log(asyncVar); // => 'A'
      }, randomTimeout());
    }
  }, randomTimeout());

  // [lexical scopes] can be nested.
  {
    let asyncVar = 'B';
    console.log(asyncVar); // => 'B'

    setTimeout(() => {
      console.log(asyncVar); // => 'B'
    }, randomTimeout());
  }
}

At least that’s my understanding — I could be missing something, but my impression was that “it’s about callbacks” was pretty intrinsic to the motivations for the API.

denk0403 commented 1 year ago

For inline code, AsyncContext.Variable already exists — “LexicalScope.Variable”.

Yes, the fully inline case is trivial and not something that we can't already do with lexically-scoped variables. However, in other cases, AsyncContexts aren't equivalent to lexically-scoped variables, in that they still behave as contexts which can bypass argument drilling for deeply nested (async and non-async) functions. And ultimately my suggestion is about preserving that ability while reducing the callbacks necessary to update the context values.

Even the use cases doc provides a sync implementation for run() that (if you squint your eyes a bit) looks a lot like the pattern the explicit resource management aims to reduce:

`run()` Sync generators (w/o explicit resource management)
```js function run(id: string, cb: () => T) { let prevId = currentId; try { currentId = id; return cb(); } finally { currentId = prevId; } } ``` ```js function * g() { // critical resource const handle = acquireFileHandle(); try { ... } finally { handle.release(); // cleanup } } ```
jridgewell commented 1 year ago

We had actually discussed using using bindings and allowing general mutability of the context a while ago, but decided against it. The big problem I have with this is that if you miss the using (or the manual dispose at the end of the block), you've now leaked your data into other execution contexts. With the callback API, that's impossible, you always have a correct execution context that will be cleaned on exit.

denk0403 commented 1 year ago

We had actually discussed using using bindings and allowing general mutability of the context a while ago, but decided against it.

Thank you for the link to that discussion, it was a very interesting read.

The big problem I have with this is that if you miss the using (or the manual dispose at the end of the block), you've now leaked your data into other execution contexts.

That's a valid point. It's certainly a worse kind of error to leak data between execution contexts than it is to forget to close/release some resource. However, another issue of the callback-style API lies in the use and setting of multiple async contexts at a time. My understanding of the current API is that multiple contexts can only be set at once by daisy chaining run() calls like so:

{
  const ctxA = new AsyncContext.Variable();
  const ctxB = new AsyncContext.Variable();
  const ctxC = new AsyncContext.Variable();

  ctxA.run(1, () => {
    return ctxB.run(2, () => {
      return ctxC.run(3, sum);
    });
  }); // 6
}

That's a bit of an eyesore in my opinion. With using bindings, we could simplify that daisy chain to just:

{
  const ctxA = new AsyncContext.Variable();
  const ctxB = new AsyncContext.Variable();
  const ctxC = new AsyncContext.Variable();

  using _a = ctxA.set(1);
  using _b = ctxB.set(2);
  using _c = ctxC.set(3);
  sum(); // 6
}

However, in hindsight, even the discards look out of place, and as you mentioned, accidentally leaking data is too risky. An obvious workaround would be to combine those contexts into one object, but it doesn't always make sense to group data. So perhaps the better solution is to actually provide some API to run() multiple contexts simultaneously. Perhaps a static compose() method:

namespace AsyncContext {
  export declare function compose<F extends (...args: unknown[]) => unknown>(
    assignments: [AsyncContext.Variable<unknown>, unknown][],
    fn: F,
    ...args: Parameters<F>
  ): ReturnType<F>;
}

AsyncContext.compose(
  [
    [ctxA, 1],
    [ctxB, 2],
    [ctxC, 3],
  ],
  sum
); // 6;

Such a function has the advantage that the internal implementation could optimize away the callbacks and instead manipulate the contexts directly, leading to better memory usage.

olee commented 8 months ago

Imho all of this can be solved in library code which actually makes use of the async context. Also, I think that many libraries making use of this feature would only ever create a single AsyncContext.Variable and store an object or whatever with context data within it.

An example I could think of would be a middleware like to following to provide information about the request. That would allow a generic logger to use this information without passing all of this along to any deeply nested code:

interface TraceInfo {
    traceId: string;
    userId: string;
    url: string;
}

const traceVar = new AsyncContext.Variable<TraceInfo>();

// export helper to get current trace info
export function getTraceInfo() {
    return traceVar.get();
}

export async function traceInfoMiddleware(ctx: Context, next: NextFunction) {
    // Collect tracing information
    const traceId = createRandomTraceId();
    const traceInfo: TraceInfo = {
        traceId,
        userId: ctx.user.id,
        url: ctx.req.url,
    };
    // Run next middleware with trace info in async context now
    await traceVar.run(traceInfo, next);
}

That is also probably the #1 feature I'm dying to use this for once it becomes available 🚀

Edit: This could probably serve as a solution if you really need to set many variables at once:

export function composeAsyncVariables<T>(cb: () => T, ...variables: [AsyncContext.Variable<any>, any][]): T {
    if (variables.length === 0) {
        return cb();
    } else {
        const [variable, value] = variables[0];
        return variable.run(value, () => composeAsyncVariables(cb, ...variables.slice(1)));
    }
}

await composeAsyncVariables(
    next,
    [var1, value1],
    [var2, value2],
);
Qard commented 8 months ago

It's worth calling out that different parts of the data a user wants stored will likely need to change at different parts of the graph. For example, in tracing we likely have a single trace id stored for a request and all descending async activity, however we would also want to store a span id from the start to the end of a particular interesting execution we want to draw a span around. A span may contain many inner behaviours. If we just stuff everything in one big object we have to reallocate a copy of other things like the trace id reference, which has not changed, every time we change the span id. We would actually prefer to have different stores for different singular purposes.

We actually also have many internal things we store for internal visibility, so as it is currently we'd be doing a lot of object rebuilds per request if we tried to fit everything in one object. Whereas if we store each of our differently propagating pieces of data in their own separate stores most of them can just reuse the same store value as the outer async context rather than doing a fresh run and building a new state all the time.

legendecas commented 7 months ago

The big problem I have with this is that if you miss the using (or the manual dispose at the end of the block), you've now leaked your data into other execution contexts.

I am prototyping adding using declaration support to AsyncLocalStorage: https://github.com/nodejs/node/pull/52065. To address unexpected declarations without using, a leakage check is performed when a task (AsyncResource in Node.js) callback is finished. If there is any active disposable async local storage value, an error is emitted to hint at such misuse.

shicks commented 7 months ago

I understand that this was discussed in yesterday's meeting, and the decision was that it could be added later as a follow-on. I want to caution that this would likely be a very difficult follow-on to polyfill.

Consider a situation in which the engine has a working AsyncContext implementation, but does not support this usage pattern. It's not clear that one could polyfill this short of abandoning the entire built-in implementation in favor of a brand new polyfill, which isn't cheap. Worse, it may be difficult to determine statically whether the extra functionality is needed at all (i.e. just seeing using _ = someName.set(value) isn't a given without following the reference to someName, possibly in a different file, and determining its type).

Given this difficulty, it seems to me that if we think it likely that this will land eventually, then there may be good reasons to do it now rather than later.

olee commented 7 months ago

@Qard

It's worth calling out that different parts of the data a user wants stored will likely need to change at different parts of the graph. For example, in tracing we likely have a single trace id stored for a request and all descending async activity, however we would also want to store a span id from the start to the end of a particular interesting execution we want to draw a span around. A span may contain many inner behaviours. If we just stuff everything in one big object we have to reallocate a copy of other things like the trace id reference, which has not changed, every time we change the span id. We would actually prefer to have different stores for different singular purposes.

We actually also have many internal things we store for internal visibility, so as it is currently we'd be doing a lot of object rebuilds per request if we tried to fit everything in one object. Whereas if we store each of our differently propagating pieces of data in their own separate stores most of them can just reuse the same store value as the outer async context rather than doing a fresh run and building a new state all the time.

I totally understand your concerns. But even in such cases - where additional properties might be needed in some spans - I still believe it would be less costly to just create a clone of the state object with additional properties than running many async contexts together, considering the amount of work that the JS engine has to perform behind the scenes. Compared to that a simple { ...currentContext, spans: [..currentContext.spans, newSpan ] } feels both easier to work with and I think would probably be even more performant or pretty much equally as fast as setting multiple async contexts. There's also the cost of not only setting multiple async variables, but also retrieving multiple of them to get all required values.

Qard commented 7 months ago

It's definitely not less costly. I have measured this extensively at Datadog through various permutations of AsyncLocalStorage designs, including with heavy edits to V8.

A trace id is set once per request. A span id is typically set a few hundred to a thousand times per request, sometimes tens of thousands in extreme cases. It's not uncommon for context to be propagated across async barriers a hundred thousand or so times per request. I have seen well over a million async transitions in a single request before.

The present state of Node.js is effectively doing an object rebuild on every transition, and the performance is...not great. Doing a map clone and edit on-change in JS and then only passing through the frame reference otherwise is about 15x faster. In my experiments with ContinuationPreservedEmbedderData in V8 it can be made an order of magnitude faster with a native map that can be directly memory-cloned and only the changed slot modified. The closer that separation of differently-timed changing data lives to the metal, the faster it is.

This is roughly what we've tried to achieve with the AsyncContextFrame concept used in Cloudflare and my in-progress rewrite of AsyncLocalStorage in Node.js. It spawned from a need to heavily prioritize performance of propagating data over performance of altering data, since most context data changes far less frequently than it propagates, while still having reasonable performance for the things which do change somewhat often.

An AsyncContextFrame is only built when a store.run(...) happens. Otherwise, the frame value is passed through directly by-reference. By moving the separation machinery to a lower level rather than tracking it at object level, it enabled the optimization of propagation by-reference while at the same time also making it even more desirable to have separate stores as it reduced the frequency that data needed to change for the context management to flow.

It's not as optimal as it could be as implemented in JS it can't do the memory-cloned map optimization, but even with rebuilding the map it still managed to be much faster than the object clone version.

jridgewell commented 7 months ago

An interesting case is generators:

function* gen() {
  using _ = storage.disposable(value);

  yield 1;

  // what context is restored here?
}

Of course we'd want value restored into the context, but how would that work? We'd have to mutate the generator's internal [[GeneratorAsyncContextSnapshot]] field to point to the new mapping, because we can't modify the old one directly (something else may have snapshotted it). Or we'd have to re-enter the disposable every time the generator resumed so that it could push the modified mapping onto the stack again. Either way it's messy.

olee commented 7 months ago

That's @Qard, that's some very interesting insights which I didn't expect.

littledan commented 7 months ago

@jridgewell I don't understand your question; why would there be any mutation?

jridgewell commented 6 months ago

Because I'm still within the same scope, the using binding isn't discarded at the end of a yield. Why wouldn't you expect value to be the current value?

jridgewell commented 6 months ago

Related: https://github.com/rbuckton/proposal-using-enforcement/issues/1

mhofman commented 6 months ago

I think avoiding nesting can be solved without explicit resource management if we're willing to require some setup for the functions that want to switch within their invocation.

const varA = new AsyncContext.Variable();
const varB = new AsyncContext.Variable();

const doStuff = AsyncContext.bindVariables((setVariable, foo) => {
  setVariable(varA, 123);
  setVariable(varB, foo);

  someOtherCall();
});

doStuff("foo");
jridgewell commented 6 months ago

I think that achieves the same 1-layer-nesting that runAll does.

mhofman commented 6 months ago

I think that achieves the same 1-layer-nesting that runAll does.

Right, but with the additional ability to conditionally switch variables during the execution of the function. Not sure if that's a use case, to be honest I'm fuzzy on the motivations.

shicks commented 6 months ago

It's not clear to me that runAll really addresses the issue in the OP: in their example, there's only a single var being set at each point.

To my mind, as long as you own the context, there's not really anything too troubling about mutating a variable within that context. The problem with the disposable approach is that you may be in a shared context and thus leak your mutations somewhere unexpected.

A simple userland workaround if you happen to own the variable is to add an extra level of indirection: use a Variable<() => T> instead of a Variable<T>, and then when you run with a new value, you can hold onto a setter that only affects the context you already own. This has poor ergonomics and doesn't work for variables you don't own - but ultimately it's context ownership that should matter, not variable ownership.

@mhofman's suggestion of exposing a setter that only works in your owned context seems like a good compromise, though I suspect the API could be less confusing as a "run" rather than a "bind" (but maybe there's a performance reason I'm unaware of where the bind approach makes more sense?):

const result = v.runWithSetter((setV, boundArg1, boundArg2) => {
  // do stuff
  setV(newValue);
  // do more stuff
  return someResult;
}, arg1, arg2);

Or the batched approach

const result = AsyncContext.runWithSetter((setVariable, ...boundArgs) => {
  setVariable(varA, 123);
  setVariable(varB, 456);
  // do stuff
  setVariable(varA, 789);
  // do more stuff
  return someResult;
}, ...passedArgs);

I could also imagine a method or function decorator doing this a bit more automatically (though the changed signature might be a dealbreaker depending on how TypeScript ultimately handles that):

@bindVariables
function(setVariable, arg1, arg2) { /* ... */ }

But I agree that without a better understanding of the use case, it's hard to justify. Different approaches to solving this are easier/harder to polyfill, particularly in an incremental way. For what it's worth, I'm inclined to strictly enforce wrapping AsyncContext.Variable with a userland AsyncVar class in my own organization, in order to maintain an extra level of indirection, at least while these questions are still shaking out. This allows transparently implementing the workaround I mentioned above, and even potentially experimenting with some of these mutation APIs if there's developer demand for it.

jridgewell commented 6 months ago

@shicks

It's not clear to me that runAll really addresses the issue in the OP: in their example, there's only a single var being set at each point.

See https://github.com/tc39/proposal-async-context/issues/60#issuecomment-1632019120 for the authors intention. runAll should solve it with 1-layer of nesting, the same as runWithSetter.

suggestion of exposing a setter that only works in your owned context seems like a good compromise, though I suspect the API could be less confusing as a "run" rather than a "bind" (but maybe there's a performance reason I'm unaware of where the bind approach makes more sense?):

I don't understand why moving mutation into the callback is a better choice than having them grouped outside the callback with runAll. I think that introduces a new "is this mutable" problem that doesn't exist with scoped callbacks. runAll is a direct parallel to run, just with multiple variables done before calling the cb.

If both runAll and runWithSetter require introducing 1 layer of nesting, I'd much rather runAll.

shicks commented 6 months ago

runAll allows de-nesting when setting multiple variables at a single point in time, and can be easily accomplished in userland.

runWithSetter handles the above use case but also allows de-nesting when setting a single variable at multiple points in time:

v.run(1, () => {
  const a = doSomething();
  v.run(2, () => {
    const b = doSomethingElse();
    v.run(3, () => {
      putItAllTogether(a, b);
    });
  });
});

vs.

runWithSetter(set, () => {
  set(v, 1);
  const a = doSomething();
  set(v, 2);
  const b = doSomethingElse();
  set(v, 3);
  putItAllTogether(a, b);
});

This latter use case is not possible in userland unless you have control over the variable and are willing to add an extra layer of indirection, which likely also hurts usability. One might argue that if you don't own the variable, you shouldn't be able to mutate it - but in this case, you do own the scope in which you're mutating it, and that's at least something. It satisfies the condition that you can't affect your caller's scope, but it does have the potential downside that functions you call can no longer rely on variables being immutable across async continuations (i.e. the values can change out from under them), which may still be a concern.