How CLS should work with promises

overlookmotel commented 8 years ago

With AsyncWrap taking shape, it feels like it could be a good time to determine how CLS (or a CLS-like module that uses AsyncWrap) should work with promises.

After delving into how promises and CLS interact, I've come to the conclusion that there are 3 different ways in which a CLS/Promise shim can work.

All three approaches have their own logic, and they're all incompatible with each other.

It's not clear which is the "correct" way. The behaviour of native promises with CLS (through the shim provided by async-listener) follows one convention, cls-q and cls-bluebird follow another.

The 3 conventions

Here are the 3 different approaches:

Convention 1: Callback style

This is the behavior of native JS Promises.

The context at the end of the last .then() callback is maintained for the next .then() callback. Where and when the .then() is added to the promise chain is irrelevant.

For CLS purposes, the following are treated the same:

fs.readFile('foo.txt', function(text) {
    console.log(text);
});

fs.readFile('foo.txt').then(function(text) {
    console.log(text);
});

i.e. promises are essentially sugar for callbacks, rather than a distinct syntax with different behavior.

If the code inside a .then() callback loses CLS context (due to using a queue or similar), then the shim would NOT correct this.

On the positive side, it allows a CLS context to be created within a .then() callback and the rest of the promise chain that follows runs within that context. This could be useful e.g. for middleware.

Promise.resolve().then(function() {
    return new Promise(function(resolve) {
        ns.run(function() {
            ns.set('foo', 123);
            resolve();
        });
    });
}).then(function() {
    console.log(ns.get('foo')); // prints 123
});

Convention 2: Follow promise chain

CLS context is set at the time of the promise's creation. Any promises which chain on from another promise inherit the same context.

This is the same as (1) except:

If a .then() callback loses context, context is restored for the next .then() callback
If a new CLS context is created within a .then() callback, it is NOT maintained for the next .then() in the promise chain

ns.run(function() {
    ns.set('foo', 123);
    Promise.resolve().then(function() {
        return loseContext(); // returns a promise, but loses CLS context
    }).then(function() {
        // original CLS context has been restored
        console.log(ns.get('foo')); // prints 123
    });
});

var promise;
ns.run(function() {
    ns.set('foo', 123);
    promise = Promise.resolve();
});

ns.run(function() {
    ns.set('foo', 456);
    promise.then(function() {
        console.log(ns.get('foo')); // prints 123
    });
});

Convention 3: Listener attachment context

CLS context for execution of .then() callback is defined at time .then() is called. This is not necessarily the same context as the previous promise in the chain.

Similarly to (2), if a .then() callback loses context, this doesn't affect context for the next .then() in the chain.

This appears to be the convention followed by cls-q and cls-bluebird.

var promise;
ns.run(function() {
    ns.set('foo', 123);
    promise = Promise.resolve();
});

ns.run(function() {
    ns.set('foo', 456);
    promise.then(function() {
        console.log(ns.get('foo')); // prints 456
    });
});

Difference between the three

The following code demonstrates the difference between the 3 conventions. It will log "This Promise implementation follows convention X", where X depends on which approach the promise shim takes.

var promise;
ns.run(function() {
    ns.set('test', 2);
    promise = new Promise(function(resolve) {
        ns.run(function() {
            ns.set('test', 1);
            resolve();
        });
    });
});

ns.run(function() {
    ns.set('test', 3);
    promise.then(function() {
        console.log('This Promise implementation follows convention ' + ns.get('test'));
    });
});

NB With native JS promises you get "This Promise implementation follows convention 1". With cls-q or cls-bluebird you get "This Promise implementation follows convention 3".

Which way is best?

I think this is debatable. It depends on how you conceptualize promises and the control flow they represent.

Convention 1 is the simplest and isn't opinionated about what a promise control flow represents.

Native JS Promises follow this convention, so there's an argument other promise shims should follow the same convention to avoid confusion.

This doesn't cover the common use case of patching where a library like redis loses CLS context within it. However, there's a strong separation of concerns argument that a shim for a promise library should just shim the promise library. If another library loses CLS context, then that library should be shimmed. i.e. solve the problem that redis loses context with cls-redis not cls-bluebird!

Convention 2 conceptualizes a promise chain as a set of connected actions.

Imagine multiple tasks running in parallel, each composed of multiple steps e.g. read a file, transform it, write it out again. Each task run is represented by a promise chain.

Now if you want to add an extra step to each of the tasks (e.g. notify a server when task is done), you'd add an extra .then() to the end of the promise chain for each task. You would expect each callback to run in the CLS context for that task.

Convention 3 conceptualizes a promise chain as a queue.

Imagine a resource which can only be accessed by one process at a time. The queue for access is represented by a promise. When a process finishes accessing the resource, it resolves the promise and the next in the queue (next promise in the chain) then starts up. If you want access to the resource, you add a .then() to the promise chain.

If a running task (e.g. serving an HTTP request), gets the resource and then continues on with other things, you would expect promises chained on after accessing the resource to execute in the CLS context of the task, NOT the context of the preceding item in the resource queue.

function Resource() {
    this.promise = Promise.resolve();
}

Resource.prototype.read = function() {
    this.promise = this.promise.then(function() {
        return fs.readFileAsync('/path/to/resource'); // NB returns promise
    });
    return this.promise;
};

var resource = new Resource();

// somewhere else in code
function doTheDo() {
    return resource.read().then(function(resourceContent) {
        // do something with the resource's content
    });
}

Conclusion

I'm not pushing for one convention over another. I just thought it'd be useful to lay out what I think are the 3 different choices and their implications.

What I do suggest is that if there's some consensus on which convention is best, this be set out in a set of tests, so everyone can be sure that the cls-promise implementation they're using is compliant.

It would also clear up what's a bit of an ambiguity - there's been some confusion for example here: https://github.com/TimBeyer/cls-bluebird/pull/1#issuecomment-47889243.

I've made a start on a test suite here: https://github.com/overlookmotel/cls-bluebird-test

Anyone have any thoughts on this?

Qard commented 8 years ago

The important note here is that all of these stages of the promise lifecycle are important.

To me, there are 5 points in the lifecycle of any call which produces an async task, including promises, which one might want to link context between in different ways:

The start of a call which produces an async task
The end of a call which produces an async task
The point when a callback is assigned to an async task
The start of a callback for an async task
The end of a callback for an async task

Depending on the style of API (callbacks vs promises) points 2 and 3 may switch places. For repeatable actions (like http.createServer(...) receiving requests), 4 and 5 may trigger many times.

Rather than picking one source of context, I'd rather see a way to allow the source to be configurable. That could be by allowing the user to define app-wide how a given context tree should be nested, or it could be done by producing a traversable structure that can be used to walk back through the tree to where one expects a particular bit of data to be. It may also be possible to combine the two somehow, with a hinting API for the call sites to individually describe how their child tasks should navigate the tree.

It's a complicated problem which I feel CLS may be ill-equipped to solve in its current state.

Jeff-Lewis commented 8 years ago

I would love to hear @ofrobots's thoughts.

parasyte commented 8 years ago

Followup comment from conversation in #66.

Promises present an interesting pattern to async programming in JavaScript, but they are quite distinct from similarly named synchronization primitives in other languages. In all honesty, I didn't "get" the point of promises until I came across a few different articles by notable authors:

You're Missing The Point of Promises by Promises/A+ editor Domenic Denicola
A series of articles on Promises by Kyle Simpson

The first thing to note, as pointed out by @domenic is that promises, for better or worse, are functional in nature, not imperative. He explains it like this (emphasis his own):

then is not a mechanism for attaching callbacks to an aggregate collection. It’s a mechanism for applying a transformation to a promise, and yielding a new promise from that transformation.

The second important point covered by both authors is that a promise, once resolved or rejected, is immutable. The state of that promise may never change. It follows from this logic that whatever state exists in the CLS context at the time of a promise's resolution is a part of that promise's immutable state.

Given this establishment of what a promise is and the example above under the section titled Difference between the three, I'm confident that "convention 1" is the only one of the three that fits with the goals of promises.

The reasons these authors (and others like them) are so passionate about educating developers on promises, are the same reasons held by everyone in this thread; to establish a single interpretation of "safe" asynchronous programming in JavaScript. The safety I refer to is the guarantee of immutability, and ability to reason about asynchronous code.

The latter two conventions would make it very difficult to reason about the state of a promise, given its CLS context is not captured at resolution time. Even if they are attempts to resolve the context-loss problem... But the closing statement on "convention 1" above describes the proper solution better than I could:

If another library loses CLS context, then that library should be shimmed. i.e. solve the problem that redis loses context with cls-redis not cls-bluebird!

There are also a few example use cases provided above. I won't go into detail on any of these. I might propose, however, that promises could be the wrong tool for approaching those problems. You don't want to implement a non-terminating consumer for a message queue with promises, for example (something I foolishly tried the other day while sleep deprived). The functional nature of promises does not work correctly with the consumer pattern. You're better off with an event listener pattern! (But you can still use promises within your event handlers.)

FWIW, my two cents. :+1:

Qard commented 8 years ago

Agreed, convention 1 feels most natural to me too.

domenic commented 8 years ago

Convention 3 seems way more correct. That is what we are doing for zones.

domenic commented 8 years ago

This seems very unrelated to promises, by the way. It's more about the semantics of CLS and similar abstractions. At least for zones, the semantics are that no matter what mechanism you use to schedule your code later---setTimeout, promises, events---you want to have the context inside the scheduled code be the same as the context that performed the scheduling. This is essential to the nature of asynchronous context propagation in all cases I've seen so far. It also is the only thing that makes sense with async-await:

// context is c1
await promise;
// context must still be c1

Under convention 1 or 2, context could in some cases no longer be c1, which is clearly wrong.

Perhaps an analogy with thread-local storage would make it clearer. Let's say that you had the ability to resolve a promise from a background thread. (In fact, this is done all over the place in the web platform.) Should code in the foreground thread that reacts to that promise suddenly be using the thread-local storage of the background thread? Clearly not. My understanding is that convention 2 would make it always use the background thread context, whereas convention 1 would make it do so under certain weird circumstances.

Again, in all cases context propagation should not depend on the scheduling mechanism, but should preserve the invariant that the context propagates from the scheduling code to the code that is scheduled.

(My understanding of domains is that failing to enforce this invariant is one of their bigger failures; they do something weird for event emitters where event emitters carry a domain with them, which overrides the zone in which the event handler is called, instead of allowing the domain to propagate from the scheduling code to the event handler code.)

overlookmotel commented 8 years ago

@parasyte By the way, the link to @domenic's article is broken in your comment. Could you edit the link please? It appears to be this article you were referring to: https://blog.domenic.me/youre-missing-the-point-of-promises/

ofrobots commented 8 years ago

+1 to the comments from Domenic. I would like to add that convention 3 leads to better module interoperability.

I like to conceptualize what CLS is trying to do by considering async stack traces. Imagine I am using module that returns me a promise:

const cowsComeHome = require('cows').comeHome(); // a promise of cows coming home

function foo() {
    cowsComeHome.then(() => {
        // what's my context?
    });
}

From my code's point of view, the async stack trace ought to have foo on it. A third party module shouldn't affect my context. Convention 3 makes third party modules work better together.

overlookmotel commented 8 years ago

@trevnorris Do you have any opinion on this?

Looks like async-listener uses convention 1 in its patching of Promise (though I'm not sure if that was also in your patch to node core, or introduced by @othiym23 in his polyfill).

othiym23 commented 8 years ago

Patching Promise was introduced by @hayes. @trevnorris's core API preceded landing ES Promises in V8 (and the concomitant challenge of handling the new microtask queue).

overlookmotel commented 8 years ago

@othiym23 Thanks for the info. It's quite hard to trace the chain of events with async listener landing and then not landing in node core.

@hayes Would you mind giving your thoughts on this debate?

parasyte commented 8 years ago

@overlookmotel Fixed. Thanks for the heads up.

hayes commented 8 years ago

I'm on my phone in a car, so please excuse any typos. I had alot of conversations about the pros and cons of option 1 vs 3, and at the time option 3 seemed like it made the most sense when considering all the consumers of async-listener, but CLS may not share all the same concerns.

The most compelling reason we had for choosing 1 over 3 was for libraries doing some sort of monitoring or introspection (async stack traces, New Relic, etc). In those cases, the thing that holds up a promise from being fulfilled is still relevant after it has been fulfilled.

Option 1 provides causality at a fairly low level, option 3 seems to be much closer to capturing user intent.

function login(id) {
  dB.first({id}).from('users').then(setLastLogin)
}

With option 1, setLastLogin was called as a result of getting the user from the database, with option 3 it was called because login was called.

In the context of a library used for state management in applications (as opposed to monitoring tools) I do mostly agree with @domenic and async functions are a very compelling argument for using option 3.

Basically with option 1 callbacks are continually nested inside eachother, while with option 3, the "stack" is reset for each then. option 3 is most inline with async/await, and when compared to how a sync equivalent would work, the call stack would be reset between each call the await/then.

I still think option 1 makes sense in some contexts, especially monitoring, but I don't think that is the majority use case for CLS.

parasyte commented 8 years ago

I have to agree that option 3 is required for async-await.

I argue that the coroutine-style async paradigm provided by await has vastly different use cases from promises; The former is good at hiding the complexity of callbacks entirely, whilst maintaining context in a traditional call stack model; The latter is useful for transforming the context via middleware.

domenic commented 8 years ago

I disagree strongly that async/await and promises are meant to be used differently. My article linked previously ("You're Missing the Point of Promises") tries to make this clear.

Qard commented 8 years ago

This module was originally created for the monitoring purpose, and it's even called continuation-local-storage, implying continued availability of context data into the future. Maybe there needs to be another module called context-local-storage, or something like that, which implements the third style?

domenic commented 8 years ago

I disagree that anything but 3 correctly gives correct continual availabilityof context data into the future.

Qard commented 8 years ago

Unless I'm misunderstanding your thinking, the third approach would mean that data added within the processing of a yielded promise would not be available in the context that did the yielding, after the yield completes.

This is exactly what is needed for the monitoring use. It's less of a compartmentalized context tree and more of a sequence of transformations over time to one greater context. In most application performance monitoring cases, the "greater context" is an entire request.

hayes commented 8 years ago

Not exactly, contexts are nested, contexts that are active before a yielded promise would still be active insides the context that is yielded to, and data can be added to those contexts. New contexts created inside a yielded promise would no longer be active after the promise is resolved.

But that's kinda besides the point, option 3 is definitely wrong for monitoring, which was the initial intent of this library. I think your point about the name/purpose of this library being continuation rather than context is a good one. If we reframe things in terms of continuation, a handler for a promise is a continuation from the call site of the call to then/catch.

This makes the definition of the 5 events you mentioned earlier pretty clear. With option 3, the definitions are less obvious, especially when dealing with adding handlers to resolved promises. This can lead to very unexpected contexts code that cache results in promises.

Personally I've been very frustrated with the behavior of both option 1 and 3 for monitoring, depending on how promises were being used in the application. I think the question of how to handle the adding of handlers to resolved promises is important. Option 3 handles this case in a very predictable way, but I do see a lot valid reasons to use option 1 as well.

On Sun, Jun 19, 2016, 7:22 PM Stephen Belanger notifications@github.com wrote:

Unless I'm misunderstanding your thinking, the third approach would mean that data added within the processing of a yielded promise would not be available in the context that did the yielding, after the yield completes.

This is exactly what is needed for the monitoring use. It's less of a compartmentalized context tree and more of a sequence of transformations over time to one greater context. In most application performance monitoring cases, the "greater context" is an entire request.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/othiym23/node-continuation-local-storage/issues/64#issuecomment-227037851, or mute the thread https://github.com/notifications/unsubscribe/AAx1j9GSEz1axfwbJiI7D0QmBvvToOonks5qNflsgaJpZM4Icj3S .

Qard commented 8 years ago

Maybe it'd make sense to store references to the state snapshot at each of the transition points in the async task lifetime, allowing the relevant snapshot to be accessed manually, if you want to look at something different than what the default nesting resolution produces?

hayes commented 8 years ago

It's an interesting idea, I'm not sure what that would look like. It seems like it would require essentially doing both, but adding the option 3 behavior on top of option 1 is fairly trivial. Figuring out how to store and access the contexts sounds more complicated. I wonder if it would be possible instead to provide a continuation strategy that could be used with various promise instrumentations

On Sun, Jun 19, 2016, 9:22 PM Stephen Belanger notifications@github.com wrote:

Maybe it'd make sense to store references to the state snapshot at each of the transition points in the async task lifetime, allowing the relevant snapshot to be accessed manually, if you want to look at something different than what the default nesting resolution produces?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/othiym23/node-continuation-local-storage/issues/64#issuecomment-227047782, or mute the thread https://github.com/notifications/unsubscribe/AAx1j_xpRNyQATbhrp9HEJLiZNuC-T5vks5qNhWRgaJpZM4Icj3S .

Qard commented 8 years ago

That could be possible too--adding a strategy option to the namespace constructor.

idanfelz commented 7 years ago

All my application is written in promises chain adn some callback style. i need the convention 1. Currently, there is some workaround for wotking with native promise? without bind every callback to the 'then' and 'catch' manually... thanks.

reggi commented 6 years ago

What's the status of this question? Is this still unresolved?

overlookmotel commented 6 years ago

@reggi Not resolved, but I suppose now pretty irrelevant.

I went for convention 3 in my re-write of cls-bluebird. But from my understanding, async_hooks follows convention 1.

This implementation of CLS is likely to be retired in favour of a new implementation of CLS based on async_hooks, so I guess convention 1 has won the day.

reggi commented 6 years ago

Just finished watching this video by Thomas Watson on Instrumenting Node.js in Production, in the video Thomas goes deep into how the [async_hooks](https://nodejs.org/api/async_hooks.html) (then async_wrap) api works.

Given that async_hooks is a feature of these later versions of node, is there any hope for a polyfill that will work with older versions of Node? Will babel's async / await & bluebird Promise ever work with CLS on Node 4 or 6?

othiym23 commented 6 years ago

I would love to see CLS brought up to date with modern versions of Node. async_hooks is on its way out of being experimental, so the time is right to rebase CLS on that. A version that's based on async-listener should definitely stick around for a while yet, because there are still LTS versions of Node that don't include the finished version of the async_hooks API. I'd also like to see support for Promises (both Bluebird and native) incorporated into CLS proper. Promises are very widely used now, and CLS is much less useful because it doesn't handle them properly!

However, somebody needs do the work necessary to add (and debug, which is the hard part) that functionality, and it's probably not going to be me. My interests are elsewhere at the moment, and I really don't have the resources necessary to deal with the tricky and technical work required. I'm happy to review PRs and hand out maintainer bits to motivated contributors, but that's about the limits of what I think I can do.

Not all of the work is on the CLS side, either. If you're interested, you should take a look at the results in https://github.com/nodejs/benchmarking/issues/181 – right now, the combination of async_hooks and Promises is pretty costly, and for applications like APM, or for frameworks that make heavy use of async / await using the two together could be prohibitively expensive.

othiym23 / node-continuation-local-storage