nodejs / diagnostics

Node.js Diagnostics Working Group
MIT License
537 stars 78 forks source link

[async_hooks] Cannot properly propagate context in its current state #300

Closed rochdev closed 4 years ago

rochdev commented 5 years ago

As recommended by @mcollina in https://github.com/nodejs/node/issues/22360#issuecomment-491714078 I am opening a new issue here.

Currently, Async Hooks are implemented directly in Node. This means that newer constructs like async/await cannot be properly supported since they are language constructs interpreted directly by the underlying engine. The current workaround for that is to ask the maintainers of every module to implement AsyncResource. This model doesn't work as it cannot be expected that the authors of almost 1 million modules validate if they need to implement AsyncResource. This applies to application developers as well.

This issue impacts mostly APM vendors since we are by far the heaviest users of context propagation, but it also impacts anyone using Async Hooks to simply propagate arbitrary data across their application. For us it means we have to patch every single module we know of that should have implemented AsyncResource, and we find new ones pretty much every month. For developers who are using Async Hooks directly, it means the feature is simply broken and will randomly lose context depending on the dependencies of the project.

I unfortunately don't have enough information about how this feature evolved to propose a solution, but I think this should definitely be investigated as it means Async Hooks in their current state are basically broken and won't work in many unpredictable cases. In theory porting the feature to v8 could fix the issue, but now that Node supports different engines, this wouldn't work, meaning it should probably land as an ECMAScript standard instead.

Does anyone know what is the status of domenic/zones? Are there any alternatives in the meantime to make sure Async Hooks work natively with JavaScript constructs without the entire community having to validate their modules?

References:

mhdawson commented 5 years ago

Being able to propagate context is a complex and challenging problem. We have a number of APM vendors involved in the Diagnostic WG who have helped move the implementation forward and validate against their use cases.

Work on the feature is done by volunteers so progress depends on how much time people can squeeze in so the more people to help out the better.

The best way to help move async hooks towards an implementation that works for your use case is to get involved in the work on async hooks and the discussions in the Diagnostics WG.

sam-github commented 5 years ago

Given that some new JS language features (async/await) can be opaque to context tracking, I am curious if any APM vendors who support both Node.js and browser-side JS have any comments. I'd think the context tracking would be an issue in the browser, too, so the both the problem and solution will involve more technology than just Node.js.

rochdev commented 5 years ago

@mhdawson Thanks for the quick answer! I would gladly get involved and participate, but I need to understand first what the expectations for async_hooks are. As mentioned above, the current behavior will lead to constant patching being required for many existing and new modules by APM vendors, and break many applications unexpectedly.

Right now the feature is experimental, so limitations like this are understandable. I'm just not sure it would be wise to make it stable with such limitation since it doesn't work according to the expected behavior of propagating the context regardless of the dependencies used. Of course there is an exception for any kind of internal task queue since these are impossible to follow even with an implementation at the language level.

I understand that supporting this at the language level could take years for a proposal to be approved and actually land, hence this issue to discuss if anything can be done to mitigate the issue. If this discussion already happened and a consensus was reached, please let me know as well.

Do you happen to know if any work was done to try and make this an actual ECMAScript standard, or what happened to the Zones proposal? I feel like after 10 years and many different implementations like async-listener, domains, AsyncWrap, Zones and Async Hooks it should become obvious that an actual standard is necessary to have a final solution that works 100% and it might be worth the effort to move towards that.

Flarna commented 5 years ago

@rochdev AsyncHooks are not limited to async context propagation within JavaScript alone. One major point in context propagation are JS built ins like native promises and await. Here I think the engine (or even the standard) should provide a mechanism for context tracking. It would be perfect to have this in place whenever a new feature is released. This is not only helpful for APMs on Node side it's also easer to debug any application if your debugger is able to track context. Currently V8 has hooks to ensure that AsyncHooks work with Promises and await (but still improvements are possible). Thenables are missing here.

Besides the JS builtins a runtime like node does a lot I/O and AsyncHooks are used to track context across e.g. streaming from a file, TCP connections,... This happens outside of the JavaScript engine therefore something outside of the engine must link request and results. Sure the JS standard could provide API where runtimes like node tell the engine which context is executed once some native I/O event has been triggered. But as you can see every node module which is doing some I/O needs to play around with AsyncResources.

Besides that "correct" context propagation is often not that clear or depends on the use case. Consider a database driver which is using a single TCP connection. APMs usually wan't to see each db transaction and pass context across such transactions. From low level I/O point of view someone may prefer to see individual TCP packets as transactions. AsyncHooks are able to provide both, the TCP layer is in node core and implemented there, the high level database view needs to be implemented by the specific database driver (or some layer on top of it).

Sometimes the above described user space queuing issues resolved "automatically" if promises/await is used as this bundles high level requests. But there are a lot pure callback based APIs out there.

So at least I don't see a single area where to "fix" this problem once for all. One issue is for sure that AsyncHooks are still experimental and as a result modules usually don't want to use it as it may break on every release and/or they have to adapt on different APIs for different nodejs versions (even minor).

As far as I know there is also no reliable context propagation in other languages. Sure, in Java ThreadLocals often work or in .NET AsyncLocal. But once you start using some thread pool, worker queues in Java it's again not that easy/clear to track the context.

@sam-github I will try to get hold of a JS-Agent colleague to get some answers regarding their issues/solution in context propagation in browser.

Flarna commented 5 years ago

I had a short chat with the JS-Agent colleague regarding await/promises/thenables in browers. It seems that this is not yet a significant topic for them. Reason is that there are too much browsers out in the wild which do not support awaitor even promises so these are features not frequently used at the moment for compatibility reasons. Once they get used also in browsers JS-Agent will end up in similar problems as we do already now in node - but browsers have no AsyncHooks to help them.

rochdev commented 5 years ago

Thenables are missing here.

This is precisely what this issue is about. So basically what you are saying is that native promises have hooks in v8 because they are created by v8? In this case I can understand the argument that a promise created by a library should have hooks in the library. At least for the specific case of promises. But then what is the expectation for promise libraries that are pure JS and can run in any environment? Should they have different context propagation mechanisms according to each environment?

but browsers have no AsyncHooks to help them

This is definitely an issue. However, as you mentioned it probably won't be a problem for the next few years since an application must support many different browsers whereas a Node app needs to support only a single version of Node. I think it would definitely be beneficial to start the discussion as soon as possible to make sure that this is supported when all browsers start supporting async/await. I'm not super familiar with Promise Hooks, but maybe something similar could be exposed directly in JavaScript so that they could be hooked into without having to rely on C++?

mhdawson commented 5 years ago

In terms of:

Do you happen to know if any work was done to try and make this an actual ECMAScript standard, or what happened to the Zones proposal? I feel like after 10 years and many different implementations like async-listener, domains, AsyncWrap, Zones and Async Hooks it should become obvious that an actual standard is necessary to have a final solution that works 100% and it might be worth the effort to move towards that.

@mike-kaufman had some some work on defining terminology for Async context and a model that might have been suitable for going into ECMAScript but its kind of stalled out. I'm not as up to date on Zones as some of the other members but I don't think it was seen as the future solution to this issue.

pauldraper commented 4 years ago

Are there any alternatives in the meantime to make sure Async Hooks work natively with JavaScript constructs without the entire community having to validate their modules?

Short answer: no.

Every module must be modified to support a context propagation API (or have the support shimmed).

This model doesn't work as it cannot be expected that the authors of almost 1 million modules validate if they need to implement AsyncResource. This applies to application developers as well.

This is true of other languages as well. Fortunately, only a very small percentage of code deals with non-linear flow control and needs to worry about it.

There really isn't an alternative though. If you're doing arbitrarily complicated flow control, you have to let some API know how they logically relate.

there are too much browsers out in the wild which do not support await or even promises so these are features not frequently used at the moment for compatibility reasons.

async/await was officially standardized in 2016, and at least 93% of web users support it. To reduce sizes, some application ship both ES5 and ES2016+ versions. I know many applications that simply don't support IE at all.

The bigger factor IMO is that there is simply no solution for browser native async/await.

I'm not as up to date on Zones as some of the other members but I don't think it was seen as the future solution to this issue.

Zone.js works well enough, though its ES standardization has been withdrawn due to error-handling objections by Node.js team. The library has been maintained by the Angular team in the Angular repo. There was an effort to support it via async_hooks but the more advanced Zone.js API proves difficult. Possible implementation


Here's the facts

  1. The best support for context propagation is a language standard.

  2. Like other languages (e.g. Java), JavaScript lacks a language standard for context propogation.

  3. Propagation of async/await is impossible on browser platforms.

  4. Node.js platform standard for propagation is async_hooks. This is currently the closest candidate to a de facto standard. GraalVM. To my knowledge, there are no browser shims for async_hooks, though it would be possible to create some (sans native async/await support of course).

Flarna commented 4 years ago

At least the thenables part has been fixed via https://github.com/nodejs/node/pull/33778

github-actions[bot] commented 4 years ago

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

ctcpip commented 1 day ago

https://github.com/tc39/proposal-async-context