Open bmeurer opened 6 years ago
There's been a related discussion in https://github.com/nodejs/promises/issues/31 earlier this year, which yielded a couple of interesting improvements and insights.
Pinging @nodejs/performance @nodejs/async_hooks
I'd like to use this forum to come to an agreement on what the concrete use cases are (at least estimate that), which aspects of the performance matter the most (i.e. promises created inside of C++ vs. in JS land), and what are useful benchmarks to drive the performance work in a meaningful way.
Particularly about async_hooks and promises or promises in general?
For async_hooks and promises - I think a server-application benchmark could be good here - with contexts through async functions through several layers of calls - I'd then listen to promiseResolve
and write interesting things about the app while sending it "user data" and measure how long it takes. "interesting things" can be things like how many times the database is hit, request latency for different request stages etc.
For promises in general there are at least 2-3 good ideas in that thread that can get native promises finally faster than userland alternatives like bluebird. Namely skilling the microtask queue when possible and inlining calls - not to mention async iterators that aren't fast yet but could revolutionize programming in Node.js if they were.
I'd like to use this forum to come to an agreement on what the concrete use cases are (at least estimate that), which aspects of the performance matter the most (i.e. promises created inside of C++ vs. in JS land), and what are useful benchmarks to drive the performance work in a meaningful way.
Particularly about async_hooks and promises or promises in general?
Improving promise performance in general is also a good idea, but this particular issue is primarily about the impact of async_hooks
, i.e. about the performance cliff that you fall off when you enable async_hooks
.
As for promises in general and in particular async
/await
and async iterators, we will need good benchmarks as well to drive meaningful performance work. I've started to collect some ideas for promise performance improvements, but I don't feel like it's a good approach to measure our success in terms of the bluebird benchmarks.
I think @mcollina had some comments about that before. Can we derive a simple representative performance test from fastify to get things going?
Specifically, regarding PromiseHooks
and performance I would like to see:
kDestroy
is added. I think in many cases it is straightforward to know when a promise
can no longer be refered too. Thus we could emit the destroy
hook much sooner, in other cases it would be fine to wait for garbage collection, but if V8 could instrument that for us directly it would be great.
I've gotten many real-life complains from both APM providers and other async_hooks
users that they experience a "memory leak". The reality is, that there is no memory leak but that it unintuitive that the destroy
event is not emitted as soon as possible but rather at garbage collection.
I'm guessing this would improve performance a lot, as we would no longer have to wrap the promise object itself to get notified about garbage collection. Essentially we could just create an unreferenced basic object on kInit
and set then just set the asyncId
and triggerAsyncId
(two doubles) on the internal fields.
Extra feature requests that are not performance related.
Know the resolve
value in kResolve
. https://github.com/angular/zone.js/pull/795 currently needs this.
Don't break executionAsyncId()
and triggerAsyncId()
when async_hooks
is not enabled. This might be very theoretical. It would involve enabling PromiseHooks
unconditionally. If we also had kDestroy
we would only need to set two doubles on the internal fields.
/cc @addaleax
@AndreasMadsen Thanks for the feedback, that's very interesting. These seem to be more on the feature request side.
I think in many cases it is straightforward to know when a promise can no longer be refered too.
Do you have a suggestion/intuition here how to do that?
I'm guessing this would improve performance a lot, as we would no longer have to wrap the promise object itself to get notified about garbage collection. Essentially we could just create an unreferenced basic object on kInit and set then just set the asyncId and triggerAsyncId (two doubles) on the internal fields.
So you're saying that by changing the API and adding machinery to implement kDestroy
on the V8 side we should be able to improve performance significantly?
cc @gsathya @caitp
Do you have a suggestion/intuition here how to do that?
Often promises are only "referenced" once, either by .then()
chaining or they are returned and used in await
. At least for the user, it is quite obvious that the promise can't be referenced after that.
function wait(ret, ms) {
// hint: the promise is directly returned
return new Promise(function (resolve, reject) {
setTimeout(() => resolve(ret), ms);
});
}
function main() {
// hint: the promise is directly used in `await` without being assigned to a variable
await wait(1, 10) // after this line the user expects the `destroy` hook to emit
await wait(2, 10) // same for this promise
await wait(3, 10) // same for this promise
}
So you're saying that by changing the API and adding machinery to implement
kDestroy
on the V8 side we should be able to improve performance significantly?
Yes, that would be my guess. Wrapping and unwrapping the promise
object is definitely the most expensive part of our PromiseHooks
integration. Not having to do that, should improve performance.
cc @jaro-sevcik
Often promises are only "referenced" once, either by .then() chaining or they are returned and used in await. At least for the user, it is quite obvious that the promise can't be referenced after that.
If V8 could detect that with escape analysis and optimize the return value of that function to an entirely different implementation that would be awesome. Fast libraries all hack this by having the single listener case optimized - but an engine that can prove it and avoid allocating extra stuff would be awesome. In terms of promise hooks I suspect it would also simplify things a lot for the common case.
On the benchmarks side, we can also consider Hapi v17, which is completely async-await based.
The problem with real-world code is that it typically involves a database, and that makes it very hard to measure small improvements. IMHO we should measure:
I fear this would have to be a synthetic application.
I fear this would have to be a synthetic application.
This is what Doxbee did (the bluebird benchmark suite) which you're familiar with - we can fork and extend it.
@benjamingr I think we are all more concern on overall impact of async await rather than a comparative measurement between implementations. Plus, I think we should focus on overall performance including a web framework, a database and HTTP requests rather than just the promise/async await layer.
Thanks @AndreasMadsen, @benjamingr and @mcollina! This is definitely interesting.
Yes, that would be my guess. Wrapping and unwrapping the promise object is definitely the most expensive part of our PromiseHooks integration. Not having to do that, should improve performance.
This seems surprising to me. I would think that calling out to the javascript hook defined by the user from C++ to be the most expensive part. Have you tried benchmarking with and without the wrapping?
If V8 could detect that with escape analysis and optimize the return value of that function to an entirely different implementation that would be awesome.
The way escape analysis works is by figuring out if a given allocation is unobservable. But, we lose all that with PromiseHooks as the kInit
will cause the promise to escape. PromiseHooks enable more observable behavior, potentially restricting optimization.
Plus, I think we should focus on overall performance including a web framework, a database and HTTP requests rather than just the promise/async await layer.
@mcollina I agree, 💯. The hard part would be to try and get reproducible results from such a benchmark though.
@AndreasMadsen -- Can you link to a real world usage of async hooks? How are developers using async hooks? Is it for async stack traces? Is the kResolve
hook useful? How is the kDestroy
hook used? Is it useful for context propagation? PromiseHooks have a big surface area -- they give you the promise and let you do whatever. If we can find patterns in real world and benchmark these patterns, then we can start to optimize for that and go from there.
@mcollina My gut feeling is that we will need some kind of micro-benchmarks to drive individual work items. Plus real-world code to see overall impact and find appropriate areas worth looking into.
@benjamingr Escape analysis could indeed eliminate promises (sometimes), but with promise hooks that becomes impossible, since every promise already escapes/leaks to the hooks.
This seems surprising to me. I would think that calling out to the javascript hook defined by the user from C++ to be the most expensive part. Have you tried benchmarking with and without the wrapping?
I admit, I have not done a full matrix comparison. But this was the observation when we initially added the wrapping. It also depends on what hooks are used. I remember that we have some numbers for this but I can't find them.
The way escape analysis works is by figuring out if a given allocation is unobservable. But, we lose all that with PromiseHooks as the kInit will cause the promise to escape.
I can't think of any that actually use the promise
object. If we had the kDestroy
hook we could properly live without it. We would also need kInit
to allow us to return a custom id
(the asyncId
) and if PromiseHook
would carry that context information for the other events. The parentPromise
would then have to be replaced with that id.
Can you link to a real world usage of async hooks?
Is it for async stack traces?
That and continues local storage. Hopefully, we will see it being used as a profiling tool too.
Is the
kResolve
hook useful?
kResolve
is used by https://github.com/angular/zone.js/pull/795 - but it is not enough for what they want. Either they need the resolved value or an extra event.
How is the
kDestroy
hook used?
It is not implemented by PromiseHooks
so it isn't used. But it would directly replace our destroy
hook. The destroy
hook is used to clean up information that is stored in a new Map
with the asyncId
as the index value.
Is it useful for context propagation?
Very.
Is the
kResolve
hook useful?
kResolve
is used by https://github.com/angular/zone.js/pull/795 - but it is not enough for what they want. Either they need the resolved value or an extra event.
I've heard that zone.js
might be abandoned by the Angular folks. And async_hooks
are available on the Node side only, so they still need a different solution for client side. Not sure who to ask about this?
@mcollina
I think we are all more concern on overall impact of async await rather than a comparative measurement between implementations. Plus, I think we should focus on overall performance including a web framework, a database and HTTP requests rather than just the promise/async await layer.
I think what we're measuring async/await
against here (with async_hooks) is callbacks. I agree with the database assumption, I'd focus on the most common stack and write the same app in two ways (callbacks and promises) and add instrumentation (for async hooks).
I tend to agree with database/web framework and http requests - but we need to make sure those are as thin as possible so we don't end up with an implementation that is fine "for the average case" but completely breaks for apps that have a harder performance requirement.
What about a benchmark that does:
In general, it's mostly a scope question of whether Node.js should optimize for "the average app" or for "performance sensitive apps".
@bmeurer
Escape analysis could indeed eliminate promises (sometimes), but with promise hooks that becomes impossible, since every promise already escapes/leaks to the hooks.
I'm not sure I follow - so I'll clarify my request/suggestion. Escape analysis can (probably) greatly optimize most of the modern usage of promises in async/await where we're awaiting a direct invocation of an async function - and use much simpler promise code.
async function bar() {
}
(async function foo() {
// bar is also an async function, this call only has one listener and doesn't escape
// this is, presumably "the common case".
await bar();
})();
In this case, we don't actually need to allocate a real promise object - we can continue the execution of foo when bar
ends. Properties of promises like multicast and assimilation aren't "required". We don't need await
to call Promise.resolve
on its parameter since we know it's an async function or anything like that either.
If we can prove other nice properties for bar
(like that it doesn't throw) it gets even better and we can inline bar
entirely in foo
(and 'defer a microtick' since we're awaiting it). This would let us drop any allocation or book-keeping altogether.
I realize this is far from the actual implementation but the benefit is also huge and it doesn't sound "too crazy" (I hope :)).
For my specific request: when/if this is optimized, you can inline async_hooks
calls too in the above code. Instead of destroy
when a promise allocated by bar
gets GCd, there is no promise created (or even a call to bar whose code is inlined in foo) and destroy
is emitted synthetically.
@benjamingr The thing with promise hooks (as required by async_hooks
) is that we need to pass the JSPromise
object to Node, which immediately makes all escape analysis useless as TurboFan has no idea what Node does with the object and cannot perform scalar replacement of the fields.
The thing with promise hooks (as required by async_hooks) is that we need to pass the JSPromise object to Node, which immediately makes all escape analysis useless as TurboFan has no idea what Node does with the object and cannot perform scalar replacement of the fields.
@bmeurer As said, we don't actually need the JSPromise
object in node, we just need a way to pass context.
That's not how it's implemented currently. I think I need to look at the Node core side again. Don't you need to slap ids onto the JSPromise
object?
@bmeurer yes we do. If PromiseHooks
allowed us to set those ids (just one 64bit id really) in kInit
and then passed them to the other events we wouldn't need the JSPromise
object.
Our PromiseHook
code is here: https://github.com/nodejs/node/blob/master/src/async_wrap.cc#L283
@AndreasMadsen But we still need to materialize the JSPromise
object to pass it to the kInit
hook. So you already defeat the escape analysis at that point. Would it be possible to eliminate the need to pass a concrete JSPromise
object to the promise hooks completely?
Would it be possible to eliminate the need to pass a concrete JSPromise object to the promise hooks completely?
This is what I'm saying! We are only using it propagating context, so if PromiseHooks
provided a different way of doing this, such as propagating a 64bit id set in kInit
then that would work just fine.
@AndreasMadsen Staring at the PromiseHook
on the Node side, you store the PromiseWrap
object on the JSPromise
and you do check the parent JSPromise
during kInit
. So this would be a major refactoring on how this works. A few questions come to mind here:
PromiseWrap
for every JSPromise
?JSPromise
objects.@gsathya does that make sense to you as well? Do we have other users of the PromiseHook
?
Staring at the
PromiseHook
on the Node side, you store thePromiseWrap
object on theJSPromise
and you do check the parentJSPromise
duringkInit
.
Correct. The PromiseWrap
holds the ids (asyncId
and triggerAsyncId
) and help us emits the destroy event. We also send it to the user as the resource
object, where it contains two values:
resource = {
promise: JSPromise,
// lookup parent asyncId on the internal field of the parent PromiseWrap
parentId: parentJSPromise.[[PromiseWrap]].[[asyncId]]
}
So this would be a major refactoring on how this works.
On our side, it would be API breaking but I don't see a big issue here. What would be missing is the resource.promise
object but I have never seen anyone actually use that. And I don't think it is worth the performance overhead.
Can we get rid of the PromiseWrap for every JSPromise?
yes. It would require two things:
kDestroy
event.These are the only critical things we use the PromiseWrap
object for and hence the JSPromise
as well.
Can we move the id slapping logic to V8 instead? So that V8 can set these internal fields upon creating of JSPromise objects.
yes
We would need at least one id (asyncId
). We have another id (triggerAsyncId
) but we could store that in a map as the asyncId
is unique for each promise.
@bmeurer,
I've heard that zone.js might be abandoned by the Angular folks. And async_hooks are available on the Node side only, so they still need a different solution for client side. Not sure who to ask about this?
Angular current provide a NgNoopZone
to improve performance, it will not load zone.js
,but it is just an additional option when user want to fully control when to do change detection
.
currently in zone.js
, I want to know when promise
is really resolved instead of get a kResolve
callback because I want to
asynchooks
based zone.js
.zone.js
be able to use native async/await
.I am still looking for walk around, but it seems to be a v8
feature request. I also post a feature request there , https://bugs.chromium.org/p/v8/issues/detail?id=6862
@AndreasMadsen @bmeurer
So this would be a major refactoring on how this works.
On our side, it would be API breaking but I don't see a big issue here. What would be missing is the resource.promise object but I have never seen anyone actually use that. And I don't think it is worth the performance overhead.
Note that async-hooks are still experimental, so the cost of making a semver breaking change is not that high, specially if it has significant benefits. Regardless, if V8 was to implement this today, this would only get into Node 10, a semver major release. Whether or not the change can be backported to Node 8 and 9 would depend on the complexity rather than semver-ness of the impact to async-hooks API.
@bmeurer I've tried to implement a user request context using Async Hooks. I wrote about it a blog post. We used it for debugging purposes as well when we had problems to understand where exactly our bug is.
@guyguyon can you share information about the performance impact and what parts were painful in that experience?
I know the idea is to share production experience but I didnt dare to upload it to production yet :/. From our Dev experience, I'm deleting a request context using the destroy
hook. Since on some resources it will be called by the garbage collector, it caused the request context object to grow fast.
I set up a test case with the popular Bluebird and Wikipedia benchmarks (used by the V8 project), which shows that the impact of just having async_hooks
enabled (i.e. the PromiseHook enabled in V8) results in significant performance drops.
@bmeurer do you have some numbers?
The bmeurer/async-hooks-performance-impact repository has the numbers:
We've had to rollback a staged production deploy after confirming a severe performance degradation as well. Same goal as @guyguyon - creating request based contexts The numbers from @bmeurer seem to back this up.
@ruimarinho Oh that's pretty bad. Was it due to Promise heavy code? Or just old fashioned callback style? Also what kind of software did you try to deploy using async_hooks
?
@bmeurer yes - heavy Promise code (using async/await). Full stack http request server, including redis and database connections. We are going to try enabling async_hooks
without associating any other code to confirm if the issue is simply related to activating them. We used domains
which in 9.3.0+ run on top of async_hooks
. We're also using bluebird for most of the modules. /cc @kurayama.
@ruimarinho Ok, if you're using bluebird promises, that's going to be slower than native promises already by itself. async
/await
though will always use native promises.
Accidentally commented on the TSC thread instead of this one. So for future reference, I also did a test run of simple hapi
and koa
servers (using @mcollina's autocannon
tester), again with and without async_hooks
enabled to get more real-worldish numbers (the Promise benchmarks arguably really stress promises pretty heavily). The results were pretty interesting (with latest Node 9.4.0):
The koa
test is super flaky, so the performance difference could also be noise, but for hapi
, which makes heavy use of async
/await
, there's pretty consistent 30% performance drop with just an empty init
hook. See bmeurer/async-hooks-performance-impact for the benchmarks and additional information.
And to provide even more data for the discussion: The performance drop also increases with the number of hooks being used (maybe not unsurprising). For example for the Promise
benchmarks, we see additional performance regressions:
Very interesting data @bmeurer, thanks for the insights on this subject. We definitely hit the with async_hooks (all)
scenario since the current domains implementation on top of async_hooks
registers all handlers. That's more in line with what we've seen in production.
async/await though will always use native promises.
Does this hold true even even when await Promise.all([p1, p2])
where const Promise = require('bluebird')
?
Does this hold true even even when
await Promise.all([p1, p2])
whereconst Promise = require('bluebird')
?
In that case you'll probably use bluebird promises for the Promise.all
and native promises for the await
, which I guess is not what you want. The ES2017 spec forces await
and async function
to always use native promises (which I consider a good thing FWIW). I think @gsathya brought this up earlier.
Let me know whether it is not relevant and I should delete my post, but I experienced quite a bad increase in CPU usage when I activated async_hooks
(Node 8 and 11) - see https://theholyjava.wordpress.com/2018/11/01/beware-the-performance-cost-of-async_hooks-node-8/ for details.
I would recommend to check out the latest Node 8, Node 10 and Node 11, as the runtimes that are used in the blog post are quite old, or not supported anymore (Node 9). There should have been some improvement in the area.
Overall async_hooks
are very costly, especially with promises.
Thanks. I did try the latest Node 11 (the last graph in the blog post) with no difference.
On Thu, Nov 1, 2018, 11:10 AM Matteo Collina notifications@github.com wrote:
I would recommend to check out the latest Node 8, Node 10 and Node 11, as the runtimes that are used in the blog post are quite old, or not supported anymore (Node 9). There should have been some improvement in the area.
Overall async_hooks are very costly, especially with promises.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nodejs/benchmarking/issues/181#issuecomment-434994261, or mute the thread https://github.com/notifications/unsubscribe-auth/AAmJPg_nTlHJe-Qr1xFoG-rEVv8Fq2rGks5uqsilgaJpZM4REc5B .
Tried it on Node 12.2 on Windows 10 U. Still costly.
regular Bluebird-doxbee: 126 ms.
init Bluebird-doxbee: 551 ms.
full Bluebird-doxbee: 707 ms.
regular Bluebird-parallel: 170 ms.
init Bluebird-parallel: 1048 ms.
full Bluebird-parallel: 1316 ms.
regular Wikipedia: 334 ms.
init Wikipedia: 2129 ms.
full Wikipedia: 2621 ms.
regular hapiserver: 5477.2 reqs.
init hapiserver: 3266.6 reqs.
full hapiserver: 2727.3 reqs.
regular koaserver: 8231.8 reqs.
init koaserver: 7309.6 reqs.
full koaserver: 6771.1 reqs.
I would recommend to check out the latest Node 8, Node 10 and Node 11, as the runtimes that are used in the blog post are quite old, or not supported anymore (Node 9). There should have been some improvement in the area.
Overall
async_hooks
are very costly, especially with promises.
@mcollina Can you shortly explain the reasons for the substantial performance impact?
Is it possible to reduce performance impact of async_hooks in the future (in future Node.js versions)? Or the performance will be always so bad because of the nature of async_hooks and how they work?
@alekbarszczewski it is mostly blocked on people doing the product work and suggesting concrete improvements.
If anyone can provide pointers of where to start to try and fix this, and/or a tl;dr of why it's so slow, I can try to look into it.
@rochdev how much time will you have to invest? It will be a relatively big ramp up.
@rochdev my suggestion is to come to the next https://github.com/nodejs/diagnostics meeting. We are looking for a Champion to help with moving async hooks forward.
As per the calendar the next meeting is scheduled for Oct 9 (next wed) and an issue will be opened in the repo next Monday with the meeting links etc.
I've already started a related discussions on Twitter earlier here, but Twitter doesn't scale for this kind of discussion, so I thought I should move it here, so it's easier to follow the discussion. For background: @jasnell approached me at NodeConfEU in November 2017, asking for help on the performance of
async_hooks
on the V8 side, especially when it comes to the Promise lifecycle hooks. And I've talked briefly with @thlorenz about it during the conference (although I have to admit that I was using the wrong terminology and thus kinda ruined the conversation, sorry). And we had chatted briefly about this as well with @trevnorris during one of the CTC-V8 calls.Since Promise based APIs are likely becoming a (big) thing for Node 10 and at the same time there are estimates (i.e. by @ofrobots) that by the end of next year every Node server in enterprise is going to run with
async_hooks
, we should start a discussion about this early one to stay ahead of the problem before it becomes a real problem for our users.In the last year @gsathya and @caitp already spent a lot of effort optimizing promises in V8 in general, so we're in a pretty good position wrt. baseline performance!
I'd like to use this forum to come to an agreement on what the concrete use cases are (at least estimate that), which aspects of the performance matter the most (i.e. promises created inside of C++ vs. in JS land), and what are useful benchmarks to drive the performance work in a meaningful way. I think we need both a set of micro-benchmarks to drive certain, in addition to real-world code that makes heavy use of promises, i.e. koa.js or fastify servers maybe? I feel that these benchmarks are most important, otherwise we might easily sink a lot of effort into work that doesn't help real-world use cases in the end. That's why I opened the issue on the benchmarking WG.
Comments and contributions are very welcome!