nodejs / benchmarking

Node.js Benchmarking Working Group
https://benchmarking.nodejs.org
161 stars 47 forks source link

Performance impact of async_hooks #181

Open bmeurer opened 6 years ago

bmeurer commented 6 years ago

I've already started a related discussions on Twitter earlier here, but Twitter doesn't scale for this kind of discussion, so I thought I should move it here, so it's easier to follow the discussion. For background: @jasnell approached me at NodeConfEU in November 2017, asking for help on the performance of async_hooks on the V8 side, especially when it comes to the Promise lifecycle hooks. And I've talked briefly with @thlorenz about it during the conference (although I have to admit that I was using the wrong terminology and thus kinda ruined the conversation, sorry). And we had chatted briefly about this as well with @trevnorris during one of the CTC-V8 calls.

Since Promise based APIs are likely becoming a (big) thing for Node 10 and at the same time there are estimates (i.e. by @ofrobots) that by the end of next year every Node server in enterprise is going to run with async_hooks, we should start a discussion about this early one to stay ahead of the problem before it becomes a real problem for our users.

In the last year @gsathya and @caitp already spent a lot of effort optimizing promises in V8 in general, so we're in a pretty good position wrt. baseline performance!

I'd like to use this forum to come to an agreement on what the concrete use cases are (at least estimate that), which aspects of the performance matter the most (i.e. promises created inside of C++ vs. in JS land), and what are useful benchmarks to drive the performance work in a meaningful way. I think we need both a set of micro-benchmarks to drive certain, in addition to real-world code that makes heavy use of promises, i.e. koa.js or fastify servers maybe? I feel that these benchmarks are most important, otherwise we might easily sink a lot of effort into work that doesn't help real-world use cases in the end. That's why I opened the issue on the benchmarking WG.

Comments and contributions are very welcome!

bmeurer commented 6 years ago

There's been a related discussion in https://github.com/nodejs/promises/issues/31 earlier this year, which yielded a couple of interesting improvements and insights.

Pinging @nodejs/performance @nodejs/async_hooks

benjamingr commented 6 years ago

I'd like to use this forum to come to an agreement on what the concrete use cases are (at least estimate that), which aspects of the performance matter the most (i.e. promises created inside of C++ vs. in JS land), and what are useful benchmarks to drive the performance work in a meaningful way.

Particularly about async_hooks and promises or promises in general?

For async_hooks and promises - I think a server-application benchmark could be good here - with contexts through async functions through several layers of calls - I'd then listen to promiseResolve and write interesting things about the app while sending it "user data" and measure how long it takes. "interesting things" can be things like how many times the database is hit, request latency for different request stages etc.

For promises in general there are at least 2-3 good ideas in that thread that can get native promises finally faster than userland alternatives like bluebird. Namely skilling the microtask queue when possible and inlining calls - not to mention async iterators that aren't fast yet but could revolutionize programming in Node.js if they were.

bmeurer commented 6 years ago

I'd like to use this forum to come to an agreement on what the concrete use cases are (at least estimate that), which aspects of the performance matter the most (i.e. promises created inside of C++ vs. in JS land), and what are useful benchmarks to drive the performance work in a meaningful way.

Particularly about async_hooks and promises or promises in general?

Improving promise performance in general is also a good idea, but this particular issue is primarily about the impact of async_hooks, i.e. about the performance cliff that you fall off when you enable async_hooks.

As for promises in general and in particular async/await and async iterators, we will need good benchmarks as well to drive meaningful performance work. I've started to collect some ideas for promise performance improvements, but I don't feel like it's a good approach to measure our success in terms of the bluebird benchmarks.

I think @mcollina had some comments about that before. Can we derive a simple representative performance test from fastify to get things going?

AndreasMadsen commented 6 years ago

Specifically, regarding PromiseHooks and performance I would like to see:

kDestroy is added. I think in many cases it is straightforward to know when a promise can no longer be refered too. Thus we could emit the destroy hook much sooner, in other cases it would be fine to wait for garbage collection, but if V8 could instrument that for us directly it would be great.

I've gotten many real-life complains from both APM providers and other async_hooks users that they experience a "memory leak". The reality is, that there is no memory leak but that it unintuitive that the destroy event is not emitted as soon as possible but rather at garbage collection.

I'm guessing this would improve performance a lot, as we would no longer have to wrap the promise object itself to get notified about garbage collection. Essentially we could just create an unreferenced basic object on kInit and set then just set the asyncId and triggerAsyncId (two doubles) on the internal fields.


Extra feature requests that are not performance related.

/cc @addaleax

bmeurer commented 6 years ago

@AndreasMadsen Thanks for the feedback, that's very interesting. These seem to be more on the feature request side.

I think in many cases it is straightforward to know when a promise can no longer be refered too.

Do you have a suggestion/intuition here how to do that?

I'm guessing this would improve performance a lot, as we would no longer have to wrap the promise object itself to get notified about garbage collection. Essentially we could just create an unreferenced basic object on kInit and set then just set the asyncId and triggerAsyncId (two doubles) on the internal fields.

So you're saying that by changing the API and adding machinery to implement kDestroy on the V8 side we should be able to improve performance significantly?

bmeurer commented 6 years ago

cc @gsathya @caitp

AndreasMadsen commented 6 years ago

Do you have a suggestion/intuition here how to do that?

Often promises are only "referenced" once, either by .then() chaining or they are returned and used in await. At least for the user, it is quite obvious that the promise can't be referenced after that.

function wait(ret, ms) {
  // hint: the promise is directly returned
  return new Promise(function (resolve, reject) {
    setTimeout(() => resolve(ret), ms);
  });
}

function main() {
 // hint: the promise is directly used in `await` without being assigned to a variable
 await wait(1, 10) // after this line the user expects the `destroy` hook to emit
 await wait(2, 10) // same for this promise
 await wait(3, 10) // same for this promise
}

So you're saying that by changing the API and adding machinery to implement kDestroy on the V8 side we should be able to improve performance significantly?

Yes, that would be my guess. Wrapping and unwrapping the promise object is definitely the most expensive part of our PromiseHooks integration. Not having to do that, should improve performance.

bmeurer commented 6 years ago

cc @jaro-sevcik

benjamingr commented 6 years ago

Often promises are only "referenced" once, either by .then() chaining or they are returned and used in await. At least for the user, it is quite obvious that the promise can't be referenced after that.

If V8 could detect that with escape analysis and optimize the return value of that function to an entirely different implementation that would be awesome. Fast libraries all hack this by having the single listener case optimized - but an engine that can prove it and avoid allocating extra stuff would be awesome. In terms of promise hooks I suspect it would also simplify things a lot for the common case.

mcollina commented 6 years ago

On the benchmarks side, we can also consider Hapi v17, which is completely async-await based.

The problem with real-world code is that it typically involves a database, and that makes it very hard to measure small improvements. IMHO we should measure:

  1. receive a request, performs a query in the DB, returns value as JSON
  2. receive a request, performs a query in the DB, performs another query in the DB, returns a JSON
  3. receive a request, performs a query in the DB, send an HTTP request somewhere (maybe using fetch), returns a JSON

I fear this would have to be a synthetic application.

benjamingr commented 6 years ago

I fear this would have to be a synthetic application.

This is what Doxbee did (the bluebird benchmark suite) which you're familiar with - we can fork and extend it.

mcollina commented 6 years ago

@benjamingr I think we are all more concern on overall impact of async await rather than a comparative measurement between implementations. Plus, I think we should focus on overall performance including a web framework, a database and HTTP requests rather than just the promise/async await layer.

gsathya commented 6 years ago

Thanks @AndreasMadsen, @benjamingr and @mcollina! This is definitely interesting.

Yes, that would be my guess. Wrapping and unwrapping the promise object is definitely the most expensive part of our PromiseHooks integration. Not having to do that, should improve performance.

This seems surprising to me. I would think that calling out to the javascript hook defined by the user from C++ to be the most expensive part. Have you tried benchmarking with and without the wrapping?

If V8 could detect that with escape analysis and optimize the return value of that function to an entirely different implementation that would be awesome.

The way escape analysis works is by figuring out if a given allocation is unobservable. But, we lose all that with PromiseHooks as the kInit will cause the promise to escape. PromiseHooks enable more observable behavior, potentially restricting optimization.

Plus, I think we should focus on overall performance including a web framework, a database and HTTP requests rather than just the promise/async await layer.

@mcollina I agree, 💯. The hard part would be to try and get reproducible results from such a benchmark though.

@AndreasMadsen -- Can you link to a real world usage of async hooks? How are developers using async hooks? Is it for async stack traces? Is the kResolve hook useful? How is the kDestroy hook used? Is it useful for context propagation? PromiseHooks have a big surface area -- they give you the promise and let you do whatever. If we can find patterns in real world and benchmark these patterns, then we can start to optimize for that and go from there.

bmeurer commented 6 years ago

@mcollina My gut feeling is that we will need some kind of micro-benchmarks to drive individual work items. Plus real-world code to see overall impact and find appropriate areas worth looking into.

@benjamingr Escape analysis could indeed eliminate promises (sometimes), but with promise hooks that becomes impossible, since every promise already escapes/leaks to the hooks.

AndreasMadsen commented 6 years ago

This seems surprising to me. I would think that calling out to the javascript hook defined by the user from C++ to be the most expensive part. Have you tried benchmarking with and without the wrapping?

I admit, I have not done a full matrix comparison. But this was the observation when we initially added the wrapping. It also depends on what hooks are used. I remember that we have some numbers for this but I can't find them.

The way escape analysis works is by figuring out if a given allocation is unobservable. But, we lose all that with PromiseHooks as the kInit will cause the promise to escape.

I can't think of any that actually use the promise object. If we had the kDestroy hook we could properly live without it. We would also need kInit to allow us to return a custom id (the asyncId) and if PromiseHook would carry that context information for the other events. The parentPromise would then have to be replaced with that id.

Can you link to a real world usage of async hooks?

Is it for async stack traces?

That and continues local storage. Hopefully, we will see it being used as a profiling tool too.

Is the kResolve hook useful?

kResolve is used by https://github.com/angular/zone.js/pull/795 - but it is not enough for what they want. Either they need the resolved value or an extra event.

How is the kDestroy hook used?

It is not implemented by PromiseHooks so it isn't used. But it would directly replace our destroy hook. The destroy hook is used to clean up information that is stored in a new Map with the asyncId as the index value.

Is it useful for context propagation?

Very.

bmeurer commented 6 years ago

Is the kResolve hook useful?

kResolve is used by https://github.com/angular/zone.js/pull/795 - but it is not enough for what they want. Either they need the resolved value or an extra event.

I've heard that zone.js might be abandoned by the Angular folks. And async_hooks are available on the Node side only, so they still need a different solution for client side. Not sure who to ask about this?

benjamingr commented 6 years ago

@mcollina

I think we are all more concern on overall impact of async await rather than a comparative measurement between implementations. Plus, I think we should focus on overall performance including a web framework, a database and HTTP requests rather than just the promise/async await layer.

I think what we're measuring async/await against here (with async_hooks) is callbacks. I agree with the database assumption, I'd focus on the most common stack and write the same app in two ways (callbacks and promises) and add instrumentation (for async hooks).

I tend to agree with database/web framework and http requests - but we need to make sure those are as thin as possible so we don't end up with an implementation that is fine "for the average case" but completely breaks for apps that have a harder performance requirement.

What about a benchmark that does:

In general, it's mostly a scope question of whether Node.js should optimize for "the average app" or for "performance sensitive apps".


@bmeurer

Escape analysis could indeed eliminate promises (sometimes), but with promise hooks that becomes impossible, since every promise already escapes/leaks to the hooks.

I'm not sure I follow - so I'll clarify my request/suggestion. Escape analysis can (probably) greatly optimize most of the modern usage of promises in async/await where we're awaiting a direct invocation of an async function - and use much simpler promise code.

async function bar() {

}

(async function foo() {
  // bar is also an async function, this call only has one listener and doesn't escape
  // this is, presumably "the common case".
  await bar(); 
})();

In this case, we don't actually need to allocate a real promise object - we can continue the execution of foo when bar ends. Properties of promises like multicast and assimilation aren't "required". We don't need await to call Promise.resolve on its parameter since we know it's an async function or anything like that either.

If we can prove other nice properties for bar (like that it doesn't throw) it gets even better and we can inline bar entirely in foo (and 'defer a microtick' since we're awaiting it). This would let us drop any allocation or book-keeping altogether.

I realize this is far from the actual implementation but the benefit is also huge and it doesn't sound "too crazy" (I hope :)).

For my specific request: when/if this is optimized, you can inline async_hooks calls too in the above code. Instead of destroy when a promise allocated by bar gets GCd, there is no promise created (or even a call to bar whose code is inlined in foo) and destroy is emitted synthetically.

bmeurer commented 6 years ago

@benjamingr The thing with promise hooks (as required by async_hooks) is that we need to pass the JSPromise object to Node, which immediately makes all escape analysis useless as TurboFan has no idea what Node does with the object and cannot perform scalar replacement of the fields.

AndreasMadsen commented 6 years ago

The thing with promise hooks (as required by async_hooks) is that we need to pass the JSPromise object to Node, which immediately makes all escape analysis useless as TurboFan has no idea what Node does with the object and cannot perform scalar replacement of the fields.

@bmeurer As said, we don't actually need the JSPromise object in node, we just need a way to pass context.

bmeurer commented 6 years ago

That's not how it's implemented currently. I think I need to look at the Node core side again. Don't you need to slap ids onto the JSPromise object?

AndreasMadsen commented 6 years ago

@bmeurer yes we do. If PromiseHooks allowed us to set those ids (just one 64bit id really) in kInit and then passed them to the other events we wouldn't need the JSPromise object.

Our PromiseHook code is here: https://github.com/nodejs/node/blob/master/src/async_wrap.cc#L283

bmeurer commented 6 years ago

@AndreasMadsen But we still need to materialize the JSPromise object to pass it to the kInit hook. So you already defeat the escape analysis at that point. Would it be possible to eliminate the need to pass a concrete JSPromise object to the promise hooks completely?

AndreasMadsen commented 6 years ago

Would it be possible to eliminate the need to pass a concrete JSPromise object to the promise hooks completely?

This is what I'm saying! We are only using it propagating context, so if PromiseHooks provided a different way of doing this, such as propagating a 64bit id set in kInit then that would work just fine.

bmeurer commented 6 years ago

@AndreasMadsen Staring at the PromiseHook on the Node side, you store the PromiseWrap object on the JSPromise and you do check the parent JSPromise during kInit. So this would be a major refactoring on how this works. A few questions come to mind here:

@gsathya does that make sense to you as well? Do we have other users of the PromiseHook?

AndreasMadsen commented 6 years ago

Staring at the PromiseHook on the Node side, you store the PromiseWrap object on the JSPromise and you do check the parent JSPromise during kInit.

Correct. The PromiseWrap holds the ids (asyncId and triggerAsyncId) and help us emits the destroy event. We also send it to the user as the resource object, where it contains two values:

resource = {
  promise: JSPromise,
  // lookup parent asyncId on the internal field of the parent PromiseWrap
  parentId: parentJSPromise.[[PromiseWrap]].[[asyncId]]
}

So this would be a major refactoring on how this works.

On our side, it would be API breaking but I don't see a big issue here. What would be missing is the resource.promise object but I have never seen anyone actually use that. And I don't think it is worth the performance overhead.

Can we get rid of the PromiseWrap for every JSPromise?

yes. It would require two things:

These are the only critical things we use the PromiseWrap object for and hence the JSPromise as well.

Can we move the id slapping logic to V8 instead? So that V8 can set these internal fields upon creating of JSPromise objects.

yes

We would need at least one id (asyncId). We have another id (triggerAsyncId) but we could store that in a map as the asyncId is unique for each promise.

JiaLiPassion commented 6 years ago

@bmeurer,

I've heard that zone.js might be abandoned by the Angular folks. And async_hooks are available on the Node side only, so they still need a different solution for client side. Not sure who to ask about this?

Angular current provide a NgNoopZone to improve performance, it will not load zone.js,but it is just an additional option when user want to fully control when to do change detection.

currently in zone.js, I want to know when promise is really resolved instead of get a kResolve callback because I want to

I am still looking for walk around, but it seems to be a v8 feature request. I also post a feature request there , https://bugs.chromium.org/p/v8/issues/detail?id=6862

ofrobots commented 6 years ago

@AndreasMadsen @bmeurer

So this would be a major refactoring on how this works.

On our side, it would be API breaking but I don't see a big issue here. What would be missing is the resource.promise object but I have never seen anyone actually use that. And I don't think it is worth the performance overhead.

Note that async-hooks are still experimental, so the cost of making a semver breaking change is not that high, specially if it has significant benefits. Regardless, if V8 was to implement this today, this would only get into Node 10, a semver major release. Whether or not the change can be backported to Node 8 and 9 would depend on the complexity rather than semver-ness of the impact to async-hooks API.

guyguyon commented 6 years ago

@bmeurer I've tried to implement a user request context using Async Hooks. I wrote about it a blog post. We used it for debugging purposes as well when we had problems to understand where exactly our bug is.

benjamingr commented 6 years ago

@guyguyon can you share information about the performance impact and what parts were painful in that experience?

guyguyon commented 6 years ago

I know the idea is to share production experience but I didnt dare to upload it to production yet :/. From our Dev experience, I'm deleting a request context using the destroy hook. Since on some resources it will be called by the garbage collector, it caused the request context object to grow fast.

bmeurer commented 6 years ago

I set up a test case with the popular Bluebird and Wikipedia benchmarks (used by the V8 project), which shows that the impact of just having async_hooks enabled (i.e. the PromiseHook enabled in V8) results in significant performance drops.

mike-kaufman commented 6 years ago

@bmeurer do you have some numbers?

bmeurer commented 6 years ago

The bmeurer/async-hooks-performance-impact repository has the numbers:

Results for Node 9.3.0

ruimarinho commented 6 years ago

We've had to rollback a staged production deploy after confirming a severe performance degradation as well. Same goal as @guyguyon - creating request based contexts The numbers from @bmeurer seem to back this up.

bmeurer commented 6 years ago

@ruimarinho Oh that's pretty bad. Was it due to Promise heavy code? Or just old fashioned callback style? Also what kind of software did you try to deploy using async_hooks?

ruimarinho commented 6 years ago

@bmeurer yes - heavy Promise code (using async/await). Full stack http request server, including redis and database connections. We are going to try enabling async_hooks without associating any other code to confirm if the issue is simply related to activating them. We used domains which in 9.3.0+ run on top of async_hooks. We're also using bluebird for most of the modules. /cc @kurayama.

bmeurer commented 6 years ago

@ruimarinho Ok, if you're using bluebird promises, that's going to be slower than native promises already by itself. async/await though will always use native promises.

bmeurer commented 6 years ago

Accidentally commented on the TSC thread instead of this one. So for future reference, I also did a test run of simple hapi and koa servers (using @mcollina's autocannon tester), again with and without async_hooks enabled to get more real-worldish numbers (the Promise benchmarks arguably really stress promises pretty heavily). The results were pretty interesting (with latest Node 9.4.0):

Results for Node 9.4.0

The koa test is super flaky, so the performance difference could also be noise, but for hapi, which makes heavy use of async/await, there's pretty consistent 30% performance drop with just an empty init hook. See bmeurer/async-hooks-performance-impact for the benchmarks and additional information.

And to provide even more data for the discussion: The performance drop also increases with the number of hooks being used (maybe not unsurprising). For example for the Promise benchmarks, we see additional performance regressions:

ruimarinho commented 6 years ago

Very interesting data @bmeurer, thanks for the insights on this subject. We definitely hit the with async_hooks (all) scenario since the current domains implementation on top of async_hooks registers all handlers. That's more in line with what we've seen in production.

async/await though will always use native promises.

Does this hold true even even when await Promise.all([p1, p2]) where const Promise = require('bluebird')?

bmeurer commented 6 years ago

Does this hold true even even when await Promise.all([p1, p2]) where const Promise = require('bluebird')?

In that case you'll probably use bluebird promises for the Promise.all and native promises for the await, which I guess is not what you want. The ES2017 spec forces await and async function to always use native promises (which I consider a good thing FWIW). I think @gsathya brought this up earlier.

holyjak commented 5 years ago

Let me know whether it is not relevant and I should delete my post, but I experienced quite a bad increase in CPU usage when I activated async_hooks (Node 8 and 11) - see https://theholyjava.wordpress.com/2018/11/01/beware-the-performance-cost-of-async_hooks-node-8/ for details.

mcollina commented 5 years ago

I would recommend to check out the latest Node 8, Node 10 and Node 11, as the runtimes that are used in the blog post are quite old, or not supported anymore (Node 9). There should have been some improvement in the area.

Overall async_hooks  are very costly, especially with promises.

holyjak commented 5 years ago

Thanks. I did try the latest Node 11 (the last graph in the blog post) with no difference.

On Thu, Nov 1, 2018, 11:10 AM Matteo Collina notifications@github.com wrote:

I would recommend to check out the latest Node 8, Node 10 and Node 11, as the runtimes that are used in the blog post are quite old, or not supported anymore (Node 9). There should have been some improvement in the area.

Overall async_hooks are very costly, especially with promises.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nodejs/benchmarking/issues/181#issuecomment-434994261, or mute the thread https://github.com/notifications/unsubscribe-auth/AAmJPg_nTlHJe-Qr1xFoG-rEVv8Fq2rGks5uqsilgaJpZM4REc5B .

dnutels commented 5 years ago

Tried it on Node 12.2 on Windows 10 U. Still costly.

regular Bluebird-doxbee: 126 ms.
init Bluebird-doxbee: 551 ms.
full Bluebird-doxbee: 707 ms.
regular Bluebird-parallel: 170 ms.
init Bluebird-parallel: 1048 ms.
full Bluebird-parallel: 1316 ms.
regular Wikipedia: 334 ms.
init Wikipedia: 2129 ms.
full Wikipedia: 2621 ms.
regular hapiserver: 5477.2 reqs.
init hapiserver: 3266.6 reqs.
full hapiserver: 2727.3 reqs.
regular koaserver: 8231.8 reqs.
init koaserver: 7309.6 reqs.
full koaserver: 6771.1 reqs.
omeraha commented 5 years ago

I would recommend to check out the latest Node 8, Node 10 and Node 11, as the runtimes that are used in the blog post are quite old, or not supported anymore (Node 9). There should have been some improvement in the area.

Overall async_hooks  are very costly, especially with promises.

@mcollina Can you shortly explain the reasons for the substantial performance impact?

alekbarszczewski commented 4 years ago

Is it possible to reduce performance impact of async_hooks in the future (in future Node.js versions)? Or the performance will be always so bad because of the nature of async_hooks and how they work?

benjamingr commented 4 years ago

@alekbarszczewski it is mostly blocked on people doing the product work and suggesting concrete improvements.

rochdev commented 4 years ago

If anyone can provide pointers of where to start to try and fix this, and/or a tl;dr of why it's so slow, I can try to look into it.

mhdawson commented 4 years ago

@rochdev how much time will you have to invest? It will be a relatively big ramp up.

mhdawson commented 4 years ago

@rochdev my suggestion is to come to the next https://github.com/nodejs/diagnostics meeting. We are looking for a Champion to help with moving async hooks forward.

As per the calendar the next meeting is scheduled for Oct 9 (next wed) and an issue will be opened in the repo next Monday with the meeting links etc.