Issue with the N+1 benchmark

kyri-petrou commented 2 months ago

Seems that tailcall went from ~7k reqs/s to ~60k reqs/s even for the baseline query that doesn't contain N+1 queries.

Is this an issue with the benchmark or shall I add global request batching to Caliban as well?

Also, what's the reason for not using the batching endpoint of users such as /users?id=1&id=2 etc? I run the benchmark locally and more than 50-60% of the CPU was being used by the nginx proxy, meaning that we're no longer benchmarking graphql performance

tusharmath commented 2 months ago

Yeah we did 💪

We were quite inspired by the optimizations you have been doing 🙌. Will publish a blog about how we made such an improvement, most of it is because we can look ahead of time at what and how the API call is going to be made and create many short circuits along the way.

Is this an issue with the benchmark or shall I add global request batching to Caliban as well?

Benchmarks are fine, we want to add more realistic benchmarks. Feel free to make changes in Caliban to improve performance if you can. We will raise an issue if something doesn't meet the test expectations.

Also, what's the reason for not using the batching endpoint of users such as /users?id=1&id=2 etc?

Bulk API is a separate test, less about performance and more about being "smart".

I run the benchmark locally and more than 50-60% of the CPU was being used by the nginx proxy, meaning that we're no longer benchmarking graphql performance

You are right, I want to fix that too. Thread and process prioritization plays a major role here. We plan to invest in dedicated infrastructure that will run the benchmarks on dedicated hardware that's not shared between Nginx, GraphQL and WRK.

kyri-petrou commented 2 months ago

We were quite inspired by the optimizations you have been doing 🙌. Will publish a blog about how we made such an improvement, most of it is because we can look ahead of time at what and how the API call is going to be made and create many short circuits along the way.

I'm really happy that the work that we've been doing was an inspiration! Although I think there is something buggy in either the benchmark or the implementation of the dedup directive. When removed, the performance plummets for both the "baseline" and the N+1 benchmark (the latter is expected of course). Is it expected to also affect the baseline benchmark? If yes that's totally fine I'm just not sure if it's a bug or a feature

tusharmath commented 2 months ago

Yes, baseline performance was expected to improve with those changes. Some of the optimizations were in beta for a while and weren't enabled until very recently. In upcoming releases, we expect to improve performance even further by ~45% as per our internal benchmarks.

kyri-petrou commented 1 month ago

@tusharmath I'm sorry but I'm failing to see how enabling N+1 optimizations (i.e., @dedup directive) improves the performance of a query that doesn't contain fields that can be N+1 optimized. If you remove the @dedup directive the performance drops for both benchmarks, where it shouldn't be affecting the baseline posts { title } query.

This can only mean that the @dedup directive batches requests:

Globally (across request boundaries)
The entirety of the request, rather than individual fields

Look, if you want to convince the world that tailcall is fast, perhaps do so by comparing apples with apples and not by putting a global cache infront of the query execution. At the moment, all implementations minus tailcall are N+1 optimizing the user field only, and batch under the scope of a single request.

Honestly, I'm fairly certain that tailcall got faster, perhaps even faster than Caliban, and it can potentially get much faster. But please, don't insult our intelligence by putting a global cache (yes, I looked at the implementation of @dedup in the tailcall repo) and trying to convince us that it's due to "optimizations that were in beta" and then close the issue.

tailcallhq / graphql-benchmarks

Issue with the N+1 benchmark #273