stargate / stargate

An open source data gateway
https://stargate.io
Apache License 2.0
824 stars 163 forks source link

Analyse direct executor usage in the `BridgeImpl` #2006

Open ivansenic opened 2 years ago

ivansenic commented 2 years ago

We were doing some initial performance testing with the Docs API V2 and once of the most disturbing things for us so far was the metrics and the tracing reports that the DescribeKeyspace method on the Bridge was taking too much time from the client perspective.

In the current state, we observed this metrics that showcase that the method has:

grpc_server_processing_duration_seconds_count{method="DescribeKeyspace",methodType="UNARY",service="stargate.StargateBridge",statusCode="OK",} 400036.0
grpc_server_processing_duration_seconds_sum{method="DescribeKeyspace",methodType="UNARY",service="stargate.StargateBridge",statusCode="OK",} 1151.208204639
grpc_client_processing_duration_seconds_count{method="DescribeKeyspace",methodType="UNARY",module="sgv2-docsapi",service="stargate.StargateBridge",statusCode="OK",} 400036.0
grpc_client_processing_duration_seconds_sum{method="DescribeKeyspace",methodType="UNARY",module="sgv2-docsapi",service="stargate.StargateBridge",statusCode="OK",} 2842.293467427

It was quite disturbing to see a method that should be a sub-millisecond method to be recorded as the 7ms on the client side, with over 4ms overhead from the server-side on the same machine. We started exploring and we, most likely by accident, tackled the .directExecutor() usage on the BridgeImpl..

We changed this to a dedicated executor with 16 threads: .executor(Executors.newScheduledThreadPool(16)), and got quite the opposite times:

grpc_server_processing_duration_seconds_count{method="DescribeKeyspace",methodType="UNARY",service="stargate.StargateBridge",statusCode="OK",} 700102.0
grpc_server_processing_duration_seconds_sum{method="DescribeKeyspace",methodType="UNARY",service="stargate.StargateBridge",statusCode="OK",} 512.994728568
grpc_client_processing_duration_seconds_count{method="DescribeKeyspace",methodType="UNARY",module="sgv2-docsapi",service="stargate.StargateBridge",statusCode="OK",} 500099.0
grpc_client_processing_duration_seconds_sum{method="DescribeKeyspace",methodType="UNARY",module="sgv2-docsapi",service="stargate.StargateBridge",statusCode="OK",} 643.713077082

It's really unclear to me why is this and if usage of the direct executor is justified, but we also observed improved response times for the read and write scenarios in V2:

Direct Executor Executor Diff
Read Mean=25.16ms (99.000ptile=42ms) Mean=21.55ms (99.000ptile=31ms) 16% faster
Write Mean=39.69ms (99.000ptile=68ms) Mean=36.25ms (99.000ptile=55ms) 9% faster

We should analyze if the direct executor should be used here or not.

Note that all data is from write and read docs scenario using 60 nb threads on a single machine.

ivansenic commented 2 years ago

@mpenick Could you have a look on this please?

ivansenic commented 2 years ago

Showcasing differences in tracing examples:

Executor: image

Direct executor: image

ivansenic commented 2 years ago

Reconfirmation needed after #2008 is merged.

ivansenic commented 2 years ago

The theory why is this happening that the DescribeKeyspace calls are not offloaded to another thread, as for the CQL querying. This is most likely the reason for that strange timing. @mpenick and I think that it would be the best:

This should be done for all the gRPC call that are not doing the CQL query.