vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.57k stars 2.09k forks source link

gRPC settings improvement #3929

Open rafael opened 6 years ago

rafael commented 6 years ago

Description

I've started to do some research into gRPC settings to see if there are some knobs that we can tweak in order to:

The following are some ideas that I think we could consider.

KeepAlive

HTTP2 Window Size

Client Connection Timeouts

Compression

Set up compression on the rpc layer to reduce the amount of data sent through the wire. We could look into using gzip or snappy in gRPC.

Circuit Breakers

We should look into protecting gRPC calls with circuit breakers. That way, we could fail fast when requests are failing due to downstream component failures (e.g fail requests fast in the gate if it can't connect to a tablet).

derekperkins commented 6 years ago

@rafael Have you done any testing with https://github.com/golang/protobuf/releases/tag/v1.1.0 yet?

The serialization logic (for both Marshal and Unmarshal) has been optimized. Testing inside Google demonstrate that the new implementation is about 1.3x to 2.1x faster.

rafael commented 6 years ago

Not yet! That's something that we should add to this list for sure.

derekperkins commented 6 years ago

I also seem to remember that @tirsen or someone was experimenting with snappy compression a while back.

rafael commented 6 years ago

Oh didn't know @tirsen already worked in compression. Here it is:

https://github.com/vitessio/vitess/blob/master/go/vt/grpcclient/snappy.go

@tirsen did you rollout this change to prod in your setup?

derekperkins commented 6 years ago

Regarding client timeouts, using the gRPC client should already support client-side timeouts, but I'm not sure how that would work with the MySQL protocol. @bbeaudreault and @hmcgonig are using server-side gRPC timeouts, I think mostly for load balancing. https://vitess.slack.com/archives/C0PQY0PTK/p1520879177000696

rafael commented 6 years ago

Right now, when using MySQL protocol there are no timeouts in the gRPC call. I'm planning on working on that as part of this: https://github.com/vitessio/vitess/issues/3718#issuecomment-388473606.

The timeout I'm proposing here is connection timeout, not general grpc call timeout that you can do in the client.

derekperkins commented 6 years ago

cc @dweitzman

rafael commented 6 years ago

@zmagg also found today that KeepAlive can be set already in the server side, so we will only need to do the counter part in the client side.

There are many knobs in https://github.com/grpc/grpc-go/blob/master/keepalive/keepalive.go#L42, I think we should allow to set all of them.