tensorflow / serving

A flexible, high-performance serving system for machine learning models
https://www.tensorflow.org/serving
Apache License 2.0
6.16k stars 2.19k forks source link

Add arguments to configure gRPC completion queue parameters #2141

Open yarikmarkov opened 1 year ago

yarikmarkov commented 1 year ago

I discovered a performance issue that Tensorflow Serving has an unexplainable and significant network delay for tail latencies when facing higher loads of traffic.

My setup was a client and a Tensorflow server located on the same host, so in theory the client latency should be roughly equal to the server one. The server was running a simple CPU model. In my experiment, the latency between client and server started to diverge at 50 qps for p99.99 tail latency. At 200 QPS the divergence was hitting 120-150ms, even though the server latency was around 30ms.

Upon debugging I discovered that Tensorflow Serving is not initializing the gRPC completion queue parameters. By default it initialized to 1 queue with 1 poller min and 2 pollers max. It seems to be a major bottleneck for applications caring about tail latency.

The parameters in question are: grpc::ServerBuilder::SyncServerOption::NUM_CQS, grpc::ServerBuilder::SyncServerOption::MIN_POLLERS and grpc::ServerBuilder::SyncServerOption::MAX_POLLERS

Once adding the code for configuration of the said parameters, and setting it to larger numbers than defaults, the divergence between server and client latency went down to almost 0.

Please add the arguments and code in Tensorflow Serving for configuring those.

Seems that people were investigating this issue in the past: https://discuss.tensorflow.org/t/tensorflow-serving-grpc-mode/11613

singhniraj08 commented 1 year ago

@yarikmarkov,

Just to confirm, Do you want TF Serving to have arguments to update these parameters in grpc code base?

yarikmarkov commented 1 year ago

@singhniraj08 exactly, I wanted to have the command line arguments in TF serving, which will eventually update the value of these parameters when initializing grpc server