Add GRPC Interceptors to TF Serving to capture transport-level overheads

salliewalecka commented 2 years ago

Feature Request

If this is a feature request, please fill out the following form in full:

Describe the problem the feature is intended to solve

I want to close the the gap between latency as seen in :tensorflow:core:graph_run_time_usecs_histogram_bucket and latency as seen by the client by adding transport-level tracing. Then I will have additional metrics for network delay + request queuing and request serialization and deserialization on the server side. I've experienced high latencies that cannot be explained by tensorboard or the graph latency, which has turned out to be a blocker to launch some models.

Describe the solution

I want to get metrics similar to DoorDash's implementation of this tracing using GRPC Interceptors. However, using client interceptors is not enough, as we need to have server side interceptors to be able to track the whole request lifecycle. Thus, we need the ability to add these interceptors that can report the request event lengths. I'm not sure what the exact mechanism should be to gather these metrics after they are created by the interceptors, but somehow getting these into the metrics endpoint prometheus scrapes.

Describe alternatives you've considered

We can't get the information we need with client-only metrics, and have looked through all other metrics offered by TF Serving and none of them help us explain extra non-graph latency. We've done latency tests from different points in our infrastructure, but having these metrics would be really valuable to pinpoint the source of latency.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Container-Optimized OS
TensorFlow Serving installed from (source or binary): docker image
TensorFlow Serving version: 2.5.2 and 2.6.1

salliewalecka commented 2 years ago

DoorDash provides some good psuedo code for their interceptor. If you could point me to the analogous spot to add in the tracer for your server, that would also be appreciated.

ndeepesh commented 2 years ago

Hello, any progress on this request? Or if you can point to some code where we can add the custom client interceptors. That will be great

tensorflow / serving