Closed rnburn closed 6 years ago
LGTM 👍
From a coding/implementation perspective, this all looks good to me, but I’m concerned about the general idea/design of a single span
around a streaming client or server, which can last up to hours. This definitely breaks the definition of a span
, so I think it's worthwhile to step back and discuss the general paradigm of instrumenting a streaming RPC.
Some quick thoughts (all IMHO):
span
of a streaming RPC should be finished when the stream is successfully established, this applies to both client and server streams.message
sent on the above stream should have its own span
, finishing when the SendMsg
call finishes (could be an error).RecvMsg
call should have its own span.span
in 1), but that'll likely create huge long traces, so I don't actually think we should do it.I agree that tracing individual requests in the stream would be useful, but gRPC only allows you to send metadata when you initiate the stream, so I don’t see how you could propagate context if you created a span per-request.
I can see creating a span for the entire lifetime of the stream as being useful for shorter lived streams. I’m not sure what you could do for long-lived streams without being able to send context for individual requests.
Any motion on this PR? It would be nice to have in the official release.
Thanks
Any motion on this would indeed be nice. From my point of view tracing complete streams instead of every message is better than not having tracing support for streaming rpc calls at all.
My personal problem is that the master of moby/buildkit depends on that PR, and that breaks stuff on my side.
Thanks in advance!
Merging this in. Sorry for the wait!
FYI: this repo is about to be moved to https://github.com/opentracing-contrib/go-grpc, and future development will happen there, so that we can monitor PR activity better and with more accountability.
This PR adds additional interceptors to
otgrpc
so that streaming RPCs can be traced.One of the more challenging problems with streaming RPCs is figuring out when to finish the span on the client-side. Unlike other interceptors, gRPC's client-side stream interceptor functions don't enclose the entire RPC, but rather return a ClientStream that the invoker then interacts with. Based on the error codes returned by ClientStream's methods you can figure out in most use-cases when the span finishes, though there's special cases that are problematic (example).
I added code for the problematic cases that at least eventually finish the spans and clean up resources. And I reached out on StackOverflow to see if to see if there's any other solution, but I'm not sure it's possible with the current API to do any better.
I also added testing coverage that follows the structure of this test in gRPC Go.