open-telemetry / opentelemetry-go-contrib

Collection of extensions for OpenTelemetry-Go.
https://opentelemetry.io/
Apache License 2.0
1.15k stars 548 forks source link

otelgrpc: bidi stream incremental span processor flush #436

Open anuragbeniwal opened 3 years ago

anuragbeniwal commented 3 years ago

For gRPC instrumentation otelgrpc implementation adds 1 span per stream with span events tracking individual send/receive messages. For long running bidirectional streams this has the potential of the span building up in memory and getting massive.

  1. Are there any benchmarks and guidelines for such scenarios?

  2. Is/would there some "incremental" span processor mechanism which could get invoked periodically and flush(and export?) the current state of spans even when the stream is not yet finished? (Fundamentally providing the inverse value proposition of the Batching processor)

kanekv commented 3 years ago

@MrAlias I have same question, I guess the way it works is interceptor starts long running span for the whole stream and then implementations can add child spans/events to root span. Is it send to server on time without waiting for the root span to finish or root span has to end before whole batch can be sent?

benmathews commented 3 months ago

My company has a central component that I'm trying to add tracing to. It involves a gRPC stream endpoint. The current implementation doesn't work for us. I don't (and I can't imagine anyone) want a never ending span on these streaming calls. It results in super large traces that are too large to get processed. And even if we could process, store, and visualize them, they aren't the unit of work I want a trace to be over. Can we modify the implementation to create spans per message? Or as the linked issue suggests provide a mechanism to specify when the spans should be created?

In slack, @dmathieu asked for some research how other language/libraries handle this.

The specification does seem to imply that the go library is partially correct in that it states one span per RPC and offers no distinction for stream connections. The spec does state that:

In the lifetime of an RPC stream, an event for each message sent/received on client and server spans SHOULD be created.

I don't believe the go library creates the events.

It appears that Java has a similar implementation as go.

An interesting quote

This is uncharted territory. With gRPC streaming, I believe, the state of art is to instrument a request/response, but not instrument a single message. Since the stream may have an app’s lifetime and be reused to send independent messages, it makes sense to instrument each message separately. Unfortunately, this is fully custom since there is no metadata propagation on the message or standard way to handle messages. Manual instrumentation is the best we can do. Still, HTTP semantics doesn’t really apply to those micro-calls (they don’t have HTTP methods, URL, or status codes). Long polling and streaming

So while it appears that the go implementation conforms to the spec, it doesn't help me w/ my problem. For now, I've filtered out the streaming gRPC call and created a new root in the code that consumes the stream. This breaks up the trace, but leaves the remaining fragments usable.