uber / tchannel-go

Go implementation of a multiplexing and framing protocol for RPC calls
http://uber.github.io/tchannel/
MIT License
490 stars 84 forks source link

Fix span leak for inbound & outbound #922

Open ChenX1993 opened 3 months ago

ChenX1993 commented 3 months ago

Problem

Span leak is a issue in observability tracing where a span is started, but is not finished. The span will not be sent to tracing backend if it is not finished, hence causing incomplete traces.

span := tracer.StartSpan(...)
...
span.Finish()   // span leak happens when this line is not executed.

Internally at Uber, we fond incomplete traces from time to time due to span leak caused in tchannel-go library. In the happy path, the span is properly finished, but the span leak issue in tchannel-go happens in error path when a function early returns with error.
E.g. https://github.com/uber/tchannel-go/blob/af146f5a54ac5787563a6857ed9b6ff26394f0ca/thrift/client.go#L139-L152 In above code snippet, A span is created within c.startCall, and is finished within readResponse when reader.Close() is executed. If there is an error occurred in writeArgs, then the span leak happens.

This PR is to fix the span leak issue.

Change Summary

  1. Add span component tag following the OpenTracing semantic conventions to indicate the software package that created spans.
  2. For inbound: 2.1. Add finishSpanWithOptions method that allows span.Finish() to be invoked many times in InboundCallResponse. 2.2 . Use finishSpanWithOptions in dispatchInbound to always try to finish a span after the handler is called.
  3. For outbound: 3.1. Add finishSpanWithOptions in OutboundCallResponse similar to inbound case 3.2. Use finishSpanWithOptions json, thrift and raw clients.

Next

At Uber, YARPC (yarpc-go) is often used as a umbrella RPC framework with gRPC, TChannel and HTTP as transports. When YARPC is used with TChannel, the span leak issue does also exist.

With this PR, the span leak in inbound is expected to be fixed for YARPC as well, but for outbound, we need to add defer call.Response().Done here in yarpc-go after this PR is merged.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 88.83%. Comparing base (49a0061) to head (6e0853b). Report is 3 commits behind head on dev.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #922 +/- ## ========================================== - Coverage 89.15% 88.83% -0.33% ========================================== Files 43 43 Lines 4517 4548 +31 ========================================== + Hits 4027 4040 +13 - Misses 373 389 +16 - Partials 117 119 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.