Question about tracecontext in load-balancer scenarios

Hello all,

Wanted to get your feedback on the below use case related to tracecontext:

Let's say there's a global load balancer service (say, a L7 load balancer offered by a cloud provider) that has multiple microservices. Let's say it wants to use DT to improve life for its on-call engineers.

Let's say an application uses the above the load-balancer service to efficiently route requests to the right backend. For example, let's say there are two parts of this application level code:

A (client) and
C (the backend service behind the load balancer).

The request goes as:

A (client application) -> B (cloud provider offered load balancer service (with multiple internal microservices)) -> C (backend service)

Now, if B is not just propagating the original context, but is actively participating in the trace, then the parent-child relationship between A and C will get broken. This is because the spans emitted by A and C might be going to a different Observability backend than those of B.

In such a situation, it looks to me that B should be restoring the original trace context (that it got from A) before finally calling C. For this, it will likely need to store and propagate that original trace context (at least A's spanid, A's traceflags) in tracestate so that it can use that when it comes the point where it needs to restore this (and then clear it before actually forwarding the request to C).

Wanted to get your feedback on the above thinking - and if you can think of any problems with this approach, or if there's a better approach.

How would B know that there are other tracers downstream?
Moreover, how would B know that the its upstream (A) and its downstream (C) are sending telemetry to the same observability backend?
If B is actually comprised of several individual services itself, for this to work, every service within B (or rather, every tracer within any service within B) would need to know whether its outgoing requests actually go to another internal service or are leaving its scope of control (that is, they go to a foreign downstream, e.g. C)

Now, for (1) and (2) one could argue that every cloud provider load balancer should just always behave like this (restore the original trace context for outbound requests), just on the off chance that (1.) and (2.) are true. But that would also mean that our imaginary cloud provider would need to spend the extra complexity and computation cycles for this behavior every time, for every request; when in reality only a small subset of users would benefit. (For example because in most cases when a cloud provider load balancer is involved, A and C really belong to different parties and send telemetry to different observability backends as well, or to no backend at all.) To me this sounds like a somewhat unreasonable expectation.

And for (3.) you could argue that since all of B is within the scope of control of one party, they could know which of their services need to save the original context to trace state and which services would need to restore it for downstream propagation. But what if the cloud provider uses general purpose implementations for trace context handling (say an OTel SDK with auto instrumentation). Would you have them interfere with the automatic context propagation? Do custom tracing? This sounds like a high bar to me as well.

That being said, it is worth noting that the observability backend that A and C are talking to can still see that all spans belong to one trace, since the trace ID is identical throughout the whole distributed transaction. But it is true that, without further help, it cannot restore the correct tree structure because the direct parent-child relation is lost.

An alternative approach could be this: Assuming A and C are under control by the same party: If A knows that it is talking to a load balancer which will break the direct parent-child relationship, it could store the span ID of the span representing the outgoing request in tracestate and send it downstream. Then, C could pick up that information from tracestate and replace the parent span ID in the span representing the incoming request before reporting it to the observability backend. It could even store the "original" incoming parent span ID in a separate span attribute. The observability backend that A and C are talking to could use this additional attribute to recognize the fact that this request went through third party services that do not report to the same backend.

In a former life (while still at Instana) I implemented this latter strategy quite successfully to deal with this type of situations - requests going through third-party services that send spans elsewhere.

Unfortunately, these type of situations -- distributed transactions going through some infrastructure that is monitored by a different observability solution -- are quite common in my experience and to the best of my knowledge there is no ideal one-size fits all solution. (Not unless we specify an API where observability backends can talk to each other to get those missing spans.)

w3c / trace-context

Question about tracecontext in load-balancer scenarios #575