w3c / trace-context

Trace Context
https://w3c.github.io/trace-context/
Other
470 stars 76 forks source link

Proposal: Add a "propagation-only-parent" flag to be set to true if parents is a no-tracing service #550

Open bogdandrutu opened 1 year ago

bogdandrutu commented 1 year ago

Background

Starting from the definition about the "minimum" interaction:

At a minimum they MUST propagate the traceparent and tracestate headers and guarantee traces are not broken. This behavior is also referred to as forwarding a trace.

Let's assume we have the following scenario, where 3 services are call during a request A -> B -> C (Service A calls Service B which calls Service C), and assume that the owner of the Service B wants tracing disable so they will do the "minimum" interaction defined above. With this it will look like Service A -> Service C (service A calls Service C, which is not really the reality).

Proposal

Add some information into the trace-flags about the fact that the request recently passed a Service that has not participated in the trace. For this we can define a "propagation-only-parent" flag bit, that has the following behavior:

How does this help?

Knowing that the parent participates or not into the trace is a critical information that can be used by the backend when showing parent child relationships and to inform the user about the correct connection between the services. In this case what the backend may do is just to inform the user that there are intermediate services that are not participating in the trace (backend cannot know about how many, what other services are doing since they decided to not participate in the trace).

Questions

yurishkuro commented 1 year ago

My concern with this is what it means to be a "service". Is istio sidecar a service? Should it set this bit? If it doesn't, how about a standalone proxy like nginx?

bogdandrutu commented 1 year ago

I do agree that in some scenarios like sidecar maybe skipping this is the right thing, but if we have a "standalone service" which is not a proxy/lb/sidecar, would be good to know this info.

Happy to have these kind of comments/recommendations. Also maybe making this an "optional" flag, and service owner can decide to set this or not based on the fact that the service type would make sense. So not having the flag set it does not mean that your parent is traced but that the parent is a "relevant" service (or how we call it).

I do agree that there are caveats, but I do also see a gap here, and would like to have a solution that at least works for real cases like we have in my current company :).

basti1302 commented 11 months ago

Sorry for the late reply. We discussed this in a working group meeting. It would be helpful to describe possible use cases in more detail. There is a description about that in the "How does this help?" section already, but so far it remains a bit vague.

Knowing that the parent participates or not into the trace is a critical information that can be used by the backend when showing parent child relationships and to inform the user about the correct connection between the services. In this case what the backend may do is just to inform the user that there are intermediate services that are not participating in the trace (backend cannot know about how many, what other services are doing since they decided to not participate in the trace).

So the only information a downstream service can get from this is whether or not there were any intermediate services, but no additional context or information. How would an observability backend use this information? How is displaying this helpful for users of an observability product? Do you have examples of real world use cases for that information?

and would like to have a solution that at least works for real cases like we have in my current company :).

Can you expand on that real world use case?

SergeyKanzhelev commented 11 months ago

This can be achieved by utilizing tracestate. Also tracestate approach is better (as discussed today at the meeting) in another use case where "proxy" service retries and want to append the retry # into tracestate without changing traceparent. Multi-value tracestate key/value pair will work better than a boolean flag.

bogdandrutu commented 9 months ago

This can be achieved by utilizing tracestate. Also tracestate approach is better (as discussed today at the meeting) in another use case where "proxy" service retries and want to append the retry # into tracestate without changing traceparent. Multi-value tracestate key/value pair will work better than a boolean flag.

Feedback:

This will not work for us unless the tracestate is somehow standardized as well. Even for the number of retries what is the "key" in the tracestate used? Same here, what is the key used for this? How do a proxy provider like envoy agree with a service provider like Snowflake on that key?

bogdandrutu commented 5 months ago

@SergeyKanzhelev ping on this