w3c / trace-context

Trace Context
https://w3c.github.io/trace-context/
Other
468 stars 75 forks source link

Standardize format for "simple" baggages (a.k.a "tags") #13

Closed bogdandrutu closed 6 years ago

bogdandrutu commented 7 years ago

Background:

Open Questions:

yurishkuro commented 7 years ago

Other open questions:

yurishkuro commented 7 years ago

btw, I am obviously biased towards "baggage" because of OpenTracing, but I think "tags" is way too generic and ambiguous term, while "baggage" is unusual enough to not be confused with anything else.

CodingFabian commented 7 years ago

I do not understand why separation of concerns is not applied in OpenTracing, and I wonder why this pollution is proposed to infect the idea of a trace context spec. What is really needed for vendor interoperability of tracing is a clear definition of how to understand an active trace and how to continue it, preserving the trace. The actual semantics of what this trace is has been subject to the tracing system used.

In my opinion the whole idea of baggage transport is seriously flawed, and I seriously had hoped that this spec will be the remedy, reducing it to the minimal need of understanding a trace and being able to continue it.

If you want a tldr of why I dislike baggage or tags, then here is the focal point: Baggage or Tags lack semantics. They consume bandwidth without explaining anybody what they mean. Because they are "optional" in most specs, this means they are unusable for interoperability. And due to their optional nature they are unusable for control flow. The way it is used now is a vendor specific quirk of tracers, which decide to carry some meaningless key value pairs along, while sending other vendor specific key value pairs to the trace system.

yurishkuro commented 7 years ago

@CodingFabian

I do not understand why separation of concerns is not applied in OpenTracing, and I wonder why this pollution is proposed to infect the idea of a trace context spec.

Instrumentation for tracing consists of 90% of general purpose distributed context propagation, and 10% of the actual tracing. There are initiatives to separate DCP into its own layer (e.g. https://github.com/tracingplane). We discussed it at the inception of OpenTracing (which actually started as "Distributed Context Propagation" project), but decided that we want the API to be easy for tracing first, and simply expose the DCP (baggage) API which would have to be implemented anyway, since tracing cannot exist without DCP.

I think the same reasoning applies here. It's certainly possible to limit the scope of this spec to only the tracing piece, except that there have been already many questions "what about this" which can be easily solved with general-purpose baggage spec.

It is not clear from your comment if you think the notion of general purpose DCP itself flawed, or any specific implementations of it. If it's the former, then I can say this:

  1. http://www.cs.brown.edu/~rfonseca/pubs/fonseca-tracing-1973.pptx
  2. http://pivottracing.io/mace15pivot.pdf
  3. at Uber we have several projects in flight that are using baggage for propagating things like tenancy, auth & claims, fault instructions for chaos engineering, business flow identifiers, markers for capacity planning, cost accounting, contextualized metrics.

Baggage or Tags lack semantics.

Yes, that is the main feature of baggage. Actually, they do have semantics for components that care about specific items. And components that don't care don't need to know the semantics, they only need to pass the data along to the next layer. If every baggage item had semantics that every one of 5000 microservices understood, we'd need to change all 5000 microservices if we want to add one more baggage item, even if only 1% of them actually care about interpreting it.

yurishkuro commented 7 years ago

Added another open question to my https://github.com/TraceContext/tracecontext-spec/issues/13#issuecomment-329980665 above regarding name spacing of baggage keys.

codefromthecrypt commented 6 years ago

just like RFCs (we we aren't even at the stage of) change a bunch of times, we can expect whatever we do here to change. I would like to follow-up with @CodingFabian to figure out what we can do around this.

For experimentation sake, the Correlation-Context seems on track to help as long as whatever we hope to achieve with this ends up in some sort of tests (ex maybe some python or other easy-access language tests here in this repo).

bikeshedding the name a bit, I think Correlation-Context is associated with the trace-context header, maybe trace-context-meta isn't a bad choice as it also prefix-matches (assuming it is sent iff trace-context is). meta is bad, just throwing ideas

bhs commented 6 years ago

Truly just throwing this out there in the spirit of sharing ideas (translation: I'm not sure I like what I'm about to propose, either)...

Another way to look at this is that there's one set of trace context fields that are for tracer-internal purposes, and potentially another set of trace context fields for tracing-user purposes. The "internal" fields are basically what's been discussed elsewhere in this repo: trace-ids, span-ids, and sampling bits. The "user" fields would be things like census-style accounting entities, security tokens, and so on. The latter maps almost 1:1 with the opentracing "baggage" concept, of course, though this is a different way of framing it.

Separating the "user" fields from the "internal" fields seems safer from an interference and isolation standpoint. I wonder (and this is the most questionable part here) if we could use precisely the same format or meta-format for the "user" and "internal" fields – could be an elegant unification. (Though I suspect it won't be compact enough for the tracer-internal state)

tylerbenson commented 6 years ago

@bhs While I like the idea of avoiding potential naming collisions between trace internals and user space, if this gets popular and becomes more standardized, I also see a potential for a third grouping of collisions from frameworks/libraries. That may be something to consider when trying to decide to split or not.

bhs commented 6 years ago

@tylerbenson that's a great point. I still like the idea of segmenting state that's specific to tracer internals... wouldn't want it to get lost or truncated because of overzealous user behavior.

wu-sheng commented 6 years ago

@bogdandrutu Since we have Correlation Context, and @SergeyKanzhelev suggest for ttl property. For me, baggages are somthing ttl=infinite. Do we still this separated head?

Maybe you will say, baggages are somthing of application data, Correlation Context are something about tracer propagation. That is the only possible difference for me, now.

christian-schwarzbauer commented 6 years ago

hi everybody,

I'd like to add our point of view at Dynatrace to this discussion, if I may. :-) Generally saying, our tracing concepts are pretty similar, but we also have some "internal tracing data" that is only useful for our tools.

So I absolutely vote for something like the baggage concept to propagate tool-internal information - which mostly consists of various IDs for us. I also kinda like the name bagagge as suggested by @yurishkuro and keeping the "Trace-Context" prefix as @adriancole suggested, would lead to something like "Trace-Context-Baggage" which would sound ok to me.

As the majority of this thread seems to do, I also suggest to separate the headers between "general tracing data" and "internal tracing data" (= baggage). I also suggest to only have one header for all the baggage -> should make it easier to get it through firewalls/ESBs and such if there's only 2 specific headers. In any case, if there are multiple tools/vendors involved in one trace, they must not interfere with one another.

So from my point of view the basic behavior guidelines should be:

I also don't see the need to specify the exact encoding, but general tracecontext rules could be recommended: e.g. use '-' as separator, base16 encoding of numeric values I would just go for something simple like

<tool1_name>=<tool1_bagagge>;<tool2_name>=<tool2_bagagge>;...

and a specific tool's bagagge could look like this (but it's actually up to the tool)

01-127-session42-87e9a6f1   // ';' and '=' may not be used and have to be encoded by tool itself (e.g. via url-encoding)

Of course there should be no tool name collision, but I guess that should be doable. :-)

For a binary representation there should of course also be size information included, which I don't see necessary for the HTTP header representation. I also would not include required version information, that's - again - up to the tool to implement on its own.

Having said all that an actual bagagge HTTP header could IMHO simply look like this:

Trace-Context-Baggage: uber=05-taxi31337-27;dynatrace=01-127-session42-87e9a6f1

@yurishkuro sorry for the Uber sample, that was the first name that came to my mind ;-)

codefromthecrypt commented 6 years ago

So from my point of view the basic behavior guidelines should be:

if there is bagagge coming in, it has to be passed along tools may modify their own bagagge, but must not touch bagagge from other tools baggage may be dropped, but only if really really necessary

What you are describing is very much like actual Baggage, except simplified quite a bit with text-header encoding. Main points are we are firewalling contexts between tools as demonstrated here https://github.com/JonathanMace/tracingplane-opentracing/tree/master/src/main/baggage cc @JonathanMace

JonathanMace commented 6 years ago

Another question to add:

Suppose I have some async components, or suppose I propagate my baggage in RPC responses. How do I combine a key that has two different values for a tag?


FWIW here's the paper we wrote about baggage: http://cs.brown.edu/people/jcmace/papers/mace2017layered.pdf

The paper's a bit roundabout in its message, but is certainly a good resource for this discussion (feel free to contact me to discuss it, especially if you disagree with any of our assertions)

Also not-quite-as-relevant, here's the upcoming SOSP paper on Canopy, Facebook's e2e tracing system: http://cs.brown.edu/people/jcmace/papers/kaldor2017canopy.pdf


If code matters more than words, I've begun implementing our baggage stuff for other languages (e.g., https://github.com/tracingplane/tracingplane-go/blob/master/atomlayer/api.go is our minimal implementation of TP propagation in Go, at 118 LOC) and instrumenting other platforms (e.g., shamelessly butchering https://github.com/JonathanMace/spring-cloud-sleuth). I'll use the sock shop demo (https://microservices-demo.github.io/) to illustrate some baggage concepts. Also we hacked a goroutine-local context into go (https://github.com/JonathanMace/tracing-framework-go/blob/master/cmd/modify-runtime/modify.go) which is cool.

SergeyKanzhelev commented 6 years ago

I believe this issue is addressed by introduction of Trace-Context-Ext for logger internal fields and Correlation-Context for user-defined bag of properties. Closing the issue. Please re-open for more discussion