openzipkin / b3-propagation

Repository that describes and sometimes implements B3 propagation
Apache License 2.0
541 stars 62 forks source link

What should a trace id and span id look like? #5

Open codefromthecrypt opened 8 years ago

codefromthecrypt commented 8 years ago

from a question: https://github.com/spring-cloud/spring-cloud-sleuth/issues/400#issuecomment-246878146

codefromthecrypt commented 8 years ago

B3 ids are fixed-length lowerhex encoded values Ex. "48485a3953bb6124". These are easy to copy/paste vs numeric values or UUIDs which have hyphens in them. They are expected to be fully random. This is important as some samplers are probabilistic and assume each bit is equally likely.

Traditionally, the start of a trace (root span) has the same value for trace id and span id. The root span has no parent id. Its child would share a trace id with its parent, but provision a new span id.

Ex. root span:

X-B3-TraceId: 48485a3953bb6124
X-B3-SpanId: 48485a3953bb6124

And its child

X-B3-TraceId: 48485a3953bb6124
X-B3-ParentSpanId: 48485a3953bb6124
X-B3-SpanId: 42e1e27066118385

Since spans are contained within the namespace of a trace, and traces usually have orders of hundreds of spans or less, there's little likelihood that a 64bit span id will ever clash.

However, 64bit trace identifiers are possible to clash in high-traffic circumstances, such as client-originated traces (devices or cars, for example), or very high volume websites (like twitter). For this reason, 128bit support will be added for trace identifiers (via #1). When these are added, they will have the following conventional behavior.

Ex. 128 bit root span:

X-B3-TraceId: 463ac35c9f6413ad48485a3953bb6124
X-B3-SpanId: 48485a3953bb6124

This allows "compatibility mode" where a system that chooses to only look at the lower 64bits of a 128bit trace id appear exactly the same as prior practice. NOTE At the time of this writing 128 bit ids are not in use yet, and won't be until at least #1 is merged

yurishkuro commented 8 years ago

@adriancole prepending high 64bits to X-B3-TraceId might break id parsing in the non-upgraded clients, which makes it quite hard to deploy somewhere where tracing is already rolled out, because 100s of microservices cannot all upgrade the client libraries overnight. Do you have any thoughts on a possible (incremental) upgrade path?

codefromthecrypt commented 8 years ago

So the node that starts the trace makes the decision whether to use 128 bits or not. The thing I mentioned works when high bits of all zeros are not serialized and the node that starts the trace decides to not start a 128 bit trace.

The thinking is that users should do a wave of updates where they toss the high bits of a X-B3-TraceId that is larger than 64 bits on ingest. Once that's in the node that starts traces can start them at 128 bits, even if it is lossy on the other side. This is a trivial change in any language, and a better alternative than permanently defining an additional B3 trace id header.

This probably hints at an operations story where you do analysis on the propagation mode of a tracer.

On Wed, Sep 14, 2016 at 10:37 AM, Yuri Shkuro notifications@github.com wrote:

@adriancole https://github.com/adriancole prepending high 64bits to X-B3-TraceId might break id parsing in the non-upgraded clients, which makes it quite hard to deploy somewhere where tracing is already rolled out, because 100s of microservices cannot all upgrade the client libraries overnight. Do you have any thoughts on a possible (incremental) upgrade path?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/b3-propagation/issues/5#issuecomment-246888711, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD618t54z6vLjBZvWuEMLu5KgTMHy_-ks5qp135gaJpZM4J8UH1 .

codefromthecrypt commented 8 years ago

the general tradeoff is this:

On mitigator of time, is that we are in open source and can update tracers quite quickly, especially if only tossing high bits. Also, if we start soon, the time to converge also starts. The later we start, the later we get tolerant reading libraries out there, and the longer time convergence takes.

For example, the good thing about B3 being historically under-specified is that many people have had to change their code in the last year. Many of these people are still active in zipkin and are able to upgrade their apps. Also, client-originated traces is a novelty for many, so the deployment challenge is patching servers on the most part. The same folks that updated their servers to fix a B3 goof earlier this year can do a change to toss or support longer trace ids.

yurishkuro commented 8 years ago

Thanks. I had a different approach in mind with sending an alternative header in parallel with the old one during the transition, but I like your approach better. They both have a similar time horizon, can't flip a switch until the first wave of upgrades reaches critical mass, and once the switch is flipped the non-upgraded holdout services won't be able to parse the header and will be starting new traces.

codefromthecrypt commented 8 years ago

added #6 for tracking library updates to 128bit