Position zero in tracestate identifies the traceparent

codefromthecrypt commented 6 years ago

I believe the left-most position of the tracestate should identify the calling trace (the one that wrote traceparent). There were arguments against doing this at one workshop, but it makes a lot of sense if we just make this a requirement. That's why I've done this in https://github.com/w3c/distributed-tracing/pull/72

In doing so, we remove the ability to split tracestate across multiple header values (as perhaps they might not be put back in order?) I think this is ok. For example, we don't define behavior of multiple traceparent entries, so even special casing multiple tracestate entries is a bit of a strange optimization especially with a length constraint of 512 bytes.

yurishkuro commented 6 years ago

Can we list the arguments for and against that?

codefromthecrypt commented 6 years ago

Note to whoever wants to respond to this (even if future self) this was discussed at the last tracing workshop so notes might be there and/or video. Possibly not.

christian-schwarzbauer commented 6 years ago

I will start with arguments for this:

when trying to correlate the spans, you can instantly decide if the parent span was created by your tracing system or another system
- if yes -> you should be able to look it up in your system
- if not -> in this case you can skip searching in your system or you can ask the other tracing system, if possible
when receiving such a tracestate, you can also immediately tell if the parent was created by your system, in case you need different handling of something in your tracing code

IMHO this behavior should make it a lot easier to correlate spans of traces that have been created by multiple tracing systems.

yurishkuro commented 6 years ago

@christian-schwarzbauer couldn't the same be achieved without position-based approach? E.g.

receiver looks for its own vendor key in tracestate and parses span id
it then parses traceparent and compares span id
only if span IDs match it can infer that the previous hop was traced with the same vendor

SergeyKanzhelev commented 6 years ago

few disadvantages of positioning are:

not every trace library wants to know where it sends telemetry to. Sometimes it will be better to semantically recognize certain tracestate keys and read/update them if it knows what to do with them
traceparent was created to optimize identities that standard "recommend" for distributed tracing correlation. So basic interoperability should be on that level without the need to always parse tracestate
unnatural requirement for http. Http treats headers with comma-separated values as concatenate-able

codefromthecrypt commented 6 years ago

Sergey your first point is less about positioning and more about opacity right? Isnt arbitrary keys in trace state something we discussed as not viable in the past (due to others clobbering eachothers values etc). Moreover it makes this like correlation context again which was distinctly what we decided not to do. If you want to reopen the issue of opacity again that is ok but it is a different issue than ordering.

The whole point of order at least first position is that you can label yourself. That label does not exist in traceparent so to the degree we support labeling has to exist here and you need some mechanism to know which is the incoming.

On the point of not knowing who you are.. a sort of unknown label is imho better than not knowing which entry was your direct upstream (ex was this you?).

Otherwise we get into complexity of another key to say which tracestate is first ;) seems unnecessarily defensive especially as no one is xoding yet to even know if anything works

Do you feel contrary to any of this?

On 15 Mar 2018 05:49, "Sergey Kanzhelev" notifications@github.com wrote:

few disadvantages of positioning are:

not every trace library wants to know where it sends telemetry to. Sometimes it will be better to semantically recognize certain tracestate keys and read/update them if it knows what to do with them

traceparent was created to optimize identities that standard "recommend" for distributed tracing correlation. So basic interoperability should be on that level without the need to always parse tracestate

unnatural requirement for http. Http treats headers with comma-separated values as concatenate-able

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/w3c/distributed-tracing/issues/80#issuecomment-373186364, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD615sV88Nd8U1-js2S5IzYD7A4eT7fks5teZBQgaJpZM4Sl85x .

nicmunroe commented 6 years ago

Is this specifically about ordering of name/value pairs, or is this also about the name-only optimization that has been discussed (where there's no =[opaqueValue] following the name, which indicates that the parent is that vendor's golden copy)?

If this is just about ordering of name/value pairs, then I'm ok with either way although I have some minor concerns about HTTP libraries not honoring the order you specify (no idea if this is an actual problem in the real world - just a concern that it might be). Assuming that's not an issue we need to worry about then I can see the benefit of knowing who the incoming caller is.

If this is about the name-only optimization then I have some other thoughts, but won't spam them here until I know they're relevant to this issue.

codefromthecrypt commented 6 years ago

One thing we should really call out is the problem we are solving and its context matters. For example we arent talking about header concatenation, multiple headers etc. We are talking about one header currently with a 512 char limit, and the data format inside that value.

It is indeed specified with commas. I have worked with dozens of http middleware and clients and yet to see any that routinely reorganize data inside a header value. I am pretty sure we can find a spec that doesnt suggest this is the case either. But anyway if there is tension about the comma delimiter i am more inclined to use semicolon :)

The key values here are like wingtips=yourtracestate;xray=theirs

Just like the doc example.. there is no header concat defined except correlation context which is separate from this.

codefromthecrypt commented 6 years ago

Ps by context matters i mean to say that at the point we go to a length that implies header concat.. yeah we have to deal with that as it invalidates this.

codefromthecrypt commented 6 years ago

ex I don't know this to be the case, but for example amazon ARN alone can be pretty big, probably longer than 512bytes as it can contain user-provided data. It uses colon delimiter, maybe because it looks nicer maybe because there is some justifiable fear about commas with big headers? If there is a technical concern with commas, let's smoke it out and consider alternatives. OTOH, I feel like we can debunk this possibly lower priority to starting to code something meanwhile.

x-amz-sns-subscription-arn: arn:aws:sns:us-west-2:123456789012:MyTopic:c9135db0-26c4-47ec-8998-413945fb5a96

SergeyKanzhelev commented 6 years ago

Adrian, I'm not talking about opacity. It is more like multiple inheritance problem. Like if there is an Azure service - it may want to participate in multiple ways.

First, it should be a good citizen and make all the necessary updates to trace-parent. Ideally being a generic trace system.

Second, service might want to tell it's identity to the downstream. So downstream will know where to take the span information from.

Third, if it knows about other microsoft tracers upstream that injected hierarchical identity - it will increment it. However, this will ONLY happen if hierarchical identity was started by somebody else. So Default behavior will not inflate the tracestate unnecessary.

Now this service doesn't actually know what's its real identity. Is it generic trace system? Is it Azure? Is it Microsoft hierarchical? So it will not be clear which part to make a leftmost.

nicmunroe commented 6 years ago

The RFC 2616 HTTP spec section 4.2 specifically answers my question on ordering, and I think makes it clear that if we want to support tracestate as a standard multiple-same-named-header then we need to use commas.

The relevant text in that section regarding commas (emphasis mine):

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma.

The relevant text in that section regarding ordering:

The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded.

So if we want to be able to support multiple-same-named headers for tracestate (i.e. the multiple headers tracestate: wingtips=foo and tracestate: xray=bar are equivalent to tracestate: wingtips=foo,xray=bar) then the delimiter must be a comma. We could define tracestate such that it is not intended to be used as a multiple-same-named-header and then we'd be free to use whatever delimiter we want, but I think that's going to be more confusing. I'd prefer defining it using comma as the delimiter so that multiple-same-named-header is implicitly and explicitly supported.

With all that context in mind, I think I'm on board with defining the first position as identifying the traceparent. I think there's value in being able to identify the source of traceparent, and this solution seems relatively straightforward. And with the HTTP spec saying that ordering of multiple headers matters and must be preserved my concern about arbitrary/accidental reordering is significantly lessened.

I don't have a strong opinion though and would probably be ok with another solution e.g. @yurishkuro's solution seemed like it would work to allow you to know whether traceparent was yours, although it would be a bit more effort (and if it wasn't yours you wouldn't necessarily be able to determine who sent it).

yurishkuro commented 6 years ago

@SergeyKanzhelev I think we reached a general agreement that

trace-state = Set(pair(vendor-name, vendor-trace-state))

where vendor-trace-state is a completely opaque blob for any other vendor.

This ticket is about it being a List, rather than a Set, but below holds for both. So when you say "service might want to tell it's identity to the downstream", it's either

a specific functionality that exists in MS implementation, in which case it's going to be a part of that opaque blob and is not worth discussing here, or
it's something you're proposing as part of the common trace context interface, and I don't think we have any agreement on that aspect (e.g. Dapper style tracers don't do this kind of caller name propagation).

Similar point about hierarchical - I think you're trying to add more semantics to the trace-state than we agreed to. I suggest a different ticket for that.

@nicmunroe good find about same-name headers order being guaranteed. It still doesn't preclude some wonky RPC framework implementation that might expose headers as Map<String, Set<String>> instead of Map<String, List<String>>, but I would be inclined to discount that possibility and always expect a List, which unblocks Adrian's position-0 semantics proposal.

The semi-colon alternative would probably work too, I assume we can stuff up to 4k into a single header (that's the limit on cookie size - https://tools.ietf.org/html/rfc6265#section-6.1), so unless the trace has lots of hops through too many different vendors there's enough room for vendor states. And we could define the behavior that if there isn't enough room the lowest hop can start dropping entries at the tail of the list. My only concern with semi-colon is that it's typically used as a separator between main portion and options for a given segment, e.g. <cookie-name>=<cookie-value>; Domain=<domain-value>. If we use it as segment separator we limit extensibility.

nicmunroe commented 6 years ago

I share the concern about semi-colon as I've usually seen it used to specify properties/options for a given header, not as a delimiter, so using it as a delimiter could lead to confusion or incorrect assumptions by people who don't carefully read the spec.

Plus anything other than comma means we can't support tracestate being a multi-value header (or if we can then it's going to be more complex). Which means we have to explicitly call that out in the spec, hope people don't use it as a multi-value header anyway, and define what to do when it inevitably happens.

So I'm leaning strongly toward comma as delimiter, and also leaning somewhat towards the core of this issue that position zero identifies the parent. Seems like a reasonable solution and optimization for those who read the spec, and for those who don't they can still determine which one is theirs by explicitly comparing their state against the parent.

SergeyKanzhelev commented 6 years ago

Similar point about hierarchical - I think you're trying to add more semantics to the trace-state than we agreed to. I suggest a different ticket for that.

No, I'm not trying to argue with what we agreed on. We discussed that any tracing system may know the structure of other tracing systems's tracestate. So it may understand and mutate it if necessary. Single system may also be controlled by multiple tracing systems. Platform exposes the mechanism to access and modify tracestate and subscribers - tracing vendors - mutate each own's state. In example of Azure service - it is kind of controlled by multiple parties. As it participates in generating telemetry for multiple scenarios/graphs.

yurishkuro commented 6 years ago

We discussed that any tracing system may know the structure of other tracing systems's tracestate. So it may understand and mutate it if necessary.

Hm, ok. Hopefully that doesn't actually require a change in the tracestate header format.

It would be good if you could provide a real world example of the header in this scenario, to make this less abstract.

SergeyKanzhelev commented 6 years ago

Real world example (not a commitment that everything will be implemented exact this way. We are still hashing out details internally). Take Azure API Management Service. It is basically load balancer on steroids where by setting up policies you can configure secure, standardized access to your APIs. It can combine multiple APIs into single API, translate between formats, etc.

This service may operate in different modes. With the new headers the modes would be:

proxy headers if telemetry collection not enabled
mutate traceparent as generic tracing provider when telemetry collection enabled
mutate traceparent as generic tracing provider & mutate microsoft-specific hierarchical identifier (if any upstream services already has this hierarchical id set in tracestate)

Note, in last case service doesn't want to do hard switch completely to "hierarchical". It is not practical for downstream consumer to see it's behavior flipping depending on caller.

Also telemetry from this service may be consumed by external vendor AND be shown in some Azure experience. First will see it as a generic tracing provider. Second may have extra features available via hierarchical identifiers.

codefromthecrypt commented 6 years ago

Thanks for these details, Nic. We should copy them to another issue about "continuations" or something as currently with 512char limit I dont think we need to split. However, it may happen soon and your research can be reused whenever that occurs and definitely it colors the comma. Thanks!

On 16 Mar 2018 01:44, "Nic Munroe" notifications@github.com wrote:

The RFC 2616 HTTP spec section 4.2 https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 specifically answers my question on ordering, and I think makes it clear that if we want to support tracestate as a standard multiple-same-named-header then we need to use commas.

The relevant text in that section regarding commas (emphasis mine):

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma.

The relevant text in that section regarding ordering:

The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded.

So if we want to be able to support multiple-same-named headers for tracestate (i.e. the multiple headers tracestate: wingtips=foo and tracestate: xray=bar are equivalent to tracestate: wingtips=foo,xray=bar) then the delimiter must be a comma. We could define tracestate such that it is not intended to be used as a multiple-same-named-header and then we'd be free to use whatever delimiter we want, but I think that's going to be more confusing. I'd prefer defining it using comma as the delimiter so that multiple-same-named-header is implicitly and explicitly supported.

With all that context in mind, I think I'm on board with defining the first position as identifying the traceparent. I think there's value in being able to identify the source of traceparent, and this solution seems relatively straightforward. And with the HTTP spec saying that ordering of multiple headers matters and must be preserved my concern about arbitrary/accidental reordering is significantly lessened.

I don't have a strong opinion though and would probably be ok with another solution e.g. @yurishkuro https://github.com/yurishkuro's solution seemed like it would work to allow you to know whether traceparent was yours, although it would be a bit more effort (and if it wasn't yours you wouldn't necessarily be able to determine who sent it).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/w3c/distributed-tracing/issues/80#issuecomment-373464056, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD61zFqNhRK-b7pJCFk5rf6r66yODONks5teqiHgaJpZM4Sl85x .

codefromthecrypt commented 6 years ago

Sergey, thanks for the concrete example. If I understand correctly, this is suggesting there are multiple state entries representing the same system and same position in it. For example, to say position 0 might not be good in such a complex protocol. Would it make it different though if one of your state entries were? For example just moving one of the 2 or more you are updating so that downstream, say wingtips, knows they certainly are not the direct upstream?

Correct me if I misunderstood the multi-state scenario seems like you are describing.. if i did dummy header values could be a nice way to unpack

On 16 Mar 2018 07:25, "Sergey Kanzhelev" notifications@github.com wrote:

Real world example (not a commitment that everything will be implemented exact this way. We are still hashing out details internally). Take Azure API Management Service. It is basically load balancer on steroids where by setting up policies you can configure secure, standardized access to your APIs. It can combine multiple APIs into single API, translate between formats, etc.

This service may operate in different modes. With the new headers the modes would be:

proxy headers if telemetry collection not enabled

mutate traceparent as generic tracing provider when telemetry collection enabled

mutate traceparent as generic tracing provider & mutate microsoft-specific hierarchical identifier (if any upstream services already has this hierarchical id set in tracestate)

Note, in last case service doesn't want to do hard switch completely to "hierarchical". It is not practical for downstream consumer to see it's behavior flipping depending on caller.

Also telemetry from this service may be consumed by external vendor AND be shown in some Azure experience. First will see it as a generic tracing provider. Second may have extra features available via hierarchical identifiers.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/w3c/distributed-tracing/issues/80#issuecomment-373554389, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD616LzV-OtkM27Tok7KDWK1Y5IgGQfks5teviAgaJpZM4Sl85x .

codefromthecrypt commented 6 years ago

I concur with Yuri's most recent summary btw

SergeyKanzhelev commented 6 years ago

Last statement that it's a set of opaque named items - no concerns here. My comment is mostly about the ordering and positioning. I don't think we can make platform decide which tracing vendor it is. For me ideal world is that everybody is just a generic tracing system with the standard mutations. And tracestate has hints for specific vendors. Like cluster-id example from DynaTrace - where to store this span to. And note which span was last seen by "definitely DynaTrace agent".

Not picking on Dynatrace - vendor A may trace 1st and 3rd components. And import data from 2nd. So it will benefit everybody if 2nd component will behave as generic tracer with standard mutations, not hiding it's mutations in 2nd-specific tracestate.

codefromthecrypt commented 6 years ago

Last statement that it's a set of opaque named items - no concerns here. My comment is mostly about the ordering and positioning. I don't think we can make platform decide which tracing vendor it is. For me ideal world is that everybody is just a generic tracing system with the standard mutations. And tracestate has hints for specific vendors. Like cluster-id example from DynaTrace - where to store this span to. And note which span was last seen by "definitely DynaTrace agent".

Not picking on Dynatrace - vendor A may trace 1st and 3rd components. And import data from 2nd. So it will benefit everybody if 2nd component will behave as generic tracer with standard mutations, not hiding it's mutations in 2nd-specific tracestate.

Maybe the term "vendor" is causing some grief. Let's think of this as what you affected? When writing the propagation state, when you change a value of a tracestate entry you must move it left. We are talking about data policy more than vendors. Obviously you should not modify data you do not own, so if you do so, you move that to position 0, and any other you modify.

We are yet to be in a position of violating opacity, for example, uncanning someone else's data and modifying it without reporting it into their system. If you modify dynatrace's cluster id value without their IDs, I'd suspect you will create a misleading trace state datum.

codefromthecrypt commented 6 years ago

PS if you want to continue the 1-many part, where traceparent is a generic reference to many simultaneous mutations to tracestate, again this is hugely complicated and should be discussed somewhere beside position zero being direct upstream.. do you agree?

codefromthecrypt commented 6 years ago

by the way.. If we decide to accept propagator coordinating several tracers with a generic upstream (possible!) then I agree this also will violate this design aspect (as well others!). My question, similar to the header continuation, is do we accept that today. If so, this should be a primary function documented etc as people now need to support this. In positional nature, it might imply a different technique, either a grouping construct in trace state (which solves multiple entries at expense of brackets or something), or another header or another field that says which of many keys are direct upstream. Point is that even if I think this is ok, I think it can be solved incrementally, as we still need to flesh out that feature against many things unrelated to position 0, but inclusive of it. Make sense?

yurishkuro commented 6 years ago

Maybe the term "vendor" is causing some grief.

some grief, certainly :-) In particular, even if Company X and Y both running Zipkin (same "vendor") it doesn't mean that they want the behavior of Zipkin tracer in the edge service to be the same as in the internal services, i.e. they have the same reasons to mistrust external trace ID as X-Ray/Stackdriver do. I may be way off topic here.

I think we're running into a typical issue of discussing implementation details before agreeing on the requirements, as in the behavior we expect this standard to impose on the participants. For example, do we really need the notion of generic provider? Do we expect some nodes behave like chameleons as in Sergey's example (supporting different vendor tracestate formats)?

codefromthecrypt commented 6 years ago

@yurishkuro the label was supposed to represent the tracing system, not the brand. We did yank "generic provider", however we minimally need a means to implicate a state entry with a format (which is why generic provider was created in feb). I agree that mappings between things are currently not in scope, however the format characters inside the label don't preclude this.

codefromthecrypt commented 6 years ago

Ps yuri mentioned in gitter an alternative for knowing which tracestate is the parent.. essentially a special tracestate field.

Ex parent=congo

This would work for randomly sorted fields, and complicate things only a little. It starts feeling like amazon's existing format in this regard as they label a field for parent. Though in their case the standard format prevents them from having double entry concern (ex having multiple labels to identify and define the parent). Position zero eliminates this extra part and has nice side effects like stack.

Anyway I will make some code for existing way and that way if people decide to make it not position zero it will break tests and you can see code impact https://github.com/openzipkin/brave/issues/657

yurishkuro commented 6 years ago

Reposting from https://github.com/w3c/distributed-tracing/issues/81#issuecomment-381230750, there are two problems with the dictionary:

how does the current node know that the previous node used the same or different tracer
if the tracestate header grows too long and the current node must drop some of the other vendor entries, how does it choose which ones to drop?

Two solutions to the first issue have been proposed above:

parse your own key and compare with traceparent to ensure you were the previous hop, or
reserve a special key parent={vendor} that indicates the previous hop.

There is no clean solution to the second issue, only dropping keys randomly.

SergeyKanzhelev commented 6 years ago

@yurishkuro I think ordering may be a good thing. Specifically for dropping of long headers.

What is the main reason to know your immediate parent's tracer? If the first one doesn't match most implementation will search for the next one that match it. And than extract traceparent in one form or another from it. This way you have more chances to correlate if there is a service mesh was set in-between one day.

Also we may need to support:

property-like tracestate keys. Basically something that helps carries extra information to locate the trace in the telemetry database. Like cluster-id or tenant-id.
scenario I described in https://github.com/w3c/distributed-tracing/issues/54#issuecomment-381875812 where a single component will need to place two different tracestate keys. It is unclear which one should be first there.

That's said I tend to agree that recomendation should be to put your key in the beginning of the tracestate. And tracestate should be ordered.

So if we will put the language of ordering (MUST preserve order of tracestate keys) and recommend to put your key first (SHOULD move modified and added keys to the first position) - will it satisfy scenarios you have in mind?

yurishkuro commented 6 years ago

Yes, having an ordered set makes more sense semantically, ie it has some tangible benefits.

AloisReitbauer commented 6 years ago

As stated in another issue, I don't think we need ordering. The only reason so far is to understand what the parent tracer is. We have the trace-parent header and we should use it for this. This will also make dropping the trace-state header easier. If it gets/ would get too long it is dropped entirely and only trace-parent is set.

discostu105 commented 6 years ago

From a practical point of view, ordering tracestate as suggested makes sense:

The tracestate string needs to be parsed (split) and then reassembled. At parsing, I need to find and interpret my tracing systems part anyway. The other tracing system parts need to be kept in memory. I end up with: dynatrace_part, part_before_dynatrace, part_after_dynatrace.
When reassembling tracestate for an outgoing call, I need to create a new tracestate for our own system. I can then simply append the other tracing system parts:
```
non_dynatrace_part = part_before_dynatrace + part_after_dynatrace;
tracestate = createNewTraceState(...) + non_dynatrace_part ;
```

Of course this is vastly oversimplified. What I am trying to say: Putting the last tracing system in front is more efficient to implement. Of course this wouldn't be hard to code in Java (String.split(","), String.join(), ...), but I did it in C++ and tried to avoid allocations as much as possible. In that implementation it was just very natural to re-assemble the string that way.

SergeyKanzhelev commented 6 years ago

Closed via #164

w3c / trace-context

Position zero in tracestate identifies the traceparent #80