w3c / trace-context

Trace Context
https://w3c.github.io/trace-context/
Other
468 stars 75 forks source link

Clarify the HTTP header name choice for trace #119

Closed serialseb closed 6 years ago

serialseb commented 6 years ago

Hi,

The following text made me rise an eyebrow, so I thought I'd give some feedback, maybe the text could clarify those points for future readers.

While HTTP headers are conventionally delimited by hyphens, the trace context header names are not. Rather, they are lowercase concatenated "traceparent" and "tracestate" respectively. The departure from convention is due to practical concerns of propagation. Trace context is unlike typical http headers, which are point-to-point and do not propagate through other systems like messaging. Different systems have different constraints. For example, some cannot read case insensitively, and others forbid the hyphen character. Even if we could suggest not using the same format for such systems, we know many systems transparently copy http headers into fields. This class of concerns only exist when we choose to support mixed case with hyphens. By choosing not to, we open trace context integration beyond http at the cost of a conventional distraction.

  1. Http headers are case insensitive so I'm unsure why the lowercase mention exists. A specification cannot redefine HTTP, and nothing can or would prevent proxies and intermediaries from re-casing as they see fit.
  2. Most http headers are not point-to-point, there is a limited list alongside the ones in the Connection response header.
  3. The reference to "fields" is confusing here, I'm not sure what it refers to.
  4. If many systems transparently copy http headers into fields, then it'll end up copying a vast majority of headers using the convention, so I'm unsure what scenario we cater for here, especially as most pre-existing and deployed tracing headers use the dash, without triggering interoperability problems.

Many thanks.

codefromthecrypt commented 6 years ago

the recasing is interesting though odd. which product do you know of takes lowercase and makes it uppercase?

I have seen the other way around (http2 for ex downcases) but not the other way (upcasing). If something did, the only possibly sensible thing it could do is uppercase the first character which is still easier to handle than hyphens.

As this spec is only concerned with trace data if other things are recased, it doesn't impact us.

Which tracing system are you working on by the way?

yurishkuro commented 6 years ago

Personally, the no-hyphen rule bothers me too. The current draft is specifically talking about HTTP headers, where hyphens are perfectly fine. Another protocol may have a different specification. The theoretic case of some system that receives HTTP requests and translates them, say, into RabbitMQ messages by blindly copying all HTTP headers to message headers is not going to work anyway, since most common HTTP headers have hyphens, as well as headers used by existing tracing systems.

Even in HTTP world it's not uncommon to have infra that doesn't allow dashes (e.g. CGI, WSGI - they will also uppercase, @adriancole), but people often assume that dash and underscore are interchangable.

codefromthecrypt commented 6 years ago

when we say something doesn't work we should qualify that.

for example tools like camel map hyphens and underscores to avoid problems like these but they do so in ways not exactly the same as other tools. For example this mapping is incompatible with the approach used in opentracing jms.

to say removing hyphens doesn't help with fields required for tracing to work is false. other tools and the heuristics they use for inconsistent mapping can continue even if we fix things for tracing.

Removing the hyphens absolutely obviates this problem of hyphen or underscore mapping provably. nothing has changed fact wise since we decided this either.

serialseb commented 6 years ago

@adriancole thank you for taking the time to respond.

My interest in this is one of implementation, as I work on distributed systems and am a web framework author; and as a member of the wider standards community.

The recasing to uppercase is indeed an odd idea, although there is prior art in cgi gates of old times. There is however a whole set of http frameworks that would parse http headers, turn them into an object model, and re-serialize them on the other end. It is also common for operators configuring reverse-proxy to mistype the casing.

From a standards perspective, defining a header as case sensitive is not possible without redefining the semantics of HTTP itself. I woudl imagine that it would prevent a registration of the header as per BCP90

From an implementation perspective, the header itself may be comma-separated, so it could be folded into one or split in multiple headers, as per the specification. As with most things in HTTP, if it can be done it will be done, in intermediaries that the operator of a tracing system, or the author of the headers, will have no control over.

RFC7231 Section 8.3.1 says on the subject

Whether the field is a single value or whether it can be a list (delimited by commas; see Section 3.2 of [RFC7230]).

If it does not use the list syntax, document how to treat messages where the field occurs multiple times (a sensible default would be to ignore the field, but this might not always be the right choice).

Note that intermediaries and software libraries might combine multiple header field instances into a single one, despite the field's definition not allowing the list syntax. A robust format enables recipients to discover these situations (good example: "Content-Type", as the comma can only appear inside quoted strings; bad example: "Location", as a comma can occur inside a URI).

Within those constraints, I find it difficult to imagine using the http headers as one serialisation stable enough to transfer with no modification across protocols without some level of parsing, and that will mean some level of processing for things that you cannot disallow http to do, by design, like case sensitivity in header field names and the presence of multiple headers for lists.

At this point, I'm not sure that the lack of hyphen looks as attractive anymore, but it's entirely likely that i do not understand the mappings between tracing systems that you refer to, and I expect other readers and implementers of specifications may stumble upon the same difficulty, hence my suggestion to extend the description of the incompatibilities between toolsets leading to the choice of the hyphen.

Either way, if the goal is to have one simple mapping from a single header to whatever other system of choice exists, then lists with commas seem to prevent this from happening, as for example most json-based messaging systems will not allow the presence of duplicate keys in objects. It would probably be wiser to have a space-separated list instead, and a processing rule (as per above) when multiple are present.

If the goal, on the other hand, is to define semantics that integrate well with HTTP, it may be necessary to provide the steps to generate an idiomatic single key/value pair representation, separate from the definition of the tracing elements themselves, along the lines of what was used rather successfully in RFC8288.

On a side-note, my original research lead me from the pull request to the minutes that alluded to a decision made at a workshop, for which I couldn't find any minutes. This would explain why, as part of the wider spec community, the facts leading to that decision are unknown enough that I opened this issue.

codefromthecrypt commented 6 years ago

you seem to be raising many sorts of issues in the same issue comment. let me try to figure out what you want:

  1. you want us to intentionally add hyphens even though folks with practice know this to cause issues in distributed tracing which by definition is more than http
  2. you want us to intentionally not suggest lowercase for header name encoding
  3. you want us to accept concatenated headers on things that shouldn't be. ex right now trace headers in the wild would fail of people concatenated them.

all of these things seem thoroughly documented by your research but also destroy practical value.

I wouldn't want such a spec as it is worse than status quo. others can feel free to chime in.

asbjornu commented 6 years ago
  1. you want us to intentionally add hyphens even though folks with practice know this to cause issues in distributed tracing which by definition is more than http

While I understand why you would want some definition power of the header's value, I don't understand the need to control its name. As @serialseb writes, proxies, caches, middleware and whatnot is going to rewrite these names regardless of what you write in this specification. Most proxy providers probably won't even know about its existence for many years to come. You can't redefine HTTP; it is well defined in RFC 7230.

Also, just the idea of taking one protocol's header name and plunging it into another without any modification sounds very weird to me. The value, I can understand, but not the name. The name will have to obey the syntax of the protocol it is being used in. In binary protocol such as MQTT, the header won't even have a name – it will just be represented by a number.

  1. you want us to intentionally not suggest lowercase for header name encoding

If you want to make suggestions, they should at least conform with the protocol which they apply to. If you want to ensure compatibility across protocols, provide guidance for each protocol you envision that this standard is going to be used in. For most open and standard protocols, I assume you're going to have to formally register the header, in which case you will need to adhere to the syntax of each individual protocol anyway.

  1. you want us to accept concatenated headers on things that shouldn't be. ex right now trace headers in the wild would fail of people concatenated them.

This is how HTTP, SMTP, POP3, IMAP and all other protocols using MIME headers work. Headers can be split, concatenated and juggled around for every hop a message takes from the sender to the receiver. The standard allows for a great deal of flexibility and you can't break this 45 years of legacy with your specification.

serialseb commented 6 years ago

@adriancole I don't want you to do anything. You're writing an open specification that, as currently written, would suffer from significant compatibility issues with HTTP. I suggested changes to either confirm better to HTTP, or alternatively, to continue in the current direction but without the compatibility problems.

Either way, as is this specification wouldn't be usable on HTTP as deployed on the web, hence my feedback. If you believe it to be unhelpful, or incorrect, I'd suggest outreaching to some of the w3c resources, as they're much more authoritative than me on the question.

If you wish me not to provide feedback at this stage in the standardisation, please feel free to say so, I'm in no way trying to impose.

codefromthecrypt commented 6 years ago

while you are good at citing things, several facts remain in quite practical terms. you elude to something being unusable which is quite false

codefromthecrypt commented 6 years ago

I am also not interested in arguing with you. Just I hope that someone here can apply things in practical terms like for example what this specification is actually about.

for example, we have trace identifiers in headers today. while it is spec possible for someone to comma delimit an extra trace ID, that is malformed in the sense of the trace ID even if not in the sense of http.

moreover while it is possible to add hyphens in field names this only matters if you do. we can choose not to for any reason especially in context of tracing. a different approach would need to make sense in the sense of what we are actually doing here. notably cross systems.

finally while yes a stupid middleware could take a lowercase header name and make it every other character capital, this is just dumb. In practical terms the least efforts are required if we are consistent across all of our field name instructions.. http being one of them (w3c being an arbitrary choice vs other bodies). In practical terms implementations will want to check for uppercase first character as that is maybe even likely. however this doesn't mean we must recommend fixed name for the header.

I don't want to argue ad there is limited time and spending it in circles isn't best for either of us. for example, I have choice to spend time arguing about nuance without any context to what we are doing or maybe work on something. I plead with you to concentrate on what we are doing before offering change requests, but of course your opinions are fine even without relevance to tracing.

dret commented 6 years ago

just chiming in here: @serialseb and @asbjornu to my understanding have a very valid point here: you may be defining something that attempts to be cross-protocol, and that's fine. but it is hard to imagine that you can do so without carefully examining and respecting the rules of each individual protocol you want to cover. i haven't look at any of your other "bindings", but for HTTP, you definitely have a bit of adjustment to do. and as @asbjornu said, it is hard to imagine that it will scale to rely on blindly copying "headers" and "values" across protocols, as all of them will have different definitions of how to handle both (he alluded to MQTT, and there certainly are others).

codefromthecrypt commented 6 years ago

so in summary, my 2p is to suggest this:

codefromthecrypt commented 6 years ago

by the way we don't need to use imagination. there are definitely products that mostly blindly copy such as apache camel. When they do this, they use their own reformatting rules for hyphens. hyphen is the biggest problem much bigger than case format.

https://github.com/openzipkin/brave/issues/584

folks who actively disbelieve that this exists can, and they can also subvert the design aspects that accommodate this for something that seems more theoretically coherent with http. the side effect besides undoing the discussions and work around this is that it makes the format less attractive as it continues the spaghetti problem. personally I will stop working on the spec if this happens as it would be more fruitful to work on just b3 as we can reuse it more effectively due to reasons discussed.

your call

AloisReitbauer commented 6 years ago

Summarising this issue, we have two items to discuss:

dret commented 6 years ago

absolutely nothing keeps you from not using hyphens in your header field name. just name them as you wish.

On Apr 30, 2018, at 10:20, Adrian Cole notifications@github.com wrote:

by the way we don't need to use imagination. there are definitely products that mostly blindly copy such as apache camel. When they do this, they use their own reformatting rules for hyphens. hyphen is the biggest problem much bigger than case format.

https://github.com/openzipkin/brave/issues/584

folks who actively disbelieve that this exists can, and they can also subvert the design aspects that accommodate this for something that seems more theoretically coherent with http. the side effect besides undoing the discussions and work around this is that it makes the format less attractive as it continues the spaghetti problem. personally I will stop working on the spec if this happens as it would be more fruitful to work on just b3 as we can reuse it more effectively due to reasons discussed.

your call — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

serialseb commented 6 years ago

Due to the tone of this conversation, I’m withdrawing from making further comments.

SergeyKanzhelev commented 6 years ago

@serialseb I'm sorry this thread came into this. I appreciate your feedback and opinion.

We want to make this specification work for real life scenarios. In order to achieve interoperability it should be as strict as possible while accounting for the history and prior art. @adriancole and others working on it has a great experience building distributed tracing systems. And saw a lot of misunderstanding, lax and corner-cutting implementations of standards. Which made code a big mess of if statements accounting for such implementations. You might have seen it firsthand as well. That's said -- the spec should be clear on not dismissing prior art, rather suggest trace systems how to operate in "Http world".

One additional feedback to what @AloisReitbauer listed I'm taking is that rationale and headers overview section should be re-worked to not make reasoning of chosen non-traditional names clearer.

dret commented 6 years ago

On 2018-04-30 18:25, Sergey Kanzhelev wrote:

One additional feedback to what @AloisReitbauer https://github.com/AloisReitbauer listed I'm taking is that rationale and headers overview section should be re-worked to not make reasoning of chosen non-traditional names clearer.

it really is very simple:

SergeyKanzhelev commented 6 years ago

@dret sure! The rationale docs and introductions is a way to be transparent of the thinking process AND more importantly -- the way to highlight non-traditional for http choice. This text makes it visible and explicit for implementors.

nicmunroe commented 6 years ago

You can't redefine HTTP; it is well defined in RFC 7230.

If you want to make suggestions, they should at least conform with the protocol which they apply to.

You're writing an open specification that, as currently written, would suffer from significant compatibility issues with HTTP.

Either way, as is this specification wouldn't be usable on HTTP as deployed on the web, hence my feedback.

you can't break this 45 years of legacy with your specification.

We should probably re-word some stuff to make clear what is a suggestion (i.e. lowercasing - which we can't enforce since HTTP treats header names as case-insensitive, but we can encourage) vs. required (i.e. no hyphens in the name - which we are certainly allowed to do). These are oversights or artifacts of early drafts - not an intentional attempt to subvert HTTP. C'mon - nobody is attempting to redefine or break HTTP with this spec...

With that out of the way: it seems to me that since the main purpose of this tracing spec is to maximize interoperability, we should focus on making decisions that promote success by default, even in the face of middleware that does strange things or tooling that is implemented without carefully read the spec. That's what the no-hyphen-header-names and lowercase-suggestion stuff is about - successful integration by default, in the wild, based on hard-earned prior experience:

However much we feel that middleware shouldn't do certain things we have empirical proof that it does happen, and will continue to happen if we make the same mistakes from the past. We might gain some moral high ground by saying "sorry your stuff is broken but we decided to go with hyphens and not encourage lowercase header names because it was more HTTP conventional - you need to fix your libraries". And that middleware (e.g. apache camel) would likely respond "cool story, but we're established and popular and not going to break our existing customers for your tracing spec". The end result is that actual users have broken tracing and nobody is happy. Hence the no-hyphen names and lowercase suggestions: success by default.

Headers can be split, concatenated and juggled around for every hop a message takes from the sender to the receiver. The standard allows for a great deal of flexibility and you can't break this 45 years of legacy with your specification.

As @serialseb mentioned, in RFC 7231 Section 8.3.1, we are certainly allowed to define Whether the field is a single value or whether it can be a list (delimited by commas; see Section 3.2 of [RFC7230]). Again, nobody here is trying to break HTTP. As @adriancole said, traceparent is clearly a single value header. tracestate may or may not be - that's being actively discussed (see #107, #81, and probably others). In any case we can and should define what good citizen implementations should do when they receive a request that contains multiple traceparent and/or tracestate headers, as I agree it is true that we can't control what intermediaries do and there's a good chance broken impls will accidentally do a headers.add("traceparent", traceParentValue) rather than headers.set(...) someday, and it would be good to give receivers of that broken tracing header guidance on how to handle it gracefully.

SergeyKanzhelev commented 6 years ago

Added some clarification for the name choice in spec. Closing issue

alexweej commented 3 years ago

I know I'm very, very late to this party but I just... wat.

ioquatix commented 3 years ago

I've implemented HTTP/1 & HTTP/2 clients and servers and spent a lot of time reading through the relevant RFCs/specs. Since I wrote these clients and servers, I wanted to add tracing, so I was really excited to see this spec. Well done on standardising it.

But then I saw the header names and the related comments regarding hyphens and wanted to understand the rational behind such a decision.

I'm curious what actually would have broken using more standard names like "Trace-Parent" and "Trace-State" which just semantically seem much more consistent with the entire HTTP model. When I read the spec and noticed the comment regarding hyphens, I wanted to know what was the justification.

Reading this thread, I just don't understand why anyone could possibly map between HTTP headers and some non-HTTP key-value metadata without some kind of transformation. So the whole idea of "interoperability" just feels like a stretch without understanding the exact "hard won experience" that would lead to such a trade-off.

I guess I understand the logic, but I'm also a bit disappointed as the headers feel out of place and a bit ugly/inconsistent with the rest of HTTP (which it seems this spec is mostly in relation to). I know backwards compatibility is important but I believe specs need to be forward looking. Maybe I'm too idealistic but I always believe there will be a lot more code written in the future than has been written in the past, so I personally care more about the future and less about the past.

Anyway, I'm guessing that ship has sailed?

dyladan commented 3 years ago

I've implemented HTTP/1 & HTTP/2 clients and servers and spent a lot of time reading through the relevant RFCs/specs. Since I wrote these clients and servers, I wanted to add tracing, so I was really excited to see this spec. Well done on standardising it.

But then I saw the header names and the related comments regarding hyphens and wanted to understand the rational behind such a decision.

I'm curious what actually would have broken using more standard names like "Trace-Parent" and "Trace-State" which just semantically seem much more consistent with the entire HTTP model. When I read the spec and noticed the comment regarding hyphens, I wanted to know what was the justification.

Reading this thread, I just don't understand why anyone could possibly map between HTTP headers and some non-HTTP key-value metadata without some kind of transformation. So the whole idea of "interoperability" just feels like a stretch without understanding the exact "hard won experience" that would lead to such a trade-off.

I guess I understand the logic, but I'm also a bit disappointed as the headers feel out of place and a bit ugly/inconsistent with the rest of HTTP (which it seems this spec is mostly in relation to). I know backwards compatibility is important but I believe specs need to be forward looking. Maybe I'm too idealistic but I always believe there will be a lot more code written in the future than has been written in the past, so I personally care more about the future and less about the past.

Anyway, I'm guessing that ship has sailed?

Most of these decisions were made before my time on the working group, but I suspect the ship has indeed sailed. At least for traceparent and tracestate, they are official w3c recommendations and I see no workable path forward to change them. For headers which are not yet released, it would be possible to change still at this point if there is a concrete benefit. We discussed using more typical header casing for baggage (options were baggage, Baggage, correlationcontext, and Correlation-Context). In the end, we settled on the all lowercase baggage mostly because it was consistent with headers already recommended by the working group.

ioquatix commented 3 years ago

I would have gone for:

Trace-Parent
Trace-State
Trace-Context

Case is irrelevant (even more so with HTTP/2). Trace- should have been treated like a namespace.