Would a common http response id header be helpful?

codefromthecrypt commented 7 years ago

@nicmunroe and @jcarres-mdsol mentioned something that isn't unusual. Should we log the trace id of a zipkin trace as an http response header? This originally came up in b3, but it is really cross-cutting.

For example, it is the same concern with GRPC, Jaeger and HTrace (who use a different propagation than B3, but still report to zipkin). It will also be the same concern with GRPC tracing.

Historically, some have copied the B3-TraceId header to the response. This has an advantage of parity when using B3, but also the disadvantage of being scoped to B3.

Not everyone who does http propagation use B3, yet everyone who use zipkin have compatible trace identifiers.
B3 also is also more a tracing internal than a user spec. If we expose this header, we add lock-in to a very specific propagation style, which is likely to be obviated at some point especially when we do single-sided spans.
Response is scope creep for B3. For example, it is an outbound propagation style, so it doesn't actually have scope for the response, even if manually implemented seems convenient.

Another option is to define a response header for the zipkin trace id, and use it without regards to propagation. Ex "x-zipkin-traceid"

this would be in zipkin-specific documentation for the tracer, which is the default if a zipkin tracer.
this doesn't couple to a specific propagation system.
if we end up defining log correlation, we could build on this.
this might imply copying the same value twice where a different trace identifier is in use. ex the htrace response header and also the zipkin one.

Another option is status quo aka punt.. add documentation for those who encode an application-specific trace identifier. Ex "x-wingtips-trace" or something.

this could be documented where other things like log context correlation are
this doesn't couple to a specific propagation system
this has a disadvantage of being different. Ex zipkin users might need to learn 3 or more different response headers.

cc @shalako @marcingrzejszczak @openzipkin/cross-implementation-team-zipkin-pinpoint-htrace-etc

codefromthecrypt commented 7 years ago

cc @openzipkin/core @prat0318 @yurishkuro

codefromthecrypt commented 7 years ago

Forgot one point about response headers. If we added one, we should be careful to document that it might be absent because..

there was no trace provisioned
a security policy stripped it
instrumentation doesn't support it

shakuzen commented 7 years ago

Tangentially related: perhaps we should avoid prefixing headers with X- per RFC6648. Particularly with headers newly introduced.

jcarres-mdsol commented 7 years ago

We are currently exposing B3 trace id header in the responses of the UI. Something less specific would be an improvement. As I do not think anyone will run several tracing systems at the same time, what about taking ownership of Trace-Id or similar name?

nicmunroe commented 7 years ago

The word "trace" is overloaded enough I could see Trace-Id maybe conflicting with other use cases that are not related to distributed tracing. If the non-zipkin-specific header name route is chosen, what about Distributed-Trace-Id or DTrace-Id to try and reduce collision space?

codefromthecrypt commented 7 years ago

I'm in favor of a zipkin-trace-id or similar (esp since http/2 downcases). Reason is that the shape of the ID matters, and I'd want people to confidently paste this into the UI or use in an API call. If we loosened to any given system, it might lead to mistakes like UUID or numeric formatting.

Thoughts?

basvanbeek commented 7 years ago

I'm in favor of not doing this at all. If we need to correlate requests with tracing, metrics and other recording facilities we should create the correlation identifier at the source where the request starts, not have it start somewhere downstream and report back to the caller, with the already mentioned issues of it potentially never making its way back to the final response.

So instead of sending an implementation detail from Zipkin back to the caller for correlation purposes we could establish a request header per transport (HTTP header, gRPC metadata key-val, etc.) to automatically binary annotate as a correlation-id. This currently works if you roll your own but maybe we can provide some guidance on how to do this and add features to the various tracers to come up with some standard work flow without stepping outside of our scope.

What I currently do on my platform is automatically create a correlation id at the front-end whenever an API call is made by the browser. This correlation id is sent over an HTTP header X-Correlation-Id which is then in my service middleware binary annotated and added to the per request contextual logger.

The benefit is that you can now automatically use this correlation-id which is always available in the front-end to be part of issue reports by customers. With this correlation-id being available in all my logs and traces to troubleshoot and not bound to any specific backend choice I make wrt instrumentation.

Since people might have various ideas on what this correlation-id should look like from both the ke value as well as the payload value we should make this configurable if we decide to add code to tracers for this. Of course we could set sensible defaults for the keys used across the various transports.

The one thing to note is that propagation of the correlation-id is a concern of the middlewares people use. Do we really want to take that on as our concern? What about traces not being sampled and potentially not propagating downwards? I see Zipkin tracing as a best effort concern. It should not be responsible for making sure you can correlate a request with log files for instance.

nicmunroe commented 7 years ago

I guess I'm not seeing the distinction between creating a separate correlation ID at the source where the request starts, or using the trace ID (which is also created at the source where the request starts)?

I'm waffling on whether I think it should be recommended that trace ID should be returned (whatever the name) because I don't want to squash the legitimate use case of a separate correlation ID. But I do think that explicitly saying "you should not return the zipkin trace ID" is limiting and potentially creates a bunch of extra unnecessary hassle for some use cases.

Looking specifically through the wingtips lens: by default wingtips puts the trace ID into the logger MDC so all log messages can be tagged with the trace ID, and it sets (not adds) the trace ID to the HTTP response headers. Currently the header key name used is X-B3-TraceId but that's arbitrary - it could be anything - the important point is that a caller hitting a wingtips service can trivially extract the trace ID from the response headers and use that to directly look up the trace in a zipkin UI, or find all log messages related to that trace, or any combination regardless of the span storage and retrieval mechanisms. No need for an extra correlation step.

I'm not saying that's the best way to do things, but it works really well for us, just like the correlation ID works really well for you. So like I said, waffling.

I'm fine with recommending a different response header name than X-B3-TraceId though.

codefromthecrypt commented 7 years ago

here's a quick summary for next person.

propagation doesn't always use the same mechanism and some even concatenate things into the same header (ex TraceContext does)

Best advice to someone choosing to add the trace ID to the response is to use a stable name, like "zipkin-trace-id" which is decoupled from propagation. This causes the least confusion and ensures what's written back is only the trace id!

openzipkin / openzipkin.github.io

Would a common http response id header be helpful? #48