lightstep / lightstep-tracer-javascript

Lightstep distributed tracing library for Node.js and the browser
https://lightstep.com
MIT License
77 stars 66 forks source link

Only half of the traceid is used when using the LightstepPropagator. #258

Closed kirbysayshi closed 4 years ago

kirbysayshi commented 4 years ago

Not sure if anyone is still using this library, but I believe I found a bug (unsure of the correct behavior).

If a traceId is 32 characters, then the SpanContext will split it into upper/lower: https://github.com/lightstep/lightstep-tracer-javascript/blob/67019d6773f26dc97495e418818ffe8b08e702b2/src/imp/span_context_imp.js#L51-L54

But when the X-Cloud-Trace-Context header is created for propagation, only the lower section is used since only the private _traceGUID is accessed: https://github.com/lightstep/lightstep-tracer-javascript/blob/67019d6773f26dc97495e418818ffe8b08e702b2/src/imp/propagator_ls.js#L25

A fix would be to instead use the traceGUID function: https://github.com/lightstep/lightstep-tracer-javascript/blob/67019d6773f26dc97495e418818ffe8b08e702b2/src/imp/span_context_imp.js#L31-L33

As it is today, this results in headers like:

X-Cloud-Trace-Context 5d22122a52f8d5400000000000000000/7676538251468309227;o=1

The zeros are a result of only using the bottom "bits" of the split traceId.

Is there a specific reason why the Lightstep propagator does not use both parts of the trace id? I'm unsure if there is more implicit information there, such as a parent trace, that I don't know about. I did notice that the dd propagator encodes more information in there:

https://github.com/lightstep/lightstep-tracer-javascript/blob/67019d6773f26dc97495e418818ffe8b08e702b2/src/imp/propagator_dd.js#L24-L29

While the B3 propagator also skips some:

https://github.com/lightstep/lightstep-tracer-javascript/blob/67019d6773f26dc97495e418818ffe8b08e702b2/src/imp/propagator_b3.js#L21-L24

OpenCensus' Stackdriver implementation appears to use all the bits:

https://github.com/census-instrumentation/opencensus-node/blob/ef5712fd3b279b0e80494322231232047b06f9e6/packages/opencensus-propagation-stackdriver/src/stackdriver-format.ts#L80-L87

Is this a bug? or is it expected / spec'ed behavior?

kayousterhout commented 4 years ago

This is the expected behavior. The underlying reason for all of this is that Lightstep only accepts 64 bit trace IDs (which maps to 16 hex characters), but context propagation formats typically allow for 128 bit trace IDs (32 hex characters). We truncate those trace IDs (by taking the least-significant 64 bits / 8 bytes / 16 hex chars) before sending them to a satellite. Typically propagators send the full 128-bit trace ID and the truncation happens later, right before we send data to a satellite. But, for the Lightstep context propagation format, we assume the data is eventually going to a Lightstep satellite, so do the truncation earlier, as part of the context propagation.

The B3 spec dictates that it will propagate what it receives, so if it receives just 16 hex chars, it will propagate those 16 (that's the behavior in the code snippet you linked), or otherwise it will propagate the full 128 bit (32 hex chars) trace ID.

I'm not sure whether there is something wrong in the X-Cloud-Trace-Context header you linked - that looks like a GCP-specific header format (that I don't think could have been emitted by the code here? But perhaps was in a connected service?).

Let me know if this all makes sense or if there seem to be remaining bugs here! I'll be out next week but I added @andrewhsu here to help with any further issues.

kirbysayshi commented 4 years ago

@kayousterhout Thank you very much for the clear explanation! I'm sorry I didn't know that Lightstep only used 64 bit trace ids. Not knowing that was very confusing when comparing this implementation to the new OpenTelemetry libraries! I'm glad I know now.

You are also correct that the X-Cloud-Trace-Header is not in the code in this repo; again, my apologies. It's from an internal extension of the Tracer from this repo that does something like this to construct the header:

public inject(
    spanContext: any,
    format: string,
    carrier: LightstepCarrier,
  ): void {
  LightstepTracer.prototype.inject.call(this, spanContext, format, carrier);
  const traceGuid = carrier['ot-tracer-traceid'];
  const spanGuid = carrier['ot-tracer-spanid'];
  const traceGuidForHeader: string = traceGuid.padEnd(32, '0');
  const spanGuidForHeader: string = hexToIntString(spanGuid);
  const traceValue = `${traceGuidForHeader}/${spanGuidForHeader};o=1`;
  carrier['X-Cloud-Trace-Context'] = traceValue;
}

Since it's using the values placed into carrier by LightstepTracer.prototype.inject, I assumed showing the header was a good example to illustrate the most significant bits being dropped (since it's basically just using the values from this library), but forgot that the header was GCP/opencensus. Sorry for the confusion.