open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.77k stars 812 forks source link

OpenTracing shim doesn't change context #2016

Open morigs opened 3 years ago

morigs commented 3 years ago

OpenTelemetry: v0.17.0 Node: v15.5.0

I'm not sure if I'm doing something wrong/ We have the following situation: The application initializes OTel using node-sdk and OpenTracing shim in standard way like in examples. It also adds http instrumentation. The application passes OpenTracing shim to the underlying libraries (actually app plugins) which create spans using opentracing API. One of the libraries performs HTTP request and creates span for this. OpenTelemetry HTTP instrumentation span actually belongs to the root span instead of span create by the library. So trace has the following structure:

- HTTP root span (created by instrumentation, also active context is set here)
`--- Application span (created using OT API, creates new context and passes it below using childOf)
      `--- Library span with HTTP request (create using OT API, uses context from childOf)
`--- HTTP outgoing span (created by instrumentation, uses global active span which is root span)

But I expected to see the following:

- HTTP root span (created by instrumentation)
`--- Application span (created using OT API)
      `--- Library span with HTTP request (create using OT API)
            `--- HTTP outgoing span (created by instrumentation)

Usually user code creates new context for every level of nesting using api.context.with but this is impossible inside libraries which using OpenTracing API. This actually means that OpenTracing Shim is incompatible with instrumentations / plugins I think it's impossible to fix but at least this should be mentioned in the docs

dyladan commented 3 years ago

Hmm shouldn't those libraries be using tracer.withSpan though? I feel like we should be able to shim that to work properly

morigs commented 3 years ago

Do you mean withSpan from OpenTracing? I don't know about such method. And I wasn't able to find any information about it. We can change that libraries but they can use only opentracing API.

Although I believe it should be mentioned in the docs that OpenTracing shim doesn't support active context changing and instrumentations that use it

dyladan commented 3 years ago

appears i was misremembering. We should be able to respect the childOf option in the shim though

https://opentracing-javascript.surge.sh/interfaces/spanoptions.html#childof

dyladan commented 3 years ago

Obviously there is a context loss somewhere but a look over the code doesn't show anything obvious to me. Can you provide a minimal reproduction?

Flarna commented 3 years ago

If I understand this correct OpenTracing will never set a span on context but OpenTelemetry picks up parents only from context. Looks obvious that OpenTelemetry instrumentations will never use a span from OpenTracing as parent.

morigs commented 3 years ago

Yeah, I'll do an example later (on weekends I hope). But for now the context is lost because the only way to set is to call api.context.with. OpenTracing shim just passes existing context down using the third parameter of startSpan and doesn't change global active context which is stored in AsyncLocalStorage. May be if there was another way to change active context it would be possible to shim that. But I think AsyncLocalStorage API won't allow that

dyladan commented 3 years ago

If I understand this correct OpenTracing will never set a span on context but OpenTelemetry picks up parents only from context. Looks obvious that OpenTelemetry instrumentations will never use a span from OpenTracing as parent.

opentracing API passes parents around explicitly. they don't need to pick anything up from context because they're provided directly to startSpan so we should be able to match them up

dyladan commented 3 years ago

Yeah, I'll do an example later (on weekends I hope). But for now the context is lost because the only way to set is to call api.context.with. OpenTracing shim just passes existing context down using the third parameter of startSpan and doesn't change global active context which is stored in AsyncLocalStorage. May be if there was another way to change active context it would be possible to shim that. But I think AsyncLocalStorage API won't allow that

Oh I see where the disconnect is. I'm not an expert in AsyncLocalStorage but at least our context manager doesn't have an API for this. We could add it if it does turn out to be possible but I think users are discouraged from modifying the current active context without creating a new context layer.

At the least you are correct this should be documented.

Flarna commented 3 years ago

opentracing API passes parents around explicitly. they don't need to pick anything up from context because they're provided directly to startSpan so we should be able to match them up

That works as long as you stay in the OpenTracing world. But if there is an (autoinstrumented) OpenTelemetry instrumenation somewhere on the way noone will pass the OpenTracing span to it - instead it wil pick up that one from active context.

I agree with @morigs that this is not really fixable because the concept of automatic context propagation doesn't exist in OpenTracing. A user could fix the gap only by calling some OpenTelemetry API - so by leaving the OpenTracing world.

I think it should work to use a span created by OTraceShim in api.context.with(ShimSpan, () => {}).

But if you can modify the code where OpenTracing is used it's also possible to use OTel directly at all places.

Flarna commented 3 years ago

But I think AsyncLocalStorage API won't allow that

We should avoid to use any APIs from a concrete context manager to ensure this stays portable to e.g. browser and older nodejs versions.

morigs commented 3 years ago

@Flarna

But if you can modify the code where OpenTracing is used it's also possible to use OTel directly at all places.

This a big But 😄

This is impossible in our case. So I think we won't use any instrumentations for now. This is sad situation of course

morigs commented 3 years ago

In case somebody is interested in diving in here's the minimal reproduction. It requires opentelemetry-js@0.18.0 and Jaeger at localhost:6832

Code Spoiler ```javascript 'use strict'; const { HttpTraceContext, ParentBasedSampler, AlwaysOnSampler } = require('@opentelemetry/core'); const opentelemetry = require("@opentelemetry/sdk-node"); const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http'); const { JaegerExporter } = require('@opentelemetry/exporter-jaeger'); const { TracerShim } = require('@opentelemetry/shim-opentracing'); const opentracing = require('opentracing'); const jaegerExporter = new JaegerExporter({ serviceName: 'http_server_opentracing', flushInterval: 100 }); const httpInstrumentation = new HttpInstrumentation(); opentelemetry.api.diag.setLogger(new opentelemetry.api.DiagConsoleLogger(), 80); const sdk = new opentelemetry.NodeSDK({ traceExporter: jaegerExporter, textMapPropagator: new HttpTraceContext(), instrumentations: [httpInstrumentation], autoDetectResources: true, sampler: new ParentBasedSampler({ root: new AlwaysOnSampler() }), }); sdk.start().then(() => startServer(process.env.PORT || 4000)); const tracer = new TracerShim(opentelemetry.api.trace.getTracer('opentracing-shim')); const http = require('http'); const superagent = require('superagent'); function startServer(port) { const server = http.createServer(handleRequest); server.listen(port, (err) => { if (err) throw err; console.log(`Server is listening on ${port}`); }); } async function handleRequest(req, res) { const parentSpan = tracer.extract( opentracing.FORMAT_HTTP_HEADERS, req.headers, ); const span = tracer.startSpan('handle_request', { childOf: parentSpan, }); await doSomething(span); res.writeHead(200, { 'Content-Type': 'application/json' }); res.write( JSON.stringify({ status: 'OK', traceId: span.context().toTraceId() }), ); res.end(); span.finish(); } async function doSomething(parentSpan) { const span = tracer.startSpan('do_something', { childOf: parentSpan }); const res = await doRequest(span); span.log({ res: res.body }); span.finish(); } async function doRequest(parentSpan) { const span = tracer.startSpan('do_request', { childOf: parentSpan }); const res = await superagent('https://httpbin.org/json'); span.finish(); return res; } ```

When running this code I'm getting the following result image

As you can see HTTPS GET is located under HTTP GET, not under do_request

obecny commented 3 years ago

if this is related to context propagation for outgoing request then it is the same error I have described here https://github.com/open-telemetry/opentelemetry-js/issues/1535 which describes error when clients send request to server and server responds back to client, then you have also 2 separate traces

morigs commented 3 years ago

@obecny not enough information to be sure, but I have experienced very similar behavior with this issue. Thus I think it's related