opentracing / lua-bridge-tracer

Provides an implementation of the Lua OpenTracing API on top of the C++ API
Apache License 2.0
14 stars 13 forks source link

`extract` returning context object, but new trace is being created #8

Closed dbousque closed 5 years ago

dbousque commented 5 years ago

Hi there !

Thanks a lot for the project. I'm planning on using it, things work well, but not for the extract part. I am receiving HTTP headers of this kind { "uber-trace-id" = "<some-trace-context>" }. Both tracer:text_map_extract and tracer:http_headers_extract should work well in this case, but none do.

These functions return LuaSpanContext objects. By modifying LuaTracer::extract in the CPP source, I was able to confirm that we do finish the function execution with :

*userdata = new LuaSpanContext{std::move(span_context)};
luaL_getmetatable(L, LuaSpanContext::description.metatable);
lua_setmetatable(L, -2);
return 1;

But trying to use the resulting span context in tracer:start_span("op name", {["references"] = {{"child_of", span_context}}}) will not create the reference correctly, even though doing the same with a span context coming from parent_span:context() works correctly.

I am quite confused, I have been trying to find the source of the issue for the past few hours without success. Looks like it might be some kind of mishaps in the communication between this project and the tracer implementation plugin (I'm using jaeger).

Do you have an idea where it could come from ?

I tried with the following setups :

Failing example :

local headers = {}
headers["uber-trace-id"] = "<some-trace-context>"
local parent_span_context = tracer:text_map_extract(headers)
local span = tracer:start_span("op name",  {["references"] = {{"child_of", parent_span_context}}})
local child_span = tracer:start_span("op name 2",  {["references"] = {{"child_of", span:context()}}})
child_span.finish()
span.finish()

Both "op name" and "op name 2" would be reported, with "op name 2" as the child, but the parent span would be lost and no link would be made with the source parent span coming from the headers.

rnburn commented 5 years ago

That's probably not the full context. Try injecting a span context then looking at all the key values. If you don't give the full context, the extract will fail.

dbousque commented 5 years ago

I did so, text_map_inject and http_headers_inject are setting uber-trace-id only (which works well). I am doing this in a web environment. In our infrastructure, we have nginx -> kong -> webservice. I am using your project to add jaeger to our kong server. In kong, if I just forward the uber-trace-id header value (and only this value, no other headers) coming from nginx to the target webservice, the webservice is able to pick up nginx's span context and add children spans.

So the issue arises when extracting the tracing context from the headers. In the "failing example" above, the trace id of the context is not picked up, and the span:context() is referencing a new trace, created for the occasion : I can see it on jaeger's UI :)

I can try to add a failing test using jaeger to the test suite if you wish, something like that (uber-trace-id : https://www.jaegertracing.io/docs/1.7/client-libraries/#value) :

local context = { ["uber-trace-id"] = "1111111111111111:2222222222222222:3333333333333333:1" }
local span_context = tracer:text_map_extract(context)
local carrier = {}
tracer:text_map_inject(span_context, carrier)
assert.are.equal(context["uber-trace-id"], carrier["uber-trace-id"])
rnburn commented 5 years ago

Could you try the change in https://github.com/opentracing/lua-bridge-tracer/pull/9?

dbousque commented 5 years ago

It worked well, thanks a lot @rnburn :) Would be really nice if you could publish a release with the fix.

dbousque commented 5 years ago

Hi @rnburn,

Actually, it was a mistake on my part : I thought I was testing your fix, but in fact I tested the master branch of your fork, which dates back to before this fix and the previous one (context-fix). The latest release of this repo still contains the issue, as well as the fix-inject branch on your fork (which are the same).

So it looks like the issue was actually introduced by context-fix. If I checkout to the 696c6f34acc25d5caf6d0b6eac33b978f04b0909 commit, the issue doesn't appear.