kim / opentracing

OpenTracing (https://opentracing.io) for Haskell
Apache License 2.0
40 stars 12 forks source link

Jaeger Support #7

Closed kim closed 6 years ago

mjrussell commented 6 years ago

@kim just wanted to say thanks for all your effort so far! I've been playing with using this library with jaeger and even in it it's experimental state its been so far so good (one exception being jaeger seems crash when I throw a lot of traffic at it, could be just the all-in-one container running out of mem). I had to make a tweak to pass in the Process/Service name but it looks like you've got a solution for that. Keep up the good work and let me know if you need any help testing/bouncing ideas back and forth. Your changes in #5 really cleaned stuff up!

kim commented 6 years ago

Hey @mjrussell, thanks for the thanks :) Good to hear people find this useful.

If you have ideas around automating integration tests with the various backends, or have some working setup already, please let me know! A lot of details are encoded not in the spec, but in the implementations, so having something in place will be a prerequisite for an actual release.

mjrussell commented 6 years ago

@kim I downloaded the Jaeger all in one image and was using this library's master with a minor tweak. But I was getting traces to appear in their UI which was awesome. One thing I did notice was if I sent enough UDP messages at it, I'd wind up with the following crash message:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x8d88c6]

goroutine 36 [running]:
github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel.(*Reporter).EmitBatch(0xc420176c40, 0x0, 0x0, 0x0)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel/reporter.go:119 +0x86
github.com/jaegertracing/jaeger/thrift-gen/jaeger.(*agentProcessorEmitBatch).Process(0xc420224a30, 0xc400000000, 0xfe3ec0, 0xc420342780, 0xfe3ec0, 0xc420342780, 0xc420256200, 0x89fa18, 0xc420224a70)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:137 +0xc9
github.com/jaegertracing/jaeger/thrift-gen/jaeger.(*AgentProcessor).Process(0xc42023c360, 0xfe3ec0, 0xc420342780, 0xfe3ec0, 0xc420342780, 0x0, 0x0, 0x0)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:111 +0x302
github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer(0xc4202265a0)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:110 +0x179
created by github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).Serve
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:82 +0x62

Not yet sure if thats just because the all in one ran out of memory (it stores traces in-memory) or there is an issue with the messages sent by this library. I was gong to setup their recommended production setup and see if the problem persisted. Happy to take a swing at this PR though and add it into my project and see if its working.

I'd also like to hear more about your plan for propagation (can discuss in a separate issue). I say the new types but Im not yet sure how they fit it and how to potentially make a custom propagator (we use websockets instead of HTTP so headers won't work but I can pass along ids inside the messages)

kim commented 6 years ago

the following crash message

Aww, this looks bad - like the jaeger agent deserializes an incomplete thrift message off the wire, and then segfaults when trying to access its fields. Not sure how this can happen. Would you mind sharing some code which demonstrates the issue?

plan for propagation

Well, I think most other language implementations just go through TextMap for any kind of not-HTTP propagation. For WebSockets, it would be rather application-dependent as it depends on the message format. A simple solution would be to encode the TextMap as JSON (which you get for free as it's just a HashMap Text Text) and embed that in your message.

The spec also demands an opaque "binary" carrier format, which I've held off supporting until now as it isn't clear what the benefits are when the structure is not specified -- apart from a potential de/serialisation speedup compared to going through a TextMap, at the expense of potential runtime failure because of incompatible de/serialisers at either end. Since it can also not be defined generically, I'm not sure how to encode it -- the only choice I see is that some concrete Prism' ByteString SpanContext would have to be supplied by the user when constructing Propagations.

Note, however, that you're not obliged to use traceInject/traceExtract from the library in your instrumentation code -- you just need to get hold of a SpanContext by whatever means suits your use case.

mjrussell commented 6 years ago

Aww, this looks bad - like the jaeger agent deserializes an incomplete thrift message off the wire, and then segfaults when trying to access its fields. Not sure how this can happen. Would you mind sharing some code which demonstrates the issue?

Yeah let me try to setup a minimal example to work through. Strange that the auto generated Thrift message would have an issue. Also that it only happens when we send many messages. I wonder if its related to how the ThriftReporter packs up the messages and flushes them over UDP.

Thanks for the notes on propagation, I think I need to dive into that a bit more and will let you know if I run into any issues threading it along my set of services.

kim commented 6 years ago

@mjrussell I was able to reproduce your issue by running some traced actions concurrently.

3c00cd4 fixes is by using a Handle as the thrift Transport, which provides safe concurrent access. Note, however, that this only works as long as we leave the Handle in unbuffered mode (ie. flush after each and every Span being emitted). This is most likely not the fastest solution, but I'll leave a more elaborate implementation to when someone actually needs it.

mjrussell commented 6 years ago

Sorry I don't know how I missed the response to this but thanks so much for finding the issue and fixing it! We've been focused on a few other tasks lately but we definitely still plan to use your library in our production apps with Jaeger. Appreciate all the hard work and will send any issues/feedback/PRs as we find them.