jaegertracing / jaeger-client-go

🛑 This library is DEPRECATED!
https://jaegertracing.io/
Apache License 2.0
1.39k stars 288 forks source link

SIGSEGV in jaeger-client-go/reporter.go #155

Closed yqf3139 closed 7 years ago

yqf3139 commented 7 years ago

I was try to migrate from the zipkin backend to jaeger, so everything I need is to swap the tracer.

To init the tracer:

// Sample configuration for testing. Use constant sampling to sample every trace
// and enable LogSpan to log every span via configured Logger.
config := jaegercfg.Configuration{
    Sampler: &jaegercfg.SamplerConfig{
        Type:  jaeger.SamplerTypeConst,
        Param: 1,
    },
    Reporter: &jaegercfg.ReporterConfig{
    LogSpans: true,
        LocalAgentHostPort: "jaeger-all-in-one-agent.fission:5775",
    },
}
// Example logger and metrics factory. Use github.com/uber/jaeger-client-go/log
// and github.com/uber/jaeger-lib/metrics respectively to bind to real logging and metrics
// frameworks.
jLogger := jaegerlog.StdLogger
jMetricsFactory := metrics.NullFactory
// Initialize tracer with a logger and a metrics factory
closer, err := config.InitGlobalTracer(
    "router",
    jaegercfg.Logger(jLogger),
    jaegercfg.Metrics(jMetricsFactory),
)
if err != nil {
    log.Printf("Could not initialize jaeger tracer: %s", err.Error())
    return
}
defer closer.Close()

And then error occured:

2017-05-31T04:36:39.214604075Z panic: runtime error: invalid memory address or nil pointer dereference
2017-05-31T04:36:39.214734078Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x80492cc]
2017-05-31T04:36:39.214753293Z 
2017-05-31T04:36:39.214814925Z goroutine 16 [running]:
2017-05-31T04:36:39.214944289Z sync/atomic.AddUint64(0x19f3a424, 0xffffffff, 0xffffffff, 0x19b1c700, 0x1)
2017-05-31T04:36:39.215025990Z  /home/yqf/bin/go/src/sync/atomic/asm_386.s:112 +0xc
2017-05-31T04:36:39.215083702Z github.com/fission/fission/vendor/github.com/uber/jaeger-client-go.(*remoteReporter).processQueue(0x19f3a400)
2017-05-31T04:36:39.215193739Z  /home/yqf/dev/mygo/src/github.com/fission/fission/vendor/github.com/uber/jaeger-client-go/reporter.go:234 +0x107
2017-05-31T04:36:39.215252715Z created by github.com/fission/fission/vendor/github.com/uber/jaeger-client-go.NewRemoteReporter
2017-05-31T04:36:39.215361183Z  /home/yqf/dev/mygo/src/github.com/fission/fission/vendor/github.com/uber/jaeger-client-go/reporter.go:200 

When the error occured, r.queueLength is 0. Appreciate any helps.

black-adder commented 7 years ago

Looking at the stack trace you provided, I found this ticket: https://github.com/golang/go/issues/13868

I need to investigate if it's related but it points to the same line of code that the stack trace referenced.

yurishkuro commented 7 years ago

We need to move queueLength to the top of the remoteReporter struct, and leave a comment explaining why, with a link to that Go issue.

yqf3139 commented 7 years ago

Thanks.

I ran the jaeger image on a GCP machine which is 64 bit. Still cannot fully understand where the problem is if it is running on a 64 bit machine.

black-adder commented 7 years ago

@yqf3139 before I put in the fix for this, do you mind applying the change locally and running your service again? I can't reproduce this issue on my end.

In jaeger-client-go/reporter.go change

type remoteReporter struct {
    reporterOptions
    sender       transport.Transport
    queue        chan *zipkincore.Span
    queueLength  int64 // signed because metric's gauge is signed
    queueDrained sync.WaitGroup
    flushSignal  chan *sync.WaitGroup
}

to

type remoteReporter struct {
        queueLength  int64 // signed because metric's gauge is signed
    reporterOptions
    sender       transport.Transport
    queue        chan *zipkincore.Span
    queueDrained sync.WaitGroup
    flushSignal  chan *sync.WaitGroup
}
yqf3139 commented 7 years ago

@black-adder I will have a try.

yqf3139 commented 7 years ago

@black-adder Thanks, problem solved by making the change. We use GOARCH=386 to build, so the problem is caused by the go issue.

tomersimis-zz commented 7 years ago

Had the same problem and reordering the remoteReporter fields did work.