rnburn / zipkin-cpp-opentracing

OpenTracing Tracer implementation for Zipkin in C++
Apache License 2.0
51 stars 45 forks source link

zipkin hangs on exit #30

Closed nefethael closed 5 years ago

nefethael commented 5 years ago

Hello all,

When I close zipkin tracer, it hangs on writer_.join(); in zipkin_reporter_impl.cc.

I'm not sure about what happens but don't we need to unlock lock_guard in ReporterImpl::makeWriterExit before writecond.notify_all();?

Regards, Vincent

rnburn commented 5 years ago

It would be better to release that lock, but it's not necessary. notify_all doesn't block on the other threads and it will release the lock when makeWriterExit returns.

Notes from the notify_all reference:

The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.

But we should do something like this.

If you can reproduce consistently, can you attach a debugger to the process to see what this recorder thread is doing or add some print statements.

It might be hanging in this transport code that's uploading the span information.

nefethael commented 5 years ago

Sorry for delay, indeed it seems to be a curl problem

#0  0x0000003d0660888d in __GI___pthread_timedjoin_ex (threadid=139711098750720, thread_return=0x0, abstime=0x0, block=<optimized out>) at /usr/src/debug/glibc/2.27-r0/git/nptl/pthread_join_common.c:89
#1  0x0000003d0aeba6c3 in std::thread::join() () from target:/usr/lib/libstdc++.so.6
#2  0x0000003d082201b6 in zipkin::ReporterImpl::~ReporterImpl() () from target:/usr/lib/libzipkin.so.0
#3  0x0000003d082202d9 in zipkin::ReporterImpl::~ReporterImpl() () from target:/usr/lib/libzipkin.so.0
#4  0x0000003d0821d221 in ?? () from target:/usr/lib/libzipkin.so.0
#5  0x0000003d08a29b30 in ?? () from target:/usr/lib/libzipkin_opentracing.so.0
#6  0x000000000044cc75 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x720710) at /add/sdk/2.5.1/sysroots/corei7-64-cube-linux/usr/include/c++/7.3.0/bits/shared_ptr_base.h:154
#7  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x70a4b8, __in_chrg=<optimized out>) at /add/sdk/2.5.1/sysroots/corei7-64-cube-linux/usr/include/c++/7.3.0/bits/shared_ptr_base.h:684
#8  std::__shared_ptr<opentracing::v2::Tracer, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x70a4b0, __in_chrg=<optimized out>)
    at /add/sdk/2.5.1/sysroots/corei7-64-cube-linux/usr/include/c++/7.3.0/bits/shared_ptr_base.h:1123
#9  std::shared_ptr<opentracing::v2::Tracer>::~shared_ptr (this=0x70a4b0, __in_chrg=<optimized out>) at /add/sdk/2.5.1/sysroots/corei7-64-cube-linux/usr/include/c++/7.3.0/bits/shared_ptr.h:93
#0  0x0000003d062e82b9 in __GI___poll (fds=0x7f1106696f30, nfds=1, timeout=1000) at /usr/src/debug/glibc/2.27-r0/git/sysdeps/unix/sysv/linux/poll.c:29
#1  0x0000003d0aa2edf9 in ?? () from target:/usr/lib/libcurl.so.4
#2  0x0000003d0aa292d2 in curl_multi_wait () from target:/usr/lib/libcurl.so.4
#3  0x0000003d0aa241f3 in curl_easy_perform () from target:/usr/lib/libcurl.so.4
#4  0x0000003d08220eb1 in zipkin::ZipkinHttpTransporter::transportSpans(zipkin::SpanBuffer&) () from target:/usr/lib/libzipkin.so.0
#5  0x0000003d08220847 in zipkin::ReporterImpl::writeReports() () from target:/usr/lib/libzipkin.so.0
#6  0x0000003d0aeba46f in ?? () from target:/usr/lib/libstdc++.so.6
#7  0x0000003d06607477 in start_thread (arg=0x7f1106698700) at /usr/src/debug/glibc/2.27-r0/git/nptl/pthread_create.c:463
#8  0x0000003d062f208f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
nefethael commented 5 years ago

@rnburn Do you think we could add CURLOPT_TIMEOUT curl option somewhere?

rnburn commented 5 years ago

Yes, that makes sense. Could you put in a PR for it?

nefethael commented 5 years ago

I'll try to create one tomorrow. Do you think adding timeout param in ZipkinHttpTransporter constructor would be acceptable?