odygrd / quill

Asynchronous Low Latency C++ Logging Library
MIT License
1.54k stars 158 forks source link

core dump in quill:flush() #276

Closed mbhardwa closed 1 year ago

mbhardwa commented 1 year ago

I have a test program which works totally fine with quill v2.7.0 , I cloned on Feb 10 With the latest update from master I get a crash in quill:flush() This is what i see on stdout when I call quill::flush()

terminate called after throwing an instance of 'std::bad_variant_access' what(): Unexpected index Aborted (core dumped)

If I remove quill::flush() its all works fine

Running on Centos7 , gcc 7.3.1, compiled with -DQUILL_NO_EXCEPTIONS -DQUILL_USE_BOUNDED_QUEUE the quill lib and test program are compiled with same exact flags Test program is just 4 threads logging data .

This is the main int main() { quill::Config cfg; cfg.default_queue_capacity = 524288 ; quill::configure(cfg); quill::start(); std::cout<<" quill start 2"<<std::endl; quill::Handler file_handler = quill::rotating_file_handler(base_filename, "w", quill::FilenameAppend::None, 1001024, 5); //quill::Handler file_handler = quill::rotating_file_handler(base_filename, "w", 1001024, 5); file_handler->set_pattern("%(ascii_time) | %(level_name) | %(message)", "%Y-%m-%d %H:%M:%S.%Qus"); std::cout<<" create logger "<<std::endl; quill::Logger logger = quill::create_logger("rotating", file_handler); //quill::Logger logger = quill::get_logger(); logger->set_log_level(quill::LogLevel::TraceL3); LOG_TRACE_L3(logger, "This is a log trace l3 example {}", 1); LOG_TRACE_L2(logger, "This is a log trace l2 example {} {}", 2, 2.3); LOG_TRACE_L1(logger, "This is a log trace l1 {} example", "string"); //runBenchmark(logger); run_threads(logger,4);

std::cout<<" start flush "<<std::endl;

quill::flush();
std::cout<<" end flush "<<std::endl;

}

odygrd commented 1 year ago

hey I wrote you on this but probably missed it https://github.com/odygrd/quill/issues/273

Quill as a library doesn't have to be recompiled anymore to change the queue type in master. Therefore QUILL_USE_BOUNDED_QUEUE was removed from cmake. Previously this was propagating the flag from the library to your targets but it isn't there anymore

I think that is the issue, can you try add this flag to your benchmark in cmake and let me know if it works ? 1) When compiling quill lib you don't need to pass -DQUILL_USE_BOUNDED_QUEUE anymore, it has been removed from cmake options 2) For you specific target e.g. benchmark can you try to add this to your cmake please

target_compile_definitions(your_benchmark_target PUBLIC QUILL_USE_BOUNDED_QUEUE)
odygrd commented 1 year ago

I believe the above should solve your issue.. If you still have any problems please reopen

shuhaowu commented 1 year ago

@odygrd this is not fixed. I have a reproduceable example here: https://github.com/shuhaowu/quill-bad-variant-bug-report. I'm using 3.3.1 to test. In this example, I have the log being emitted from a std::thread, and then I call quill::flush in the main thread prior to exit.

The problem is the line of quill::flush. This method returns a local thread context here. If I call quill::flush for the first time in a new thread (like the main thread), that line for some reason has QUILL_QUEUE_TYPE to be quill::detail::QueueType::UnboundedBlocking. This causes an unbounded queue to be constructed in ThreadContext::_spsc_queue. Later on, in clear_invalid_and_empty_thread_contexts, the code loops through all the thread contexts and it calls ThreadContext::spsc_queue<QUILL_QUEUE_TYPE>, but this time, QUILL_QUEUE_TYPE is set appropriately to quill::detail::QueueType::BoundedNonBlocking, which then leads to the bad variant access.

To reproduce this, you can follow the instructions in the readme of the repo linked above. Then, run the program with:

gdb build/quill_bad_variant

Set 2 break points:

(gdb) b ThreadContext::ThreadContext
(gdb) b quill::flush

Run the program:

(gdb) run

You will encounter ThreadContext::ThreadContext for the first time, which is correct somehow. You can check queue_type

Thread 3 "quill_bad_varia" hit Breakpoint 1, quill::detail::ThreadContext::ThreadContext (this=0x7ffff7fba080, queue_type=quill::detail::QueueType::BoundedNonBlocking, default_queue_capacity=4096, initial_transit_event_buffer_capacity=64, huge_pages=false) at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContext.h:37
37    explicit ThreadContext(QueueType queue_type, uint32_t default_queue_capacity,

(gdb) p queue_type
$1 = quill::detail::QueueType::BoundedNonBlocking

Simply continue, as there's nothing to be gained here.

(gdb) c

You'll hit quill::flush, which is fine. Simply continue again.

Thread 1 "quill_bad_varia" hit Breakpoint 2, quill::flush () at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/src/Quill.cpp:169
169 void flush() { detail::LogManagerSingleton::instance().log_manager().flush(); }
(gdb) c

You'll hit ThreadContext::ThreadContext a second time, which is called from quill::flush. We can check queue_type again, which this time is UnboundedBlocking

Thread 1 "quill_bad_varia" hit Breakpoint 1, quill::detail::ThreadContext::ThreadContext (this=0x7ffff7fb6080, queue_type=quill::detail::QueueType::UnboundedBlocking, default_queue_capacity=4096, initial_transit_event_buffer_capacity=64, huge_pages=false) at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContext.h:37
37    explicit ThreadContext(QueueType queue_type, uint32_t default_queue_capacity,
(gdb) p queue_type
$2 = quill::detail::QueueType::UnboundedBlocking

Can verify that it is called from quill::flush:

(gdb) bt
#0  quill::detail::ThreadContext::ThreadContext (this=0x7ffff7fb6080, queue_type=quill::detail::QueueType::UnboundedBlocking, default_queue_capacity=4096, 
    initial_transit_event_buffer_capacity=64, huge_pages=false) at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContext.h:37
#1  0x000055555559cdb5 in quill::detail::ThreadContextCollection::ThreadContextWrapper<(quill::detail::QueueType)0>::ThreadContextWrapper (this=0x7ffff7e98760, 
    thread_context_collection=..., default_queue_capacity=4096, initial_transit_event_buffer_capacity=64, huge_pages=false)
    at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContextCollection.h:57
#2  0x00005555555b28ac in quill::detail::ThreadContextCollection::local_thread_context<(quill::detail::QueueType)0> (
    this=0x555555629280 <quill::detail::LogManagerSingleton::instance()::instance+384>)
    at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContextCollection.h:146
#3  quill::detail::LogManager::flush (this=0x555555629100 <quill::detail::LogManagerSingleton::instance()::instance>)
    at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/LogManager.h:157
#4  0x00005555555b1fcf in quill::flush () at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/src/Quill.cpp:169
#5  0x000055555555fb56 in main () at /home/shuhao/projects/quill-bug-report/main.cc:33

Continue again, and if you are lucky (which for me is 90%+ of the time), you should pause at the bad variant access throw:

(gdb) c
[...]
terminate called after throwing an instance of 'std::bad_variant_access'
  what():  std::get: wrong index for variant

Thread 2 "Quill_Backend" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff77ff640 (LWP 127683)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737345746496) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.

The backtrace is very large:

(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737345746496) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737345746496) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737345746496, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff78287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7ca2b9e in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7cae20c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7cad1e9 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff7cad959 in __gxx_personality_v0 () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff7eb3884 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#10 0x00007ffff7eb3f41 in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#11 0x00007ffff7cae4cb in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x0000555555560bee in std::__throw_bad_variant_access (__what=0x5555555ea320 "std::get: wrong index for variant") at /usr/include/c++/11/variant:1314
#13 0x0000555555560c25 in std::__throw_bad_variant_access (__valueless=false) at /usr/include/c++/11/variant:1322
#14 0x000055555557105f in std::get<2ul, std::monostate, quill::detail::UnboundedQueue, quill::detail::BoundedQueueImpl<unsigned int> > (
    __v=std::variant<std::monostate, quill::detail::UnboundedQueue, quill::detail::BoundedQueueImpl<unsigned int>> [index 1] = {...}) at /usr/include/c++/11/variant:1681
#15 0x000055555556d0a8 in std::get<quill::detail::BoundedQueueImpl<unsigned int>, std::monostate, quill::detail::UnboundedQueue, quill::detail::BoundedQueueImpl<unsigned int> > (
    __v=std::variant<std::monostate, quill::detail::UnboundedQueue, quill::detail::BoundedQueueImpl<unsigned int>> [index 1] = {...}) at /usr/include/c++/11/variant:1122
#16 0x0000555555562289 in quill::detail::ThreadContext::spsc_queue<(quill::detail::QueueType)4> (this=0x7ffff7fb6080) at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContext.h:98
#17 quill::detail::ThreadContextCollection::_find_and_remove_invalidated_thread_contexts()::{lambda(quill::detail::ThreadContext*)#2}::operator()(quill::detail::ThreadContext*) const (__closure=0x7ffff77feaef, 
    thread_context=0x7ffff7fb6080) at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContextCollection.h:340
#18 0x0000555555574d43 in __gnu_cxx::__ops::_Iter_pred<quill::detail::ThreadContextCollection::_find_and_remove_invalidated_thread_contexts()::{lambda(quill::detail::ThreadContext*)#2}>::operator()<__gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > > >(__gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >) (this=0x7ffff77feaef, __it=0x7ffff7fb6080) at /usr/include/c++/11/bits/predefined_ops.h:318
#19 0x0000555555571d1b in std::__find_if<__gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >, __gnu_cxx::__ops::_Iter_pred<quill::detail::ThreadContextCollection::_find_and_remove_invalidated_thread_contexts()::{lambda(quill::detail::ThreadContext*)#2}> >(__gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >, __gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >, __gnu_cxx::__ops::_Iter_pred<quill::detail::ThreadContextCollection::_find_and_remove_invalidated_thread_contexts()::{lambda(quill::detail::ThreadContext*)#2}>, std::random_access_iterator_tag) (__first=0x7ffff7fb6080, __last=0x7ffff7fb6080, __pred=...)
    at /usr/include/c++/11/bits/stl_algobase.h:2099
#20 0x000055555556d5c4 in std::__find_if<__gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >, __gnu_cxx::__ops::_Iter_pred<quill::detail::ThreadContextCollection::_find_and_remove_invalidated_thread_contexts()::{lambda(quill::detail::ThreadContext*)#2}> >(__gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >, __gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >, __gnu_cxx::__ops::_Iter_pred<quill::detail::ThreadContextCollection::_find_and_remove_invalidated_thread_contexts()::{lambda(quill::detail::ThreadContext*)#2}>) (__first=0x7ffff7fb6080, __last=0x7ffff7fb6080, __pred=...) at /usr/include/c++/11/bits/stl_algobase.h:2114
#21 0x0000555555568b37 in std::find_if<__gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >, quill::detail::ThreadContextCollection::_find_and_remove_invalidated_thread_contexts()::{lambda(quill::detail::ThreadContext*)#2}>(__gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >, __gnu_cxx::__normal_iterator<quill::detail::ThreadContext**, std::vector<quill::detail::ThreadContext*, std::allocator<quill::detail::ThreadContext*> > >, quill::detail::ThreadContextCollection::_find_and_remove_invalidated_thread_contexts()::{lambda(quill::detail::ThreadContext*)#2}) (__first=0x7ffff7fb6080, __last=0x7ffff7fb6080, __pred=...) at /usr/include/c++/11/bits/stl_algo.h:3910
#22 0x000055555556237d in quill::detail::ThreadContextCollection::_find_and_remove_invalidated_thread_contexts (this=0x555555629280 <quill::detail::LogManagerSingleton::instance()::instance+384>)
    at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContextCollection.h:333
#23 0x0000555555561e93 in quill::detail::ThreadContextCollection::clear_invalid_and_empty_thread_contexts (this=0x555555629280 <quill::detail::LogManagerSingleton::instance()::instance+384>)
    at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContextCollection.h:202
#24 0x0000555555566539 in quill::detail::BackendWorker::_main_loop (this=0x555555629500 <quill::detail::LogManagerSingleton::instance()::instance+1024>)
    at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/backend/BackendWorker.h:1068
#25 0x000055555556294f in quill::detail::BackendWorker::run()::{lambda()#1}::operator()() const (__closure=0x55555563e058)
    at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/backend/BackendWorker.h:313
#26 0x000055555557aa66 in std::__invoke_impl<void, quill::detail::BackendWorker::run()::{lambda()#1}>(std::__invoke_other, quill::detail::BackendWorker::run()::{lambda()#1}&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#27 0x000055555557a9d2 in std::__invoke<quill::detail::BackendWorker::run()::{lambda()#1}>(quill::detail::BackendWorker::run()::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#28 0x000055555557a942 in std::thread::_Invoker<std::tuple<quill::detail::BackendWorker::run()::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x55555563e058) at /usr/include/c++/11/bits/std_thread.h:259
#29 0x000055555557a8de in std::thread::_Invoker<std::tuple<quill::detail::BackendWorker::run()::{lambda()#1}> >::operator()() (this=0x55555563e058) at /usr/include/c++/11/bits/std_thread.h:266
#30 0x000055555557a61e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<quill::detail::BackendWorker::run()::{lambda()#1}> > >::_M_run() (this=0x55555563e050) at /usr/include/c++/11/bits/std_thread.h:211
#31 0x00007ffff7cdc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#32 0x00007ffff7894b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#33 0x00007ffff7926a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The function is at #16, where we can check for the identity of ThreadContext and _spsc_queue

(gdb) up 16
#16 0x0000555555562289 in quill::detail::ThreadContext::spsc_queue<(quill::detail::QueueType)4> (this=0x7ffff7fb6080) at /home/shuhao/projects/quill-bug-report/build/_deps/quill-src/quill/include/quill/detail/ThreadContext.h:98
98        return std::get<BoundedQueue>(_spsc_queue);
(gdb) p _spsc_queue
$3 = std::variant<std::monostate, quill::detail::UnboundedQueue, quill::detail::BoundedQueueImpl<unsigned int>> [index 1] = {{_producer = 0x555555641d00, _consumer = 0x555555641d00, _huge_pages = false}}

We noted that ThreadContext is at 0x7ffff7fb6080, which matches the ThreadContext from the earlier breakpoint where the UnboundedQueue is constructed. This explains the bad variant access.

I suspect the problem is some sort of header include problem where the value of QUILL_QUEUE_TYPE is messed up. Haven't debugged that far tho, as I don't know enough of the code base.

odygrd commented 1 year ago

Thanks a lot for reporting this and the detailed report. This happened because the function definition was in the .cpp file and it was using the default queue_type that quill was originally compiled with. The pull request should fix this