Closed htfy96 closed 7 years ago
The two bugs are probably caused by data race in eventloop::handle_event: Event e { data->fd }
and eventloop:modify_event
which changes data->fd
in another thread. Current workaround is to force handle_event
to acquire map_mutex
before reading data->fd
https://github.com/htfy96/c10k-server/commit/7ce2d49d912b1af59b4d5c1bd439abe6eaaf4f47
Still broken in https://travis-ci.org/htfy96/c10k-server/jobs/195955076 7ce2d49.
1x SEGV 2x unhandled exception read when socket is closed
No DR was reported
With logbt
we acquired backtrace of SEGV:
https://travis-ci.org/htfy96/c10k-server/jobs/196185054
worker_thread-test: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Core was generated by `/data/build/worker_thread-test'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
58 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7ff0ced02700 (LWP 48))]
Thread 4 (Thread 0x7ff0cfd04700 (LWP 46)):
#0 0x00007ff0d0105ced in write () at ../sysdeps/unix/syscall-template.S:84
#1 0x00007ff0d0088927 in _IO_new_file_write (f=0x7ff0d03d0600 <_IO_2_1_stdout_>, data=0x891af0, n=707) at fileops.c:1271
#2 0x00007ff0d008a492 in new_do_write (to_do=707, data=0x891af0 "\n\n", '~' <repeats 158 times>, "\n\nworker_thread-test is a Catch v1.6.1 h"..., fp=0x7ff0d03d0600 <_IO_2_1_stdout_>) at fileops.c:526
#3 _IO_new_do_write (fp=0x7ff0d03d0600 <_IO_2_1_stdout_>, data=0x891af0 "\n\n", '~' <repeats 158 times>, "\n\nworker_thread-test is a Catch v1.6.1 h"..., to_do=707) at fileops.c:502
#4 0x00007ff0d0087ec8 in _IO_new_file_sync (fp=0x7ff0d03d0600 <_IO_2_1_stdout_>) at fileops.c:882
#5 0x00007ff0d007ca1c in __GI__IO_fflush (fp=0x7ff0d03d0600 <_IO_2_1_stdout_>) at iofflush.c:40
#6 0x00007ff0d06fd98e in std::ostream::flush() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x0000000000467d14 in Catch::ConsoleReporter::printTestCaseAndSectionHeader() ()
#8 0x0000000000467efc in Catch::ConsoleReporter::lazyPrint() ()
#9 0x00000000004683dc in Catch::ConsoleReporter::assertionEnded(Catch::AssertionStats const&) ()
#10 0x000000000045f881 in Catch::RunContext::assertionEnded(Catch::AssertionResult const&) ()
#11 0x0000000000448f6d in Catch::ResultBuilder::handleResult(Catch::AssertionResult const&) ()
#12 0x000000000044a292 in Catch::ResultBuilder::captureExpression() ()
#13 0x000000000046b9c1 in Catch::RunContext::handleFatalErrorCondition(std::string const&) ()
#14 0x00000000004579e5 in Catch::fatal(std::string const&, int) ()
#15 0x0000000000457a8c in Catch::FatalConditionHandler::handleSignal(int) ()
#16 <signal handler called>
#17 0x00007ff0c8000078 in ?? ()
#18 0x000000000047dc35 in c10k::EventLoop::loop() ()
#19 0x0000000000445201 in std::thread::_Impl<std::_Bind_simple<std::reference_wrapper<c10k::detail::WorkerThread<ServerHandler> > ()> >::_M_run() ()
#20 0x00007ff0d06a78f0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#21 0x00007ff0d097b6ca in start_thread (arg=0x7ff0cfd04700) at pthread_create.c:333
#22 0x00007ff0d01160af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Thread 3 (Thread 0x7ff0d0da9740 (LWP 45)):
#0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007ff0d007d992 in __GI__IO_fwrite (buf=0x7ff0d03d1760 <_IO_stdfile_1_lock>, size=1, count=26, fp=0x7ff0d012407c <__lll_lock_wait_private+28>) at iofwrite.c:37
#2 0x00007ff0d06fe9f6 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x000000000046794b in Catch::ConsoleReporter::printOpenHeader(std::string const&) ()
#4 0x0000000000467ae5 in Catch::ConsoleReporter::printTestCaseAndSectionHeader() ()
#5 0x0000000000467efc in Catch::ConsoleReporter::lazyPrint() ()
#6 0x00000000004683dc in Catch::ConsoleReporter::assertionEnded(Catch::AssertionStats const&) ()
#7 0x000000000045f881 in Catch::RunContext::assertionEnded(Catch::AssertionResult const&) ()
#8 0x0000000000448f6d in Catch::ResultBuilder::handleResult(Catch::AssertionResult const&) ()
#9 0x000000000044a292 in Catch::ResultBuilder::captureExpression() ()
#10 0x000000000044e158 in Catch::ResultBuilder::useActiveException(Catch::ResultDisposition::Flags) ()
#11 0x0000000000452e02 in Catch::runTests(Catch::Ptr<Catch::Config> const&) ()
#12 0x000000000043099a in main ()
Thread 2 (Thread 0x7ff0cf503700 (LWP 47)):
#0 0x00007ff0d09848fd in accept () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000000000432954 in std::thread::_Impl<std::_Bind_simple<____C_A_T_C_H____T_E_S_T____0()::{lambda()#1} ()> >::_M_run() ()
#2 0x00007ff0d06a78f0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ff0d097b6ca in start_thread (arg=0x7ff0cf503700) at pthread_create.c:333
#4 0x00007ff0d01160af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Thread 1 (Thread 0x7ff0ced02700 (LWP 48)):
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1 0x00007ff0d00453ea in __GI_abort () at abort.c:89
#2 0x00007ff0d003bbb7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x7ff0d09873c2 "mutex->__data.__owner == 0", file=file@entry=0x7ff0d09873a5 "../nptl/pthread_mutex_lock.c", line=line@entry=81, function=function@entry=0x7ff0d0987530 <__PRETTY_FUNCTION__.8476> "__pthread_mutex_lock") at assert.c:92
#3 0x00007ff0d003bc62 in __GI___assert_fail (assertion=assertion@entry=0x7ff0d09873c2 "mutex->__data.__owner == 0", file=file@entry=0x7ff0d09873a5 "../nptl/pthread_mutex_lock.c", line=line@entry=81, function=function@entry=0x7ff0d0987530 <__PRETTY_FUNCTION__.8476> "__pthread_mutex_lock") at assert.c:101
#4 0x00007ff0d097e068 in __GI___pthread_mutex_lock (mutex=<optimized out>) at ../nptl/pthread_mutex_lock.c:81
#5 0x000000000047dc62 in c10k::EventLoop::loop() ()
#6 0x0000000000445171 in std::thread::_Impl<std::_Bind_simple<std::reference_wrapper<c10k::detail::WorkerThread<ClientHandler> > ()> >::_M_run() ()
#7 0x00007ff0d06a78f0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007ff0d097b6ca in start_thread (arg=0x7ff0ced02700) at pthread_create.c:333
#9 0x00007ff0d01160af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Here is the most detailed backtrace until now from travis CI:
Thread 3 (Thread 0x7f49e28c2700 (LWP 45)):
#0 0x00007f49e324d323 in std::locale::locale() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007f49e324a9c8 in std::ios_base::_M_init() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007f49e32a1c51 in std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f49e32c3d26 in std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream(std::_Ios_Openmode) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x0000559c47a0744e in Catch::ConsoleReporter::SummaryColumn::addRow (this=0x7f49e28bd900, count=2) at /data/vendor/Catch/include/reporters/catch_reporter_console.hpp:329
#5 0x0000559c47a079ed in Catch::ConsoleReporter::printTotals (this=0x559c48c54fa0, totals=...) at /data/vendor/Catch/include/reporters/catch_reporter_console.hpp:364
#6 0x0000559c47a05ea3 in Catch::ConsoleReporter::testRunEnded (this=0x559c48c54fa0, _testRunStats=...) at /data/vendor/Catch/include/reporters/catch_reporter_console.hpp:97
#7 0x0000559c479fa5e4 in Catch::RunContext::handleFatalErrorCondition (this=0x7ffe07827440, message="SIGSEGV - Segmentation violation signal") at /data/vendor/Catch/include/internal/catch_run_context.hpp:255
#8 0x0000559c479f8312 in Catch::fatal (message="SIGSEGV - Segmentation violation signal", exitCode=-11) at /data/vendor/Catch/include/internal/catch_fatal_condition.hpp:19
#9 0x0000559c479f83d7 in Catch::FatalConditionHandler::handleSignal (sig=11) at /data/vendor/Catch/include/internal/catch_fatal_condition.hpp:59
#10 <signal handler called>
#11 0x00007f49dc0008b0 in ?? ()
#12 0x0000559c47a5c052 in std::_Mem_fn_base<void (c10k::Connection::*)(c10k::Event const&), true>::_M_call<std::shared_ptr<c10k::Connection>&, c10k::Event const&> (this=0x7f49dc0069b0, __ptr=std::shared_ptr (empty) 0x0) at /usr/include/c++/5/functional:634
#13 0x0000559c47a5be0d in std::_Mem_fn_base<void (c10k::Connection::*)(c10k::Event const&), true>::operator()<std::shared_ptr<c10k::Connection>&, c10k::Event const&, void> (this=0x7f49dc0069b0, __object=std::shared_ptr (empty) 0x0) at /usr/include/c++/5/functional:610 // why do we have an empty shared_ptr here?
#14 0x0000559c47a5b8ab in std::_Bind<std::_Mem_fn<void (c10k::Connection::*)(c10k::Event const&)> (std::shared_ptr<c10k::Connection>, std::_Placeholder<1>)>::__call<void, c10k::Event const&, 0ul, 1ul>(std::tuple<c10k::Event const&>&&, std::_Index_tuple<0ul, 1ul>) (this=0x7f49dc0069b0, __args=<unknown type in /data/build/worker_thread-test, CU 0x17316a, DIE 0x1b7cca>) at /usr/include/c++/5/functional:1074
#15 0x0000559c47a5aa28 in std::_Bind<std::_Mem_fn<void (c10k::Connection::*)(c10k::Event const&)> (std::shared_ptr<c10k::Connection>, std::_Placeholder<1>)>::operator()<c10k::Event const&, void>(c10k::Event const&) (this=0x7f49dc0069b0) at /usr/include/c++/5/functional:1133
#16 0x0000559c47a5903c in std::_Function_handler<void (c10k::Event const&), std::_Bind<std::_Mem_fn<void (c10k::Connection::*)(c10k::Event const&)> (std::shared_ptr<c10k::Connection>, std::_Placeholder<1>)> >::_M_invoke(std::_Any_data const&, c10k::Event const&) (__functor=..., __args#0=...) at /usr/include/c++/5/functional:1871
#17 0x0000559c47a4ff28 in std::function<void (c10k::Event const&)>::operator()(c10k::Event const&) const (this=0x7f49d4000f28, __args#0=...) at /usr/include/c++/5/functional:2267
#18 0x0000559c47a4d745 in c10k::EventLoop::<lambda(const epoll_event&)>::operator()(const epoll_event &) const (__closure=0x7f49e28bec60, ev=...) at /data/src/c10k/event_loop.cpp:64
#19 0x0000559c47a4e004 in std::for_each<epoll_event*, c10k::EventLoop::handle_events(epoll_event*, epoll_event*)::<lambda(const epoll_event&)> >(epoll_event *, epoll_event *, c10k::EventLoop::<lambda(const epoll_event&)>) (__first=0x7f49e28bed6c, __last=0x7f49e28bed54, __f=...) at /usr/include/c++/5/bits/stl_algo.h:3767
#20 0x0000559c47a4d79e in c10k::EventLoop::handle_events (this=0x7ffe078264c0, st=0x7f49e28bed60, ed=0x7f49e28bed54) at /data/src/c10k/event_loop.cpp:65
#21 0x0000559c47a4d5da in c10k::EventLoop::loop (this=0x7ffe078264c0) at /data/src/c10k/event_loop.cpp:49
#22 0x0000559c479de0ce in c10k::detail::WorkerThread<ServerHandler>::operator() (this=0x7ffe078264c0) at /data/src/c10k/worker_thread.hpp:42
#23 0x0000559c479ddf59 in std::__invoke<c10k::detail::WorkerThread<ServerHandler>>(c10k::detail::WorkerThread<ServerHandler>&) (__f=...) at /usr/include/c++/5/functional:201
#24 0x0000559c479ddd73 in std::reference_wrapper<c10k::detail::WorkerThread<ServerHandler> >::operator()<>() const (this=0x559c48c59528) at /usr/include/c++/5/functional:428
#25 0x0000559c479ddb30 in std::_Bind_simple<std::reference_wrapper<c10k::detail::WorkerThread<ServerHandler> > ()>::_M_invoke<>(std::_Index_tuple<>) (this=0x559c48c59528) at /usr/include/c++/5/functional:1531
#26 0x0000559c479dd30a in std::_Bind_simple<std::reference_wrapper<c10k::detail::WorkerThread<ServerHandler> > ()>::operator()() (this=0x559c48c59528) at /usr/include/c++/5/functional:1520
#27 0x0000559c479dc92f in std::thread::_Impl<std::_Bind_simple<std::reference_wrapper<c10k::detail::WorkerThread<ServerHandler> > ()> >::_M_run() (this=0x559c48c59510) at /usr/include/c++/5/thread:115
#28 0x00007f49e32658f0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#29 0x00007f49e35396ca in start_thread (arg=0x7f49e28c2700) at pthread_create.c:333
#30 0x00007f49e2cd40af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
#1 0x0000562d556e639c in std::_Mem_fn<void (c10k::Connection::*)(c10k::Event const&)>::_M_call<std::shared_ptr<c10k::Connection>&, c10k::Event const&> (this=0x7fbb70003b30, __ptr=std::shared_ptr (count 1744830584, weak 32698) 0x7fbb68002770) at /usr/include/c++/4.9/functional:526
Since shared_cnt block is placed at the end of Handler, I guess there exists overflow in this program :(
#9 0x000055ab953e3c3e in c10k::EventLoop::handle_events (this=0x7ffdde173f30, st=0x7f953002fd60, ed=0x7f953002fd54) at /data/src/c10k/event_loop.cpp:65
WTF
Finally fixed in https://github.com/htfy96/c10k-server/commit/d0a931efce8ef1b222128addee25b3a656ae9b1f https://github.com/htfy96/c10k-server/commit/458d1772e22819f30133a5ad859a8dd16131b06c https://github.com/htfy96/c10k-server/commit/7ce2d49d912b1af59b4d5c1bd439abe6eaaf4f47 https://github.com/htfy96/c10k-server/commit/8d3abb5833412634ed944e04dbb607c30d5fbad6 https://github.com/htfy96/c10k-server/commit/75928e8be75e60edd377952ca28b84ec035b6c10 https://github.com/htfy96/c10k-server/commit/29daa0739ca506a07b3ce80830fb6329c5ba8bb5
https://travis-ci.org/htfy96/c10k-server/jobs/195788297