Open cyx1231st opened 4 years ago
@gleb-cloudius can you please look into this issue?
@bhalevy we also see this occasionally. here is a more recent trace from a few days ago. it sort of looks like when conn_q
is being destroyed via thread local storage clean-up, that a new continuation is being placed onto a reactor task queue which doesn't get cleaned-up before process exits.
Direct leak of 88 byte(s) in 1 object(s) allocated from:
#0 0x557c91ca5b7d in operator new(unsigned long) /v/llvm/llvm/src/compiler-rt/lib/asan/asan_new_delete.cpp:95:3
#1 0x557ca3e3cc08 in void seastar::future<void>::schedule<seastar::internal::promise_base_with_type<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> >, seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>, seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > seastar::future<void>::then_impl_nrvo<seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>, seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > >(seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> >&&, seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>&, seastar::future_state<seastar::internal::monostate>&&)>(seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>&&, seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> >&&, seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > seastar::future<void>::then_impl_nrvo<seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>, seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > >(seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> >&&, seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>&, seastar::future_state<seastar::internal::monostate>&&)&&) /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1395:20
#2 0x557ca3e3ca72 in seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > seastar::future<void>::then_impl_nrvo<seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>, seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > >(seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>&&) /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1585:9
#3 0x557ca3e3c7a3 in seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > seastar::future<void>::then_impl<seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>, seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > >(seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>&&) /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1619:16
#4 0x557ca3e3aaed in seastar::internal::future_result<seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>, void>::future_type seastar::internal::call_then_impl<seastar::future<void> >::run<seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()> >(seastar::future<void>&, seastar::noncopyable_function<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > ()>&&) /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1248:20
#5 0x557ca3e3a2fa in seastar::lowres_clock seastar::future<void>::then<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > seastar::get_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, seastar::lowres_clock>&, unsigned long, seastar::basic_semaphore<seastar::named_semaphore_exception_factory, seastar::lowres_clock>::time_point)::'lambda'(), seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > >(seastar::named_semaphore_exception_factory&&) /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1544:16
#6 0x557ca3c8e150 in seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock> > seastar::get_units<seastar::named_semaphore_exception_factory, seastar::lowres_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, seastar::lowres_clock>&, unsigned long, seastar::basic_semaphore<seastar::named_semaphore_exception_factory, seastar::lowres_clock>::time_point) /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/semaphore.hh:558:37
#7 0x557ca3acf373 in seastar::smp_message_queue::submit_item(unsigned int, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::unique_ptr<seastar::smp_message_queue::work_item, std::__1::default_delete<seastar::smp_message_queue::work_item> >) /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3202:9
#8 0x557ca4715cde in seastar::futurize<std::__1::invoke_result<seastar::net::conntrack::handle::~handle()::'lambda'()>::type>::type seastar::smp_message_queue::submit<seastar::net::conntrack::handle::~handle()::'lambda'()>(unsigned int, seastar::smp_submit_to_options, seastar::net::conntrack::handle::~handle()::'lambda'()&&) /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/smp.hh:269:9
#9 0x557ca471571e in seastar::futurize<std::__1::invoke_result<seastar::net::conntrack::handle::~handle()::'lambda'()>::type>::type seastar::smp::submit_to<seastar::net::conntrack::handle::~handle()::'lambda'()>(unsigned int, seastar::smp_submit_to_options, seastar::net::conntrack::handle::~handle()::'lambda'()&&) /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/smp.hh:357:44
#10 0x557ca4714a6d in seastar::futurize<std::__1::invoke_result<seastar::net::conntrack::handle::~handle()::'lambda'()>::type>::type seastar::smp::submit_to<seastar::net::conntrack::handle::~handle()::'lambda'()>(unsigned int, seastar::net::conntrack::handle::~handle()::'lambda'()&&) /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/smp.hh:376:16
#11 0x557ca4714613 in seastar::net::conntrack::handle::~handle() /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/net/posix-stack.hh:88:19
#12 0x557ca4704ebf in seastar::net::posix_ap_server_socket_impl::connection::~connection() /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/net/posix-stack.hh:135:12
#13 0x557ca473298f in std::__1::pair<std::__1::tuple<int, seastar::socket_address> const, seastar::net::posix_ap_server_socket_impl::connection>::~pair() /vectorized/llvm/bin/../include/c++/v1/__utility/pair.h:40:29
#14 0x557ca4732930 in void std::__1::__destroy_at<std::__1::pair<std::__1::tuple<int, seastar::socket_address> const, seastar::net::posix_ap_server_socket_impl::connection>, 0>(std::__1::pair<std::__1::tuple<int, seastar::socket_address> const, seastar::net::posix_ap_server_socket_impl::connection>*) /vectorized/llvm/bin/../include/c++/v1/__memory/construct_at.h:56:13
#15 0x557ca47328cc in void std::__1::destroy_at<std::__1::pair<std::__1::tuple<int, seastar::socket_address> const, seastar::net::posix_ap_server_socket_impl::connection>, 0>(std::__1::pair<std::__1::tuple<int, seastar::socket_address> const, seastar::net::posix_ap_server_socket_impl::connection>*) /vectorized/llvm/bin/../include/c++/v1/__memory/construct_at.h:81:5
#16 0x557ca4732630 in void std::__1::allocator_traits<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection>, void*> > >::destroy<std::__1::pair<std::__1::tuple<int, seastar::socket_address> const, seastar::net::posix_ap_server_socket_impl::connection>, void, void>(std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection>, void*> >&, std::__1::pair<std::__1::tuple<int, seastar::socket_address> const, seastar::net::posix_ap_server_socket_impl::connection>*) /vectorized/llvm/bin/../include/c++/v1/__memory/allocator_traits.h:317:9
#17 0x557ca473233b in std::__1::__hash_table<std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection>, std::__1::__unordered_map_hasher<std::__1::tuple<int, seastar::socket_address>, std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection>, std::__1::hash<std::__1::tuple<int, seastar::socket_address> >, std::__1::equal_to<std::__1::tuple<int, seastar::socket_address> >, true>, std::__1::__unordered_map_equal<std::__1::tuple<int, seastar::socket_address>, std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection>, std::__1::equal_to<std::__1::tuple<int, seastar::socket_address> >, std::__1::hash<std::__1::tuple<int, seastar::socket_address> >, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection> > >::__deallocate_node(std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection>, void*>*>*) /vectorized/llvm/bin/../include/c++/v1/__hash_table:1572:9
#18 0x557ca4730831 in std::__1::__hash_table<std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection>, std::__1::__unordered_map_hasher<std::__1::tuple<int, seastar::socket_address>, std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection>, std::__1::hash<std::__1::tuple<int, seastar::socket_address> >, std::__1::equal_to<std::__1::tuple<int, seastar::socket_address> >, true>, std::__1::__unordered_map_equal<std::__1::tuple<int, seastar::socket_address>, std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection>, std::__1::equal_to<std::__1::tuple<int, seastar::socket_address> >, std::__1::hash<std::__1::tuple<int, seastar::socket_address> >, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection> > >::~__hash_table() /vectorized/llvm/bin/../include/c++/v1/__hash_table:1511:5
#19 0x557ca47034d8 in std::__1::unordered_multimap<std::__1::tuple<int, seastar::socket_address>, seastar::net::posix_ap_server_socket_impl::connection, std::__1::hash<std::__1::tuple<int, seastar::socket_address> >, std::__1::equal_to<std::__1::tuple<int, seastar::socket_address> >, std::__1::allocator<std::__1::pair<std::__1::tuple<int, seastar::socket_address> const, seastar::net::posix_ap_server_socket_impl::connection> > >::~unordered_multimap() /vectorized/llvm/bin/../include/c++/v1/unordered_map:2026:5
#20 0x7f98dcaf238e in __call_tls_dtors (/lib64/libc.so.6+0x4038e) (BuildId: 6e3c087aca9b39549e4ba92c451f1e399b586e28)
here is a reproducer. basically if a socket is to be handled on some non-main thread then conn_q has a connected added to it from the main thread. if the target shard doesn't call accept or abort_accept for some reason then entry is never removed from conn_q.
then when reactor threads are exiting conn_q thread local is destroyed and a background future is scheduled late in the shutdown sequence. appears that this happens before the thread local reactor destructor. seems it either isn't cleaning up at this level of detail or lsan is having trouble tracking something.
+ss::logger lg("ok");
+
+struct server {
+ ss::future<> start() {
+ ss::listen_options lo;
+ lo.reuse_address = true;
+ lo.set_fixed_cpu(ss::smp::count - 1);
+ s = ss::engine().listen(
+ ss::socket_address(ss::net::inet_address("127.0.0.1"), 9092), lo);
+ if (ss::this_shard_id() == 0) {
+ (void)ss::with_gate(g, [this] {
+ return s.accept().then_wrapped([](auto far) {
+ try {
+ auto ar = far.get();
+ lg.info("accepted");
+ } catch (...) {
+ lg.info(
+ "accepted (error): {}", std::current_exception());
+ }
+ });
+ });
+ }
+ return ss::now();
+ }
+
+ ss::future<> stop() {
+ if (ss::this_shard_id() == 0) {
+ s.abort_accept();
+ }
+ return g.close();
+ }
+
+ ss::server_socket s;
+ ss::gate g;
+};
+
+SEASTAR_THREAD_TEST_CASE(memleak) {
+ ss::sharded<server> service;
+ service.start().get();
+ service.invoke_on_all([](server& s) { return s.start(); }).get();
+ auto c = ss::make_socket();
+ c.connect(ss::socket_address(ss::net::inet_address("127.0.0.1"), 9092))
+ .then_wrapped([](auto f) {
+ try {
+ f.get();
+ } catch (...) {
+ }
+ })
+ .get();
+ c.shutdown();
+ service.stop().get();
+}
it also appears this race could occur even if a core calls abort_accept on its socket. say this happens in some ss::sharded<>::stop() sequence. on some non-main core if abort_accept is called, then the main core could still race with an accepted connection and add an entry in the other cores conn_q thread_local container. this would lead to the same leak.
@gleb-cloudius please look into this issue
Cc @xemul
@dotnwat if you're able to reproduce the issue, it would be ideal if you could send a fix for it that you've validated.
@bhalevy i had several solutions that seemed to work, but they were all rather ugly. it wasn't until after i hacked these together that i had a full understanding of the race condition that is happening.
i'll post a message to seastar-dev for further discussion.
Any progress on this issue? I ran into this issue today.
@niekbouman there is a discussion going on here https://github.com/scylladb/seastar/pull/1265
Some critical tasks can be missed during engine exit, causing LeakSanitizer failure. Notably, there is a chance to lose the destruction task of
foreign_ptr<lw_shared_ptr<conntrack::load_balancer>>
submitted byconntrack::~handle()
because its future is ignored, see https://github.com/scylladb/seastar/blob/65a8bedb02eaa0c82ae2ee3e47085bc73fb05382/include/seastar/net/posix-stack.hh#L87-L90Detailed leak report: