Closed travisdowns closed 3 months ago
There are 11 stalls in this snippet.
6 of them have the following userspace backtrace:
(inlined by) seastar::reactor::block_notifier(int) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1329
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x42abf: ?? at ??:0
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x105eab: ?? at ??:0
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x514081a: seastar::file_desc::read(void*, unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/posix.hh:224
(inlined by) seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34::operator()() const at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:289
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x5140bc4: decltype(static_cast<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&>(fp)()) std::__1::__invoke<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&>(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3640
(inlined by) std::__1::invoke_result<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&>::type std::__1::invoke<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&>(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:93
(inlined by) auto seastar::internal::future_invoke<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::internal::monostate>(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::internal::monostate&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1223
(inlined by) seastar::future<seastar::temporary_buffer<char> > seastar::future<void>::then_impl_nrvo<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > >(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&)::operator()(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&) const::'lambda'()::operator()() const at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1596
(inlined by) void seastar::futurize<seastar::future<seastar::temporary_buffer<char> > >::satisfy_with_result_of<seastar::future<seastar::temporary_buffer<char> > seastar::future<void>::then_impl_nrvo<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > >(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&)::operator()(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&) const::'lambda'()>(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:2134
(inlined by) seastar::future<seastar::temporary_buffer<char> > seastar::future<void>::then_impl_nrvo<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > >(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&)::operator()(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&) const at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1589
(inlined by) seastar::continuation<seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > seastar::future<void>::then_impl_nrvo<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > >(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&), void>::run_and_dispose() at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:781
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x510b25f: seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2330
In userspace this is is in the do_read_some
call, but ultimately the stack is in the kernel inside read(2)
. We have already confirmed the FD is readable, so this should be doing only on-CPU work, not blocking. Decoding the kernel trace would be important here, but this can only be done on the machine which produced it (due to dependence on the exact kernel image and also KASLR). @brianzer0 is that something you could help with?
This is not the first time I've seen stalls in do_read_some
.
I seem to be having issues decoding traces, can someone help instruct me on this?
Also here is a more recent trace:
Jan 5 21:46:31 cdn-maple-as-132 rpk[850584]: Reactor stalled for 52 ms on shard 81. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x105eab 0x514081a 0x5140bc4 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Jan 5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e61a 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan 5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e4fd 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan 5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e4fd 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan 5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e61a 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan 5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098eeb6 0xffffffffc098f6a6 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d1c7cb 0xffffffff82d2a2f0 0xffffffff82c35080 0xffffffff82c350eb 0xffffffff82d0dc83 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan 5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e61a 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan 5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e4fd 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan 5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098eeb6 0xffffffffc098f6a6 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d2a2f0 0xffffffff82c35080 0xffffffff82c350eb 0xffffffff82d0f65d 0xffffffff82d47f3c 0xffffffff82c2c9d2 0xffffffff8277112a 0xffffffff827737b3 0xffffffff82773e37 0xffffffff82e7bb1b 0xffffffff8300007c
22.x is EOL
This issue was split out of https://github.com/redpanda-data/redpanda/issues/7853 to track the reactor stall specific part of the issue.
Version & Environment
Redpanda version: (use
rpk version
): v22.3.4 (rev 5be3e8e) Cluster Info: 10 node bare-metal Host:Operating System (e.g. from
/etc/os-release
):What went wrong?
After upgrading from 22.2.4 to 22.3.4 I am seeing lots of reactor stalls and timeouts (1+ stall per second per node). This is with one producer sending ~200k records/s (~50mb/s) of data (Custom java producer using official kafka producer libs).
What should have happened instead?
Stalls and timeouts shouldn't happen... This is concerning and I don't want to use this cluster with all the errors.
How to reproduce the issue?
Additional information
Here is a log snippet:
JIRA Link: CORE-1117