redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.69k stars 590 forks source link

Reactor stalled for 43ms in `read_iobuf_exactly` #10423

Open ballard26 opened 1 year ago

ballard26 commented 1 year ago

In: https://ci-artifacts.dev.vectorized.cloud:443/vtools/7317/0187c1bf-8e3c-401e-bda2-49c5fa727564/vbuild/output/ducktape-release-clang-amd64.tgz

Backtrace:

WARNING:root:decoded: tmp/ducktape/results/2023-04-27--001/PartitionBalancerScaleTest/test_partition_balancer_with_many_partitions/type=big_partitions/293/RedpandaService-0-139902321644048/ip-172-31-2-158/redpanda.log:2230:Reactor stalled for 43 ms on shard 0. Backtrace:
[Backtrace #78]
{tmp/libexec/redpanda} 0x636072f: void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::'lambda'(seastar::frame)>(seastar::backtrace_buffer::append_backtrace_oneline()::'lambda'(seastar::frame)&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:60
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:816
{tmp/libexec/redpanda} 0x6361b42: seastar::internal::cpu_stall_detector::generate_trace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1386
 (inlined by) seastar::internal::cpu_stall_detector::maybe_report() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1143
 (inlined by) seastar::reactor::block_notifier(int) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1423
{tmp/libexec/redpanda} 0x42abf: ?? at ??:0
{tmp/libexec/redpanda} 0x11753b: ?? at ??:0
{tmp/libexec/redpanda} 0x6419d65: seastar::file_desc::recv(void*, unsigned long, int) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/posix.hh:238
 (inlined by) seastar::reactor_backend_uring::recv_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor_backend.cc:1883
{tmp/libexec/redpanda} 0x635442c: seastar::pollable_fd_state::recv_some(seastar::internal::buffer_allocator*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:463
{tmp/libexec/redpanda} 0x655d90b: seastar::pollable_fd::recv_some(seastar::internal::buffer_allocator*) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/internal/pollable_fd.hh:177
 (inlined by) seastar::net::posix_data_source_impl::get() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/net/posix-stack.cc:597
{tmp/libexec/redpanda} 0x61ed4ad: seastar::data_source::get() at /vectorized/include/seastar/core/iostream.hh:77
 (inlined by) seastar::input_stream<char>::read_up_to(unsigned long) at /vectorized/include/seastar/core/iostream-impl.hh:261
 (inlined by) read_iobuf_exactly(seastar::input_stream<char>&, unsigned long)::$_3::operator()(iobuf&, unsigned long&) const::'lambda0'()::operator()() const at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-05222bd6360c4da24-1/redpanda/redpanda/src/v/bytes/iobuf.cc:64
{tmp/libexec/redpanda} 0x61ee838: seastar::internal::do_until_state<read_iobuf_exactly(seastar::input_stream<char>&, unsigned long)::$_3::operator()(iobuf&, unsigned long&) const::'lambda'(), read_iobuf_exactly(seastar::input_stream<char>&, unsigned long)::$_3::operator()(iobuf&, unsigned long&) const::'lambda0'()>::run_and_dispose() at /vectorized/include/seastar/core/loop.hh:299
{tmp/libexec/redpanda} 0x638204f: seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2557
 (inlined by) seastar::reactor::run_some_tasks() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3020
{tmp/libexec/redpanda} 0x6385cc7: seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3189
{tmp/libexec/redpanda} 0x6383099: seastar::reactor::run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3072
{tmp/libexec/redpanda} 0x6298a81: seastar::app_template::run_deprecated(int, char**, std::__1::function<void ()>&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/app-template.cc:265
{tmp/libexec/redpanda} 0x6296b9f: seastar::app_template::run(int, char**, std::__1::function<seastar::future<int> ()>&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/app-template.cc:156
{tmp/libexec/redpanda} 0x1ff05de: application::run(int, char**) at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-05222bd6360c4da24-1/redpanda/redpanda/src/v/redpanda/application.cc:330
{tmp/libexec/redpanda} 0x66b4889: main at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-05222bd6360c4da24-1/redpanda/redpanda/src/v/redpanda/main.cc:22
{tmp/libexec/redpanda} 0x2d58f: ?? at ??:0
{tmp/libexec/redpanda} 0x2d648: ?? at ??:0
{tmp/libexec/redpanda} 0x1feaaa4: _start at ??:0

JIRA Link: CORE-1292

ballard26 commented 1 year ago

Haven't been able to replicate this issue. However, at 25ms for a reactor stall I am seeing a lot of false positives for kernel interrupts. One common interrupt I'm seeing is;

[Backtrace #1]
{<kernel>} 0xffffffffffffff80: 0xffffffffffffff80 (1069588464 bytes after last symbol)
{<kernel>} 0xffffffff93921401: _raw_spin_unlock_irqrestore+0x21
{<kernel>} 0xffffffff92b3887a: __wake_up_common_lock+0x8a
{<kernel>} 0xffffffff92b38fc0: __wake_up_sync_key+0x20
{<kernel>} 0xffffffff935e908b: sock_def_readable+0x3b
{<kernel>} 0xffffffff936f5601: tcp_data_ready+0x31
{<kernel>} 0xffffffff936f637c: tcp_rcv_established+0x61c
{<kernel>} 0xffffffff9370537e: tcp_v4_do_rcv+0x16e
{<kernel>} 0xffffffff93708671: tcp_v4_rcv+0xea1
{<kernel>} 0xffffffff936cf8bc: ip_protocol_deliver_rcu+0x3c
{<kernel>} 0xffffffff936cfbb1: ip_local_deliver_finish+0x81
{<kernel>} 0xffffffff936cfc4e: ip_local_deliver+0x6e
{<kernel>} 0xffffffff936cfd4f: ip_sublist_rcv_finish+0x6f
{<kernel>} 0xffffffff936cfedd: ip_sublist_rcv+0x17d
{<kernel>} 0xffffffff936d02c9: ip_list_rcv+0xf9
{<kernel>} 0xffffffff93616719: __netif_receive_skb_list_core+0x229
{<kernel>} 0xffffffff93616d2e: netif_receive_skb_list_internal+0x18e
{<kernel>} 0xffffffff9361743a: napi_complete_done+0x7a
{<kernel>} 0xffffffffc024a5c2: ena_io_poll+0x112
{<kernel>} 0xffffffff936175d3: __napi_poll+0x33
{<kernel>} 0xffffffff93617a35: net_rx_action+0x185
{<kernel>} 0xffffffff93c000d8: __softirqentry_text_start+0xd8
{<kernel>} 0xffffffff92ad1edd: __irq_exit_rcu+0x8d
{<kernel>} 0xffffffff92ad223e: irq_exit_rcu+0xe
{<kernel>} 0xffffffff9390cc2a: common_interrupt+0x4a
{<kernel>} 0xffffffff93a00de7: asm_common_interrupt+0x27
github-actions[bot] commented 4 months ago

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

dotnwat commented 4 months ago

i can't imagine that read_iobuf_exactly would stall for 43ms in user space. does reactor stall reports give any indication of stall time in user vs kernel space?