redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.59k stars 584 forks source link

Assertion in seastar fair_queue under low memory conditions #3521

Open NyaliaLui opened 2 years ago

NyaliaLui commented 2 years ago

Version & Environment

Redpanda version: v21.11.3-si-beta8

The following backtrace was seen on BYOC during long running tests for shadow indexing.

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:754
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:784
 (inlined by) seastar::print_with_backtrace(char const*, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:796
 (inlined by) seastar::sigsegv_action() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3568
 (inlined by) operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3554
 (inlined by) __invoke at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3550
?? ??:0
seastar::fair_queue::dispatch_requests(std::__1::function<void (seastar::fair_queue_entry&)>) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/fair_queue.cc:260
seastar::io_queue::poll_io_queue() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/io_queue.cc:581
seastar::reactor::flush_pending_aio() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1535
 (inlined by) seastar::reactor::io_queue_submission_pollfn::poll() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2398
seastar::reactor::poll_once() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2904
 (inlined by) operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2802
 (inlined by) decltype ((std::__1::forward<seastar::reactor::do_run()::$_76&>({parm#1}))()) std::__1::__invoke<seastar::reactor::do_run()::$_76&>(seastar::reactor::do_run()::$_76&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3694
 (inlined by) bool std::__1::__invoke_void_return_wrapper<bool, false>::__call<seastar::reactor::do_run()::$_76&>(seastar::reactor::do_run()::$_76&) at /vectorized/llvm/bin/../include/c++/v1/__functional_base:317
 (inlined by) std::__1::__function::__alloc_func<seastar::reactor::do_run()::$_76, std::__1::allocator<seastar::reactor::do_run()::$_76>, bool ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1558
 (inlined by) std::__1::__function::__func<seastar::reactor::do_run()::$_76, std::__1::allocator<seastar::reactor::do_run()::$_76>, bool ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1732
std::__1::__function::__value_func<bool ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:1885
 (inlined by) std::__1::function<bool ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:2560
 (inlined by) seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2828
operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3996
 (inlined by) decltype ((std::__1::forward<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&>({parm#1}))()) std::__1::__invoke<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&>(seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3694
 (inlined by) void std::__1::__invoke_void_return_wrapper<void, true>::__call<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&>(seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&) at /vectorized/llvm/bin/../include/c++/v1/__functional_base:348
 (inlined by) std::__1::__function::__alloc_func<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92, std::__1::allocator<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1558
 (inlined by) std::__1::__function::__func<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92, std::__1::allocator<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1732
std::__1::__function::__value_func<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:1885
 (inlined by) std::__1::function<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:2560
 (inlined by) seastar::posix_thread::start_routine(void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/posix.cc:60
/opt/redpanda/lib/libpthread.so.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7b0cdaf878ab4f99078439d864af70a5fd7b5a2c, for GNU/Linux 3.2.0, stripped

JIRA Link: CORE-816

NyaliaLui commented 2 years ago

The grafana logs before the segfault:

2022-01-16T15:59:50.042059903Z stderr F INFO  2022-01-16 15:59:50,041 [shard 5] cluster - state_machine.cc:139 - State machine for ntp={kafka/test-2k-e/1360} handles std::bad_alloc (std::bad_alloc)
2022-01-16T15:59:50.042196205Z stderr F INFO  2022-01-16 15:59:50,042 [shard 5] cluster - state_machine.cc:139 - State machine for ntp={kafka/test-2k-e/1392} handles std::bad_alloc (std::bad_alloc)
2022-01-16T15:59:50.047890614Z stderr F INFO  2022-01-16 15:59:50,047 [shard 11] cluster - state_machine.cc:139 - State machine for ntp={kafka/test-2k-e/220} handles std::bad_alloc (std::bad_alloc)
2022-01-16T15:59:50.049975165Z stderr F INFO  2022-01-16 15:59:50,049 [shard 5] cluster - state_machine.cc:139 - State machine for ntp={kafka/test-2k-e/1968} handles std::bad_alloc (std::bad_alloc)
2022-01-16T15:59:50.059292565Z stderr F WARN  2022-01-16 15:59:50,059 [shard 2] raft - [group_id:16808, {kafka/test-2k-e/1418}] consensus.cc:473 - Node {id: {5}, revision: {1115}} recovery failed - std::bad_alloc (std::bad_alloc)
2022-01-16T15:59:50.075019872Z stderr F INFO  2022-01-16 15:59:50,074 [shard 0] cluster - state_machine.cc:139 - State machine for ntp={kafka/test-2k-e/1830} handles std::bad_alloc (std::bad_alloc)
2022-01-16T15:59:50.075653681Z stderr F INFO  2022-01-16 15:59:50,075 [shard 0] cluster - state_machine.cc:139 - State machine for ntp={kafka/test-2k-e/1798} handles std::bad_alloc (std::bad_alloc)
2022-01-16T15:59:50.078474045Z stderr F INFO  2022-01-16 15:59:50,078 [shard 0] cluster - state_machine.cc:139 - State machine for ntp={kafka/test-2k-e/1734} handles std::bad_alloc (std::bad_alloc)
2022-01-16T15:59:50.09439617Z stderr F INFO  2022-01-16 15:59:50,094 [shard 9] cluster - state_machine.cc:139 - State machine for ntp={kafka/test-2k-e/1176} handles std::bad_alloc (std::bad_alloc)
NyaliaLui commented 2 years ago

Brokers were configured with the new storage_read_buffer_size and storage_read_readahead_count configs: Do kubectl edit cluster -n <namespace> <cluster name> to get

spec:
   additionalConfiguration:
     redpanda.default_topic_replications: "3"
     redpanda.id_allocator_replication: "3"
     redpanda.storage_read_buffer_size: "32768"
     redpanda.storage_read_readahead_count: "2"

The topic:

SUMMARY
=======
NAME        test-2k-e
PARTITIONS  2048
REPLICAS    3

CONFIGS
=======
KEY                     VALUE                          SOURCE
cleanup.policy          delete                         DYNAMIC_TOPIC_CONFIG
compression.type        producer                       DEFAULT_CONFIG
message.timestamp.type  CreateTime                     DEFAULT_CONFIG
partition_count         2048                           DYNAMIC_TOPIC_CONFIG
redpanda.datapolicy     function_name:  script_name:   DEFAULT_CONFIG
redpanda.remote.read    true                           DYNAMIC_TOPIC_CONFIG
redpanda.remote.write   true                           DYNAMIC_TOPIC_CONFIG
replication_factor      3                              DYNAMIC_TOPIC_CONFIG
retention.bytes         2147483648                     DYNAMIC_TOPIC_CONFIG
retention.ms            604800000                      DEFAULT_CONFIG
segment.bytes           1073741824                     DYNAMIC_TOPIC_CONFIG
NyaliaLui commented 2 years ago

Possibly related to #3458

The segfault from this issue and 3458 have different back traces but they both occurred after bad_allocs on the state_machine

jcsp commented 2 years ago

This is an assertion in the seastar storage code, suggesting it has been called with a priority class that is not registered.

 259 void fair_queue::update_shares_for_class(class_id id, uint32_t shares) {
> 260     assert(id < _priority_classes.size());
 261     auto& pc = _priority_classes[id];
 262     assert(pc);
 263     pc->update_shares(shares);
 264 }
NyaliaLui commented 2 years ago

I had to go farther in the grafana logs but there was also the following Exceptional future ignored around the same time as the segfault

Exceptional future ignored: std::bad_alloc (std::bad_alloc), backtrace: 0x3a559e4 0x3741bb2 0x154b02f 0x15bd83b 0x384955c 0x381e628 0x37d4523 0x382433d 0x378201f /opt/redpanda/lib/libpthread.so.0+0x9298 /opt/redpanda/lib/libc.so.6+0x1006a2

With this backtrace:

[Backtrace #0]
void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_3>(seastar::current_backtrace_tasklocal()::$_3&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::current_backtrace_tasklocal() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:86
 (inlined by) seastar::current_tasktrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:137
 (inlined by) seastar::current_backtrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:170
seastar::report_failed_future(std::exception_ptr const&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/future.cc:210
 (inlined by) seastar::report_failed_future(seastar::future_state_base::any&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/future.cc:218
seastar::future_state_base::any::check_failure() at /vectorized/include/seastar/core/future.hh:567
 (inlined by) seastar::future_state<seastar::internal::monostate>::clear() at /vectorized/include/seastar/core/future.hh:609
 (inlined by) ~future_state at /vectorized/include/seastar/core/future.hh:614
 (inlined by) ~future at /vectorized/include/seastar/core/scheduling.hh:43
 (inlined by) raft::consensus::maybe_step_down() at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09f40697b81d3241f-1/vectorized/redpanda/vbuild/release/clang/../../../src/v/raft/consensus.cc:140
operator() at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09f40697b81d3241f-1/vectorized/redpanda/vbuild/release/clang/../../../src/v/raft/consensus.cc:104
 (inlined by) seastar::noncopyable_function<void ()>::direct_vtable_for<raft::consensus::consensus(detail::base_named_type<int, model::node_id_model_type, std::__1::integral_constant<bool, true> >, detail::base_named_type<long, raft::raft_group_id_type, std::__1::integral_constant<bool, true> >, raft::group_configuration, simple_time_jitter<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >, storage::log, raft::scheduling_config, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >, raft::consensus_client_protocol, seastar::noncopyable_function<void (raft::leadership_status)>, storage::api&, std::__1::optional<std::__1::reference_wrapper<raft::recovery_throttle> >)::$_32>::call(seastar::noncopyable_function<void ()> const*) at /vectorized/include/seastar/util/noncopyable_function.hh:124
seastar::noncopyable_function<void ()>::operator()() const at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/noncopyable_function.hh:209
 (inlined by) void seastar::reactor::complete_timers<seastar::timer_set<seastar::timer<seastar::lowres_clock>, &seastar::timer<seastar::lowres_clock>::_link>, boost::intrusive::list<seastar::timer<seastar::lowres_clock>, boost::intrusive::member_hook<seastar::timer<seastar::lowres_clock>, boost::intrusive::list_member_hook<>, &seastar::timer<seastar::lowres_clock>::_link> >, seastar::reactor::do_expire_lowres_timers()::$_65>(seastar::timer_set<seastar::timer<seastar::lowres_clock>, &seastar::timer<seastar::lowres_clock>::_link>&, boost::intrusive::list<seastar::timer<seastar::lowres_clock>, boost::intrusive::member_hook<seastar::timer<seastar::lowres_clock>, boost::intrusive::list_member_hook<>, &seastar::timer<seastar::lowres_clock>::_link> >&, seastar::reactor::do_expire_lowres_timers()::$_65&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1242
 (inlined by) seastar::reactor::do_expire_lowres_timers() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2280
 (inlined by) seastar::reactor::lowres_timer_pollfn::poll() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2443
seastar::reactor::poll_once() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2904
 (inlined by) operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2802
 (inlined by) decltype ((std::__1::forward<seastar::reactor::do_run()::$_76&>({parm#1}))()) std::__1::__invoke<seastar::reactor::do_run()::$_76&>(seastar::reactor::do_run()::$_76&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3694
 (inlined by) bool std::__1::__invoke_void_return_wrapper<bool, false>::__call<seastar::reactor::do_run()::$_76&>(seastar::reactor::do_run()::$_76&) at /vectorized/llvm/bin/../include/c++/v1/__functional_base:317
 (inlined by) std::__1::__function::__alloc_func<seastar::reactor::do_run()::$_76, std::__1::allocator<seastar::reactor::do_run()::$_76>, bool ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1558
 (inlined by) std::__1::__function::__func<seastar::reactor::do_run()::$_76, std::__1::allocator<seastar::reactor::do_run()::$_76>, bool ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1732
std::__1::__function::__value_func<bool ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:1885
 (inlined by) std::__1::function<bool ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:2560
 (inlined by) seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2828
operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3996
 (inlined by) decltype ((std::__1::forward<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&>({parm#1}))()) std::__1::__invoke<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&>(seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3694
 (inlined by) void std::__1::__invoke_void_return_wrapper<void, true>::__call<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&>(seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&) at /vectorized/llvm/bin/../include/c++/v1/__functional_base:348
 (inlined by) std::__1::__function::__alloc_func<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92, std::__1::allocator<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1558
 (inlined by) std::__1::__function::__func<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92, std::__1::allocator<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1732
std::__1::__function::__value_func<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:1885
 (inlined by) std::__1::function<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:2560
 (inlined by) seastar::posix_thread::start_routine(void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/posix.cc:60
/opt/redpanda/lib/libpthread.so.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7b0cdaf878ab4f99078439d864af70a5fd7b5a2c, for GNU/Linux 3.2.0, stripped