redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.74k stars 592 forks source link

storage: failed assert at segment_appender.cc:321 '_prev_head_write->available_units() == 1' #3495

Closed NyaliaLui closed 2 years ago

NyaliaLui commented 2 years ago

Version & Environment

Redpanda version: v21.11.3-si-beta7 Shadow Index Cache has 300GB of space On BYOC

The failed assert

ERROR 2022-01-13 20:47:32,705 [shard 9] assert - Assert failure: (../../../src/v/storage/segment_appender.cc:321) '_prev_head_write->available_units() == 1' Unexpected pending head write {no_of_chunks:64, closed:0, fallocation_offset:33554432, committed_offset:33431920, bytes_flush_pending:0}
ERROR 2022-01-13 20:47:32,705 [shard 9] assert - Backtrace below:
0x3a4ff84 0x32525e1 0x3252c49 0x37cb5d4 0x37cea97 0x381e8dd 0x377c5bf /opt/redpanda/lib/libpthread.so.0+0x9298 /opt/redpanda/lib/libc.so.6+0x1006a2

Topic:

SUMMARY
=======
NAME        test-2k
PARTITIONS  2048
REPLICAS    3

CONFIGS
=======
KEY                     VALUE                          SOURCE
cleanup.policy          delete                         DYNAMIC_TOPIC_CONFIG
compression.type        producer                       DEFAULT_CONFIG
message.timestamp.type  CreateTime                     DEFAULT_CONFIG
partition_count         2048                           DYNAMIC_TOPIC_CONFIG
redpanda.datapolicy     function_name:  script_name:   DEFAULT_CONFIG
redpanda.remote.read    true                           DYNAMIC_TOPIC_CONFIG
redpanda.remote.write   true                           DYNAMIC_TOPIC_CONFIG
replication_factor      3                              DYNAMIC_TOPIC_CONFIG
retention.bytes         134217728                      DYNAMIC_TOPIC_CONFIG
retention.ms            300000                         DYNAMIC_TOPIC_CONFIG
segment.bytes           67108864                       DYNAMIC_TOPIC_CONFIG

Brokers were configured with the new storage_read_buffer_size and storage_read_readahead_count configs: Do kubectl edit cluster -n <namespace> <cluster name> to get

spec:
   additionalConfiguration:
     redpanda.default_topic_replications: "3"
     redpanda.id_allocator_replication: "3"
     redpanda.storage_read_buffer_size: "4096"
     redpanda.storage_read_readahead_count: "1"
NyaliaLui commented 2 years ago

From the backtrace

[Backtrace #0]
void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_3>(seastar::current_backtrace_tasklocal()::$_3&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::current_backtrace_tasklocal() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:86
 (inlined by) seastar::current_tasktrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:137
 (inlined by) seastar::current_backtrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:170
?? ??:0
decltype ((std::__1::forward<seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&>({parm#1}))(std::__1::forward<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >({parm#2}))) std::__1::__invoke<seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&, seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >(seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&, seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3694
 (inlined by) std::__1::invoke_result<seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&, seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >::type std::__1::invoke<seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&, seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >(seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&, seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&&) at /vectorized/llvm/bin/../include/c++/v1/functional:2989
 (inlined by) auto seastar::internal::future_invoke<seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&, seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >(seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&) at /vectorized/include/seastar/core/future.hh:1211
 (inlined by) operator() at /vectorized/include/seastar/core/future.hh:1582
 (inlined by) void seastar::futurize<seastar::future<void> >::satisfy_with_result_of<seastar::future<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >::then_impl_nrvo<seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}, seastar::future<void> >(seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<auto:1, auto:3>&, unsigned long, auto:2&&)::{lambda(auto:1)#1}&, seastar::future_state<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&, seastar::future_state<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<void>&&, seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&&) at /vectorized/include/seastar/core/future.hh:2120
 (inlined by) operator() at /vectorized/include/seastar/core/future.hh:1575
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}, seastar::future<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >::then_impl_nrvo<seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}, seastar::future<void> >(seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, storage::segment_appender::do_next_adaptive_fallocation()::$_17&&)::{lambda(auto:1)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::with_semaphore<seastar::semaphore_default_exception_factory, storage::segment_appender::do_next_adaptive_fallocation()::$_17, std::__1::chrono::steady_clock>(seastar::basic_semaphore<auto:1, auto:3>&, unsigned long, auto:2&&)::{lambda(auto:1)#1}&, seastar::future_state<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >&&)#1}, seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >::run_and_dispose() at /vectorized/include/seastar/core/future.hh:767
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2230
 (inlined by) seastar::reactor::run_some_tasks() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2639
seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2808
operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3996
 (inlined by) decltype ((std::__1::forward<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&>({parm#1}))()) std::__1::__invoke<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&>(seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3694
 (inlined by) void std::__1::__invoke_void_return_wrapper<void, true>::__call<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&>(seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92&) at /vectorized/llvm/bin/../include/c++/v1/__functional_base:348
 (inlined by) std::__1::__function::__alloc_func<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92, std::__1::allocator<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1558
 (inlined by) std::__1::__function::__func<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92, std::__1::allocator<seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/functional:1732
std::__1::__function::__value_func<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:1885
 (inlined by) std::__1::function<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/functional:2560
 (inlined by) seastar::posix_thread::start_routine(void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/posix.cc:60
jcsp commented 2 years ago

@dotnwat @mmaslankaprv this look familiar to you guys at all? Came up during shadow indexing testing but looks to be in the normal append path

dotnwat commented 2 years ago

I haven't seen this before, and segment appender has been pretty solid. But I'm staring at the code now for a little while tonight and I'm not completely convinced that it couldn't happen. I need to sit down with morning brain and stare at it again.

@NyaliaLui @jcsp is this reproducible or have you seen it again?

If it does look like it should be a logically impossible scenario then we may benefit from adding more context into the assertion log message to see if we could identify any corrupt memory (e.g. by looking at the available units of the semaphore when the assertion failed).

jcsp commented 2 years ago

@dotnwat I think this was one of the crashes that wasn't exactly reproducible in isolation, but appeared among a bunch of other instability once we started throwing bad_allocs in places that didn't handle them well. @NyaliaLui is the authority on whether we've seen it more recently though.

NyaliaLui commented 2 years ago

We have not seen this failure recently and I am unsuccessful in reproducing it locally.