redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.65k stars 589 forks source link

[archival] Assertion when losing leadership and uploading to cloud storage #10242

Closed rockwotj closed 1 year ago

rockwotj commented 1 year ago

I reproduced this today in cdt.

DEBUG 2023-04-20 18:47:09,037 [shard 0] archival - [fiber792~8|0|30000ms kafka/t-repeater/1354] - ntp_archiver_service.cc:911 - Uploading segment {source segme
nt offsets: {term:1, base_offset:1, committed_offset:19, dirty_offset:19}, exposed_name: {1-1-v1.log}, starting_offset: 1, file_offset: 0, content_length: 6246
57, final_offset: 19, final_file_offset: 624657, term: 1, source names: {/var/lib/redpanda/data/kafka/t-repeater/1354_18/1-1-v1.log}} to {"d73894d5/kafka/t-rep
eater/1354_18/1-19-624657-1-v1.log.2"}
DEBUG 2023-04-20 18:47:09,037 [shard 0] cloud_storage - [fiber792~8~0|1|30000ms] - remote.cc:411 - Uploading segment to path {"d73894d5/kafka/t-repeater/1354_1
8/1-19-624657-1-v1.log.2"}, length 624657
DEBUG 2023-04-20 18:47:09,037 [shard 0] archival - [fiber792 kafka/t-repeater/1354] - ntp_archiver_service.cc:1161 - creating remote segment index: d73894d5/ka
fka/t-repeater/1354_18/1-19-624657-1-v1.log.2.index
...
WARN  2023-04-20 18:47:19,615 [shard 0] cloud_storage - [fiber792~8~0|1|19422ms] - remote.cc:426 - lost leadership or term changed during upload, current leade
rship status: false, current term: 2, original term: 2: cancelled uploading {"d73894d5/kafka/t-repeater/1354_18/1-19-624657-1-v1.log.2"} to {panda-bucket-8e193
03a-dfab-11ed-90f8-372391025d12}
...
DEBUG 2023-04-20 18:47:19,624 [shard 0] client_pool - client_pool.cc:317 - releasing a client, pool size: 0, capacity: 20
WARN  2023-04-20 18:47:19,624 [shard 0] cloud_storage - [fiber216~11~0|1|19224ms] - remote.cc:500 - Uploading segment {"b78d29d4/kafka/t-repeater/1610_18/33-53-690406-1-v1.log.2"} to {panda-bucket-8e19303a-dfab-11ed-90f8-372391025d12}, {failed}, segment not uploaded
DEBUG 2023-04-20 18:47:19,624 [shard 0] http - iobuf_body.cc:82 - reader - finish called
INFO  2023-04-20 18:47:19,624 [shard 0] storage - segment.cc:735 - Creating new segment /var/lib/redpanda/data/kafka/t-repeater/454_18/86-2-v1.log
DEBUG 2023-04-20 18:47:19,625 [shard 0] storage - segment_reader.cc:130 - Opening segment file /var/lib/redpanda/data/kafka/t-repeater/454_18/86-2-v1.log
DEBUG 2023-04-20 18:47:19,625 [shard 2] tx - [{kafka/t-repeater/1832}] - rm_stm.cc:2860 - Removing abort indexes 0 with offset < 0
INFO  2023-04-20 18:47:19,625 [shard 0] storage - segment.cc:735 - Creating new segment /var/lib/redpanda/data/kafka/t-repeater/502_18/84-2-v1.log
DEBUG 2023-04-20 18:47:19,625 [shard 0] storage - segment_reader.cc:130 - Opening segment file /var/lib/redpanda/data/kafka/t-repeater/502_18/84-2-v1.log
DEBUG 2023-04-20 18:47:19,625 [shard 2] cluster - ntp: {kafka/t-repeater/1832} - archival_metadata_stm.cc:731 - creating snapshot at offset: 85, remote start_offset: -9223372036854775808, last_offset: 0
DEBUG 2023-04-20 18:47:19,626 [shard 1] dns_resolver - Query name s3.us-west-2.amazonaws.com (INET)
INFO  2023-04-20 18:47:19,626 [shard 1] kafka - server.cc:159 - Disconnected 172.31.1.168:40024 (shutting down, Connection reset by peer)
DEBUG 2023-04-20 18:47:19,626 [shard 2] tx - [{kafka/t-repeater/1552}] - rm_stm.cc:2860 - Removing abort indexes 0 with offset < 0
DEBUG 2023-04-20 18:47:19,627 [shard 2] cluster - ntp: {kafka/t-repeater/1552} - archival_metadata_stm.cc:731 - creating snapshot at offset: 92, remote start_offset: -9223372036854775808, last_offset: 0
DEBUG 2023-04-20 18:47:19,627 [shard 2] tx - [{kafka/t-repeater/488}] - rm_stm.cc:2860 - Removing abort indexes 0 with offset < 0
DEBUG 2023-04-20 18:47:19,627 [shard 2] cluster - ntp: {kafka/t-repeater/488} - archival_metadata_stm.cc:731 - creating snapshot at offset: 92, remote start_offset: -9223372036854775808, last_offset: 0
DEBUG 2023-04-20 18:47:19,627 [shard 0] tx - [{kafka/t-repeater/938}] - rm_stm.cc:2860 - Removing abort indexes 0 with offset < 0
DEBUG 2023-04-20 18:47:19,627 [shard 0] cluster - ntp: {kafka/t-repeater/938} - archival_metadata_stm.cc:731 - creating snapshot at offset: 98, remote start_offset: -9223372036854775808, last_offset: 0
WARN  2023-04-20 18:47:19,627 [shard 0] archival - [fiber792 kafka/t-repeater/1354] - ntp_archiver_service.cc:1050 - skipping segment index upload to d73894d5/kafka/t-repeater/1354_18/1-19-624657-1-v1.log.2.index because segment upload was unsuccessful: {"d73894d5/kafka/t-repeater/1354_18/1-19-624657-1-v1.log.2"}
ERROR 2023-04-20 18:47:19,627 [shard 0] assert - Assert failure: (/home/rockwood/Workspace/redpanda/src/v/storage/segment_reader.cc:241) '_parent == nullptr' Must close before destroying
DEBUG 2023-04-20 18:47:19,627 [shard 1] storage - readers_cache.cc:225 - {kafka/t-repeater/1255} - evicting reader from cache, range [70,102]
ERROR 2023-04-20 18:47:19,627 [shard 0] assert - Backtrace below:
0x62841b4 0x54c925e 0x54d6fe0 0x53ae51d 0x5367acf 0x53aa38a 0x6005def 0x6009a67 0x6006e39 0x5f1c821 0x5f1a93f 0x1f1ad1e 0x63388e9 /opt/redpanda/lib/libc.so.6+0x29d8f /opt/redpanda/lib/libc.so.6+0x29e3f 0x1f15204
   --------
   seastar::internal::coroutine_traits_base<cloud_storage::upload_result>::promise_type

Here's the decoded stacktrace:


[Backtrace #0]
void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_3>(seastar::current_backtrace_tasklocal()::$_3&&) at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:60
 (inlined by) seastar::current_backtrace_tasklocal() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:86
 (inlined by) seastar::current_tasktrace() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:137
 (inlined by) seastar::current_backtrace() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:170
storage::segment_reader_handle::~segment_reader_handle() at /home/rockwood/Workspace/redpanda/src/v/storage/segment_reader.cc:241
std::__1::__optional_destruct_base<storage::segment_reader_handle, false>::~__optional_destruct_base() at /home/rockwood/Workspace/redpanda/vbuild/llvm/install/bin/../include/c++/v1/optional:238
 (inlined by) storage::concat_segment_data_source_impl::~concat_segment_data_source_impl() at /home/rockwood/Workspace/redpanda/src/v/storage/segment_reader.h:190
 (inlined by) storage::concat_segment_data_source_impl::~concat_segment_data_source_impl() at /home/rockwood/Workspace/redpanda/src/v/storage/segment_reader.h:190
seastar::lw_shared_ptr_no_esft<detail::input_stream_fanout<char> >::~lw_shared_ptr_no_esft() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/shared_ptr.hh:164
 (inlined by) seastar::internal::lw_shared_ptr_accessors_no_esft<detail::input_stream_fanout<char> >::dispose(seastar::lw_shared_ptr_counter_base*) at /home/rockwood/Workspace/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/shared_ptr.hh:220
 (inlined by) seastar::lw_shared_ptr<detail::input_stream_fanout<char> >::~lw_shared_ptr() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/shared_ptr.hh:317
 (inlined by) detail::fanout_data_source<char>::~fanout_data_source() at /home/rockwood/Workspace/redpanda/src/v/utils/stream_utils.h:222
 (inlined by) detail::fanout_data_source<char>::~fanout_data_source() at /home/rockwood/Workspace/redpanda/src/v/utils/stream_utils.h:222
std::__1::default_delete<seastar::data_source_impl>::operator()(seastar::data_source_impl*) const at /home/rockwood/Workspace/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__memory/unique_ptr.h:54
 (inlined by) std::__1::unique_ptr<seastar::data_source_impl, std::__1::default_delete<seastar::data_source_impl> >::reset(seastar::data_source_impl*) at /home/rockwood/Workspace/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__memory/unique_ptr.h:315
 (inlined by) std::__1::unique_ptr<seastar::data_source_impl, std::__1::default_delete<seastar::data_source_impl> >::~unique_ptr() at /home/rockwood/Workspace/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__memory/unique_ptr.h:269
 (inlined by) seastar::data_source::~data_source() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/iostream.hh:63
 (inlined by) seastar::input_stream<char>::~input_stream() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/iostream.hh:262
 (inlined by) std::__1::pair<seastar::input_stream<char>, seastar::input_stream<char> >::~pair() at /home/rockwood/Workspace/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__utility/pair.h:40
 (inlined by) archival::ntp_archiver::upload_segment(archival::upload_candidate, std::__1::vector<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>, std::__1::allocator<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> > >, std::__1::optional<std::__1::reference_wrapper<retry_chain_node> >) (.resume) at /home/rockwood/Workspace/redpanda/src/v/archival/ntp_archiver_service.cc:1086
std::__1::coroutine_handle<seastar::internal::coroutine_traits_base<cloud_storage::upload_result>::promise_type>::resume() const at /home/rockwood/Workspace/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__coroutine/coroutine_handle.h:168
 (inlined by) seastar::internal::coroutine_traits_base<cloud_storage::upload_result>::promise_type::run_and_dispose() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/coroutine.hh:78
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2557
 (inlined by) seastar::reactor::run_some_tasks() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3020
seastar::reactor::do_run() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3189
seastar::reactor::run() at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3072
seastar::app_template::run_deprecated(int, char**, std::__1::function<void ()>&&) at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/app-template.cc:265
seastar::app_template::run(int, char**, std::__1::function<seastar::future<int> ()>&&) at /home/rockwood/Workspace/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/app-template.cc:156
application::run(int, char**) at /home/rockwood/Workspace/redpanda/src/v/redpanda/application.cc:330
main at /home/rockwood/Workspace/redpanda/src/v/redpanda/main.cc:22
LLVMSymbolizer: error reading file: No such file or directory
?? at ??:0
?? at ??:0
_start at ??:0

Should I reopen this or create a new issue?

Originally posted by @rockwotj in https://github.com/redpanda-data/redpanda/issues/8427#issuecomment-1516905092

rockwotj commented 1 year ago

Sweet I found the issue. I need to figure out how to write a test, but I'll raise a PR and seek some help on said test.

piyushredpanda commented 1 year ago

Nice work, @rockwotj !