redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.61k stars 585 forks source link

CI Failure (segmentation fault) in `RandomNodeOperationsTest`.`test_node_operations` #9961

Closed jcsp closed 1 year ago

jcsp commented 1 year ago

https://buildkite.com/redpanda/vtools/builds/7041#01876cca-dddd-4336-9b28-c06c0a94b31b

Module: rptest.tests.random_node_operations_test
Class:  RandomNodeOperationsTest
Method: test_node_operations
Arguments:
{
  "compacted_topics": true,
  "enable_controller_snapshots": true,
  "enable_failures": true,
  "num_to_upgrade": 0
}

Failing node log: https://ci-artifacts.dev.vectorized.cloud/vtools/7041/01876cca-dddd-4336-9b28-c06c0a94b31b/vbuild/ducktape/results/2023-04-10--001/RandomNodeOperationsTest/test_node_operations/enable_failures=True.num_to_upgrade=0.compacted_topics=True.enable_controller_snapshots=True/440/RedpandaService-0-281473258371968/ip-172-31-5-161/

Segmentation fault on shard 0.
Backtrace:
  0x5395473
  0x53e8c27
  linux-vdso.so.1+0x7db
  0x38dc81f
  0x40c0c0f
  0x40d6a8f
  0x40bee83
  0x53ac78b
  0x53e6d63
  0x53b1a77
  0x53afaf7
  0x52e6347
  0x52e4b43
  0x1def177
  0x564dfdf
  /opt/redpanda/lib/libc.so.6+0x2b1c7
  /opt/redpanda/lib/libc.so.6+0x2b29f
  0x1de9f2f                   
jcsp commented 1 year ago

sev/medium because it's a crash, but it's a crash during shutdown.

jcsp commented 1 year ago

Here's an example on amd64, in a docker run:

https://buildkite.com/redpanda/redpanda/builds/26804#01876ef3-96c3-4342-a42f-5555b2205a7a

https://ci-artifacts.dev.vectorized.cloud/redpanda/26804/01876ef3-96c3-4342-a42f-5555b2205a7a/vbuild/ducktape/results/2023-04-11--001/RandomNodeOperationsTest/test_node_operations/enable_failures=True.num_to_upgrade=0.compacted_topics=False.enable_controller_snapshots=True/109/RedpandaService-0-140297778252384/docker-rp-11/

Segmentation fault on shard 0.
Backtrace:
  0x5f65666
  0x5fc8fd6
  /opt/redpanda_installs/ci/lib/libc.so.6+0x42abf
  0x3f11932
  0x488d5ad
  0x48a7ce1
  0x488b00e
  0x5f810b2
  0x5fc6998
  0x5f874c1
  0x5f84849
  0x5ea8af1
  0x5ea6c0f
  0x1ee2f0e
  0x6299259
  /opt/redpanda_installs/ci/lib/libc.so.6+0x2d58f
  /opt/redpanda_installs/ci/lib/libc.so.6+0x2d648
  0x1edd424
jcsp commented 1 year ago

Decoded the amd64 example, it looks like a crash in controller snapshot code:

std::__1::__tree_end_node<std::__1::__tree_node_base<void*>*>* std::__1::__tree_next_iter<std::__1::__tree_end_node<std::__1::__tree_node_base<void*>*>*, std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*) at /vectorized/llvm/bin/../include/c++/v1/__tree:?
 (inlined by) std::__1::__tree_const_iterator<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::__tree_node<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, void*>*, long>::operator++() at /vectorized/llvm/bin/../include/c++/v1/__tree:925
 (inlined by) std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::__tree_node<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, void*>*, long> >::operator++() at /vectorized/llvm/bin/../include/c++/v1/map:919
 (inlined by) void absl::lts_20220623::container_internal::btree<absl::lts_20220623::container_internal::map_params<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true>, std::__1::less<seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::allocator<std::__1::pair<seastar::basic_sstring<char, unsigned int, 15u, true> const, seastar::basic_sstring<char, unsigned int, 15u, true> > >, 256, false> >::insert_iterator_unique<std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::__tree_node<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, void*>*, long> >, bool>(std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::__tree_node<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, void*>*, long> >, std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::__tree_node<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, void*>*, long> >, int) at /vectorized/include/absl/container/internal/btree.h:2186
 (inlined by) void absl::lts_20220623::container_internal::btree_set_container<absl::lts_20220623::container_internal::btree<absl::lts_20220623::container_internal::map_params<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true>, std::__1::less<seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::allocator<std::__1::pair<seastar::basic_sstring<char, unsigned int, 15u, true> const, seastar::basic_sstring<char, unsigned int, 15u, true> > >, 256, false> > >::insert<std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::__tree_node<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, void*>*, long> > >(std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::__tree_node<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, void*>*, long> >, std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, std::__1::__tree_node<std::__1::__value_type<seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true> >, void*>*, long> >) at /vectorized/include/absl/container/internal/btree_container.h:323
 (inlined by) cluster::config_manager::fill_snapshot(cluster::controller_snapshot&) const at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/cluster/config_manager.cc:911
operator() at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/cluster/controller_stm.cc:80
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_1::operator()<cluster::config_manager>(cluster::config_manager&) const::{lambda()#1}>(cluster::config_manager&&) at /vectorized/include/seastar/core/future.hh:2149
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_1::operator()<cluster::config_manager>(cluster::config_manager&) const::{lambda()#1}>(cluster::config_manager&&, seastar::internal::monostate) at /vectorized/include/seastar/core/future.hh:1993
 (inlined by) seastar::future<void> seastar::future<void>::then_impl<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_1::operator()<cluster::config_manager>(cluster::config_manager&) const::{lambda()#1}, seastar::future<void> >(cluster::config_manager&&) at /vectorized/include/seastar/core/future.hh:1615
 (inlined by) seastar::internal::future_result<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_1::operator()<cluster::config_manager>(cluster::config_manager&) const::{lambda()#1}, void>::future_type seastar::internal::call_then_impl<seastar::future<void> >::run<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_1::operator()<cluster::config_manager>(cluster::config_manager&) const::{lambda()#1}>(seastar::future<void>&, cluster::config_manager&&) at /vectorized/include/seastar/core/future.hh:1248
 (inlined by) seastar::future<void> seastar::future<void>::then<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_1::operator()<cluster::config_manager>(cluster::config_manager&) const::{lambda()#1}, seastar::future<void> >(cluster::config_manager&&) at /vectorized/include/seastar/core/future.hh:1534
 (inlined by) operator()<cluster::config_manager> at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/cluster/controller_stm.cc:79
 (inlined by) operator()<cluster::topic_updates_dispatcher &, cluster::security_manager &, cluster::members_manager &, cluster::config_manager &, cluster::feature_backend &, cluster::bootstrap_backend &> at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/cluster/controller_stm.cc:83
 (inlined by) decltype ((static_cast<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_2>({parm#1}))(static_cast<cluster::topic_updates_dispatcher&>({parm#2}), static_cast<cluster::security_manager&>({parm#2}), static_cast<cluster::members_manager&>({parm#2}), static_cast<cluster::config_manager&>({parm#2}), static_cast<cluster::feature_backend&>({parm#2}), static_cast<cluster::bootstrap_backend&>({parm#2}))) std::__1::__invoke_constexpr<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_2, cluster::topic_updates_dispatcher&, cluster::security_manager&, cluster::members_manager&, cluster::config_manager&, cluster::feature_backend&, cluster::bootstrap_backend&>(cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_2&&, cluster::topic_updates_dispatcher&, cluster::security_manager&, cluster::members_manager&, cluster::config_manager&, cluster::feature_backend&, cluster::bootstrap_backend&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3648
 (inlined by) decltype(auto) std::__1::__apply_tuple_impl<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_2, std::__1::tuple<cluster::topic_updates_dispatcher&, cluster::security_manager&, cluster::members_manager&, cluster::config_manager&, cluster::feature_backend&, cluster::bootstrap_backend&>&, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_2&&, std::__1::tuple<cluster::topic_updates_dispatcher&, cluster::security_manager&, cluster::members_manager&, cluster::config_manager&, cluster::feature_backend&, cluster::bootstrap_backend&>&, std::__1::__tuple_indices<0ul, 1ul, 2ul, 3ul, 4ul, 5ul>) at /vectorized/llvm/bin/../include/c++/v1/tuple:1595
 (inlined by) decltype(auto) std::__1::apply<cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_2, std::__1::tuple<cluster::topic_updates_dispatcher&, cluster::security_manager&, cluster::members_manager&, cluster::config_manager&, cluster::feature_backend&, cluster::bootstrap_backend&>&>(cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_2&&, std::__1::tuple<cluster::topic_updates_dispatcher&, cluster::security_manager&, cluster::members_manager&, cluster::config_manager&, cluster::feature_backend&, cluster::bootstrap_backend&>&) at /vectorized/llvm/bin/../include/c++/v1/tuple:1604
 (inlined by) cluster::controller_stm::maybe_make_snapshot(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>) at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/cluster/controller_stm.cc:82
raft::mux_state_machine<cluster::topic_updates_dispatcher, cluster::security_manager, cluster::members_manager, cluster::config_manager, cluster::feature_backend, cluster::bootstrap_backend>::maybe_write_snapshot() at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/raft/mux_state_machine.h:481
operator() at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/cluster/controller_stm.cc:45
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<cluster::controller_stm::snapshot_timer_callback()::$_0>(cluster::controller_stm::snapshot_timer_callback()::$_0&&) at /vectorized/include/seastar/core/future.hh:2149
 (inlined by) auto seastar::futurize_invoke<cluster::controller_stm::snapshot_timer_callback()::$_0>(cluster::controller_stm::snapshot_timer_callback()::$_0&&) at /vectorized/include/seastar/core/future.hh:2180
 (inlined by) auto seastar::internal::invoke_func_with_gate<cluster::controller_stm::snapshot_timer_callback()::$_0>(seastar::gate&, cluster::controller_stm::snapshot_timer_callback()::$_0&&) at /vectorized/include/seastar/core/gate.hh:221
 (inlined by) auto seastar::try_with_gate<cluster::controller_stm::snapshot_timer_callback()::$_0>(seastar::gate&, cluster::controller_stm::snapshot_timer_callback()::$_0&&) at /vectorized/include/seastar/core/gate.hh:261
 (inlined by) auto ssx::spawn_with_gate_then<cluster::controller_stm::snapshot_timer_callback()::$_0>(seastar::gate&, cluster::controller_stm::snapshot_timer_callback()::$_0&&) at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/ssx/future-util.h:282
 (inlined by) cluster::controller_stm::snapshot_timer_callback() at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/cluster/controller_stm.cc:44
seastar::noncopyable_function<void ()>::operator()() const at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/noncopyable_function.hh:209
 (inlined by) void seastar::reactor::complete_timers<seastar::timer_set<seastar::timer<seastar::lowres_clock>, &seastar::timer<seastar::lowres_clock>::_link>, boost::intrusive::list<seastar::timer<seastar::lowres_clock>, boost::intrusive::member_hook<seastar::timer<seastar::lowres_clock>, boost::intrusive::list_member_hook<>, &seastar::timer<seastar::lowres_clock>::_link> >, seastar::reactor::do_expire_lowres_timers()::$_71>(seastar::timer_set<seastar::timer<seastar::lowres_clock>, &seastar::timer<seastar::lowres_clock>::_link>&, boost::intrusive::list<seastar::timer<seastar::lowres_clock>, boost::intrusive::member_hook<seastar::timer<seastar::lowres_clock>, boost::intrusive::list_member_hook<>, &seastar::timer<seastar::lowres_clock>::_link> >&, seastar::reactor::do_expire_lowres_timers()::$_71&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1368
 (inlined by) seastar::reactor::do_expire_lowres_timers() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2377
seastar::reactor::poll_once() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3003
 (inlined by) operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2900
 (inlined by) decltype ((static_cast<seastar::reactor::do_run()::$_81&>({parm#1}))()) std::__1::__invoke<seastar::reactor::do_run()::$_81&>(seastar::reactor::do_run()::$_81&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3640
 (inlined by) bool std::__1::__invoke_void_return_wrapper<bool, false>::__call<seastar::reactor::do_run()::$_81&>(seastar::reactor::do_run()::$_81&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:30
 (inlined by) std::__1::__function::__alloc_func<seastar::reactor::do_run()::$_81, std::__1::allocator<seastar::reactor::do_run()::$_81>, bool ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:180
 (inlined by) std::__1::__function::__func<seastar::reactor::do_run()::$_81, std::__1::allocator<seastar::reactor::do_run()::$_81>, bool ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:354
std::__1::__function::__value_func<bool ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:507
 (inlined by) std::__1::function<bool ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:1184
 (inlined by) seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2927
seastar::reactor::run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2789
seastar::app_template::run_deprecated(int, char**, std::__1::function<void ()>&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/app-template.cc:265
seastar::app_template::run(int, char**, std::__1::function<seastar::future<int> ()>&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/app-template.cc:156
application::run(int, char**) at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/redpanda/application.cc:329
main at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-09b9ddfa1ee5022e4-1/redpanda/redpanda/src/v/redpanda/main.cc:22
ztlpn commented 1 year ago

This should be fixed by https://github.com/redpanda-data/redpanda/pull/9908