redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.51k stars 580 forks source link

failure in `TxFeatureFlagTest.test_disabling_transactions_after_they_being_used` #2626

Closed twmb closed 2 years ago

twmb commented 2 years ago

https://buildkite.com/vectorized/redpanda/builds/3237#cc8deec5-9ad8-4eef-86af-c440ae06f576

test_id:    rptest.tests.tx_feature_flag_test.TxFeatureFlagTest.test_disabling_transactions_after_they_being_used
--
  | status:     FAIL
  | run time:   1 minute 24.397 seconds
  |  
  |  
  | CalledProcessError(1, ['kcat', '-b', 'docker_n_35:9092,docker_n_33:9092,docker_n_34:9092', '-P', '-t', 'tx-topic', '-X', 'transactional.id=test-tx-id'])
  | Traceback (most recent call last):
  | File "/usr/local/lib/python3.8/dist-packages/ducktape/tests/runner_client.py", line 135, in run
  | data = self.run_test()
  | File "/usr/local/lib/python3.8/dist-packages/ducktape/tests/runner_client.py", line 215, in run_test
  | return self.test_context.function(self.test)
  | File "/root/tests/rptest/tests/tx_feature_flag_test.py", line 41, in test_disabling_transactions_after_they_being_used
  | kcat.produce_one(tx_topic.name, msg='test-msg', tx_id='test-tx-id')
  | File "/root/tests/rptest/clients/kafka_cat.py", line 37, in produce_one
  | return self._cmd_raw(cmd, input=f"{msg}\n")
  | File "/root/tests/rptest/clients/kafka_cat.py", line 48, in _cmd_raw
  | res = subprocess.check_output(
  | File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
  | return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  | File "/usr/lib/python3.8/subprocess.py", line 516, in run
  | raise CalledProcessError(retcode, process.args,
  | subprocess.CalledProcessError: Command '['kcat', '-b', 'docker_n_35:9092,docker_n_33:9092,docker_n_34:9092', '-P', '-t', 'tx-topic', '-X', 'transactional.id=test-tx-id']' returned non-zero exit status 1.
jcsp commented 2 years ago

This is a real bug, one of the redpanda nodes (docker_n_33) crashed:

TRACE 2021-10-12 18:01:14,081 [shard 2] cluster - rm_partition_frontend.cc:238 - processing name:begin_tx, ntp:{kafka/tx-topic/0}, pid:{producer_identity: id=1, epoch=0}, tx_seq:1
WARN  2021-10-12 18:01:14,081 [shard 1] cluster - rm_partition_frontend.cc:282 - rm_stm::begin_tx({kafka/tx-topic/0},...) failed with cluster::tx_errc:1
*** stack smashing detected ***: terminated
jcsp commented 2 years ago

This crash is happening on the very first produce to the newly created topic, so probably not related to https://github.com/vectorizedio/redpanda/issues/2602 (the issue for which this test was added).

rystsov commented 2 years ago

I think I found a potential root cause. Around rm_partition_frontend.cc:282 we invoke tx_helpers::sleep_abortable which in its turn calls ss::sleep_abortable. The latter generates noise in the log but doesn't propagate an exception to upper later:

TRACE 2021-10-13 17:31:02,269 [shard 2] exception - Throw exception at:
0x34380f4 0x311187a 0x2a1780e0bda7 0xff6656 0x137f977 0x31b2866 0x3237ece 0x323986b 0x31b5a39 0x320551d 0x31631ff /lib/x86_64-linux-gnu/libpthread.so.0+0x944f /lib/x86_64-linux-gnu/libc.so.6+0x117d52
TRACE 2021-10-13 17:31:02,269 [shard 2] exception - Throw exception at:
0x34380f4 0x311187a 0x2a1780e0c1d2 /home/denis/vectorized/redpanda/vbuild/release/clang/rp_deps_install/lib/libc++.so.1+0x47e38 0x31266cf 0x31b2214 0x31b56d7 0x320551d 0x31631ff /lib/x86_64-linux-gnu/libpthread.so.0+0x944f /lib/x86_64-linux-gnu/libc.so.6+0x117d52
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void> seastar::future<void>::handle_exception<seastar::future<void> seastar::sleep_abortable<std::__1::chrono::steady_clock>(std::__1::chrono::steady_clock::duration)::'lambda'(std::exception_ptr)>(std::__1::chrono::steady_clock&&)::'lambda'(std::__1::chrono::steady_clock&&), seastar::futurize<std::__1::chrono::steady_clock>::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void> seastar::future<void>::handle_exception<seastar::future<void> seastar::sleep_abortable<std::__1::chrono::steady_clock>(std::__1::chrono::steady_clock::duration)::'lambda'(std::exception_ptr)>(std::__1::chrono::steady_clock&&)::'lambda'(std::__1::chrono::steady_clock&&)>(seastar::future<void> seastar::future<void>::handle_exception<seastar::future<void> seastar::sleep_abortable<std::__1::chrono::steady_clock>(std::__1::chrono::steady_clock::duration)::'lambda'(std::exception_ptr)>(std::__1::chrono::steady_clock&&)::'lambda'(std::__1::chrono::steady_clock&&)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void> seastar::future<void>::handle_exception<seastar::future<void> seastar::sleep_abortable<std::__1::chrono::steady_clock>(std::__1::chrono::steady_clock::duration)::'lambda'(std::exception_ptr)>(std::__1::chrono::steady_clock&&)::'lambda'(std::__1::chrono::steady_clock&&)&, seastar::future_state<seastar::internal::monostate>&&), void>
   --------
   seastar::internal::coroutine_traits_base<bool>::promise_type
   --------
   seastar::internal::coroutine_traits_base<cluster::begin_tx_reply>::promise_type
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<std::__1::vector<cluster::begin_tx_reply, std::__1::allocator<cluster::begin_tx_reply> > >, seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> >::future_type seastar::internal::complete_when_all<seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> >, seastar::future<cluster::begin_tx_reply> >(std::__1::vector<seastar::future<cluster::begin_tx_reply>, std::__1::allocator<seastar::future<cluster::begin_tx_reply> > >&&, std::__1::vector<seastar::future<cluster::begin_tx_reply>, std::__1::allocator<seastar::future<cluster::begin_tx_reply> > >::iterator)::'lambda'(seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> >), seastar::futurize<seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> > >::type seastar::future<cluster::begin_tx_reply>::then_wrapped_nrvo<seastar::future<std::__1::vector<cluster::begin_tx_reply, std::__1::allocator<cluster::begin_tx_reply> > >, seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> >::future_type seastar::internal::complete_when_all<seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> >, seastar::future<cluster::begin_tx_reply> >(std::__1::vector<seastar::future<cluster::begin_tx_reply>, std::__1::allocator<seastar::future<cluster::begin_tx_reply> > >&&, std::__1::vector<seastar::future<cluster::begin_tx_reply>, std::__1::allocator<seastar::future<cluster::begin_tx_reply> > >::iterator)::'lambda'(seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> >)>(seastar::future<cluster::begin_tx_reply>&&)::'lambda'(seastar::internal::promise_base_with_type<std::__1::vector<cluster::begin_tx_reply, std::__1::allocator<cluster::begin_tx_reply> > >&&, seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> >::future_type seastar::internal::complete_when_all<seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> >, seastar::future<cluster::begin_tx_reply> >(std::__1::vector<seastar::future<cluster::begin_tx_reply>, std::__1::allocator<seastar::future<cluster::begin_tx_reply> > >&&, std::__1::vector<seastar::future<cluster::begin_tx_reply>, std::__1::allocator<seastar::future<cluster::begin_tx_reply> > >::iterator)::'lambda'(seastar::internal::extract_values_from_futures_vector<seastar::future<cluster::begin_tx_reply> >)&, seastar::future_state<cluster::begin_tx_reply>&&), cluster::begin_tx_reply>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<cluster::add_paritions_tx_reply>, cluster::tx_gateway_frontend::do_add_partition_to_tx(cluster::tm_transaction, seastar::shared_ptr<cluster::tm_stm>, cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_7, seastar::future<cluster::add_paritions_tx_reply> seastar::future<std::__1::vector<cluster::begin_tx_reply, std::__1::allocator<cluster::begin_tx_reply> > >::then_impl_nrvo<cluster::tx_gateway_frontend::do_add_partition_to_tx(cluster::tm_transaction, seastar::shared_ptr<cluster::tm_stm>, cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_7, seastar::future<cluster::add_paritions_tx_reply> >(cluster::tx_gateway_frontend::do_add_partition_to_tx(cluster::tm_transaction, seastar::shared_ptr<cluster::tm_stm>, cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_7&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::add_paritions_tx_reply>&&, cluster::tx_gateway_frontend::do_add_partition_to_tx(cluster::tm_transaction, seastar::shared_ptr<cluster::tm_stm>, cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_7&, seastar::future_state<std::__1::vector<cluster::begin_tx_reply, std::__1::allocator<cluster::begin_tx_reply> > >&&), std::__1::vector<cluster::begin_tx_reply, std::__1::allocator<cluster::begin_tx_reply> > >
   --------
   seastar::internal::coroutine_traits_base<cluster::add_paritions_tx_reply>::promise_type
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<cluster::add_paritions_tx_reply>, seastar::future<cluster::add_paritions_tx_reply>::finally_body<auto seastar::futurize<std::__1::result_of<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'() ()>::type>::type seastar::with_semaphore<seastar::semaphore_default_exception_factory, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'(), std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'()&&)::'lambda'(seastar::semaphore_default_exception_factory)::operator()<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >(seastar::semaphore_default_exception_factory)::'lambda'(), false>, seastar::futurize<seastar::semaphore_default_exception_factory>::type seastar::future<cluster::add_paritions_tx_reply>::then_wrapped_nrvo<seastar::future<cluster::add_paritions_tx_reply>, seastar::future<cluster::add_paritions_tx_reply>::finally_body<auto seastar::futurize<std::__1::result_of<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'() ()>::type>::type seastar::with_semaphore<seastar::semaphore_default_exception_factory, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'(), std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'()&&)::'lambda'(seastar::semaphore_default_exception_factory)::operator()<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >(seastar::semaphore_default_exception_factory)::'lambda'(), false> >(cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'()&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::add_paritions_tx_reply>&&, seastar::future<cluster::add_paritions_tx_reply>::finally_body<auto seastar::futurize<std::__1::result_of<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'() ()>::type>::type seastar::with_semaphore<seastar::semaphore_default_exception_factory, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'(), std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda'()&&)::'lambda'(seastar::semaphore_default_exception_factory)::operator()<seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock> >(seastar::semaphore_default_exception_factory)::'lambda'(), false>&, seastar::future_state<cluster::add_paritions_tx_reply>&&), cluster::add_paritions_tx_reply>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<cluster::add_paritions_tx_reply>, seastar::future<cluster::add_paritions_tx_reply>::finally_body<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda0'(), false>, seastar::futurize<seastar::future<cluster::add_paritions_tx_reply> >::type seastar::future<cluster::add_paritions_tx_reply>::then_wrapped_nrvo<seastar::future<cluster::add_paritions_tx_reply>, seastar::future<cluster::add_paritions_tx_reply>::finally_body<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda0'(), false> >(seastar::future<cluster::add_paritions_tx_reply>::finally_body<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda0'(), false>&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::add_paritions_tx_reply>&&, seastar::future<cluster::add_paritions_tx_reply>::finally_body<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5::operator()(cluster::tx_gateway_frontend&) const::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>) const::'lambda0'(), false>&, seastar::future_state<cluster::add_paritions_tx_reply>&&), cluster::add_paritions_tx_reply>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::smp_message_queue::async_work_item<seastar::future<cluster::add_paritions_tx_reply> seastar::sharded<cluster::tx_gateway_frontend>::invoke_on<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5, seastar::future<cluster::add_paritions_tx_reply> >(unsigned int, seastar::smp_submit_to_options, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5&&)::'lambda'()>::run_and_dispose()::'lambda'(cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5), seastar::futurize<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5>::type seastar::future<cluster::add_paritions_tx_reply>::then_wrapped_nrvo<void, seastar::smp_message_queue::async_work_item<seastar::future<cluster::add_paritions_tx_reply> seastar::sharded<cluster::tx_gateway_frontend>::invoke_on<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5, seastar::future<cluster::add_paritions_tx_reply> >(unsigned int, seastar::smp_submit_to_options, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5&&)::'lambda'()>::run_and_dispose()::'lambda'(cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5)>(seastar::smp_message_queue::async_work_item<seastar::future<cluster::add_paritions_tx_reply> seastar::sharded<cluster::tx_gateway_frontend>::invoke_on<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5, seastar::future<cluster::add_paritions_tx_reply> >(unsigned int, seastar::smp_submit_to_options, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5&&)::'lambda'()>::run_and_dispose()::'lambda'(cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::smp_message_queue::async_work_item<seastar::future<cluster::add_paritions_tx_reply> seastar::sharded<cluster::tx_gateway_frontend>::invoke_on<cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5, seastar::future<cluster::add_paritions_tx_reply> >(unsigned int, seastar::smp_submit_to_options, cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5&&)::'lambda'()>::run_and_dispose()::'lambda'(cluster::tx_gateway_frontend::add_partition_to_tx(cluster::add_paritions_tx_request, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >)::$_5)&, seastar::future_state<cluster::add_paritions_tx_reply>&&), cluster::add_paritions_tx_reply>

However when we pass an abort_source to ss::sleep_abortable the noise disappears. So my hunch is that this internal seaside error causes *** stack smashing detected ***: terminated and to fix it we should use the ss::sleep_abortable's overload with abort_sources

twmb commented 2 years ago

https://buildkite.com/vectorized/redpanda/builds/3308#e1b94ef1-9168-4026-8ff2-20926750cded (ran without 2647, though)

jcsp commented 2 years ago

PR #2647 isn't conclusively behind this, but is a good candidate. Reopen if the failure reoccurs.

jcsp commented 2 years ago

This failed again on the run after merging this PR, so it looks like something else is wrong tooi: https://buildkite.com/vectorized/redpanda/builds/3337#d002ed10-39da-41a9-bcf2-7b6066603e41

rystsov commented 2 years ago

The test fails because a transaction coordinator's topic has replication factor 1 and when a node hosting it crushes redpanda can't process transactional requests anymore. Why a node crushes is unknown. I've created an issue to track that problem.

I've reproduced the error on a arm64 node and checked that incresing replication factor of the transactional topics makes TxFeatureFlagTest.test_disabling_transactions_after_they_being_used pass even with a crushing node. Sending a PR unblocking the test.