Closed rthallamko3 closed 10 months ago
Jira Link: DB-6398
If a transaction fails to be promoted, then the rollback of the transaction can result in FATALs.
F20230501 22:15:22 ../../src/yb/rpc/rpc.cc:359] Check failed: *handle == calls_.end() @ 0x55d980d047d7 google::LogMessage::SendToLog() @ 0x55d980d0571d google::LogMessage::Flush() @ 0x55d980d05c29 google::LogMessageFatal::~LogMessageFatal() @ 0x55d9817728d9 yb::rpc::Rpcs::RegisterAndStart() @ 0x55d980f96727 yb::client::YBTransaction::Impl::SendAbortToOldStatusTabletIfNeeded() @ 0x55d980f9889d yb::client::YBTransaction::Impl::UpdateTransactionStatusLocationDone() @ 0x55d980f98ee8 std::__1::__function::__func<>::operator()() @ 0x55d980fb23bf yb::client::(anonymous namespace)::TransactionRpcBase::Finished() @ 0x55d980fb2790 std::__1::__function::__func<>::operator()() @ 0x55d981752b13 yb::rpc::OutboundCall::InvokeCallbackSync() @ 0x55d9817563ab yb::rpc::OutboundCall::InvokeCallback() @ 0x55d9817489af yb::rpc::LocalYBInboundCall::Respond() @ 0x55d981773d7e yb::rpc::RpcContext::RespondSuccess() @ 0x55d981ae8618 yb::tserver::TabletServiceImpl::UpdateTransactionStatusLocation() @ 0x55d981bd7ede std::__1::__function::__func<>::operator()() @ 0x55d981be081f yb::tserver::TabletServerServiceIf::Handle() @ 0x55d9817fee9e yb::rpc::ServicePoolImpl::Handle() @ 0x55d981744faf yb::rpc::InboundCall::InboundCallTask::Run() @ 0x55d98180da03 yb::rpc::(anonymous namespace)::Worker::Execute() @ 0x55d981e5461f yb::thread::SuperviseThread() @ 0x7fa9029da694 start_thread @ 0x7fa902edc41d __clone
Note that this was seen on a geo partitioned cluster, that didn't have the fix for https://github.com/yugabyte/yugabyte-db/issues/16108.
Even with the fix in https://github.com/yugabyte/yugabyte-db/issues/16108, it would be good to avoid the fatal, in case it happens in other cases.
Jepsen tests being run with geo-partitioning runs into it. If we want to run Jepsen tests on 2.18 and 2.20 branches, we would need to backport the fix to this to 2.20 and 2.18 branches as well.
Jira Link: DB-6398
Description
If a transaction fails to be promoted, then the rollback of the transaction can result in FATALs.
Note that this was seen on a geo partitioned cluster, that didn't have the fix for https://github.com/yugabyte/yugabyte-db/issues/16108.
Even with the fix in https://github.com/yugabyte/yugabyte-db/issues/16108, it would be good to avoid the fatal, in case it happens in other cases.
Warning: Please confirm that this issue does not contain any sensitive information