XClusterExternalTabletSplitITest.MasterFailoverDuringProducerPostSplitOps is flaky because TabletSplitITestBase::WriteRowsAndFlush seems to be flakyl
The flush failed since the tablet is in BOOTSTRAPPING state. Do we see this flakiness in other tablet splitting tests as well? Can we change WriteRowsAndFlush to call IsSplittingComplete and WaitForLoadBalancerToStabilize before sending the flush request?
[m-1] W1116 01:38:48.365814 30489 <http://async_flush_tablets_task.cc:76|async_flush_tablets_task.cc:76>] TS b351b8cb7a7f4adeab9fd7d15e28f9fe: flush tablets failed: Illegal state (yb/tserver/service_util.cc:271): Tablet 91e63e4d04344dbb8a9710c21f3bebe4 not RUNNING: BOOTSTRAPPING (tablet server error 12) (raft group state error 0)
[m-1] W1116 01:38:48.365839 30489 <http://async_rpc_tasks.cc:350|async_rpc_tasks.cc:350>] b351b8cb7a7f4adeab9fd7d15e28f9fe Flush Tablets RPC (task=0x000015e0f9b945b8, state=kRunning): Aborted (yb/master/async_rpc_tasks.cc:349): Reached maximum number of retries (0) for request b351b8cb7a7f4adeab9fd7d15e28f9fe Flush Tablets RPC, task=0x15e0f9b945b8 state=kRunning
Jira Link: DB-14125
XClusterExternalTabletSplitITest.MasterFailoverDuringProducerPostSplitOps is flaky because TabletSplitITestBase::WriteRowsAndFlush seems to be flakyl
The flush failed since the tablet is in BOOTSTRAPPING state. Do we see this flakiness in other tablet splitting tests as well? Can we change WriteRowsAndFlush to call IsSplittingComplete and WaitForLoadBalancerToStabilize before sending the flush request?
Issue created in Slack from a message.