yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
9.04k stars 1.08k forks source link

Tablet split test flakiness #24980

Open yugabyte-ci opened 6 days ago

yugabyte-ci commented 6 days ago

Jira Link: DB-14125

XClusterExternalTabletSplitITest.MasterFailoverDuringProducerPostSplitOps is flaky because TabletSplitITestBase::WriteRowsAndFlush seems to be flakyl

The flush failed since the tablet is in BOOTSTRAPPING state. Do we see this flakiness in other tablet splitting tests as well? Can we change WriteRowsAndFlush to call IsSplittingComplete and WaitForLoadBalancerToStabilize before sending the flush request?

[m-1] W1116 01:38:48.365814 30489 <http://async_flush_tablets_task.cc:76|async_flush_tablets_task.cc:76>] TS b351b8cb7a7f4adeab9fd7d15e28f9fe: flush tablets failed: Illegal state (yb/tserver/service_util.cc:271): Tablet 91e63e4d04344dbb8a9710c21f3bebe4 not RUNNING: BOOTSTRAPPING (tablet server error 12) (raft group state error 0)
[m-1] W1116 01:38:48.365839 30489 <http://async_rpc_tasks.cc:350|async_rpc_tasks.cc:350>] b351b8cb7a7f4adeab9fd7d15e28f9fe Flush Tablets RPC (task=0x000015e0f9b945b8, state=kRunning): Aborted (yb/master/async_rpc_tasks.cc:349): Reached maximum number of retries (0) for request b351b8cb7a7f4adeab9fd7d15e28f9fe Flush Tablets RPC, task=0x15e0f9b945b8 state=kRunning

Issue created in Slack from a message.