yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[DocDB] Make xCluster semi-automatic mode setup survive master restarts #24746

Open mdbridge opened 1 week ago

mdbridge commented 1 week ago

Jira Link: DB-13826

Description

The XClusterOutboundReplicationGroupParameterized.MasterRestartDuringCheckpoint test covered a case where we do not successfully survive a master restart when check pointing.

The failure case is due to:

   2188:[P-m-1] W1022 09:10:57.290823 1869164544 xcluster_outbound_replication_group.cc:316] xClusterOutboundReplicationGroup rg1 :Failed to checkpoint namespace 00004000000030008000000000000000: Service unavailable (yb/master/catalog_manager.cc:2105): Catalog manager is shutting down. State: 3

This task is to fix semi-automatic mode so it survives master restarts even in this case; moreover, this test (or a new one) should be fixed to deterministically detect that the current code fails this case.

Note that this test has been disabled in the meantime.

Also note that there is a separate task to make automatic mode survive master restarts:

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

mdbridge commented 1 week ago

task to disable the test: https://github.com/yugabyte/yugabyte-db/issues/24745