Enhancing VTGate buffering for MoveTables and Shard by Shard Migration

TL;DR

Today VTGate buffers queries that fail during reparenting and resharding operations aka cluster events. VTGate watches the topo for these cluster events. When a query fails due to one of these operations, buffering kicks in (with a timeout). When the operation ends the buffered queries are (re-)executed.

Because these events are at the shard level, the query serving layer where we detect errors, buffer and retry are done after a query is planned: in particular, the keyspace scope for the query execution is already selected.

MoveTables and Shard By Shard Migrations operations require the query to execute on the (new) target keyspace. So we need to make two types of changes to support buffering for these operations:

For such operations, add buffering at a higher layer so that we can replan.
Add new signals as bookends for these operations

Motivation

MoveTables and Shard By Shard Migrations, while triggered by the MoveTables command move tables from one keyspace to another. However, while the normal MoveTables is a single step where the tables move into all shards of the target keyspace, for Shard By Shard, we could have the same table served from different keyspaces.

While these workflows are In Progress the queries are served from the source keyspace. Note that both the source and target VSchemas which have these tables at this point. The ambiguity of which table to route is resolved using RoutingRules (for MoveTables) and ShardRoutingRules (for Shard By Shard).

When we execute a SwitchWrites (i.e. SwitchTraffic for primaries) we have a small period where the tables are not routable for new queries due to the process of switching, which is as follows:

Add the tables being moved to a DeniedTables attribute of the source shard's TabletControl object in the topo and record the GTID.
This will cause any new DMLs to fail with an "enforce denied tables" error, which currently gets reported to the client.
Wait for the workflow to catchup to the recorded GTID.
Update routing rules to point the tables to the target.
VTGates, which are watching the topo changes, see the new routing rules. They then route the queries to the target keyspace, where there are no DeniedTables, and hence the query succeeds.

As you see above all queries to the tables being moved will fail and the client needs to wait and retry. This RFC discusses how we might augment the current buffering mechanism to also support MoveTables and Shard By Shard Migrations.

Detecting Transitions

We need to add two new signals to the existing set of signals that VTGate watches for (using the KeyspaceEventWatcher):

MoveTablesSwitchStarted and MoveTablesSwitchCompleted
ShardByShardMigrationStarted and ShardByShardMigrationCompleted

There are two sources of start signals:

The topo changes that accompany the start and end of switching and
Query failures that happen during the transition.

Note that the topo changes need to be seen by the vtgate and vttablet topo watchers. This can lead to inherent races where queries might get planned for an incorrect version of the vschema, which is exacerbated by plan caches.

MoveTablesSwitchStarted

Query error "enforce denied tables" Queries will fail at vttablet when they check against the denied table list. The error is returned to vtgate.
Existence of DeniedTables in the shard's TabletControl records and the fact that these tables are routed to the source keyspace in the SrvVSchema's RoutingTables.

MoveTablesSwitchCompleted

Existence of DeniedTables in the shard's TabletControl records and the fact that the table is routed to the target keyspace in the SrvVSchema's RoutingTables.

ShardByShardMigrationStarted

Query error "enforce denied tables" Queries will fail at vttablet when they check against the denied table list. The error is returned to vtgate.
Existence of DeniedTables in the shard's TabletControl records and the fact that the failing shard's ShardRoutingRules are pointing to the source keyspace

ShardByShardMigrationCompleted

Existence of DeniedTables in the shard's TabletControl records and the fact that the failing shard's ShardRoutingRules are pointing to the target keyspace

Buffering changes

The signals for detecting the start and end of these events will continue to be in the same layer they are today: the KeyspaceEventWatcher in vtgate. We add new types of events to the ones we are already detecting.

The "enforce denied tables" error will not be handled by the buffering layer in vtgate's tabletgateway but passed on to the vtgate Executor where we will recompute the plan.

Note that the error is only passed once buffering stops: at the shard buffer layer where we are buffering, the re-executed query continues to be routed to the source keyspace resulting in the "enforce denied tables" error.

The errors due to cluster events that are currently handled (reparenting and resharding) are retried in vcursor_impl. The errors due to the new events will be retried in the vtgate Executor's newExecute(), where we will recompute the plan.

Assumptions

Only one MoveTables operation is in progress at a given time. Currently, this is an implicit expectation within Vitess which we should add validations for.

vitessio / vitess

(WIP) RFC: Enhancing VTGate buffering for MoveTables and Shard by Shard Migration #13464