vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.73k stars 2.1k forks source link

(WIP) RFC: Enhancing VTGate buffering for MoveTables and Shard by Shard Migration #13464

Closed rohit-nayak-ps closed 2 months ago

rohit-nayak-ps commented 1 year ago

Enhancing VTGate buffering for MoveTables and Shard by Shard Migration

TL;DR

Today VTGate buffers queries that fail during reparenting and resharding operations aka cluster events. VTGate watches the topo for these cluster events. When a query fails due to one of these operations, buffering kicks in (with a timeout). When the operation ends the buffered queries are (re-)executed.

Because these events are at the shard level, the query serving layer where we detect errors, buffer and retry are done after a query is planned: in particular, the keyspace scope for the query execution is already selected.

MoveTables and Shard By Shard Migrations operations require the query to execute on the (new) target keyspace. So we need to make two types of changes to support buffering for these operations:

Motivation

MoveTables and Shard By Shard Migrations, while triggered by the MoveTables command move tables from one keyspace to another. However, while the normal MoveTables is a single step where the tables move into all shards of the target keyspace, for Shard By Shard, we could have the same table served from different keyspaces.

While these workflows are In Progress the queries are served from the source keyspace. Note that both the source and target VSchemas which have these tables at this point. The ambiguity of which table to route is resolved using RoutingRules (for MoveTables) and ShardRoutingRules (for Shard By Shard).

When we execute a SwitchWrites (i.e. SwitchTraffic for primaries) we have a small period where the tables are not routable for new queries due to the process of switching, which is as follows:

  1. Add the tables being moved to a DeniedTables attribute of the source shard's TabletControl object in the topo and record the GTID.
  2. This will cause any new DMLs to fail with an "enforce denied tables" error, which currently gets reported to the client.
  3. Wait for the workflow to catchup to the recorded GTID.
  4. Update routing rules to point the tables to the target.
  5. VTGates, which are watching the topo changes, see the new routing rules. They then route the queries to the target keyspace, where there are no DeniedTables, and hence the query succeeds.

As you see above all queries to the tables being moved will fail and the client needs to wait and retry. This RFC discusses how we might augment the current buffering mechanism to also support MoveTables and Shard By Shard Migrations.

Detecting Transitions

We need to add two new signals to the existing set of signals that VTGate watches for (using the KeyspaceEventWatcher):

There are two sources of start signals:

Note that the topo changes need to be seen by the vtgate and vttablet topo watchers. This can lead to inherent races where queries might get planned for an incorrect version of the vschema, which is exacerbated by plan caches.

MoveTablesSwitchStarted

MoveTablesSwitchCompleted

ShardByShardMigrationStarted

ShardByShardMigrationCompleted

Buffering changes

The signals for detecting the start and end of these events will continue to be in the same layer they are today: the KeyspaceEventWatcher in vtgate. We add new types of events to the ones we are already detecting.

The "enforce denied tables" error will not be handled by the buffering layer in vtgate's tabletgateway but passed on to the vtgate Executor where we will recompute the plan.

Note that the error is only passed once buffering stops: at the shard buffer layer where we are buffering, the re-executed query continues to be routed to the source keyspace resulting in the "enforce denied tables" error.

The errors due to cluster events that are currently handled (reparenting and resharding) are retried in vcursor_impl. The errors due to the new events will be retried in the vtgate Executor's newExecute(), where we will recompute the plan.

Assumptions

mattlord commented 2 months ago

Closing this as completed via https://github.com/vitessio/vitess/pull/8700 and https://github.com/vitessio/vitess/pull/13507