vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.55k stars 2.09k forks source link

Multi-tenant import support in Vitess #15403

Closed rohit-nayak-ps closed 4 months ago

rohit-nayak-ps commented 7 months ago

Feature Description

This issue describes proposed support in Vitess for imports from a multi-tenant database cluster into a single Vitess cluster.

Motivation

The user has a multi-tenant database architecture. They have several separate MySQL databases, with common schemas, one per tenant. Each database will be imported using one MoveTables workflow, using an external keyspace pointing to the source, with a common target keyspace. All tables, on source and target, will have a tenant_id column,uniquely identifying the tenant.

We can run multiple such workflows in parallel, until all tenants are imported. Just like normal imports, we would also like reverse workflows to be running: to allow for rollback.

Routing rules should, ideally, be set up, so that queries targeting a specific tenant id would be routed correctly to the target, if the tenant has already been imported, or to the source if the import workflow is in progress but has not yet been switched to the target.

Many setups with multi-tenancy will use a multi-schema approach, where each tenant has their own MySQL database on a single server. In this case the queries will use a database qualifier, which matches their database name. So when they are running the migrations into Vitess, they will continue to use the same database qualifier.

Assumptions

Design

New Keyspace VSchema Attribute

A new keyspace level attribute in the vschema for the keyspace will indicate that this keyspace supports multiple tenants and also the column which identifies the tenant and the column's data type.

Per-TenantId MoveTables

Two new options will be added to MoveTables

The tenant id corresponds to the tenant of the source keyspace from which data is being migrated. The column name/type are already specified in the keyspace vschema.

The source keyspace alias can be used interchangeably with the keyspace name of the unmanaged source keyspace that points to the cluster from which we are importing.

Tenant-based filtering

An additional filter will be added to the binlogsource filter query for all the tables both for the forward and reverse VReplication workflows.

Tenant-based routing rules

We add a new concept: Keyspace Routing Rules. This will be a new topo object and part of the SrvVSchema. It will be a list of FromKeyspace to a ToKeyspace.

These will have a higher scope than the current (table-based) Routing Rules. If Keyspace Routing Rules are present, the table routing rules will not be checked.

Two new vtctldclient commands ApplyKeyspaceRoutingRules and GetKeyspaceRoutingRules.

Initial assumption is that all traffic be switched at the same time. So replica reads and primary reads/writes will need to be switched together.

Keyspace Routing Rules Example

This is an example of how the keyspace routing rules will look for a tenant migration for tenant id 12345. User sends qualified queries using the qualifier user_12345. For each tenant we create an unmanaged keyspace called source_12345. The target keyspace is target. Note that we never reroute the target, so the target is never the source in any of these routing rules. This is because the target has multiple tenants and we cannot reroute it to any single source.

The MoveTables command will look like:

MoveTables --target-keyspace target Create 
                   --all-tables
                   --tenant-id 12345
                   --workflow import_12345 
                   --source-keyspace source_12345
                   --source-keyspace-alias  user_12345

The associated keyspace routing rules before cutover:

{
  "user_12345": "source_12345",
  "source_12345": "source_12345"
}

The associated keyspace routing rules after cutover:

{
  "user_12345": "target",
  "source_12345": "target"
}
harshit-gangal commented 5 months ago

I have a few questions

  1. Why do we need source-keyspace and source-keyspace-alias?
  2. Are we still holding the assumption that traffic switch will happen at same time for read and write?
rohit-nayak-ps commented 4 months ago

Fixed via issues referenced above