vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.48k stars 2.09k forks source link

Feature Request: Only create workflow streams on relevant shards while creating a MoveTables workflow for multi-tenant migrations #15748

Closed rohit-nayak-ps closed 3 months ago

rohit-nayak-ps commented 5 months ago

Feature Description

A common case while migrating to a sharded Vitess cluster, from a multi-schema multi-tenant architecture, is that a tenant will be scoped to a single Vitess shard. This is because most tenants will be small and also because sharding a tenant could cause cross-shard queries (and consequently a significant regression in performance).

Currently Vitess allows a shard subset to be specified for all other MoveTables subcommands, other than Create, for optimizing the shard-by-shard migration use case (ref: #9987). It was not needed for shard-by-shard migrations, because the source shards imply what the target shards will be.

However for multi-tenant migration, the target shards will be decided by the target keyspace's VSchema/Vindex. For the scenario above, the current logic will result in workflow streams getting created on all target shards, but only one shard will actually receive data. For a cluster with a large number of shards and multiple concurrent tenant migrations this will be very wasteful and hugely reduce vreplication bandwidth.

It will be ideal if MoveTables is able to allow specifying the target shards on which VReplication should run, or figure it out based on the VSchema.

rohit-nayak-ps commented 3 months ago

Fixed via #15746