vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.42k stars 2.08k forks source link

VReplication commands are inefficient for workflows involving a limited number of shards #14204

Closed rohit-nayak-ps closed 8 months ago

rohit-nayak-ps commented 11 months ago

Overview of the Issue

VReplication workflow metadata is stored in each target shard that participates in the workflow, As a result, all workflow commands that show status or change the state of a workflow needs to check every shard to see if it is participating in the workflow.

This can get extremely inefficient when a large number of shards are involved and the workflow has only a few shards participating. Consider a 256-shard keyspace where we are doing a partial MoveTables for a single shard. For every Workflow Show vtctld will be contacting 256 primaries, though only one needs to be.

It gets worse for SwitchTraffic which could end up timing out because of the overhead involved initially in wasted calls to non-participating primaries.

We originally chose not to store any workflow information in the topo. Since workflows run distributed across shards synchronizing state in the topo would be hard with potential races.

A couple of options to resolve this:

Reproduction Steps

-

Binary Version

-

Operating System and Environment details

-

Log Fragments

No response

maxenglander commented 11 months ago

Related: https://github.com/vitessio/vitess/issues/13777