VReplication workflow metadata is stored in each target shard that participates in the workflow, As a result, all workflow commands that show status or change the state of a workflow needs to check every shard to see if it is participating in the workflow.
This can get extremely inefficient when a large number of shards are involved and the workflow has only a few shards participating. Consider a 256-shard keyspace where we are doing a partial MoveTables for a single shard. For every Workflow Showvtctld will be contacting 256 primaries, though only one needs to be.
It gets worse for SwitchTraffic which could end up timing out because of the overhead involved initially in wasted calls to non-participating primaries.
We originally chose not to store any workflow information in the topo. Since workflows run distributed across shards synchronizing state in the topo would be hard with potential races.
A couple of options to resolve this:
Let each command take a "hint" of which shards are participating, so that we only contact those primaries.
However this is not reliable and not convenient. User needs to keep track of which shards are involved. If only a subset is provided, it will only cause partial information to be returned for read-only commands likeShow/Progress. But for mutating commands like SwitchTraffic or Complete, we would end up with inconsistent processing which is unacceptable.
Store "read-only" information about the workflow in the topo in the target keyspace's record.
These should only be written when a workflow is created and only non-mutating attributes (workflow name, workflow type, source shards, target shards) stored. None of these can be modified once a workflow is created. These can be used by the VReplication commands to filter the participating primaries.
If a cache entry is not found (for existing workflows) we could either just continue with the current behaviour or attempt to populate this cache entry, with appropriate locking.
Overview of the Issue
VReplication workflow metadata is stored in each target shard that participates in the workflow, As a result, all workflow commands that show status or change the state of a workflow needs to check every shard to see if it is participating in the workflow.
This can get extremely inefficient when a large number of shards are involved and the workflow has only a few shards participating. Consider a 256-shard keyspace where we are doing a partial MoveTables for a single shard. For every
Workflow Show
vtctld
will be contacting 256 primaries, though only one needs to be.It gets worse for
SwitchTraffic
which could end up timing out because of the overhead involved initially in wasted calls to non-participating primaries.We originally chose not to store any workflow information in the topo. Since workflows run distributed across shards synchronizing state in the topo would be hard with potential races.
A couple of options to resolve this:
Show/Progress
. But for mutating commands likeSwitchTraffic
orComplete
, we would end up with inconsistent processing which is unacceptable.Reproduction Steps
-
Binary Version
Operating System and Environment details
Log Fragments
No response