vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.42k stars 2.08k forks source link

Bug Report: RebuildKeyspaceGraph takes too long with 128 shards and federated topo #14669

Closed mdlayher closed 9 months ago

mdlayher commented 9 months ago

Overview of the Issue

When deploying a Vitess keyspace with 128 shards and employing a federated toposerver distributed amongst different geographical regions, we have found that calls to vtctld's RebuildKeyspaceGraph can take 30+ seconds to complete.

We have tracked this down to the topo server's FindAllShardsInKeyspace:

return vterrors.Wrapf(err, "GetShard(%v, %v) failed", keyspace, shard)

Where the error is related to context cancelation after 30+ seconds.

I will send a PR.

Reproduction Steps

Spin up a large sharded keyspace with federated topo server and attempt to issue RebuildKeyspaceGraph via vtctld.

Binary Version

Reproduces on all current versions.

Operating System and Environment details

n/a

Log Fragments

I1201 00:57:09.815553       1 locks.go:246] Locking keyspace test for action RebuildKeyspace
I1201 00:57:39.799610       1 locks.go:280] Unlocking keyspace test for action RebuildKeyspace with error deadline exceeded: /vitess/user-data/vitess-xxx/global/keyspaces/test/shards/fa-fc/Shard

Note that the action fails after approximately 30 seconds to caller timeout.

mdlayher commented 9 months ago

As a note, I've seen similar issues with vtorc and would encourage adding concurrency there as well:

2023-12-01 00:08:51.248 
GetShard(test, 84-86) failed
2023-12-01 00:08:51.248 
E1201 00:08:51.248059       1 keyspace_shard_discovery.go:129] deadline exceeded: /vitess/user-data/vitess-xxx/global/keyspaces/test/shards/84-86/Shard
rohit-nayak-ps commented 9 months ago

As a note, I've seen similar issues with vtorc and would encourage adding concurrency there as well: @GuptaManan100, cc for visibility

GuptaManan100 commented 9 months ago

Thank-you @mdlayher for fixing it for VTOrc too ❤️