Closed eseokoh closed 3 years ago
Thank you for filing this issue. It turns out that buffering does not work correctly for the resharding cutover case. At the code level, in the gateway (both DiscoveryGateway and TabletGateway), failover detection is done per keyspace/shard. When we are doing a resharding cutover, there is no "FailoverEnd" for the original shard, so there is no way for vtgate to stop buffering and start routing the requests again.
To fix this properly will require a 2-part fix.
@rohit-nayak-ps can you document here how we would detect a "resharding cutover in progress"? @harshit-gangal you had mentioned that there might be one place where we can call the "cutover detection" so that we can fail queries immediately and not buffer them. Can you document that here?
The SwitchWrites flow related to the shard cutover is as follows:
This is achieved by setting QueryServiceDisabled to true on the ShardTabletControl record in SrvKeyspace for the source shards. Primary tablets of those shards are refreshed so that they transition state into Not Serving.
Wait for the target shards to catchup to the source positions
Remove the non-serving shards and add the new target shards to SrvKeyspace
This is achieved by topo.MigrateServedType() updating the partitions to remove the source shards and add the new ones. The partition.ShardReferences list now has the new set of shards that span the keyspace. Primary tablets of the new shards are refreshed so that they transition into a Serving state.
So iiuc the guards for vtgate to stop/start buffering during a resharding cutover will be:
RESHARDING_CUTOVER_STARTED => If Shard found in vtgate's plan has its ShardTabletControl disabled then don't execute and buffer
RESHARDING_CUTOVER_COMPLETED => SrvKeyspace watcher detects change in partition.ShardReferences and stops buffering
Harshit and I discussed this briefly and he will add more details in vtgate-speak as to possible solutions.
Overview of the issue
This is another part of #7059.
VTGate does not detect the end of cutover. To drain the buffered requested is only done by the time out of buffering duration.
Reproduction Steps
please find #7059 for repro steps. Repro is same as #7059.
View error
Once you cutover, you will see VTGate logs like this:
Please notice the log:
However, switch usually takes only 2-3 seconds.
Observations
I don't know
recordExternallyReparentedTimestamp
works for resharding case.When I experimentally delete the primary pod, vtgate correctly detects the end of cutover.