vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.67k stars 2.1k forks source link

while doing cutover for resharding, the end of cutover is not automatically detected #7061

Closed eseokoh closed 3 years ago

eseokoh commented 3 years ago

Overview of the issue

This is another part of #7059.

VTGate does not detect the end of cutover. To drain the buffered requested is only done by the time out of buffering duration.

Reproduction Steps

please find #7059 for repro steps. Repro is same as #7059.

View error

Once you cutover, you will see VTGate logs like this:

I1120 02:45:32.307383       7 shard_buffer.go:280] Starting buffering for shard: ycsb/- (window: 10s, size: 5000000, max failover duration: 10s) (A failover was detected by this seen error: Code: FAILED_PRECONDITION
vttablet: rpc error: code = FailedPrecondition desc = operation not allowed in state NOT_SERVING
.)
I1120 02:45:32.314849       7 tablet_health_check.go:110] HealthCheckUpdate(Serving State): tablet: zone1-868377885 (10.171.174.217) serving true => false for ycsb/- (MASTER) reason: healthCheck update
I1120 02:45:42.307964       7 shard_buffer.go:545] Stopping buffering for shard: ycsb/- after: 10.0 seconds due to: stopping buffering because failover did not finish in time (10s). Draining 65 buffered requests now.
I1120 02:45:43.128362       7 shard_buffer.go:565] Draining finished for shard: ycsb/- Took: 820.334657ms for: 65 requests.

Please notice the log:

stopping buffering because failover did not finish in time (10s). Draining 65 buffered requests now.

However, switch usually takes only 2-3 seconds.

Observations

I don't know recordExternallyReparentedTimestamp works for resharding case.

    if timestamp <= sb.externallyReparented {
        return
    }

        // where timestamp is `th.MasterTermStartTime`.

When I experimentally delete the primary pod, vtgate correctly detects the end of cutover.

deepthi commented 3 years ago

Thank you for filing this issue. It turns out that buffering does not work correctly for the resharding cutover case. At the code level, in the gateway (both DiscoveryGateway and TabletGateway), failover detection is done per keyspace/shard. When we are doing a resharding cutover, there is no "FailoverEnd" for the original shard, so there is no way for vtgate to stop buffering and start routing the requests again.

To fix this properly will require a 2-part fix.

  1. We need to send back some information from vttablet that vtgate can use to distinguish a reparenting failover from a resharding cutover. Gateway code can then exclude resharding cutover from the current buffering. Gateway should just return an error to the caller (which is still in vtgate) for resharding cutover errors.
  2. We need to implement buffering for resharding cutover at a higher level. We need to check for the specific error information that we are adding to vttablet and propagating up from gateway. Looking at the code, it seems like we may need to do this for every type of query separately (insert / delete etc.) because the targets will change after the cutover and we need to resolve the shards again.
deepthi commented 3 years ago

@rohit-nayak-ps can you document here how we would detect a "resharding cutover in progress"? @harshit-gangal you had mentioned that there might be one place where we can call the "cutover detection" so that we can fail queries immediately and not buffer them. Can you document that here?

rohit-nayak-ps commented 3 years ago

The SwitchWrites flow related to the shard cutover is as follows:

  1. Writes are stopped on the source shards by disabling query service and gtid positions are recorded

This is achieved by setting QueryServiceDisabled to true on the ShardTabletControl record in SrvKeyspace for the source shards. Primary tablets of those shards are refreshed so that they transition state into Not Serving.

  1. Wait for the target shards to catchup to the source positions

  2. Remove the non-serving shards and add the new target shards to SrvKeyspace

This is achieved by topo.MigrateServedType() updating the partitions to remove the source shards and add the new ones. The partition.ShardReferences list now has the new set of shards that span the keyspace. Primary tablets of the new shards are refreshed so that they transition into a Serving state.

So iiuc the guards for vtgate to stop/start buffering during a resharding cutover will be:

RESHARDING_CUTOVER_STARTED => If Shard found in vtgate's plan has its ShardTabletControl disabled then don't execute and buffer

RESHARDING_CUTOVER_COMPLETED => SrvKeyspace watcher detects change in partition.ShardReferences and stops buffering

Harshit and I discussed this briefly and he will add more details in vtgate-speak as to possible solutions.