Open nolouch opened 4 years ago
Timeout is unpredictable. No one knows what's an appropriate value. I suggest to use a steaming grpc call or long http connection. The evict-leader-scheduler is added once the call/connection is established, and removed once the call/connection is aborted or finished.
I'd like to propose another approach.
Instead of depending on a special scheduler, we can introduce a new store state. Let's just call it UNLOAD
. With the state, suppose we want to upgrade a tikv node:
UNLOAD
.UNLOADED
.putStore
command from the TiKV node, it updates the node's state to Up
.@disksing I suggested a similar solution in chat. However this doesn't solve the case that user abort the operations, in which case TiKV doesn't have to be restarted.
The solution I suggested above doesn't require a new scheduler and also work in all known cases.
BR also meet the same problem, BR will temporary remove balance-region-scheduler
, balance-leader-scheduler
... to speed up restoration. and finally add these schedulers back, but if BR was killed during restoration. these schedulers would lost, so BR need PD to provide the ability of temporary remove schedulers, such as remove scheduler ttl option
Currently PD supports the following TTL-based API:
This issues requests a TTL-based API to:
evict-leader-scheduler
)BR requests 3 TTL-based APIs to:
balance-*
and shuffle-*
)max-merge-region-{keys,size}
){leader,region}-schedule-limit
and max-snapshot-count
)Question: what to do when multiple services require conflicting settings? In GC-TTL the conflict resolution is simple: just set the safepoint to min of all alive services. But for the new APIs... say, service A registers to remove scheduler X, and service B registers to add the same scheduler X, how should this be resolved?
I see two solutions for now:
Select only some specific schedulers and configs, with a clear direction of resolution, e.g. evict-leader-scheduler
can only be registered to be added, not removed; the balance-*
schedulers can only be removed, not added; max-merge-region-size
can only be decreased, not increased, etc.
First-come-first-serve: while a service TTL for a particular scheduler/config is alive, no other services can register TTL to the same scheduler/config.
We also need to consider the interaction with the existing dynamic (permanent) changes. For instance, if a service has registered to set max-snapshot-count
to 40, what effect we get if we run
pd-ctl config set max-snapshot-count 2
?pd-ctl config set max-snapshot-count 80
?I think the First-come-first-serve
solution is better. for two reasons
{leader,region}-schedule-limit
For removing schedulers we could use the "Pause" API (#1831), which is available on 3.1 and 4.0.
Feature Request
Describe your feature request related problem
To upgrade the TiKV cluster, we will use
evict-leader-scheduler
to ensure the restart TiKV has no leader. but we encountered the problem that theevict-leader-scheduler
was not deleted many times during the rolling upgrade process. In order to better solve the problem, we can provide a special evict leader scheduling with a timeout for the deployment tool.Describe the feature you'd like
Add a special scheduler for evict leader with timeout
Describe alternatives you've considered
the timeout should suitable in most cases.
Teachability, Documentation, Adoption, Migration Strategy