tikv / pd

Placement driver for TiKV
Apache License 2.0
1.04k stars 719 forks source link

Reduce the conflict when multiple scheduling is running #3778

Closed nolouch closed 1 year ago

nolouch commented 3 years ago

Background

Scenario 1: Due to the drop table operation, a large number of merge operations are continuously occurring, and the balance-leader is restricted by the leader-schedule-limit, resulting in the conflict.

Scenario 2: store limit is shared by multiple schedulers.

details

Store Limit - Add Peer Store Limit -Remove Peer leader-schedule-limit hot-region-schedule-limit merge-schedule-limit region-schedule-limit replica-schedule-limit
BalanceLeader × × × × × ×
BalanceRegion × × × ×
EvictLeader × × × × × ×
HotRegion × ×
Label
ScatterRange × × ×
learnerChecker × × × × × × ×
replicaChecker × × ×
ruleChecker × × ×
mergeChecker × ×
rleungx commented 3 years ago

Also some existed problems can be found in https://github.com/tikv/pd/issues/3807

bufferflies commented 3 years ago

Also some existed problems can be found in #3749

bufferflies commented 3 years ago

there are two way to solved the problem that schedulers influenced each other: one way: one operator enter to operator control should only have one OpKind even if the region has many OpKind, the priority is blow(order by the operator cost):

OpMerge >OpHotRegion> OpRange>OpSplit>OpReplica>OpRegion>OpLeader

the other way: modify the scheduler/checker isAllowed condition by subtracting the other OpKind Operators, such as:

balance region scheduler: count(OpRegion)-count(OpMerge)-count(OpHotRegion)<region scheduler limit

leader region scheduler: count(OpLeader)-count(OpMerge)-count(OpHotRegion)<leader scheduler limit

bufferflies commented 3 years ago

the relation between OpKind with scheduler(checker) is blow In past: image

bufferflies commented 3 years ago

Let remove-extras-peer not limit #3865

lhy1024 commented 1 year ago

@bufferflies Can we close it?