tikv / pd

Placement driver for TiKV
Apache License 2.0
1.04k stars 719 forks source link

balance-learner-schedule: balance learners among stores(tiflash). #8278

Open AndreMouche opened 3 months ago

AndreMouche commented 3 months ago

Feature Request

Describe your feature request related problem

Describe the feature you'd like

Currently,

while balance-region-schedule only consider the balance between all stores, without consider the roles(learner and follower) and the type of stores(tikv or tiflash ).
However, for TiFlash users, if the distribution of regions(learners) among TiFlash instances becomes unbalanced, it may lead to computational hotspots that slow down performance. From the following logic, we can see balance-region-scheduler choose the source store order by region-score , and if the number of learner region on tiflash is small, the region-score of tiflash node should be always the smallest, which makes tiflash nodes could never get the chance to run balance-region, that leads the imbalance of peers on the tiflash nodes.

https://github.com/tikv/pd/blob/fca469ca33eb5d8b5e0891b507c87709a00b0e81/pkg/schedule/schedulers/balance_region.go#L139-L144

In summary,I think we need a scheduler similar to balance-learner-scheduler to balance the distribution of learner nodes among the stores.