Open HuSharp opened 3 months ago
i'm interesting in the issue.
/assign @River2000i
When lightning imports tables, the table's region key is encoded in the table ID, while table IDs are created consecutively(basically next to each other). https://github.com/tikv/client-go/blob/6ba909c4ad2de65b5b36d0e5036d0a85f3154cc0/tikv/split_region.go#L241-L247
PD schedule scatter region base on region_count
in all store. And schedule new region to fewest store.
https://github.com/tikv/pd/blob/13174b5d4cab4d958f0b4ea2718ea7bb74992bc7/pkg/schedule/scatter/region_scatterer.go#L346
PD will compare region_count
base on the group
. For now, gourp
define by table ID.(every table belong to a group
)
https://github.com/tikv/pd/blob/13174b5d4cab4d958f0b4ea2718ea7bb74992bc7/pkg/schedule/scatter/region_scatterer.go#L367
Summary:
group
level, scatter base on the region_count
in a group
. It will not scatter region to the store , which not contain the region belongs to the group
.region_count
in whole cluster. Region is balanced in whole cluster, but not balanced in group
or table
level.root cause:
selectNewPeer()
will pick origin store in first time trigger scatter region. Since it will compare all stores's region count in engineContext
, but first time call context.selectedPeer.Get()
will get 0 in all stores, so it will not scatter region in all stores. https://github.com/tikv/pd/blob/13174b5d4cab4d958f0b4ea2718ea7bb74992bc7/pkg/schedule/scatter/region_scatterer.go#L444
If we want to schedule scatter region in cluster level, we can call ScatterRegion
with the same group
. It will be an options for caller.
Development Task
Background
balance-scheduler
.Balance-Region
will not schedule empty region, There is a hardcode in https://github.com/tikv/pd/blob/7e18a69734f2ae391516625ec0230c05b363f5c5/pkg/schedule/filter/region_filters.go#L150-L153Problems faced
For lightning importing 1 million tables(one table corresponds to one region), even though there are more than 3 stores, consecutive region keys will generate a lot of regions aggregations in the first 3 stores. And since regions are not scheduled, the three stores have a high probability of OOM.