tikv / pd

Placement driver for TiKV
Apache License 2.0
1.04k stars 717 forks source link

Enhance the region balance of the 1M tables imported by lightning #8424

Open HuSharp opened 1 month ago

HuSharp commented 1 month ago

Development Task

Background

Problems faced

For lightning importing 1 million tables(one table corresponds to one region), even though there are more than 3 stores, consecutive region keys will generate a lot of regions aggregations in the first 3 stores. And since regions are not scheduled, the three stores have a high probability of OOM.

img_v3_02cu_570929e1-9590-4330-8034-ed3a92434bfg

River2000i commented 1 month ago

i'm interesting in the issue.

River2000i commented 1 month ago

/assign @River2000i

River2000i commented 1 month ago

When lightning imports tables, the table's region key is encoded in the table ID, while table IDs are created consecutively(basically next to each other). https://github.com/tikv/client-go/blob/6ba909c4ad2de65b5b36d0e5036d0a85f3154cc0/tikv/split_region.go#L241-L247

PD schedule scatter region base on region_count in all store. And schedule new region to fewest store. https://github.com/tikv/pd/blob/13174b5d4cab4d958f0b4ea2718ea7bb74992bc7/pkg/schedule/scatter/region_scatterer.go#L346 PD will compare region_count base on the group. For now, gourp define by table ID.(every table belong to a group) https://github.com/tikv/pd/blob/13174b5d4cab4d958f0b4ea2718ea7bb74992bc7/pkg/schedule/scatter/region_scatterer.go#L367 Summary:

  1. If there are lots of new split empty regions(close kv range), the region will split on the same store. In group level, scatter base on the region_count in a group. It will not scatter region to the store , which not contain the region belongs to the group.
  2. In cluster level, scatter region base on the region_count in whole cluster. Region is balanced in whole cluster, but not balanced in group or table level.

root cause: selectNewPeer() will pick origin store in first time trigger scatter region. Since it will compare all stores's region count in engineContext, but first time call context.selectedPeer.Get() will get 0 in all stores, so it will not scatter region in all stores. https://github.com/tikv/pd/blob/13174b5d4cab4d958f0b4ea2718ea7bb74992bc7/pkg/schedule/scatter/region_scatterer.go#L444 If we want to schedule scatter region in cluster level, we can call ScatterRegion with the same group. It will be an options for caller.