pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.04k stars 5.83k forks source link

Support min number of region to split and scatter in tidb lightning #51494

Open guoshouyan opened 7 months ago

guoshouyan commented 7 months ago

Feature Request

Currently in tidb lightning physical backend, it'll split and scatter regions before it imports the data. The number of regions it split depends on the size of the source data and the default region size of the tidb cluster. Can we add a parameter in lightning config to support the minimum number of regions it split? So that number of region it split = max(minimum per of region in the config, data size / region size)

Is your feature request related to a problem? Please describe:

we have a use case that user import a very small data(lower than 1G) and has a high QPS. After we use lightning to import the data, the table just has one region so all the traffic goes to this one region, which cause the tikv to be slow and also affect the traffic of other tables. This issue only resolved after the tikv split and scatter that region to other nodes. We think it's better to have the ability to let lightning to split more region for some tables.

Describe the feature you'd like:

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

guoshouyan commented 7 months ago

this is the example PR to show what I want to do: https://github.com/pingcap/tidb/pull/51526 will add tests later

D3Hunter commented 7 months ago

has a high QPS

what's the access pattern, and ratio of read or write op?

even lightning split into smaller regions, pd will try to merge them as they are undersized region if they don't became hotspot for a while, you need set merge_option attribute on the table to force avoid it, which i don't think it should be done on lightning side. see also Troubleshoot Hotspot Issues

guoshouyan commented 7 months ago

has a high QPS

what's the access pattern, and ratio of read or write op?

even lightning split into smaller regions, pd will try to merge them as they are undersized region if they don't became hotspot for a while, you need set merge_option attribute on the table to force avoid it, which i don't think it should be done on lightning side. see also Troubleshoot Hotspot Issues

Around 40k qps to one region. Our use case is that, our users use lightning to import new data to a new table, then we switch over the traffic from the old table to the new table. Regions will not be merged by pd if they get enough read traffic. In our case, even the over all data size is small than one region, PD still split it into 20 regions. Our QPS is constant so pd doesn't merge those regions. But every time when we import a new table and switch traffic to it, we need to wait for pd the split the regions, which cause latency spike. So we want to split regions at the beginning

D3Hunter commented 7 months ago

Around 40k qps

all read?

guoshouyan commented 7 months ago

Around 40k qps

all read?

yes

guoshouyan commented 7 months ago

I realized that my PR above is not working, for the function here, sizeProps.iter will return more than one row at a time. Previously I was thinking reducing keysLimit and sizeLimit to let it split more regions. But if sizeProps.iter return much more than the keysLimit and sizeLimit, then it'll still return one single range.

Do you have any suggestion @D3Hunter

Frank945946 commented 7 months ago

@guoshouyan As you mentioned "import a very small data(lower than 1G) and has a high QPS. After we use lightning to import the data, the table just has one region so all the traffic goes to this one region, which cause the tikv to be slow and also affect the traffic of other tables." Does it mean the region size is more than 1GiB? The official documentation states that the recommended range for the region size is [48MiB, 258MiB]. If the region size is 48MiB, when the table is 1GiB, it will be split into 22 regions. Therefore, I believe it is better to set the region size to be [48MiB, 258MiB] instead of introducing a new parameter to Lightning.

guoshouyan commented 7 months ago

@guoshouyan As you mentioned "import a very small data(lower than 1G) and has a high QPS. After we use lightning to import the data, the table just has one region so all the traffic goes to this one region, which cause the tikv to be slow and also affect the traffic of other tables." Does it mean the region size is more than 1G ? The official doc said that the recommended range for the Region size is [48MiB, 258MiB], if the region size is 48 MiB, when the table is 1 GiB, it will be split into 22 regions. So I think it is better to set the region size to be [48MiB, 258MiB] in stead of Lighning to introduce a new parameter.

Previously we met PD OOM issue because we have too many regions in the cluster. So we increase the region size to 512MB to reduce the number of region. It works for a long time because most of our tables are large. Currently we have 1M region in one cluster and 700k on the other one. We don't want to reduce the region size which may bring us more regions.

We just have some small use case where tables is about 88 MB. If we import it by default, it will just have one region. And we are thinking to use lightning to pre-spilt the regions

guoshouyan commented 7 months ago

But on the other hand, I doubt if this feature will work in lightning because of the issue I mentioned here

guoshouyan commented 7 months ago

Hi @Frank945946 , @D3Hunter any thoughts on this?

D3Hunter commented 7 months ago

I realized that my PR above is not working, for the function here, sizeProps.iter will return more than one row at a time. Previously I was thinking reducing keysLimit and sizeLimit to let it split more regions. But if sizeProps.iter return much more than the keysLimit and sizeLimit, then it'll still return one single range.

Do you have any suggestion @D3Hunter

the granularity of size property is 4Mi(or 40Ki kvs), it might a little larger than this.

what's the expect size of you region in your case that can mitigate hotspot issue?

guoshouyan commented 6 months ago

I realized that my PR above is not working, for the function here, sizeProps.iter will return more than one row at a time. Previously I was thinking reducing keysLimit and sizeLimit to let it split more regions. But if sizeProps.iter return much more than the keysLimit and sizeLimit, then it'll still return one single range. Do you have any suggestion @D3Hunter

the granularity of size property is 4Mi(or 40Ki kvs), it might a little larger than this.

what's the expect size of you region in your case that can mitigate hotspot issue?

Hi @D3Hunter , do you mind showing me where did lightning controls the granularity of size property? Also during the meeting on Tuesday with pingcap, they mentioned that they're talking to PD team to see if they can provide a feature of specifying different region size for specific table. One question we have is that, even though we could pecifying different region size for specific table. But if the region size is smaller than the granularity of size property in lightning, it will not fix this problem right?

D3Hunter commented 6 months ago

yes, the splitted region size >= property granularity: https://github.com/pingcap/tidb/blob/e03da4d858d62fa440a70d6bb786ac21dd47ed6e/br/pkg/lightning/backend/local/local.go#L90-L91

specifying different region size for specific table.

this way we can avoid pd merge those smaller regions when they became cold for a while(assume they are not labeled with merge_option=deny)