pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
424 stars 283 forks source link

Owner: the granularity of loadbalance between captures is table will lead to performance bottleneck. #1207

Open dengqee opened 3 years ago

dengqee commented 3 years ago

Feature Request

Is your feature request related to a problem? Please describe:

It seems that the granularity of loadbalance between captures is table, and the workload of a table is 1. If a talbe is hot, the capture handling this table will become bottleneck. This is not friendly to performance scaling. Describe the feature you'd like:

Let the granularity of task migration is region, which can get better performance, and make full use of capture resources. Describe alternatives you've considered:

Design an algorithm to calculate the workload of a table, which can describe the amount of data change in a table. Then it can get a better performance for loadbalancing, when the granularity is table.

amyangfei commented 3 years ago

Thanks for your proposal. The problem as you describe does exist, currently TiCDC can't scale out if a single table is a hotspot. We have considered changing the load balance granularity to region level, but it will introduce too many changes to the current architecture. We are working in progress with the core module refactor in TiCDC, but we won't change the load balance granularity level recently.

oryx2 commented 3 years ago

It seems current granularity of load balance between captures is tables of single changefeed. See bellow scenario: 1.setup cluster tiup playground --ticdc 3 image 2.create changefeed that only process single table three time, so we create three different changefeed. /cdc cli changefeed create --pd http://127.0.0.1:2379 --sink-uri "blackhole://" --config "config/changefeed.toml" 3.One capture will idle image

Maybe we can select idle capture in all changefeeds.