pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.87k stars 5.8k forks source link

Improve lightning sample efficiency #33285

Open fubinzh opened 2 years ago

fubinzh commented 2 years ago
  1. Currently tidb-lightning doesn't support parallel sampling, it may takes lots of time when there is lots of tables to be sample.
  2. tidb-lightning has to do sampling again even we resume from a lightning checkpoint

e.g. when tidb-lightning import 60k tables, sample may take about 20 minutes, and we have to do it again when resume from a lightning checkpoint.

[root@zhengrong-24 lightning]# grep "sample" -A1 tidb-lightning.log.3 | head
[2022/03/21 16:05:52.757 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=16unit_8_agent_order_fs]
[2022/03/21 16:05:52.842 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=16unit_8_agent_order_fs] [IndexRatio=1.6089431042036386] [IsSourceOrder=true]
[2022/03/21 16:05:52.843 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=99944unit_0_game_bets_game_tag_analysis_68]
[2022/03/21 16:05:52.948 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=99944unit_0_game_bets_game_tag_analysis_68] [IndexRatio=2.1529074709211184] [IsSourceOrder=true]
[2022/03/21 16:05:52.949 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=99944unit_4_game_bets_game_tag_analysis_13]
[2022/03/21 16:05:53.050 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=99944unit_4_game_bets_game_tag_analysis_13] [IndexRatio=2.154797652704456] [IsSourceOrder=true]
[2022/03/21 16:05:53.051 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=63unit_2_game_bets_game_tag_34]
[2022/03/21 16:05:53.148 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=63unit_2_game_bets_game_tag_34] [IndexRatio=1.869930364913189] [IsSourceOrder=true]
[2022/03/21 16:05:53.149 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=64unit_1_game_bets_game_tag_21]
[2022/03/21 16:05:53.241 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=64unit_1_game_bets_game_tag_21] [IndexRatio=1.8727387995575133] [IsSourceOrder=true]
[root@zhengrong-24 lightning]# grep "sample" -A1 tidb-lightning.log.3 | tail
[2022/03/21 16:25:06.969 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=55unit_0_game_bets_all_analysis]
[2022/03/21 16:25:07.068 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=55unit_0_game_bets_all_analysis] [IndexRatio=1.9778091139419542] [IsSourceOrder=true]
[2022/03/21 16:25:07.069 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=54unit_0_game_bets_all_analysis]
[2022/03/21 16:25:07.171 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=54unit_0_game_bets_all_analysis] [IndexRatio=1.9706607650964274] [IsSourceOrder=true]
[2022/03/21 16:25:07.172 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=24unit_0_game_bets_all_analysis]
[2022/03/21 16:25:07.272 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=24unit_0_game_bets_all_analysis] [IndexRatio=1.979501191729837] [IsSourceOrder=true]
[2022/03/21 16:25:07.273 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=19unit_0_game_bets_all_analysis]
[2022/03/21 16:25:07.372 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=19unit_0_game_bets_all_analysis] [IndexRatio=1.985183783660232] [IsSourceOrder=true]
[2022/03/21 16:25:07.373 +08:00] [INFO] [check_info.go:996] ["sample file start"] [table=57unit_0_game_bets_all_analysis]
[2022/03/21 16:25:07.472 +08:00] [INFO] [check_info.go:1080] ["Sample source data"] [table=57unit_0_game_bets_all_analysis] [IndexRatio=1.9682700665034294] [IsSourceOrder=true]
fubinzh commented 2 years ago

/cc @gozssky