pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.86k stars 5.8k forks source link

lightning import stuck when PD/TiKV has label #46658

Open RenlySir opened 1 year ago

RenlySir commented 1 year ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1) deploy tidb in 6 machines, 128c256G,tidb v7.3.0 2) 1 tidb+2 tikv every machine(2 tikv get lable host1/host2) 3)lightning impot csv is slow 4)see lightning log and pd-ctl, always scattering or split region; 5) csv max size is 4GiB,total 20~30GiB,about 1000 tables; 6)then we use tidb-lightning v7.1.0 import tidb v7.3.0,but still get same problem; 7)we deploy tidb v7.1.0,and use tidb-lightning v7.1.0 import csv,still get same problem; 8) finally,we delete pd and tikv label,use tidb-lightning v7.1.0 import csv to tidb v7.1.0, it's fast;

2. What did you expect to see? (Required)

1)lightning should not always scattering or split region; 2) fast import csv into tidb;

3. What did you see instead (Required)

1)always scattering or split region;

4. What is your TiDB version? (Required)

v7.3.0 & v7.1.0

RenlySir commented 1 year ago

[2023/09/04 17:08:50.032 +08:00] [INFO] [import.go:1997] [new_collation_enabled] [enabled=true] [2023/09/04 17:08:50.032 +08:00] [INFO] [import.go:829] ["restore all schema start"] [2023/09/04 17:08:50.038 +08:00] [INFO] [import.go:848] ["restore all schema completed"] [takeTime=6.41073ms] [] [2023/09/04 17:08:50.052 +08:00] [INFO] [precheck_impl.go:1008] ["datafile to check"] [db=pm001] [table=pm_budget_cost_list_middle] [path=pm001.pm_budget_cost_list_middle.csv] [2023/09/04 17:08:50.066 +08:00] [WARN] [util.go:201] ["check table empty failed with no retry"] [query="SELECT 1 FROM pm001.pm_budget_cost_list_middle USE INDEX() LIMIT 1"] [retryCnt=0] [error="sql: no rows in result set"] [2023/09/04 17:08:50.066 +08:00] [INFO] [version.go:429] ["detect server version"] [type=TiDB] [version=7.3.0] [2023/09/04 17:08:50.068 +08:00] [INFO] [version.go:429] ["detect server version"] [type=TiDB] [version=7.3.0] [2023/09/04 17:08:50.069 +08:00] [INFO] [get_pre_info.go:670] ["sample file start"] [table=pm_budget_cost_list_middle] [2023/09/04 17:08:51.211 +08:00] [INFO] [get_pre_info.go:775] ["Sample source data"] [table=pm_budget_cost_list_middle] [IndexRatio=0] [IsSourceOrder=true] [2023/09/04 17:08:51.232 +08:00] [INFO] [import.go:1518] ["restore all tables data start"] [2023/09/04 17:08:51.238 +08:00] [INFO] [checksum_helper.go:52] ["set tidb_backoff_weight to default"] [backoff_weight=6] [2023/09/04 17:08:51.241 +08:00] [INFO] [tikv_mode.go:60] ["switch tikv mode"] [mode=Import] [2023/09/04 17:08:51.246 +08:00] [INFO] [import.go:1657] ["restore table start"] [table=pm001.pm_budget_cost_list_middle] [2023/09/04 17:08:51.246 +08:00] [INFO] [table_import.go:267] ["load engines and files start"] [table=pm001.pm_budget_cost_list_middle] [2023/09/04 17:08:51.246 +08:00] [WARN] [region.go:356] ["file is too big to be processed efficiently; we suggest splitting it at 256 MB each"] [file=pm001.pm_budget_cost_list_middle.csv] [size=4776157572] [2023/09/04 17:08:51.246 +08:00] [INFO] [region.go:293] [makeTableRegions] [filesCount=1] [MaxChunkSize=268435456] [RegionsCount=1] [BatchSize=107374182400] [cost=140.12µs] [2023/09/04 17:08:51.246 +08:00] [INFO] [table_import.go:310] ["load engines and files completed"] [table=pm001.pm_budget_cost_list_middle] [enginesCnt=2] [filesCnt=1] [takeTime=194.83µs] [] [2023/09/04 17:08:51.247 +08:00] [INFO] [version.go:429] ["detect server version"] [type=TiDB] [version=7.3.0] [2023/09/04 17:08:51.254 +08:00] [INFO] [backend.go:228] ["open engine"] [engineTag=pm001.pm_budget_cost_list_middle:-1] [engineUUID=0a8aa2d4-6bf7-5d91-bf94-8637af809b77] [2023/09/04 17:08:51.254 +08:00] [INFO] [table_import.go:448] ["import whole table start"] [table=pm001.pm_budget_cost_list_middle] [2023/09/04 17:08:51.254 +08:00] [INFO] [table_import.go:495] ["restore engine start"] [table=pm001.pm_budget_cost_list_middle] [engineNumber=0] [2023/09/04 17:08:51.254 +08:00] [INFO] [table_import.go:629] ["encode kv data and write start"] [table=pm001.pm_budget_cost_list_middle] [engineNumber=0] [2023/09/04 17:08:51.257 +08:00] [INFO] [backend.go:228] ["open engine"] [engineTag=pm001.pm_budget_cost_list_middle:0] [engineUUID=2b800536-e0e0-566a-bd51-d86778581ff0] [2023/09/04 17:08:51.268 +08:00] [INFO] [chunk_process.go:205] ["restore file start"] [table=pm001.pm_budget_cost_list_middle] [engineNumber=0] [fileIndex=0] [path=pm001.pm_budget_cost_list_middle.csv:0] [2023/09/04 17:11:24.389 +08:00] [INFO] [engine.go:1513] ["compact sst"] [fileCount=4] [size=536939978] [count=6298225] [cost=2.33863221s] [file=/data2/sort-dir/0a8aa2d4-6bf7-5d91-bf94-8637af809b77.sst/e832f81c-e6f9-41f6-894e-ce27b1afc6c1.sst] [2023/09/04 17:11:24.390 +08:00] [INFO] [engine.go:821] ["write data to local DB"] [size=536939978] [kvs=6298225] [files=1] [sstFileSize=53147468] [file=/data2/sort-dir/0a8aa2d4-6bf7-5d91-bf94-8637af809b77.sst/e832f81c-e6f9-41f6-894e-ce27b1afc6c1.sst] [firstKey=7480000000000000645F69800000000000000100038000000000000003] [lastKey=7480000000000000645F698000000000000005016161383565623563FF6432303434383430FF6233353434353233FF6366313163303735FF0000000000000000F703F80000000011D1E3] [2023/09/04 17:13:37.887 +08:00] [INFO] [engine.go:1513] ["compact sst"] [fileCount=4] [size=536903978] [count=5460625] [cost=2.03476926s] [file=/data2/sort-dir/0a8aa2d4-6bf7-5d91-bf94-8637af809b77.sst/1a02f048-bcef-41ba-8b8e-94104bfbc108.sst] [2023/09/04 17:13:37.887 +08:00] [INFO] [engine.go:821] ["write data to local DB"] [size=536903978] [kvs=5460625] [files=1] [sstFileSize=52329973] [file=/data2/sort-dir/0a8aa2d4-6bf7-5d91-bf94-8637af809b77.sst/1a02f048-bcef-41ba-8b8e-94104bfbc108.sst] [firstKey=7480000000000000645F698000000000000001000380000000001338A1] [lastKey=7480000000000000645F698000000000000005016639326664373664FF3361306234303762FF6232633834313338FF6539653763336638FF0000000000000000F703B8000000001E5FEC] [2023/09/04 17:13:51.242 +08:00] [INFO] [import.go:1372] [progress] [total=30.0%] [tables="0/1 (0.0%)"] [chunks="0/1 (0.0%)"] [engines="0/2 (0.0%)"] [restore-bytes=2.668GiB/4.448GiB] [restore-rows=2483737/4140223(estimated)] [] ["encode speed(MiB/s)"=9.109192149209914] [state=preparing] [remaining=11m40s] [2023/09/04 17:13:51.242 +08:00] [INFO] [tikv_mode.go:60] ["switch tikv mode"] [mode=Import] [2023/09/04 17:15:41.933 +08:00] [INFO] [engine.go:1513] ["compact sst"] [fileCount=4] [size=536939892] [count=4628770] [cost=1.93038008s] [file=/data2/sort-dir/0a8aa2d4-6bf7-5d91-bf94-8637af809b77.sst/494870ae-540f-4566-8bc5-3bdc4947e242.sst] [2023/09/04 17:15:41.933 +08:00] [INFO] [engine.go:821] ["write data to local DB"] [size=536939892] [kvs=4628770] [files=1] [sstFileSize=56699934] [file=/data2/sort-dir/0a8aa2d4-6bf7-5d91-bf94-8637af809b77.sst/494870ae-540f-4566-8bc5-3bdc4947e242.sst] [firstKey=7480000000000000645F698000000000000001013064303932373165FF3761303534313265FF6232643663643338FF3065306362653466FF0000000000000000F70380000000002709AC] [lastKey=7480000000000000645F698000000000000005016639326664373664FF3361306234303762FF6232633834313338FF6539653763336638FF0000000000000000F703F8000000002A121D] [2023/09/04 17:17:00.381 +08:00] [INFO] [chunk_process.go:220] ["restore file completed"] [table=pm001.pm_budget_cost_list_middle] [engineNumber=0] [fileIndex=0] [path=pm001.pm_budget_cost_list_middle.csv:0] [readDur=1m16.6126661s] [encodeDur=5m59.66854178s] [deliverDur=1m4.93365423s] [checksum="{cksum=17961671975055942448,size=7937968099,kvs=23304018}"] [takeTime=8m9.11284148s] [] [2023/09/04 17:17:01.910 +08:00] [INFO] [table_import.go:816] ["encode kv data and write completed"] [table=pm001.pm_budget_cost_list_middle] [engineNumber=0] [read(bytes)=0] [written=7937968099] [takeTime=8m10.65560164s] [] [2023/09/04 17:17:01.910 +08:00] [INFO] [backend.go:305] ["engine close start"] [engineTag=pm001.pm_budget_cost_list_middle:0] [engineUUID=2b800536-e0e0-566a-bd51-d86778581ff0] [2023/09/04 17:17:18.174 +08:00] [INFO] [engine.go:1513] ["compact sst"] [fileCount=45] [size=5977744661] [count=3884003] [cost=16.26412594s] [file=/data2/sort-dir/2b800536-e0e0-566a-bd51-d86778581ff0.sst/656f010a-31e7-460c-af4d-e0dbb5a245e4.sst] [2023/09/04 17:17:18.175 +08:00] [INFO] [engine.go:821] ["write data to local DB"] [size=5977744661] [kvs=3884003] [files=1] [sstFileSize=1887508295] [file=/data2/sort-dir/2b800536-e0e0-566a-bd51-d86778581ff0.sst/656f010a-31e7-460c-af4d-e0dbb5a245e4.sst] [firstKey=7480000000000000645F728000000000000003] [lastKey=7480000000000000645F72F8000000003B43DE] [2023/09/04 17:17:18.176 +08:00] [INFO] [backend.go:307] ["engine close completed"] [engineTag=pm001.pm_budget_cost_list_middle:0] [engineUUID=2b800536-e0e0-566a-bd51-d86778581ff0] [takeTime=16.26613196s] [] [2023/09/04 17:17:19.469 +08:00] [INFO] [engine.go:1513] ["compact sst"] [fileCount=3] [size=349439590] [count=3032395] [cost=1.29226752s] [file=/data2/sort-dir/0a8aa2d4-6bf7-5d91-bf94-8637af809b77.sst/f009771a-3242-4672-86e1-5c7624902b5b.sst] [2023/09/04 17:17:19.469 +08:00] [INFO] [engine.go:821] ["write data to local DB"] [size=349439590] [kvs=3032395] [files=1] [sstFileSize=38906580] [file=/data2/sort-dir/0a8aa2d4-6bf7-5d91-bf94-8637af809b77.sst/f009771a-3242-4672-86e1-5c7624902b5b.sst] [firstKey=7480000000000000645F698000000000000001000380000000003973EF] [lastKey=7480000000000000645F698000000000000005016639326664373664FF3361306234303762FF6232633834313338FF6539653763336638FF0000000000000000F703F8000000003AEDFA] [2023/09/04 17:17:19.471 +08:00] [INFO] [table_import.go:497] ["restore engine completed"] [table=pm001.pm_budget_cost_list_middle] [engineNumber=0] [takeTime=8m28.21669612s] [] [2023/09/04 17:17:19.471 +08:00] [INFO] [table_import.go:1194] ["import and cleanup engine start"] [engineTag=pm001.pm_budget_cost_list_middle:0] [engineUUID=2b800536-e0e0-566a-bd51-d86778581ff0] [2023/09/04 17:17:19.471 +08:00] [INFO] [backend.go:347] ["import start"] [engineTag=pm001.pm_budget_cost_list_middle:0] [engineUUID=2b800536-e0e0-566a-bd51-d86778581ff0] [retryCnt=0] [2023/09/04 17:17:19.474 +08:00] [INFO] [local.go:1054] ["split engine key ranges"] [engine=2b800536-e0e0-566a-bd51-d86778581ff0] [totalSize=5977744661] [totalCount=3884003] [firstKey=7480000000000000645F728000000000000003] [lastKey=7480000000000000645F72F8000000003B43DE] [ranges=60] [2023/09/04 17:17:19.474 +08:00] [INFO] [local.go:1439] ["pause pd scheduler of table scope"] [2023/09/04 17:17:19.477 +08:00] [INFO] [local.go:1477] ["start import engine"] [uuid=2b800536-e0e0-566a-bd51-d86778581ff0] ["region ranges"=60] [count=3884003] [size=5977744661] [2023/09/04 17:17:19.477 +08:00] [INFO] [local.go:1078] ["import engine ranges"] [count=60] [2023/09/04 17:17:19.477 +08:00] [INFO] [local.go:1091] ["split and scatter ranges start"] [uuid=2b800536-e0e0-566a-bd51-d86778581ff0] [2023/09/04 17:17:19.477 +08:00] [INFO] [localhelper.go:165] ["split and scatter region"] [minKey=7480000000000000FF645F728000000000FF0000030000000000FA] [maxKey=7480000000000000FF645F72F800000000FF3B43DF0000000000FA] [retry=0] [2023/09/04 17:17:19.478 +08:00] [INFO] [localhelper.go:184] ["paginate scan regions"] [count=8] [start=7480000000000000FF645F728000000000FF0000030000000000FA] [end=7480000000000000FF645F72F800000000FF3B43DF0000000000FA] [2023/09/04 17:17:19.478 +08:00] [INFO] [localhelper.go:192] ["paginate scan region finished"] [minKey=7480000000000000FF645F728000000000FF0000030000000000FA] [maxKey=7480000000000000FF645F72F800000000FF3B43DF0000000000FA] [regions=8] [2023/09/04 17:17:19.488 +08:00] [WARN] [client.go:366] ["fail to split region"] [region="{ID=348,startKey=7480000000000000FF645F698000000000FF0000050000000000FA,endKey=7480000000000000FF645F729000000000FF0000000000000000FA,epoch=\"conf_ver:29 version:67 \",peers=\"id:349 store_id:10 ,id:351 store_id:12 ,id:412 store_id:8 \"}"] [regionErr="message:\"EpochNotMatch [region 348] 349 epoch changed conf_ver: 29 version: 68 != conf_ver: 29 version: 67, retry later\" epoch_not_match:<current_regions:<id:348 start_key:\"t\200\000\000\000\000\000\000\377d_r\200\000\000\000\000\377\000\000\003\000\000\000\000\000\372\" end_key:\"t\200\000\000\000\000\000\000\377d_r\220\000\000\000\000\377\000\000\000\000\000\000\000\000\372\" region_epoch: peers: peers: peers: > > "] [2023/09/04 17:17:19.489 +08:00] [WARN] [client.go:366] ["fail to split region"] [region="{ID=25,startKey=7480000000000000FF645F72F000000000FF0000000000000000FA,endKey=748000FFFFFFFFFFFFF900000000000000F8,epoch=\"conf_ver:29 version:62 \",peers=\"id:119 store_id:10 ,id:320 store_id:7 ,id:414 store_id:6 \"}"] [regionErr="message:\"EpochNotMatch [region 25] 119 epoch changed conf_ver: 29 version: 63 != conf_ver: 29 version: 62, retry later\" epoch_not_match:<current_regions:<id:25 start_key:\"t\200\000\000\000\000\000\000\377d_r\370\000\000\000\000\377;C\337\000\000\000\000\000\372\" end_key:\"t\200\000\377\377\377\377\377\377\371\000\000\000\000\000\000\000\370\" region_epoch: peers: peers: peers: > > "] [2023/09/04 17:17:19.489 +08:00] [WARN] [localhelper.go:307] ["split regions"] [error="batch split regions failed: split region failed: err=message:\"EpochNotMatch [region 348] 349 epoch changed conf_ver: 29 version: 68 != conf_ver: 29 version: 67, retry later\" epoch_not_match:<current_regions:<id:348 start_key:\"t\200\000\000\000\000\000\000\377d_r\200\000\000\000\000\377\000\000\003\000\000\000\000\000\372\" end_key:\"t\200\000\000\000\000\000\000\377d_r\220\000\000\000\000\377\000\000\000\000\000\000\000\000\372\" region_epoch: peers: peers: peers: > > : [BR:Restore:ErrRestoreSplitFailed]fail to split region"] ["retry time"=0] [region_id=348] [2023/09/04 17:17:19.489 +08:00] [WARN] [localhelper.go:307] ["split regions"] [error="batch split regions failed: split region failed: err=message:\"EpochNotMatch [region 25] 119 epoch changed conf_ver: 29 version: 63 != conf_ver: 29 version: 62, retry later\" epoch_not_match:<current_regions:<id:25 start_key:\"t\200\000\000\000\000\000\000\377d_r\370\000\000\000\000\377;C\337\000\000\000\000\000\372\" end_key:\"t\200\000\377\377\377\377\377\377\371\000\000\000\000\000\000\000\370\" region_epoch: peers: peers: peers: > > : [BR:Restore:ErrRestoreSplitFailed]fail to split region"] ["retry time"=0] [region_id=25] [2023/09/04 17:17:19.491 +08:00] [INFO] [client.go:496] ["checking whether need to scatter"] [store=12] [max-replica=3] [2023/09/04 17:17:19.494 +08:00] [WARN] [localhelper.go:448] ["scatter region failed"] [regionCount=7] [failedCount=5] [error="region 549 not found"] [errorVerbose="region 549 not found\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions.func2\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:428\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:420\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).SplitAndScatterRegionByRanges.func3\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:293\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/root/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172"] [2023/09/04 17:17:19.495 +08:00] [WARN] [localhelper.go:448] ["scatter region failed"] [regionCount=7] [failedCount=4] [error="region 561 not found"] [errorVerbose="region 561 not found\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions.func2\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:428\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:420\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).SplitAndScatterRegionByRanges.func3\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:293\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/root/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172"] [2023/09/04 17:17:19.496 +08:00] [WARN] [localhelper.go:448] ["scatter region failed"] [regionCount=8] [failedCount=5] [error="region 589 not found"] [errorVerbose="region 589 not found\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions.func2\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:428\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:420\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).SplitAndScatterRegionByRanges.func3\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:293\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/root/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172"] [2023/09/04 17:17:19.496 +08:00] [WARN] [localhelper.go:448] ["scatter region failed"] [regionCount=7] [failedCount=3] [error="region 517 not found"] [errorVerbose="region 517 not found\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions.func2\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:428\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:420\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).SplitAndScatterRegionByRanges.func3\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:293\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/root/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172"] [2023/09/04 17:17:19.498 +08:00] [WARN] [localhelper.go:448] ["scatter region failed"] [regionCount=7] [failedCount=1] [error="region 457 not found"] [errorVerbose="region 457 not found\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions.func2\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:428\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:420\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).SplitAndScatterRegionByRanges.func3\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:293\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/root/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172"] [2023/09/04 17:17:19.500 +08:00] [INFO] [localhelper.go:317] ["batch split region"] [region_id=364] [keys=8] [firstKey="dIAAAAAAAABkX3LAAAAAAAXxrA=="] [end="dIAAAAAAAABkX3LIAAAAADnXTg=="] [2023/09/04 17:17:19.510 +08:00] [INFO] [localhelper.go:317] ["batch split region"] [region_id=372] [keys=7] [firstKey="dIAAAAAAAABkX3LgAAAAAAjmRA=="] [end="dIAAAAAAAABkX3LoAAAAAC3yjQ=="] [2023/09/04 17:17:19.510 +08:00] [INFO] [localhelper.go:317] ["batch split region"] [region_id=360] [keys=7] [firstKey="dIAAAAAAAABkX3KwAAAAAA2QhQ=="] [end="dIAAAAAAAABkX3K4AAAAADGhsA=="] [2023/09/04 17:17:19.511 +08:00] [INFO] [localhelper.go:317] ["batch split region"] [region_id=352] [keys=7] [firstKey="dIAAAAAAAABkX3KQAAAAAArN3w=="] [end="dIAAAAAAAABkX3KYAAAAAC9xsg=="] [2023/09/04 17:17:19.512 +08:00] [INFO] [localhelper.go:317] ["batch split region"] [region_id=368] [keys=7] [firstKey="dIAAAAAAAABkX3LQAAAAABBgBw=="] [end="dIAAAAAAAABkX3LYAAAAADPf7g=="] [2023/09/04 17:17:19.512 +08:00] [INFO] [localhelper.go:317] ["batch split region"] [region_id=356] [keys=8] [firstKey="dIAAAAAAAABkX3KgAAAAAALw1g=="] [end="dIAAAAAAAABkX3KoAAAAADeffA=="] [2023/09/04 17:17:19.512 +08:00] [INFO] [localhelper.go:165] ["split and scatter region"] [minKey=7480000000000000FF645F728000000000FF0000030000000000FA] [maxKey=7480000000000000FF645F72F800000000FF3B43E00000000000FA] [retry=1] [2023/09/04 17:17:20.514 +08:00] [INFO] [localhelper.go:184] ["paginate scan regions"] [count=53] [start=7480000000000000FF645F728000000000FF0000030000000000FA] [end=7480000000000000FF645F72F800000000FF3B43E00000000000FA] [2023/09/04 17:17:20.514 +08:00] [INFO] [localhelper.go:192] ["paginate scan region finished"] [minKey=7480000000000000FF645F728000000000FF0000030000000000FA] [maxKey=7480000000000000FF645F72F800000000FF3B43E00000000000FA] [regions=53] [2023/09/04 17:17:20.531 +08:00] [INFO] [localhelper.go:317] ["batch split region"] [region_id=348] [keys=7] [firstKey="dIAAAAAAAABkX3KAAAAAABJMCg=="] [end="dIAAAAAAAABkX3KIAAAAADVlQg=="] [2023/09/04 17:17:20.533 +08:00] [WARN] [localhelper.go:448] ["scatter region failed"] [regionCount=8] [failedCount=1] [error="region 690 not found"] [errorVerbose="region 690 not found\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions.func2\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:428\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).BatchSplitRegions\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:420\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).SplitAndScatterRegionByRanges.func3\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:293\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/root/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172"] [2023/09/04 17:17:20.544 +08:00] [INFO] [localhelper.go:317] ["batch split region"] [region_id=425] [keys=8] [firstKey="dIAAAAAAAABkX3LwAAAAAADk7Q=="] [end="dIAAAAAAAABkX3L4AAAAADYhmQ=="] [2023/09/04 17:18:51.242 +08:00] [INFO] [import.go:1372] [progress] [total=50.0%] [tables="0/1 (0.0%)"] [chunks="1/1 (100.0%)"] [engines="0/2 (0.0%)"] [restore-bytes=4.448GiB/4.448GiB] [restore-rows=3884003/3884003(estimated)] [import-bytes=0B/7.393GiB(estimated)] ["encode speed(MiB/s)"=7.591783212050413] [state=importing] [remaining=10m0s] [2023/09/04 17:18:51.242 +08:00] [INFO] [tikv_mode.go:60] ["switch tikv mode"] [mode=Import] [2023/09/04 17:23:51.242 +08:00] [INFO] [import.go:1372] [progress] [total=50.0%] [tables="0/1 (0.0%)"] [chunks="1/1 (100.0%)"] [engines="0/2 (0.0%)"] [restore-bytes=4.448GiB/4.448GiB] [restore-rows=3884003/3884003(estimated)] [import-bytes=0B/7.393GiB(estimated)] ["encode speed(MiB/s)"=5.061122369140358] [state=importing] [remaining=15m0s] [2023/09/04 17:23:51.242 +08:00] [INFO] [tikv_mode.go:60] ["switch tikv mode"] [mode=Import] [2023/09/04 17:28:51.241 +08:00] [INFO] [import.go:1372] [progress] [total=50.0%] [tables="0/1 (0.0%)"] [chunks="1/1 (100.0%)"] [engines="0/2 (0.0%)"] [restore-bytes=4.448GiB/4.448GiB] [restore-rows=3884003/3884003(estimated)] [import-bytes=0B/7.393GiB(estimated)] ["encode speed(MiB/s)"=3.7958211315543746] [state=importing] [remaining=20m0s] [2023/09/04 17:28:51.242 +08:00] [INFO] [tikv_mode.go:60] ["switch tikv mode"] [mode=Import] [2023/09/04 17:33:51.241 +08:00] [INFO] [import.go:1372] [progress] [total=50.0%] [tables="0/1 (0.0%)"] [chunks="1/1 (100.0%)"] [engines="0/2 (0.0%)"] [restore-bytes=4.448GiB/4.448GiB] [restore-rows=3884003/3884003(estimated)] [import-bytes=0B/7.393GiB(estimated)] ["encode speed(MiB/s)"=3.0366455831382404] [state=importing] [remaining=25m0s] [2023/09/04 17:33:51.241 +08:00] [INFO] [tikv_mode.go:60] ["switch tikv mode"] [mode=Import] [2023/09/04 17:38:51.241 +08:00] [INFO] [import.go:1372] [progress] [total=50.0%] [tables="0/1 (0.0%)"] [chunks="1/1 (100.0%)"] [engines="0/2 (0.0%)"] [restore-bytes=4.448GiB/4.448GiB] [restore-rows=3884003/3884003(estimated)] [import-bytes=0B/7.393GiB(estimated)] ["encode speed(MiB/s)"=2.530531238586008] [state=importing] [remaining=30m0s] [2023/09/04 17:38:51.242 +08:00] [INFO] [tikv_mode.go:60] ["switch tikv mode"] [mode=Import] [2023/09/04 17:43:51.241 +08:00] [INFO] [import.go:1372] [progress] [total=50.0%] [tables="0/1 (0.0%)"] [chunks="1/1 (100.0%)"] [engines="0/2 (0.0%)"] [restore-bytes=4.448GiB/4.448GiB] [restore-rows=3884003/3884003(estimated)] [import-bytes=0B/7.393GiB(estimated)] ["encode speed(MiB/s)"=2.1690230146305596] [state=importing] [remaining=35m0s] [2023/09/04 17:43:51.241 +08:00] [INFO] [tikv_mode.go:60] ["switch tikv mode"] [mode=Import] [2023/09/04 17:48:51.242 +08:00] [INFO] [import.go:1372] [progress] [total=50.0%] [tables="0/1 (0.0%)"] [chunks="1/1 (100.0%)"] [engines="0/2 (0.0%)"] [restore-bytes=4.448GiB/4.448GiB] [restore-rows=3884003/3884003(estimated)] [import-bytes=0B/7.393GiB(estimated)] ["encode speed(MiB/s)"=1.8978918960338311] [state=importing] [remaining=40m0s] [2023/09/04 17:48:51.242 +08:00] [INFO] [tikv_mode.go:60] ["switch tikv mode"] [mode=Import] [2023/09/04 17:52:59.328 +08:00] [INFO] [main.go:51] ["got signal to exit"] [signal=hangup] [2023/09/04 17:52:59.328 +08:00] [WARN] [import.go:1243] ["stopping periodic actions"] [error="context canceled"] [2023/09/04 17:52:59.329 +08:00] [INFO] [localhelper.go:392] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=56] [regions=59] [take=35m38.78455842s] [error="wait for scatter region timeout, print the first unfinished region id:445 start_key:\"t\200\000\000\000\000\000\000\377d_r\300\000\000\000\000\3775\274\263\000\000\000\000\000\372\" end_key:\"t\200\000\000\000\000\000\000\377d_r\310\000\000\000\000\377\0135\237\000\000\000\000\000\372\" region_epoch: peers: peers: peers: "] [errorVerbose="wait for scatter region timeout, print the first unfinished region id:445 start_key:\"t\200\000\000\000\000\000\000\377d_r\300\000\000\000\000\3775\274\263\000\000\000\000\000\372\" end_key:\"t\200\000\000\000\000\000\000\377d_r\310\000\000\000\000\377\0135\237\000\000\000\000\000\372\" region_epoch: peers: peers: peers: \ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).waitForScatterRegions\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:509\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).SplitAndScatterRegionByRanges\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:386\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).SplitAndScatterRegionInBatches\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:126\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).prepareAndSendJob\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:1097\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).doImport\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:1576\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(Backend).ImportEngine\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:1482\ngithub.com/pingcap/tidb/br/pkg/lightning/backend.(ClosedEngine).Import\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/backend/backend.go:348\ngithub.com/pingcap/tidb/br/pkg/lightning/importer.(TableImporter).importKV\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/importer/table_import.go:1216\ngithub.com/pingcap/tidb/br/pkg/lightning/importer.(TableImporter).importEngine\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/importer/table_import.go:889\ngithub.com/pingcap/tidb/br/pkg/lightning/importer.(TableImporter).importEngines.func3\n\t/home/jenkins/workspace/build-common@2/go/src/github.com/pingcap/br/br/pkg/lightning/importer/table_import.go:502\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172"] [2023/09/04 17:52:59.329 +08:00] [INFO] [local.go:1105] ["split and scatter ranges completed"] [uuid=2b800536-e0e0-566a-bd51-d86778581ff0] [takeTime=35m39.85155238s] [] [2023/09/04 17:52:59.330 +08:00] [INFO] [import.go:1391] ["cancel periodic actions"] [do=true]

lance6716 commented 1 year ago

We can see that in 17:18 ~ 17:48, lightning seems blocking and we don't know why. Maybe we should capture the goroutine profile when problem occur.