pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
948 stars 410 forks source link

all cn crash when run ch after restore data #8781

Closed Lily2025 closed 8 months ago

Lily2025 commented 9 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、[2024/02/19 01:18:01.285 +08:00] [INFO] [run.go:123] ["Execute command"] [command="export AWS_ACCESS_KEY_ID=minioadmin;export AWS_SECRET_ACCESS_KEY=minioadmin;/br restore db --send-credentials-to-tikv=true --db tpcc --pd http://tc-pd.ha-test-serverless-htap-tps-6720101-1-25 2、 [2024/02/19 01:26:44.943 +08:00] [INFO] [db.go:103] ["select * from information_schema.tiflash_replica"] [2024/02/19 01:26:44.946 +08:00] [INFO] [prepare_data.go:260] ["tiflash replica"="[{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2755\",\"TABLE_NAME\":\"nation\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2725\",\"TABLE_NAME\":\"warehouse\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2729\",\"TABLE_NAME\":\"customer\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2731\",\"TABLE_NAME\":\"history\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2759\",\"TABLE_NAME\":\"supplier\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2741\",\"TABLE_NAME\":\"item\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2757\",\"TABLE_NAME\":\"region\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2733\",\"TABLE_NAME\":\"new_order\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2737\",\"TABLE_NAME\":\"order_line\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2727\",\"TABLE_NAME\":\"district\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2739\",\"TABLE_NAME\":\"stock\",\"TABLE_SCHEMA\":\"tpcc\"},{\"AVAILABLE\":\"1\",\"LOCATION_LABELS\":\"\",\"PROGRESS\":\"1\",\"REPLICA_COUNT\":\"2\",\"TABLE_ID\":\"2735\",\"TABLE_NAME\":\"orders\",\"TABLE_SCHEMA\":\"tpcc\"}]"]

3、[2024/02/19 02:22:26.477 +08:00] [INFO] [cmd.go:150] ["Start remote command"] [cmd="go-tpc ch run -D tpcc --host tc-tidb.ha-test-serverless-htap-tps-6720101-1-251 -P4000 --warehouses 2000 -T 32 --acThreads 1 --queries q3 --ignore-error '2013,1213,1105,1205,8022,8028,9004,9007,1062' --time 36000m --user keyspace_a.root --password '' --interval '10s'"] [nodename=benchtoolset]

2. What did you expect to see? (Required)

no crash

3. What did you see instead (Required)

all cn crash

[2024/02/19 02:22:32.336 +08:00] [ERROR] [BaseDaemon.cpp:563] ["\n 0x76f27f1\tfaultSignalHandler(int, siginfo_t, void) [tiflash+124725233]\n \tlibs/libdaemon/src/BaseDaemon.cpp:214\n 0x7f531c1a9630\t [libpthread.so.0+63024]\n 0x8f7f4f9\tprometheus::Gauge::Set(double) [tiflash+150467833]\n \tcontrib/prometheus-cpp/core/src/gauge.cc:17\n 0x1e48b39\tDB::ResourceControlQueue<DB::MultiLevelFeedbackQueue >::updateStatistics(std::1::unique_ptr<DB::Task, std::__1::default_delete > const&, DB::ExecTaskStatus, unsigned long) [tiflash+31755065]\n \tdbms/src/Flash/Pipeline/Schedule/TaskQueues/ResourceControlQueue.cpp:159\n 0x1e41a77\tDB::TaskThreadPool::loop(unsigned long) [tiflash+31726199]\n \tdbms/src/Flash/Pipeline/Schedule/ThreadPool/TaskThreadPool.cpp:59\n 0x1e42146\tvoid* std::1::thread_proxy<std::1::tuple<std::1::unique_ptr<std::1::thread_struct, std::__1::default_delete<std::1::__thread_struct> >, void (DB::TaskThreadPool::)(unsigned long), DB::TaskThreadPool, unsigned long> >(void*) [tiflash+31727942]\n \t/usr/local/bin/../include/c++/v1/thread:291\n 0x7f531c1a1ea5\tstart_thread [libpthread.so.0+32421]"] [source=BaseDaemon] [thread_id=58] [2024/02/19 02:22:34.430 +08:00] [DEBUG] [PageDirectory.cpp:2182] ["After MVCC gc in memory [lowest_seq=1] clean [invalid_snapshot_nums=0] [invalid_page_nums=0] [total_deref_counter=0] [all_del_entries=0]. Still exist [snapshot_nums=0], [page_nums=1]. Longest alive snapshot: [longest_alive_snapshot_time=0] [longest_alive_snapshot_seq=0] [stale_snapshot_nums=0]"] [source=uni_write] [thread_id=3]

img_v3_0287_50bf6e0d-2322-42ed-80f3-87824ab188bg

4. What is your TiFlash version? (Required)

git hash: ecaf78a002b611b128a4683f3653ca240a022914

Lily2025 commented 9 months ago

/type bug /severity major /assign guo-shaoge

guo-shaoge commented 8 months ago

https://github.com/pingcap/tiflash/issues/8233 fixed