pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
940 stars 409 forks source link

Concurrency issue between rename partitioned table and `applyTable` #9233

Open JaySon-Huang opened 1 month ago

JaySon-Huang commented 1 month ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

https://ci.pingcap.net/blue/organizations/jenkins/tiflash-ghpr-integration-tests/detail/tiflash-ghpr-integration-tests/16561/pipeline/

fullstack-test2-logs.tar.gz

[2024-07-12T04:46:15.000Z] fullstack-test2/ddl/rename_table_across_databases.test: Running
[2024-07-12T04:46:33.020Z]   File: fullstack-test2/ddl/rename_table_across_databases.test
[2024-07-12T04:46:33.020Z]   Error line: 117
[2024-07-12T04:46:33.020Z]   Error: set session tidb_isolation_read_engines='tiflash'; select * from test_new.part4 order by id;
[2024-07-12T04:46:33.020Z]   Result:
[2024-07-12T04:46:33.020Z]     ERROR 1105 (HY000) at line 1: other error for mpp stream: Code: 107, e.displayText() = DB::Exception: Cannot open file /tmp/tiflash/data/db/metadata/db_708/t_713.sql, errno: 2, strerror: No such file or directory, e.what() = DB::Exception,
[2024-07-12T04:46:33.020Z]   Expected:
[2024-07-12T04:46:33.020Z]     +----+----------+------+
[2024-07-12T04:46:33.020Z]     | id | store_id | c1   |
[2024-07-12T04:46:33.020Z]     +----+----------+------+
[2024-07-12T04:46:33.020Z]     |  1 |        1 | NULL |
[2024-07-12T04:46:33.020Z]     |  2 |        2 | NULL |
[2024-07-12T04:46:33.020Z]     |  3 |        3 | NULL |
[2024-07-12T04:46:33.020Z]     | 11 |       11 | NULL |
[2024-07-12T04:46:33.020Z]     | 16 |       16 | NULL |
[2024-07-12T04:46:33.020Z]     +----+----------+------+

4. What is your TiFlash version? (Required)

release-7.5

JaySon-Huang commented 1 month ago

This is a concurrent issue about renaming partitioned table across databases and tiflash can recover itself in following queries. So mark it as moderate.


Thread-A enter TiDBSchemaSyncer::syncSchemaDiffs and run into RenameTable. That overwrite the id_mapping for table_id=710 old_database_id=2 new_database_id=708.

[2024/07/12 12:46:31.169 +08:00] [INFO] [TiDBSchemaSyncer.cpp:261] ["Sync table schema begin, table_id=712"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.170 +08:00] [WARN] [SchemaBuilder.cpp:1638] ["table is not exist in TiKV, applyTable need retry, get_by_mvcc=false database_id=2 logical_table_id=710"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.170 +08:00] [WARN] [TiDBSchemaSyncer.cpp:274] ["Can not apply table schema because the table_id_map is not up-to-date, try to syncSchemas. physical_table_id=712 database_id=2 logical_table_id=710"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.170 +08:00] [INFO] [TiDBSchemaSyncer.cpp:96] ["Start to sync schemas. current version is: 916 and try to sync schema version to: 926"] [source="keyspace=4294967295"] [thread_id=788]
...
[2024/07/12 12:46:31.214 +08:00] [TRACE] [SchemaBuilder.cpp:262] ["applyDiff accept type=RenameTable"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.214 +08:00] [WARN] [TableIDMap.cpp:41] ["table_id to database_id is being overwrite, table_id=710 old_database_id=2 new_database_id=708"] [source="keyspace=4294967295"] [thread_id=788]

Thread-B can not find the table .sql file in database_id=708 and raise an error

[2024/07/12 12:46:31.222 +08:00] [INFO] [TiDBSchemaSyncer.cpp:261] ["Sync table schema begin, table_id=713"] [source="keyspace=4294967295"] [thread_id=790]
[2024/07/12 12:46:31.228 +08:00] [INFO] [SchemaBuilder.cpp:1698] ["Alter table db_708.t_713 begin, database_id=708 table_id=713"] [source="keyspace=4294967295"] [thread_id=790]
[2024/07/12 12:46:31.233 +08:00] [ERROR] [MPPTask.cpp:644] ["task running meets error: Code: 107, e.displayText() = DB::Exception: Cannot open file /tmp/tiflash/data/db/metadata/db_708/t_713.sql, errno: 2, strerror: No such file or directory, e.what() = DB::Exception, Stack trace:\n\n\n       0x42d99be\tStackTrace::StackTrace() [tiflash+70097342]\n       0x42c8262\tDB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+70025826]\n       0x42fcc7a\tDB::ErrnoException::ErrnoException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, int) [tiflash+70241402]\n       0x42f82fe\tDB::throwFromErrno(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, int) [tiflash+70222590]\n       0xc83433c\tDB::PosixRandomAccessFile::PosixRandomAccessFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, std::__1::shared_ptr<DB::ReadLimiter> const&, std::__1::shared_ptr<DB::FileSegment> const&) [tiflash+209929020]\n       0xc828beb\tDB::PosixRandomAccessFile* std::__1::construct_at<DB::PosixRandomAccessFile, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&, DB::PosixRandomAccessFile*>(DB::PosixRandomAccessFile*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209882091]\n       0xc82897b\tvoid std::__1::allocator_traits<std::__1::allocator<DB::PosixRandomAccessFile> >::construct<DB::PosixRandomAccessFile, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&, void, void>(std::__1::allocator<DB::PosixRandomAccessFile>&, DB::PosixRandomAccessFile*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209881467]\n       0xc82867e\tstd::__1::__shared_ptr_emplace<DB::PosixRandomAccessFile, std::__1::allocator<DB::PosixRandomAccessFile> >::__shared_ptr_emplace<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&>(std::__1::allocator<DB::PosixRandomAccessFile>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209880702]\n       0xc8284a4\tstd::__1::shared_ptr<DB::PosixRandomAccessFile> std::__1::allocate_shared<DB::PosixRandomAccessFile, std::__1::allocator<DB::PosixRandomAccessFile>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&, void>(std::__1::allocator<DB::PosixRandomAccessFile> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209880228]\n       0xc827a47\tstd::__1::shared_ptr<DB::PosixRandomAccessFile> std::__1::make_shared<DB::PosixRandomAccessFile, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&, void>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209877575]\n       0xc8238ef\tDB::FileProvider::newRandomAccessFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::EncryptionPath const&, std::__1::shared_ptr<DB::ReadLimiter> const&, int) const [tiflash+209860847]\n       0xc84fb5c\tDB::ReadBufferFromFileProvider::ReadBufferFromFileProvider(std::__1::shared_ptr<DB::FileProvider> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::EncryptionPath const&, unsigned long, std::__1::shared_ptr<DB::ReadLimiter> const&, int, char*, unsigned long) [tiflash+210041692]\n       0xc7ff9a9\tDB::DatabaseTiFlash::alterTable(DB::Context const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ColumnsDescription const&, std::__1::function<void (DB::IAST&)> const&) [tiflash+209713577]\n       0xd63faef\tDB::updateDeltaMergeTableCreateStatement(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<DB::SortColumnDescription, std::__1::allocator<DB::SortColumnDescription> > const&, DB::ColumnsDescription const&, DB::OrderedNameSet const&, std::__1::optional<std::__1::reference_wrapper<TiDB::TableInfo const> >, unsigned long, DB::Context const&) [tiflash+224656111]\n       0xd640515\tDB::StorageDeltaMerge::alterSchemaChange(std::__1::shared_ptr<DB::RWLock::LockHolderImpl> const&, TiDB::TableInfo&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context const&) [tiflash+224658709]\n       0xde9d7b8\tDB::SchemaBuilder<DB::SchemaGetter, DB::SchemaNameMapper>::applyTable(long, long, long, bool) [tiflash+233428920]\n       0xde6fa32\tDB::TiDBSchemaSyncer<false, false>::trySyncTableSchema(DB::Context&, long, DB::SchemaGetter&, bool, char const*) [tiflash+233241138]\n       0xde6ecf9\tDB::TiDBSchemaSyncer<false, false>::syncTableSchema(DB::Context&, long) [tiflash+233237753]\n       0xc9a8858\tDB::TiDBSchemaSyncerManager::syncTableSchema(DB::Context&, unsigned int, long) [tiflash+211454040]\n       0xe316822\tDB::DAGStorageInterpreter::getAndLockStorages(long)::$_8::operator()(long) const [tiflash+238118946]\n       0xe30f884\tDB::DAGStorageInterpreter::getAndLockStorages(long) [tiflash+238090372]\n       0xe308487\tDB::DAGStorageInterpreter::prepare() [tiflash+238060679]\n       0xe309311\tDB::DAGStorageInterpreter::execute(DB::PipelineExecutorContext&, DB::PipelineExecGroupBuilder&) [tiflash+238064401]\n       0xe7152d1\tDB::PhysicalTableScan::buildPipeline(DB::PipelineBuilder&, DB::Context&, DB::PipelineExecutorContext&) [tiflash+242307793]\n       0xe64935b\tDB::PhysicalPlanNode::buildPipeline(DB::PipelineBuilder&, DB::Context&, DB::PipelineExecutorContext&) [tiflash+241472347]\n       0xe64935b\tDB::PhysicalPlanNode::buildPipeline(DB::PipelineBuilder&, DB::Context&, DB::PipelineExecutorContext&) [tiflash+241472347]\n       0xe640c12\tDB::PhysicalPlan::toPipeline(DB::PipelineExecutorContext&, DB::Context&) [tiflash+241437714]\n       0xe5b327d\tDB::PipelineExecutor::PipelineExecutor(std::__1::shared_ptr<MemoryTracker> const&, DB::AutoSpillTrigger*, std::__1::function<void (std::__1::shared_ptr<DB::OperatorSpillContext> const&)> const&, DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+240857725]\n       0xe23d0ef\tstd::__1::__unique_if<DB::PipelineExecutor>::__unique_single std::__1::make_unique<DB::PipelineExecutor, std::__1::shared_ptr<MemoryTracker>&, std::nullptr_t, std::nullptr_t, DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&>(std::__1::shared_ptr<MemoryTracker>&, std::nullptr_t&&, std::nullptr_t&&, DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+237228271]\n       0xe237e73\tDB::(anonymous namespace)::executeAsPipeline(DB::Context&, bool) [tiflash+237207155]\n       0xe23747b\tDB::queryExecute(DB::Context&, bool) [tiflash+237204603]\n       0xe4d8a18\tDB::MPPTask::preprocess() [tiflash+239962648]"] [source="MPP<gather_id:<gather_id:3, query_ts:1720759591160913143, local_query_id:163, server_id:1709, start_ts:451086802253250576, resource_group: default>,task_id:2>"] [thread_id=790]
...

Thread-A end for renaming the partitioned table


[2024/07/12 12:46:31.232 +08:00] [INFO] [SchemaBuilder.cpp:740] ["Rename table db_2.t_713 (display name: t_713) to db_708.t_713 begin, database_id=708 table_id=713"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.237 +08:00] [INFO] [SchemaBuilder.cpp:763] ["Rename table db_2.t_713 (display name: t_713) to db_708.t_713 end, database_id=708 table_id=713"] [source="keyspace=4294967295"] [thread_id=788]