pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
942 stars 410 forks source link

Check index.has_value(): Can not find path for PageFile file_id=221_0 #9406

Closed CalvinNeo closed 2 weeks ago

CalvinNeo commented 3 weeks ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

[2024/05/23 14:30:47.741 +00:00] [ERROR] [Exception.cpp:96] ["Code: 49, e.displayText() = DB::Exception: Check index.has_value() failed: Can not find path for PageFile file_id=221_0, e.what() = DB::Exception, Stack trace:\n\n\n       0x1f5441e\tDB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+32850974]\n                \tdbms/src/Common/Exception.h:46\n       0x80f57dc\tDB::PSDiskDelegatorGlobalMulti::getPageFilePath(std::__1::pair<unsigned long, unsigned int> const&) const [tiflash+135223260]\n                \tdbms/src/Storages/PathPool.cpp:1133\n       0x1ec5daf\tDB::PS::V3::BlobStore<DB::PS::V3::universal::BlobStoreTrait>::getBlobFile(unsigned long) [tiflash+32267695]\n                \tdbms/src/Storages/Page/V3/BlobStore.cpp:1506\n       0x1ec6e0d\tDB::PS::V3::BlobStore<DB::PS::V3::universal::BlobStoreTrait>::read(DB::UniversalPageId const&, unsigned long, unsigned long, char*, unsigned long, std::__1::shared_ptr<DB::ReadLimiter> const&, bool) [tiflash+32271885]\n                \tdbms/src/Storages/Page/V3/BlobStore.cpp:1146\n       0x1ece821\tDB::PS::V3::BlobStore<DB::PS::V3::universal::BlobStoreTrait>::read(std::__1::pair<DB::UniversalPageId, DB::PS::V3::PageEntryV3> const&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+32303137]\n                \tdbms/src/Storages/Page/V3/BlobStore.cpp:1099\n       0x877a57d\tDB::PS::V3::CPFilesWriter::writeEditsAndApplyCheckpointInfo(DB::PS::V3::PageEntriesEdit<DB::UniversalPageId>&, DB::PS::V3::CPFilesWriter::CompactOptions const&, bool) [tiflash+142058877]\n                \tdbms/src/Storages/Page/V3/CheckpointFile/CPFilesWriter.cpp:185\n       0x875f83e\tDB::UniversalPageStorage::dumpIncrementalCheckpoint(DB::UniversalPageStorage::DumpCheckpointOptions const&) [tiflash+141948990]\n                \tdbms/src/Storages/Page/V3/Universal/UniversalPageStorage.cpp:547\n       0x876fcca\tstd::__1::__function::__func<DB::UniversalPageStorageService::create(DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::PSDiskDelegator>, DB::PageStorageConfig const&)::$_3, std::__1::allocator<DB::UniversalPageStorageService::create(DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::PSDiskDelegator>, DB::PageStorageConfig const&)::$_3>, bool ()>::operator()() [tiflash+142015690]\n                \t/usr/local/bin/../include/c++/v1/__functional/function.h:345\n       0x80cd67b\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, DB::BackgroundProcessingPool::BackgroundProcessingPool(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)::$_1> >(void*) [tiflash+135059067]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7fef18dcaea5\tstart_thread [libpthread.so.0+32421]\n  0x7fef186d996d\t__clone [libc.so.6+1042797]"] [source="DB::PS::V3::CPDataDumpStats DB::PS::V3::CPFilesWriter::writeEditsAndApplyCheckpointInfo(universal::PageEntriesEdit &, const CPFilesWriter::CompactOptions &, bool)"] [thread_id=511]
[2024/05/23 14:30:48.021 +00:00] [ERROR] [Exception.cpp:96] ["Code: 49, e.displayText() = DB::Exception: Check index.has_value() failed: Can not find path for PageFile file_id=221_0, e.what() = DB::Exception, Stack trace:\n\n\n       0x1f5441e\tDB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+32850974]\n                \tdbms/src/Common/Exception.h:46\n       0x80f57dc\tDB::PSDiskDelegatorGlobalMulti::getPageFilePath(std::__1::pair<unsigned long, unsigned int> const&) const [tiflash+135223260]\n                \tdbms/src/Storages/PathPool.cpp:1133\n       0x1ec5daf\tDB::PS::V3::BlobStore<DB::PS::V3::universal::BlobStoreTrait>::getBlobFile(unsigned long) [tiflash+32267695]\n                \tdbms/src/Storages/Page/V3/BlobStore.cpp:1506\n       0x1ec6e0d\tDB::PS::V3::BlobStore<DB::PS::V3::universal::BlobStoreTrait>::read(DB::UniversalPageId const&, unsigned long, unsigned long, char*, unsigned long, std::__1::shared_ptr<DB::ReadLimiter> const&, bool) [tiflash+32271885]\n                \tdbms/src/Storages/Page/V3/BlobStore.cpp:1146\n       0x1ece821\tDB::PS::V3::BlobStore<DB::PS::V3::universal::BlobStoreTrait>::read(std::__1::pair<DB::UniversalPageId, DB::PS::V3::PageEntryV3> const&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+32303137]\n                \tdbms/src/Storages/Page/V3/BlobStore.cpp:1099\n       0x877a57d\tDB::PS::V3::CPFilesWriter::writeEditsAndApplyCheckpointInfo(DB::PS::V3::PageEntriesEdit<DB::UniversalPageId>&, DB::PS::V3::CPFilesWriter::CompactOptions const&, bool) [tiflash+142058877]\n                \tdbms/src/Storages/Page/V3/CheckpointFile/CPFilesWriter.cpp:185\n       0x875f83e\tDB::UniversalPageStorage::dumpIncrementalCheckpoint(DB::UniversalPageStorage::DumpCheckpointOptions const&) [tiflash+141948990]\n                \tdbms/src/Storages/Page/V3/Universal/UniversalPageStorage.cpp:547\n       0x876fcca\tstd::__1::__function::__func<DB::UniversalPageStorageService::create(DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::PSDiskDelegator>, DB::PageStorageConfig const&)::$_3, std::__1::allocator<DB::UniversalPageStorageService::create(DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::PSDiskDelegator>, DB::PageStorageConfig const&)::$_3>, bool ()>::operator()() [tiflash+142015690]\n                \t/usr/local/bin/../include/c++/v1/__functional/function.h:345\n       0x80cd67b\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, DB::BackgroundProcessingPool::BackgroundProcessingPool(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)::$_1> >(void*) [tiflash+135059067]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7fef18dcaea5\tstart_thread [libpthread.so.0+32421]\n  0x7fef186d996d\t__clone [libc.so.6+1042797]"] [source="void DB::BackgroundProcessingPool::threadFunction(size_t)"] [thread_id=511]

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

CalvinNeo commented 3 weeks ago

Thread B Background GC: Find that blob_id=161 should do full gc Copy the data from blob_id=161 to blob_id=221

Thread A DumpIncrSnap: Acquire a snap-A Call dumpIncrementalCheckpoint, get edit_from_mem with page_id_1 -> e1{blob_id=161} [v1] Thread A yield

Thread B resume: Copy data from blob_id=161 to blob_id=221 done, blob_id=161 become "ReadOnly" gcApply will add a new "version" that page_id_1 -> e1'{blob_id=221} [v1'] in the PageDirectory Next GC round run, PageDirectory::gcInMemEntries will remove [v1] but keep [v1'] for page_id_1 blob_id=161 is "ReadOnly" and [v1] is removed, no others entries left on blob_id=161, then the file is removed from disk

Thread A resume: Try to read page data by e1{blob_id=161}, but find that blob_id=161 is already removed from disk