Closed seiya-annie closed 1 year ago
TiFlash rely on "exact once" semantic of put keys. e.g. If we already proposed a Put with key of 7480000000000001FFC95F720380000000FF0000001903800000FF0000008DEE000000FCF9ECF9619EE3FFFB
, then a later Put with the same key will raise the above error.
The issue is occurred since a recent prepare flashback modification has introduced some retrying mechanism which breaks the "exact once" assumption. According to @JmPotato.
@lilinghai Is it possible we add check for the "exact once" semantic to tiflash check on tikv CI?
This is introduced and fixed by https://github.com/tikv/tikv/pull/13823, there is a corner case that will cause a key in DEFAULT_CF
to be written twice during the multiple flashback retries.
TiFlash rely on "exact once" semantic of put keys. e.g. If we already proposed a Put with key of
7480000000000001FFC95F720380000000FF0000001903800000FF0000008DEE000000FCF9ECF9619EE3FFFB
, then a later Put with the same key will raise the above error.The issue is occurred since a recent prepare flashback modification has introduced some retrying mechanism which breaks the "exact once" assumption. According to @JmPotato.
@lilinghai Is it possible we add check for the "exact once" semantic to tiflash check on tikv CI?
if it is a simple e2e test , we could add the test in tikv merge ci
if it is a simple e2e test , we could add the test in tikv merge ci
This case could be covered by the unit test in https://github.com/tikv/tikv/pull/13823.
load latest tikv patch, and run the testcase again, tiflash still can't start after restart during flashback. currently error message is :
[2022/11/29 23:40:23.354 +08:00] [ERROR] [Exception.cpp:89] ["Code: 49, e.displayText() = DB::Exception: cur_offset does not equal to offset, e.what() = DB::Exception, Stack trace:\n\n\n 0x674ab1b\tDB::DM::DeltaMergeStore::write(DB::Context const&, DB::Settings const&, DB::Block&) [tiflash+108309275]\n \tdbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:564\n 0x693a23f\tDB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, std::__1::shared_ptr<DB::Logger> const&)::$_0::operator()(bool) const [tiflash+110338623]\n \tdbms/src/Storages/Transaction/PartitionStreams.cpp:148\n 0x6934fe6\tDB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, std::__1::shared_ptr<DB::Logger> const&) [tiflash+110317542]\n \tdbms/src/Storages/Transaction/PartitionStreams.cpp:181\n 0x6934959\tDB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, std::__1::shared_ptr<DB::Logger> const&, bool) [tiflash+110315865]\n \tdbms/src/Storages/Transaction/PartitionStreams.cpp:359\n 0x69593bd\tDB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&) [tiflash+110465981]\n \tdbms/src/Storages/Transaction/Region.cpp:723\n 0x691ebfe\tDB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&) [tiflash+110226430]\n \tdbms/src/Storages/Transaction/KVStore.cpp:293\n 0x693e145\tHandleWriteRaftCmd [tiflash+110354757]\n \tdbms/src/Storages/Transaction/ProxyFFI.cpp:95\n 0x7f523bcc9f2d\t_$LT$engine_store_ffi..observer..TiFlashObserver$u20$as$u20$raftstore..coprocessor..QueryObserver$GT$::post_exec_query::h1a8a961972b8ccf8 [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+17366829]\n 0x7f523cb9700d\traftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::apply_raft_cmd::h0b08541bf9853547 [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+32886797]\n 0x7f523cbae545\traftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::process_raft_cmd::ha4bde57f9f25a6ab [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+32982341]\n 0x7f523cbb3a14\traftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::handle_raft_committed_entries::h88c3ac142c377f65 [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+33004052]\n 0x7f523cb8900c\traftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_apply::hf865ae2f9856a0b6 [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+32829452]\n 0x7f523cb8c628\traftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_tasks::h91ceadd162e4f65b [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+32843304]\n 0x7f523c263f5c\t_$LT$raftstore..store..fsm..apply..ApplyPoller$LT$EK$GT$$u20$as$u20$batch_system..batch..PollHandler$LT$raftstore..store..fsm..apply..ApplyFsm$LT$EK$GT$$C$raftstore..store..fsm..apply..ControlFsm$GT$$GT$::handle_normal::hed686c0d358455a2 [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+23240540]\n 0x7f523c203b03\tbatch_system::batch::Poller$LT$N$C$C$C$Handler$GT$::poll::h2c84599628ed3b21 [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+22846211]\n 0x7f523c2b1ad2\tstd::sys_common::backtrace::__rust_begin_short_backtrace::h385c7c98d4548244 [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+23558866]\n 0x7f523c2f4b4e\tcore::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h626760f3e1255bb8 [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+23833422]\n 0x7f523d23cc75\tstd::sys::unix::thread::Thread::new::thread_start::hd2791a9cabec1fda [[libtiflash_proxy.so](http://libtiflash_proxy.so/)+39857269]\n \t/rustc/96ddd32c4bfb1d78f0cd03eb068b1710a8cebeef/library/std/src/sys/unix/[thread.rs:108](http://thread.rs:108/)\n 0x7f523aa25ea5\tstart_thread [libpthread.so.0+32421]\n 0x7f5239e2ab0d\tclone [libc.so.6+1043213]"] [source="DB::EngineStoreApplyRes DB::HandleWriteRaftCmd(const DB::EngineStoreServerWrap *, DB::WriteCmdsView, DB::RaftCmdHeader)"] [thread_id=74]
The following key will decoded into an empty pk, which will make TiFlash panic.
key 7480000000000001FFBD5F720000000000FAF9ECEFDC3207FFFC
val 4486809092ACFEC38906
See https://github.com/tikv/tikv/issues/13861 This is not a TiFlash issue
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
flashback successfully tidb cluster work normal
3. What did you see instead (Required)
tiflash restart fail
4. What is your TiFlash version? (Required)