Closed CalvinNeo closed 1 month ago
[2024/08/15 22:43:54.212 +08:00] [FATAL] [Exception.cpp:106] ["Code: 49, e.displayText() = DB::Exception: range of region_id=144958 is overlapped with region_id=144187, state: state: Applying region { id: 144187 start_key: \"t\\200\\000\\000\\000\\000\\000\\000\\377\\352_r\\200\\000\\000\\000\\000\\377\\\\\\031\\220\\000\\000\\000\\000\\000\\372\" end_key: \"t\\200\\000\\000\\000\\000\\000\\000\\377\\352_r\\200\\000\\000\\000\\000\\377\\272\\0376\\000\\000\\000\\000\\000\\372\" region_epoch { conf_ver: 248 version: 158 } peers { id: 144188 store_id: 6 } peers { id: 144189 store_id: 1 } peers { id: 144190 store_id: 52 role: Learner } peers { id: 144191 store_id: 7 } peers { id: 144192 store_id: 27 role: Learner } peers { id: 144339 store_id: 5 role: Learner } }: (while applyPreHandledSnapshot region_id=144958), e.what() = DB::Exception, Stack trace:
0x561d3b0b1c41 DB::Exception::Exception<unsigned long&, unsigned long const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned long&, unsigned long const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&&) [tiflash+147840065]
dbms/src/Common/StackTrace.cpp:23
0x561d3b0afdb0 void DB::KVStore::checkAndApplyPreHandledSnapshot<DB::RegionPtrWithSnapshotFiles>(DB::RegionPtrWithSnapshotFiles const&, DB::TMTContext&) [tiflash+147832240]
dbms/src/Storages/KVStore/MultiRaft/ApplySnapshot.cpp:100
0x561d3b0ad814 void DB::KVStore::applyPreHandledSnapshot<DB::RegionPtrWithSnapshotFiles>(DB::RegionPtrWithSnapshotFiles const&, DB::TMTContext&) [tiflash+147822612]
dbms/src/Storages/KVStore/MultiRaft/ApplySnapshot.cpp:328
0x561d3b09ee5a ApplyPreHandledSnapshot [tiflash+147762778]
dbms/src/Storages/KVStore/FFI/ProxyFFI.cpp:701
0x7f787fd6d8be _$LT$engine_store_ffi..observer..TiFlashObserver$LT$T$C$ER$GT$$u20$as$u20$raftstore..coprocessor..ApplySnapshotObserver$GT$::post_apply_snapshot::h093fa1a866a85966 [libtiflash_proxy.so+29444286]
contrib/tiflash-proxy/proxy_components/engine_store_ffi/src/observer.rs:230
0x7f78810f2c8b raftstore::store::worker::region::Runner$LT$EK$C$R$C$T$GT$::handle_pending_applies::h1bbce7814430d99c [libtiflash_proxy.so+49912971]
contrib/tiflash-proxy/components/raftstore/src/store/worker/region.rs:865
0x7f78806683e9 yatp::task::future::RawTask$LT$F$GT$::poll::hc9b7d489fef265dd [libtiflash_proxy.so+38859753]
/workspace/.cargo/git/checkouts/yatp-e704b73c3ee279b6/793be4d/src/task/future.rs:59
0x7f78822cc23d _$LT$yatp..task..future..Runner$u20$as$u20$yatp..pool..runner..Runner$GT$::handle::h1b0bb4c7cb62c9f9 [libtiflash_proxy.so+68629053]
/workspace/.cargo/git/checkouts/yatp-e704b73c3ee279b6/793be4d/src/task/future.rs:387
0x7f787fee57d4 std::sys_common::backtrace::__rust_begin_short_backtrace::h7863e3b20fa0e349 [libtiflash_proxy.so+30984148]
/rustc/89e2160c4ca5808657ed55392620ed1dbbce78d1/library/std/src/sys_common/backtrace.rs:155
0x7f787ff47491 core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hfe55563760275bda [libtiflash_proxy.so+31384721]
/rustc/89e2160c4ca5808657ed55392620ed1dbbce78d1/library/core/src/ops/function.rs:250
0x7f78818a78c6 std::sys::unix::thread::Thread::new::thread_start::hd01330a228cdd35d [libtiflash_proxy.so+57993414]
/rustc/89e2160c4ca5808657ed55392620ed1dbbce78d1/library/std/src/sys/unix/thread.rs:108
0x7f787de14c02 start_thread [libc.so.6+564226]
0x7f787de99c40 clone3 [libc.so.6+1109056]"] [source="void DB::ApplyPreHandledSnapshot(EngineStoreServerWrap *, RawVoidPtr, RawCppPtrType)"] [thread_id=509]
overlapped range. When applying region_id=144958, it meets overlapped range of region_id=144187
region_id=144187
start_key:
7480000000000000FFEA5F728000000000FF5C19900000000000FA
end_key:
7480000000000000FFEA5F728000000000FFBA1F360000000000FA
The information get from pd
{"id":144958,"start_key":"7480000000000000FFEA5F728000000000FF5C16FA0000000000FA","end_key":"7480000000000000FFEA5F728000000000FF5C17840000000000FA","epoch":{"conf_ver":253,"version":159},"peers":[{"role_name":"Voter","id":144959,"store_id":6},{"role_name":"Voter","id":144960,"store_id":1},{"role_name":"Voter","id":144962,"store_id":7},{"role_name":"Learner","is_learner":true,"id":144963,"store_id":27,"role":1},{"role_name":"Learner","is_learner":true,"id":145049,"store_id":5,"role":1}],"leader":{"role_name":"Voter","id":144960,"store_id":1},"cpu_usage":0,"written_bytes":0,"read_bytes":0,"written_keys":0,"read_keys":0,"approximate_size":201,"approximate_keys":704097}
{"id":144187,"start_key":"7480000000000000FFEA5F728000000000FF5C19900000000000FA","end_key":"7480000000000000FFEA5F728000000000FFBA1F360000000000FA","epoch":{"conf_ver":251,"version":158},"peers":[{"role_name":"Voter","id":144188,"store_id":6},{"role_name":"Voter","id":144189,"store_id":1},{"role_name":"Voter","id":144191,"store_id":7},{"role_name":"Learner","is_learner":true,"id":144192,"store_id":27,"role":1},{"role_name":"Learner","is_learner":true,"id":147295,"store_id":52,"role":1}],"leader":{"role_name":"Voter","id":144189,"store_id":1},"cpu_usage":0,"written_bytes":0,"read_bytes":0,"written_keys":0,"read_keys":0,"approximate_size":225,"approximate_keys":630194}
However the region of 144187, already has the correct range here
[2024/08/15 22:43:42.280 +08:00] [INFO] [SSTReader.h:205] ["Open sst file first /data0/proxy/snap/rev_144187_6_20482_write.sst, range=[7480000000000000FFEA5F728000000000FF5C19900000000000FA,7480000000000000FFEA5F728000000000FFBA1F360000000000FA] sst_tot=1 split_id=18446744073709551615 region_id=144187"] [source="keyspace=4294967295 table_id=234"] [thread_id=490]
[2024/08/15 22:44:23.105 +08:00] [INFO] [ApplySnapshot.cpp:320] ["Begin apply snapshot, new_region=[region_id=144187 applied_term=6 applied_index=20482]"] [thread_id=150]
[2024/08/15 22:44:23.110 +08:00] [INFO] [ApplySnapshot.cpp:198] ["clear old range before apply snapshot, region_id=144187 old_range=[-9223372036854775808,12197686) new_range=[6035856,12197686) keyspace=4294967295 table_id=234"] [thread_id=150]
[2024/08/15 22:44:23.122 +08:00] [INFO] [ApplySnapshot.cpp:333] ["Finish apply snapshot, cost=0.017s region_id=144187"] [thread_id=150]
ditto 144958
[2024/08/15 22:43:42.226 +08:00] [INFO] [SSTReader.h:205] ["Open sst file first /data0/proxy/snap/rev_144958_6_7709_write.sst, range=[7480000000000000FFEA5F728000000000FF5C16FA0000000000FA,7480000000000000FFEA5F728000000000FF5C17840000000000FA] sst_tot=1 split_id=18446744073709551615 region_id=144958"] [source="keyspace=4294967295 table_id=234"] [thread_id=525]
In prospect of proxy, both regions have correct(not overlapped) range when snapshot is applied Region 144958
[2024/08/15 22:43:36.156 +08:00] [INFO] [peer_storage.rs:731] ["apply snapshot with state ok"] [for_witness=false] [state="applied_index: 7709 truncated_state { index: 7709 term: 6 }"] [region="id: 144958 start_key: 7480000000000000FFEA5F728000000000FF5C16FA0000000000FA end_key: 7480000000000000FFEA5F728000000000FF5C17840000000000FA region_epoch { conf_ver: 252 version: 159 } peers { id: 144959 store_id: 6 } peers { id: 144960 store_id: 1 } peers { id: 144961 store_id: 52 role: Learner } peers { id: 144962 store_id: 7 } peers { id: 144963 store_id: 27 role: Learner } peers { id: 145049 store_id: 5 role: Learner }"] [peer_id=145049] [region_id=144958] [thread_id=113]
[2024/08/15 22:43:36.750 +08:00] [INFO] [peer.rs:5206] ["snapshot is persisted"] [destroy_regions="[]"] [region="id: 144958 start_key: 7480000000000000FFEA5F728000000000FF5C16FA0000000000FA end_key: 7480000000000000FFEA5F728000000000FF5C17840000000000FA region_epoch { conf_ver: 252 version: 159 } peers { id: 144959 store_id: 6 } peers { id: 144960 store_id: 1 } peers { id: 144961 store_id: 52 role: Learner } peers { id: 144962 store_id: 7 } peers { id: 144963 store_id: 27 role: Learner } peers { id: 145049 store_id: 5 role: Learner }"] [peer_id=145049] [region_id=144958] [thread_id=113]
-- panic --
[2024/08/15 22:44:23.627 +08:00] [INFO] [region.rs:659] ["begin apply snap data"] [peer_id=145049] [region_id=144958] [thread_id=90]
Region 144187
[2024/08/15 22:43:36.157 +08:00] [INFO] [peer_storage.rs:731] ["apply snapshot with state ok"] [for_witness=false] [state="applied_index: 20482 truncated_state { index: 20482 term: 6 }"] [region="id: 144187 start_key: 7480000000000000FFEA5F728000000000FF5C19900000000000FA end_key: 7480000000000000FFEA5F728000000000FFBA1F360000000000FA region_epoch { conf_ver: 248 version: 158 } peers { id: 144188 store_id: 6 } peers { id: 144189 store_id: 1 } peers { id: 144190 store_id: 52 role: Learner } peers { id: 144191 store_id: 7 } peers { id: 144192 store_id: 27 role: Learner } peers { id: 144339 store_id: 5 role: Learner }"] [peer_id=144339] [region_id=144187] [thread_id=114]
[2024/08/15 22:43:36.751 +08:00] [INFO] [peer.rs:5206] ["snapshot is persisted"] [destroy_regions="[]"] [region="id: 144187 start_key: 7480000000000000FFEA5F728000000000FF5C19900000000000FA end_key: 7480000000000000FFEA5F728000000000FFBA1F360000000000FA region_epoch { conf_ver: 248 version: 158 } peers { id: 144188 store_id: 6 } peers { id: 144189 store_id: 1 } peers { id: 144190 store_id: 52 role: Learner } peers { id: 144191 store_id: 7 } peers { id: 144192 store_id: 27 role: Learner } peers { id: 144339 store_id: 5 role: Learner }"] [peer_id=144339] [region_id=144187] [thread_id=114]
-- panic --
[2024/08/15 22:44:23.044 +08:00] [INFO] [region.rs:659] ["begin apply snap data"] [peer_id=144339] [region_id=144187] [thread_id=90]
Put these ranges in order, we get
7480000000000000FFEA5F728000000000FF5C16FA0000000000FA 144958
7480000000000000FFEA5F728000000000FF5C17840000000000FA 144958
7480000000000000FFEA5F728000000000FF5C19900000000000FA 144187
7480000000000000FFEA5F728000000000FF5C19900000000000FA O of 144187
7480000000000000FFEA5F728000000000FFBA1F360000000000FA O of 144187
7480000000000000FFEA5F728000000000FFBA1F360000000000FA 144187
It does not show any evidence of overlapping. So we can conculude that the reason here is that we use an obsolette range of region 144187.
Check region 144187 with tikv-ctl, it is also correct
./tikv-ctl --host tc-tiflash-0.tc-tiflash-peer.endless-htap-consistency-chaos-tps-7550816-1-914.svc:20170 raft region -r 144187
{
"region_infos": {
"144187": {
"region_id": 144187,
"region_local_state": {
"region": {
"id": 144187,
"state": "Tombstone",
"start_key": "7480000000000000FFEA5F728000000000FF5C19900000000000FA",
"end_key": "7480000000000000FFEA5F728000000000FFBA1F360000000000FA",
"region_epoch": {
"conf_ver": 251,
"version": 158
},
"peers": [
{
"id": 144188,
"store_id": 6,
"role": "Voter"
},
{
"id": 144189,
"store_id": 1,
"role": "Voter"
},
{
"id": 144191,
"store_id": 7,
"role": "Voter"
},
{
"id": 144192,
"store_id": 27,
"role": "Learner"
},
{
"id": 147295,
"store_id": 52,
"role": "Learner"
}
]
}
},
"raft_local_state": null,
"raft_apply_state": null
}
}
}
The reason:
region_id=144187 created by split @ 22:37 from region_id=140192 before apply snapshot
[2024/08/15 22:37:40.891 +08:00] [INFO] [apply.rs:1712] ["execute admin command"] [command="cmd_type: BatchSplit splits { requests { split_key: 7480000000000000FFEA5F728000000000FFBA1F360000000000FA new_region_id: 144187 new_peer_ids: 144188 new_peer_ids: 144189 new_peer_ids: 144190 new_peer_ids: 144191 new_peer_ids: 144192 new_peer_ids: 144193 } right_derive: true share_source_region_size: true }"] [index=8001] [term=6] [peer_id=144060] [region_id=140192] [thread_id=115]
[2024/08/15 22:37:40.898 +08:00] [WARN] [apply.rs:2718] ["new region from splitting already exists"] [peer_id=144060] [region_id=140192] [reason="state state: Tombstone region { id: 144187 peers { id: 144193 store_id: 5 role: Learner } } exist in kv engine"] [new_peer_id=144193] [new_region_id=144187] [thread_id=115]
[2024/08/15 22:37:40.898 +08:00] [INFO] [command.rs:236] ["observe admin command"] [region_epoch="conf_ver: 246 version: 155"] [command="cmd_type: BatchSplit splits { requests { split_key: 7480000000000000FFEA5F728000000000FFBA1F360000000000FA new_region_id: 144187 new_peer_ids: 144188 new_peer_ids: 144189 new_peer_ids: 144190 new_peer_ids: 144191 new_peer_ids: 144192 new_peer_ids: 144193 } right_derive: true share_source_region_size: true }"] [index=8001] [term=6] [peer_id=144060] [region_id=140192] [thread_id=115]
region_id=140192
[2024/08/15 22:37:15.267 +08:00] [INFO] [peer_storage.rs:648] ["begin to apply snapshot"] [peer_id=144060] [region_id=140192] [thread_id=113]
[2024/08/15 22:37:15.267 +08:00] [INFO] [peer_storage.rs:731] ["apply snapshot with state ok"] [for_witness=false] [state="applied_index: 5650 truncated_state { index: 5650 term: 6 }"] [region="id: 140192 start_key: 7480000000000000FFEA5F720000000000FA end_key: 7480000000000000FFEA5F729000000000FF6C57850000000000FA region_epoch { conf_ver: 246 version: 155 } peers { id: 140194 store_id: 6 } peers { id: 140195 store_id: 1 } peers { id: 140196 store_id: 52 role: Learner } peers { id: 141906 store_id: 7 } peers { id: 143447 store_id: 27 role: Learner } peers { id: 144060 store_id: 5 role: Learner }"] [peer_id=144060] [region_id=140192] [thread_id=113]
[2024/08/15 22:37:15.327 +08:00] [INFO] [peer.rs:5206] ["snapshot is persisted"] [destroy_regions="[]"] [region="id: 140192 start_key: 7480000000000000FFEA5F720000000000FA end_key: 7480000000000000FFEA5F729000000000FF6C57850000000000FA region_epoch { conf_ver: 246 version: 155 } peers { id: 140194 store_id: 6 } peers { id: 140195 store_id: 1 } peers { id: 140196 store_id: 52 role: Learner } peers { id: 141906 store_id: 7 } peers { id: 143447 store_id: 27 role: Learner } peers { id: 144060 store_id: 5 role: Learner }"] [peer_id=144060] [region_id=140192] [thread_id=113]
[2024/08/15 22:37:18.814 +08:00] [INFO] [snapshot.rs:112] ["pre apply snapshot"] [snapshot="Some(Snapshot { key: SnapKey { region_id: 140192, term: 6, idx: 5650 }, display_path: \"/data0/proxy/snap/rev_140192_6_5650_(default|lock|write).sst\" })"] [pending=5] [has_snap=true] [snap_key="SnapKey { region_id: 140192, term: 6, idx: 5650 }"] [region_id=140192] [peer_id=144060] [thread_id=90]
[2024/08/15 22:37:36.132 +08:00] [INFO] [region.rs:659] ["begin apply snap data"] [peer_id=144060] [region_id=140192] [thread_id=90]
[2024/08/15 22:37:36.292 +08:00] [INFO] [snapshot.rs:240] ["post apply snapshot"] [pending=6] [region="id: 140192 start_key: 7480000000000000FFEA5F720000000000FA end_key: 7480000000000000FFEA5F729000000000FF6C57850000000000FA region_epoch { conf_ver: 246 version: 155 } peers { id: 140194 store_id: 6 } peers { id: 140195 store_id: 1 } peers { id: 140196 store_id: 52 role: Learner } peers { id: 141906 store_id: 7 } peers { id: 143447 store_id: 27 role: Learner } peers { id: 144060 store_id: 5 role: Learner }"] [region_id=140192] [snap_key="SnapKey { region_id: 140192, term: 6, idx: 5650 }"] [peer_id=144060] [thread_id=90]
[2024/08/15 22:37:36.292 +08:00] [INFO] [snapshot.rs:275] ["get prehandled snapshot success"] [pending=6] [region_id=140192] [snap_key="SnapKey { region_id: 140192, term: 6, idx: 5650 }"] [peer_id=144060] [thread_id=90]
[2024/08/15 22:37:36.305 +08:00] [INFO] [snapshot.rs:308] ["apply snapshot finished"] [elapsed=12] [pending=5] [region="id: 140192 start_key: 7480000000000000FFEA5F720000000000FA end_key: 7480000000000000FFEA5F729000000000FF6C57850000000000FA region_epoch { conf_ver: 246 version: 155 } peers { id: 140194 store_id: 6 } peers { id: 140195 store_id: 1 } peers { id: 140196 store_id: 52 role: Learner } peers { id: 141906 store_id: 7 } peers { id: 143447 store_id: 27 role: Learner } peers { id: 144060 store_id: 5 role: Learner }"] [snap_key="SnapKey { region_id: 140192, term: 6, idx: 5650 }"] [peer_id=144060] [mode=normal] [thread_id=90]
[2024/08/15 22:37:36.305 +08:00] [INFO] [region.rs:714] ["apply new data"] [time_takes=172.388327ms] [region_id=140192] [thread_id=90]
[2024/08/15 22:37:40.891 +08:00] [INFO] [apply.rs:1712] ["execute admin command"] [command="cmd_type: BatchSplit splits { requests { split_key: 7480000000000000FFEA5F728000000000FFBA1F360000000000FA new_region_id: 144187 new_peer_ids: 144188 new_peer_ids: 144189 new_peer_ids: 144190 new_peer_ids: 144191 new_peer_ids: 144192 new_peer_ids: 144193 } right_derive: true share_source_region_size: true }"] [index=8001] [term=6] [peer_id=144060] [region_id=140192] [thread_id=115]
[2024/08/15 22:37:40.898 +08:00] [INFO] [apply.rs:2578] ["split region"] [keys="key 7480000000000000FFEA5F728000000000FFBA1F360000000000FA"] [region="id: 140192 start_key: 7480000000000000FFEA5F720000000000FA end_key: 7480000000000000FFEA5F729000000000FF6C57850000000000FA region_epoch { conf_ver: 246 version: 155 } peers { id: 140194 store_id: 6 } peers { id: 140195 store_id: 1 } peers { id: 140196 store_id: 52 role: Learner } peers { id: 141906 store_id: 7 } peers { id: 143447 store_id: 27 role: Learner } peers { id: 144060 store_id: 5 role: Learner }"] [peer_id=144060] [region_id=140192] [thread_id=115]
[2024/08/15 22:37:40.898 +08:00] [WARN] [apply.rs:2718] ["new region from splitting already exists"] [peer_id=144060] [region_id=140192] [reason="state state: Tombstone region { id: 144187 peers { id: 144193 store_id: 5 role: Learner } } exist in kv engine"] [new_peer_id=144193] [new_region_id=144187] [thread_id=115]
[2024/08/15 22:37:40.898 +08:00] [INFO] [command.rs:236] ["observe admin command"] [region_epoch="conf_ver: 246 version: 155"] [command="cmd_type: BatchSplit splits { requests { split_key: 7480000000000000FFEA5F728000000000FFBA1F360000000000FA new_region_id: 144187 new_peer_ids: 144188 new_peer_ids: 144189 new_peer_ids: 144190 new_peer_ids: 144191 new_peer_ids: 144192 new_peer_ids: 144193 } right_derive: true share_source_region_size: true }"] [index=8001] [term=6] [peer_id=144060] [region_id=140192] [thread_id=115]
[2024/08/15 22:37:40.909 +08:00] [INFO] [command.rs:312] ["should persist admin"] [state="applied_index: 8001 commit_index: 8002 commit_term: 6 truncated_state { index: 5650 term: 6 }"] [peer_id=144060] [region_id=140192] [thread_id=115]
[2024/08/15 22:37:41.205 +08:00] [INFO] [apply.rs:1712] ["execute admin command"] [command="cmd_type: ChangePeerV2 change_peer_v2 { changes { change_type: RemoveNode peer { id: 140196 store_id: 52 role: Learner } } }"] [index=8002] [term=6] [peer_id=144060] [region_id=140192] [thread_id=115]
[2024/08/15 22:37:41.205 +08:00] [INFO] [apply.rs:2309] ["exec ConfChangeV2"] [epoch="conf_ver: 246 version: 156"] [kind=Simple] [peer_id=144060] [region_id=140192] [thread_id=115]
[2024/08/15 22:37:41.206 +08:00] [INFO] [apply.rs:2490] ["conf change successfully"] [current_region="id: 140192 start_key: 7480000000000000FFEA5F728000000000FFBA1F360000000000FA end_key: 7480000000000000FFEA5F729000000000FF6C57850000000000FA region_epoch { conf_ver: 247 version: 156 } peers { id: 140194 store_id: 6 } peers { id: 140195 store_id: 1 } peers { id: 141906 store_id: 7 } peers { id: 143447 store_id: 27 role: Learner } peers { id: 144060 store_id: 5 role: Learner }"] [original_region="id: 140192 start_key: 7480000000000000FFEA5F728000000000FFBA1F360000000000FA end_key: 7480000000000000FFEA5F729000000000FF6C57850000000000FA region_epoch { conf_ver: 246 version: 156 } peers { id: 140194 store_id: 6 } peers { id: 140195 store_id: 1 } peers { id: 140196 store_id: 52 role: Learner } peers { id: 141906 store_id: 7 } peers { id: 143447 store_id: 27 role: Learner } peers { id: 144060 store_id: 5 role: Learner }"] [changes="[change_type: RemoveNode peer { id: 140196 store_id: 52 role: Learner }]"] [peer_id=144060] [region_id=140192] [thread_id=115]
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
3. What did you see instead (Required)
4. What is your TiFlash version? (Required)
881a80de4d9000779f47f903538523aebe50beba