pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
941 stars 409 forks source link

[Dynamic Region] TiFlash can't catch up the latest raftlog for a long time #8008

Closed lilinghai closed 4 months ago

lilinghai commented 1 year ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

consistency test img_v2_462ac1c6-b98c-484f-a234-cd67c6a3559g

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

master

CalvinNeo commented 1 year ago

We can find Ingest Segment on TiFlash's side is not costy.

[2023/08/22 14:22:58.750 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=4] [region_id=5067] [snap_key="SnapKey { region_id: 5067, term: 8, idx: 899043 }"] [peer_id=5120]
[2023/08/22 14:22:58.866 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=103535] [pending=3] [region="id: 5067 start_key: 7480000000000000FF645F728000000000FF73C9240000000000FA end_key: 7480000000000000FF645F728000000000FF8B00640000000000FA region_epoch { conf_ver: 70 version: 67 } peers { id: 5120 store_id: 21 role: Learner } peers { id: 5179 store_id: 18 } peers { id: 5497 store_id: 32 role: Learner } peers { id: 5617 store_id: 31 } peers { id: 6832 store_id: 1 } peers { id: 9095 store_id: 22 role: Learner }"] [snap_key="SnapKey { region_id: 5067, term: 8, idx: 899043 }"] [peer_id=5120] [mode=normal]
[2023/08/22 14:22:59.367 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=4] [region_id=13656] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 80344 }"] [peer_id=13658]
[2023/08/22 14:22:59.431 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=63] [pending=3] [region="id: 13656 start_key: 7480000000000000FF645F728000000000FF1718280000000000FA end_key: 7480000000000000FF645F728000000000FF22B2120000000000FA region_epoch { conf_ver: 169 version: 64 } peers { id: 13657 store_id: 22 role: Learner } peers { id: 13658 store_id: 21 role: Learner } peers { id: 13659 store_id: 31 } peers { id: 13660 store_id: 1 } peers { id: 13661 store_id: 12 }"] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 80344 }"] [peer_id=13658] [mode=normal]
[2023/08/22 14:23:06.485 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=3] [region_id=806] [snap_key="SnapKey { region_id: 806, term: 7, idx: 911190 }"] [peer_id=5320]
[2023/08/22 14:23:06.605 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=6672] [pending=2] [region="id: 806 start_key: 7480000000000000FF645F728000000000FF4563A60000000000FA end_key: 7480000000000000FF645F728000000000FF5CB80F0000000000FA region_epoch { conf_ver: 90 version: 65 } peers { id: 1358 store_id: 1 } peers { id: 5320 store_id: 21 role: Learner } peers { id: 5650 store_id: 32 role: Learner } peers { id: 5823 store_id: 18 } peers { id: 6103 store_id: 12 } peers { id: 9456 store_id: 22 role: Learner }"] [snap_key="SnapKey { region_id: 806, term: 7, idx: 911190 }"] [peer_id=5320] [mode=normal]
[2023/08/22 14:23:07.106 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=2] [region_id=270] [snap_key="SnapKey { region_id: 270, term: 6, idx: 40618444 }"] [peer_id=9052]
[2023/08/22 14:23:07.466 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=360] [pending=1] [region="id: 270 start_key: 7480000000000000FF645F728000000005FFF64F790000000000FA end_key: 7480000000000000FF6500000000000000F8 region_epoch { conf_ver: 24 version: 462 } peers { id: 272 store_id: 12 } peers { id: 274 store_id: 32 role: Learner } peers { id: 277 store_id: 22 role: Learner } peers { id: 278 store_id: 31 } peers { id: 304 store_id: 1 } peers { id: 9052 store_id: 21 role: Learner }"] [snap_key="SnapKey { region_id: 270, term: 6, idx: 40618444 }"] [peer_id=9052] [mode=normal]
[2023/08/22 14:24:02.936 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=1] [region_id=1359] [snap_key="SnapKey { region_id: 1359, term: 7, idx: 900448 }"] [peer_id=5074]
[2023/08/22 14:24:03.023 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=55055] [pending=0] [region="id: 1359 start_key: 7480000000000000FF645F728000000000FF5CB80F0000000000FA end_key: 7480000000000000FF645F728000000000FF73C9240000000000FA region_epoch { conf_ver: 120 version: 66 } peers { id: 1360 store_id: 12 } peers { id: 5074 store_id: 21 role: Learner } peers { id: 5411 store_id: 18 } peers { id: 6089 store_id: 1 } peers { id: 6394 store_id: 32 role: Learner } peers { id: 9160 store_id: 22 role: Learner }"] [snap_key="SnapKey { region_id: 1359, term: 7, idx: 900448 }"] [peer_id=5074] [mode=normal]
[2023/08/22 14:36:06.434 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=1] [region_id=7028] [snap_key="SnapKey { region_id: 7028, term: 7, idx: 261033 }"] [peer_id=14658]
[2023/08/22 14:36:06.444 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=12000] [pending=0] [region="id: 7028 start_key: 7480000000000000FF645F728000000003FF11A9EB0000000000FA end_key: 7480000000000000FF645F728000000003FF1850DC0000000000FA region_epoch { conf_ver: 34 version: 258 } peers { id: 7030 store_id: 32 role: Learner } peers { id: 7032 store_id: 31 } peers { id: 7033 store_id: 1 } peers { id: 8101 store_id: 18 } peers { id: 9099 store_id: 22 role: Learner } peers { id: 14658 store_id: 21 role: Learner }"] [snap_key="SnapKey { region_id: 7028, term: 7, idx: 261033 }"] [peer_id=14658] [mode=normal]
[2023/08/22 14:39:29.594 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=1] [region_id=6508] [snap_key="SnapKey { region_id: 6508, term: 8, idx: 28744 }"] [peer_id=14685]
[2023/08/22 14:39:29.598 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=1250] [pending=0] [region="id: 6508 start_key: 7480000000000000FF645F728000000002FF387D0A0000000000FA end_key: 7480000000000000FF645F728000000002FF3935710000000000FA region_epoch { conf_ver: 116 version: 200 } peers { id: 6510 store_id: 32 role: Learner } peers { id: 10440 store_id: 1 } peers { id: 12254 store_id: 22 role: Learner } peers { id: 14514 store_id: 18 } peers { id: 14570 store_id: 31 } peers { id: 14685 store_id: 21 role: Learner }"] [snap_key="SnapKey { region_id: 6508, term: 8, idx: 28744 }"] [peer_id=14685] [mode=normal]
[2023/08/22 14:39:55.155 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=1] [region_id=5387] [snap_key="SnapKey { region_id: 5387, term: 7, idx: 44504 }"] [peer_id=14689]
[2023/08/22 14:39:55.158 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=1511] [pending=0] [region="id: 5387 start_key: 7480000000000000FF645F728000000000FFE8BEB60000000000FA end_key: 7480000000000000FF645F728000000000FFE9DD840000000000FA region_epoch { conf_ver: 122 version: 94 } peers { id: 5796 store_id: 18 } peers { id: 12739 store_id: 31 } peers { id: 13117 store_id: 1 } peers { id: 13226 store_id: 22 role: Learner } peers { id: 13809 store_id: 32 role: Learner } peers { id: 14689 store_id: 21 role: Learner }"] [snap_key="SnapKey { region_id: 5387, term: 7, idx: 44504 }"] [peer_id=14689] [mode=normal]
[2023/08/22 14:40:30.081 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=1] [region_id=7624] [snap_key="SnapKey { region_id: 7624, term: 7, idx: 261680 }"] [peer_id=14691]
[2023/08/22 14:40:30.092 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=11887] [pending=0] [region="id: 7624 start_key: 7480000000000000FF645F728000000003FFFFCC200000000000FA end_key: 7480000000000000FF645F728000000004FF067A940000000000FA region_epoch { conf_ver: 34 version: 329 } peers { id: 7625 store_id: 12 } peers { id: 7626 store_id: 32 role: Learner } peers { id: 7629 store_id: 1 } peers { id: 8211 store_id: 18 } peers { id: 9347 store_id: 22 role: Learner } peers { id: 14691 store_id: 21 role: Learner }"] [snap_key="SnapKey { region_id: 7624, term: 7, idx: 261680 }"] [peer_id=14691] [mode=normal]
[2023/08/22 14:42:03.285 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=1] [region_id=8136] [snap_key="SnapKey { region_id: 8136, term: 6, idx: 283547 }"] [peer_id=14701]
[2023/08/22 14:42:03.298 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=12546] [pending=0] [region="id: 8136 start_key: 7480000000000000FF645F728000000005FF04375F0000000000FA end_key: 7480000000000000FF645F728000000005FF0B7BEC0000000000FA region_epoch { conf_ver: 40 version: 393 } peers { id: 8137 store_id: 12 } peers { id: 8139 store_id: 22 role: Learner } peers { id: 8997 store_id: 18 } peers { id: 9699 store_id: 31 } peers { id: 10538 store_id: 32 role: Learner } peers { id: 14701 store_id: 21 role: Learner }"] [snap_key="SnapKey { region_id: 8136, term: 6, idx: 283547 }"] [peer_id=14701] [mode=normal]

Snapshot at 80344

[2023/08/22 14:10:27.280 +08:00] [INFO] [tablet_snap.rs:499] ["begin to receive tablet snapshot files"] [temp_exists=false] [region_id=13656] [file=/data0/proxy/snap_v2/rev_13656_13658_6_80344.tmp]

[2023/08/22 14:20:43.690 +08:00] [INFO] [peer_storage.rs:605] ["begin to apply snapshot"] [peer_id=13658] [region_id=13656]

[2023/08/22 14:21:09.186 +08:00] [INFO] [snapshot.rs:115] ["pre apply snapshot"] [snapshot="Some(Snapshot { key: SnapKey { region_id: 13656, term: 6, idx: 80344 }, display_path: \"/data0/proxy/snap/rev_13656_6_80344_(default|lock|write).sst\" })"] [pending=4] [has_snap=true] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 80344 }"] [region_id=13656] [peer_id=13658]
[2023/08/22 14:22:59.367 +08:00] [INFO] [region.rs:455] ["begin apply snap data"] [peer_id=13658] [region_id=13656]
[2023/08/22 14:22:59.367 +08:00] [INFO] [snapshot.rs:257] ["post apply snapshot"] [pending=4] [region="id: 13656 start_key: 7480000000000000FF645F728000000000FF1718280000000000FA end_key: 7480000000000000FF645F728000000000FF22B2120000000000FA region_epoch { conf_ver: 169 version: 64 } peers { id: 13657 store_id: 22 role: Learner } peers { id: 13658 store_id: 21 role: Learner } peers { id: 13659 store_id: 31 } peers { id: 13660 store_id: 1 } peers { id: 13661 store_id: 12 }"] [region_id=13656] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 80344 }"] [peer_id=13658]
[2023/08/22 14:22:59.367 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=4] [region_id=13656] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 80344 }"] [peer_id=13658]
[2023/08/22 14:22:59.431 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=63] [pending=3] [region="id: 13656 start_key: 7480000000000000FF645F728000000000FF1718280000000000FA end_key: 7480000000000000FF645F728000000000FF22B2120000000000FA region_epoch { conf_ver: 169 version: 64 } peers { id: 13657 store_id: 22 role: Learner } peers { id: 13658 store_id: 21 role: Learner } peers { id: 13659 store_id: 31 } peers { id: 13660 store_id: 1 } peers { id: 13661 store_id: 12 }"] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 80344 }"] [peer_id=13658] [mode=normal]
[2023/08/22 14:22:59.431 +08:00] [INFO] [region.rs:503] ["apply new data"] [time_takes=63.902344ms] [region_id=13656]
[2023/08/22 14:23:06.786 +08:00] [INFO] [peer.rs:1312] ["deleting applied snap file"] [snap_file=13656_6_80344] [peer_id=13658] [region_id=13656]

TiKV 3

[2023/08/22 14:00:12.654 +08:00] [INFO] [tablet_snap.rs:763] ["begin to send snapshot file"] [snap_key=13656_13658_6_80344]
[2023/08/22 14:00:12.655 +08:00] [ERROR] [tablet_snap.rs:984] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=13656] [to_addr=tc-tiflash-1.tc-tiflash-peer.endless-htap-consistency-chaos-tps-2100561-1-885.svc:20170]
[2023/08/22 14:00:12.655 +08:00] [INFO] [snapshot.rs:248] ["report snapshot status"] [status=Failure] [to="id: 13658 store_id: 21 role: Learner"] [peer_id=13660] [region_id=13656]
[2023/08/22 14:00:14.656 +08:00] [INFO] [snapshot.rs:425] ["requesting snapshot"] [state="Sending(data: 0A6B08D86A121B7480000000000000FF645F728000000000FF1718280000000000FA1A1B7480000000000000FF645F728000000000FF22B2120000000000FA220508A90110402A0708D96A101618012A0708DA6A101518012A0508DB6A101F2A0508DC6A10012A0508DD6A100C20032A0438D8F304 metadata { conf_state { voters: 13659 voters: 13660 voters: 13661 learners: 13657 learners: 13658 } index: 80344 term: 6 })"] [request_peer=13658] [request_index=0] [peer_id=13660] [region_id=13656]

... retrying

[2023/08/22 14:10:25.277 +08:00] [INFO] [tablet_snap.rs:763] ["begin to send snapshot file"] [snap_key=13656_13658_6_80344]
[2023/08/22 14:10:25.278 +08:00] [ERROR] [tablet_snap.rs:984] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=13656] [to_addr=tc-tiflash-1.tc-tiflash-peer.endless-htap-consistency-chaos-tps-2100561-1-885.svc:20170]
[2023/08/22 14:10:25.278 +08:00] [INFO] [snapshot.rs:248] ["report snapshot status"] [status=Failure] [to="id: 13658 store_id: 21 role: Learner"] [peer_id=13660] [region_id=13656]
[2023/08/22 14:10:27.278 +08:00] [INFO] [snapshot.rs:425] ["requesting snapshot"] [state="Sending(data: 0A6B08D86A121B7480000000000000FF645F728000000000FF1718280000000000FA1A1B7480000000000000FF645F728000000000FF22B2120000000000FA220508A90110402A0708D96A101618012A0708DA6A101518012A0508DB6A101F2A0508DC6A10012A0508DD6A100C20032A0438D8F304 metadata { conf_state { voters: 13659 voters: 13660 voters: 13661 learners: 13657 learners: 13658 } index: 80344 term: 6 })"] [request_peer=13658] [request_index=0] [peer_id=13660] [region_id=13656]
[2023/08/22 14:10:27.279 +08:00] [INFO] [tablet_snap.rs:763] ["begin to send snapshot file"] [snap_key=13656_13658_6_80344]
[2023/08/22 14:20:42.693 +08:00] [INFO] [tablet_snap.rs:786] ["sent all snap file finish"] [snap_key=13656_13658_6_80344]
[2023/08/22 14:20:43.691 +08:00] [INFO] [tablet_snap.rs:974] ["sent snapshot"] [duration=616.411793133s] [size=3886526225] [snap_key=13656_13658_6_80344] [region_id=13656]
[2023/08/22 14:20:43.691 +08:00] [INFO] [snapshot.rs:248] ["report snapshot status"] [status=Finish] [to="id: 13658 store_id: 21 role: Learner"] [peer_id=13660] [region_id=13656]
[2023/08/22 14:20:45.298 +08:00] [INFO] [snapshot.rs:470] ["requesting new snapshot"] [request_peer=13657] [request_index=0] [peer_id=13660] [region_id=13656]
[2023/08/22 14:20:45.299 +08:00] [INFO] [read.rs:241] ["snapshot generated"] [total_keys=1423840] [total_size=3886504546] [for_balance=false] [key="TabletSnapKey { region_id: 13656, to_peer: 13657, term: 6, idx: 80344 }"] [elapsed=0.001491761] [region_id=13656]

CompactLog on TiKV3

[2023/08/22 13:42:25.961 +08:00] [INFO] [compact_log.rs:588] ["compact log"] [truncated="applied_index: 65179 truncated_state { index: 60165 term: 6 }"] [apply_trace="ApplyTrace { data_cfs: [Progress { flushed: 60328, last_modified: 65178 }, Progress { flushed: 60329, last_modified: 65179 }, Progress { flushed: 60330, last_modified: 65179 }], admin: Progress { flushed: 60328, last_modified: 5 }, persisted_applied: 60328, last_flush_trigger: 0, try_persist: false }"] [index=60166] [peer_id=13660] [region_id=13656]
[2023/08/22 13:45:32.027 +08:00] [INFO] [compact_log.rs:588] ["compact log"] [truncated="applied_index: 70247 truncated_state { index: 65205 term: 6 }"] [apply_trace="ApplyTrace { data_cfs: [Progress { flushed: 60328, last_modified: 70246 }, Progress { flushed: 60329, last_modified: 70247 }, Progress { flushed: 60330, last_modified: 70247 }], admin: Progress { flushed: 60328, last_modified: 5 }, persisted_applied: 60328, last_flush_trigger: 0, try_persist: false }"] [index=60328] [peer_id=13660] [region_id=13656]

[2023/08/22 14:00:11.387 +08:00] [INFO] [compact_log.rs:588] ["compact log"] [truncated="applied_index: 80344 truncated_state { index: 75246 term: 6 }"] [apply_trace="ApplyTrace { data_cfs: [Progress { flushed: 80344, last_modified: 80344 }, Progress { flushed: 80344, last_modified: 80344 }, Progress { flushed: 80344, last_modified: 80344 }], admin: Progress { flushed: 80344, last_modified: 5 }, persisted_applied: 80344, last_flush_trigger: 0, try_persist: false }"] [index=75247] [peer_id=13660] [region_id=13656]
CalvinNeo commented 1 year ago

Here is a snapshot before 80344

TiFlash-1

[2023/08/22 13:59:21.001 +08:00] [INFO] [raft_log.rs:627] ["log [committed=0, persisted=0, applied=0, unstable.offset=1, unstable.entries.len()=0] starts to restore snapshot [index: 60328, term: 6]"] [snapshot_term=6] [snapshot_index=60328] [log="committed=0, persisted=0, applied=0, unstable.offset=1, unstable.entries.len()=0"] [raft_id=13658] [region_id=13656]
[2023/08/22 13:59:21.001 +08:00] [INFO] [raft.rs:2663] ["switched to configuration"] [config="Configuration { voters: Configuration { incoming: Configuration { voters: {13660, 13661, 13659} }, outgoing: Configuration { voters: {} } }, learners: {13657, 13658}, learners_next: {}, auto_leave: false }"] [raft_id=13658] [region_id=13656]
[2023/08/22 13:59:21.001 +08:00] [INFO] [raft.rs:2647] ["restored snapshot"] [snapshot_term=6] [snapshot_index=60328] [last_term=6] [last_index=60328] [commit=60328] [raft_id=13658] [region_id=13656]
[2023/08/22 13:59:21.001 +08:00] [INFO] [raft.rs:2528] ["[commit: 60328, term: 6] restored snapshot [index: 60328, term: 6]"] [snapshot_term=6] [snapshot_index=60328] [commit=60328] [term=6] [raft_id=13658] [region_id=13656]
[2023/08/22 13:59:21.001 +08:00] [INFO] [peer_storage.rs:605] ["begin to apply snapshot"] [peer_id=13658] [region_id=13656]
[2023/08/22 13:59:21.001 +08:00] [INFO] [peer_storage.rs:688] ["apply snapshot with state ok"] [for_witness=false] [state="applied_index: 60328 truncated_state { index: 60328 term: 6 }"] [region="id: 13656 start_key: 7480000000000000FF645F728000000000FF1718280000000000FA end_key: 7480000000000000FF645F728000000000FF22B2120000000000FA region_epoch { conf_ver: 169 version: 64 } peers { id: 13657 store_id: 22 role: Learner } peers { id: 13658 store_id: 21 role: Learner } peers { id: 13659 store_id: 31 } peers { id: 13660 store_id: 1 } peers { id: 13661 store_id: 12 }"] [peer_id=13658] [region_id=13656]
[2023/08/22 13:59:21.001 +08:00] [INFO] [peer.rs:4863] ["snapshot is persisted"] [region="id: 13656 start_key: 7480000000000000FF645F728000000000FF1718280000000000FA end_key: 7480000000000000FF645F728000000000FF22B2120000000000FA region_epoch { conf_ver: 169 version: 64 } peers { id: 13657 store_id: 22 role: Learner } peers { id: 13658 store_id: 21 role: Learner } peers { id: 13659 store_id: 31 } peers { id: 13660 store_id: 1 } peers { id: 13661 store_id: 12 }"] [peer_id=13658] [region_id=13656]
[2023/08/22 13:59:22.389 +08:00] [INFO] [snapshot.rs:115] ["pre apply snapshot"] [snapshot="Some(Snapshot { key: SnapKey { region_id: 13656, term: 6, idx: 60328 }, display_path: \"/data0/proxy/snap/rev_13656_6_60328_(default|lock|write).sst\" })"] [pending=1] [has_snap=true] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 60328 }"] [region_id=13656] [peer_id=13658]
[2023/08/22 13:59:48.079 +08:00] [INFO] [region.rs:455] ["begin apply snap data"] [peer_id=13658] [region_id=13656]
[2023/08/22 13:59:48.079 +08:00] [INFO] [snapshot.rs:257] ["post apply snapshot"] [pending=2] [region="id: 13656 start_key: 7480000000000000FF645F728000000000FF1718280000000000FA end_key: 7480000000000000FF645F728000000000FF22B2120000000000FA region_epoch { conf_ver: 169 version: 64 } peers { id: 13657 store_id: 22 role: Learner } peers { id: 13658 store_id: 21 role: Learner } peers { id: 13659 store_id: 31 } peers { id: 13660 store_id: 1 } peers { id: 13661 store_id: 12 }"] [region_id=13656] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 60328 }"] [peer_id=13658]
[2023/08/22 14:00:11.265 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=2] [region_id=13656] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 60328 }"] [peer_id=13658]
[2023/08/22 14:00:11.314 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=23234] [pending=1] [region="id: 13656 start_key: 7480000000000000FF645F728000000000FF1718280000000000FA end_key: 7480000000000000FF645F728000000000FF22B2120000000000FA region_epoch { conf_ver: 169 version: 64 } peers { id: 13657 store_id: 22 role: Learner } peers { id: 13658 store_id: 21 role: Learner } peers { id: 13659 store_id: 31 } peers { id: 13660 store_id: 1 } peers { id: 13661 store_id: 12 }"] [snap_key="SnapKey { region_id: 13656, term: 6, idx: 60328 }"] [peer_id=13658] [mode=normal]

Long before the snapshot is sent

[2023/08/22 13:39:23.500 +08:00] [INFO] [compact_log.rs:588] ["compact log"] [truncated="applied_index: 60329 truncated_state { index: 55152 term: 6 }"] [apply_trace="ApplyTrace { data_cfs: [Progress { flushed: 60328, last_modified: 60329 }, Progress { flushed: 31604, last_modified: 60328 }, Progress { flushed: 31610, last_modified: 60329 }], admin: Progress { flushed: 31604, last_modified: 5 }, persisted_applied: 31604, last_flush_trigger: 0, try_persist: false }"] [index=31604] [peer_id=13660] [region_id=13656]
[2023/08/22 13:39:23.505 +08:00] [INFO] [snapshot.rs:425] ["requesting snapshot"] [state="Generating { canceled: false, index: 60328 }"] [request_peer=13658] [request_index=0] [peer_id=13660] [region_id=13656]
[2023/08/22 13:39:23.515 +08:00] [INFO] [read.rs:241] ["snapshot generated"] [total_keys=1410709] [total_size=3838656940] [for_balance=false] [key="TabletSnapKey { region_id: 13656, to_peer: 13658, term: 6, idx: 60328 }"] [elapsed=0.128408028] [region_id=13656]
[2023/08/22 13:39:23.515 +08:00] [INFO] [apply_trace.rs:543] ["persisting admin flushed"] [flushed=60328] [tablet_index=5] [peer_id=13660] [region_id=13656]
[2023/08/22 13:39:23.515 +08:00] [INFO] [compact_log.rs:588] ["compact log"] [truncated="applied_index: 60331 truncated_state { index: 55152 term: 6 }"] [apply_trace="ApplyTrace { data_cfs: [Progress { flushed: 60328, last_modified: 60330 }, Progress { flushed: 60329, last_modified: 60331 }, Progress { flushed: 60330, last_modified: 60331 }], admin: Progress { flushed: 60328, last_modified: 5 }, persisted_applied: 60328, last_flush_trigger: 0, try_persist: false }"] [index=55153] [peer_id=13660] [region_id=13656]
[2023/08/22 13:39:23.515 +08:00] [INFO] [snapshot.rs:425] ["requesting snapshot"] [state="Generated(data: 0A6B08D86A121B7480000000000000FF645F728000000000FF1718280000000000FA1A1B7480000000000000FF645F728000000000FF22B2120000000000FA220508A90110402A0708D96A101618012A0708DA6A101518012A0508DB6A101F2A0508DC6A10012A0508DD6A100C20032A0438ABD703 metadata { conf_state { voters: 13659 voters: 13660 voters: 13661 learners: 13657 learners: 13658 } index: 60328 term: 6 })"] [request_peer=13658] [request_index=0] [peer_id=13660] [region_id=13656]

...

[2023/08/22 13:42:01.548 +08:00] [WARN] [tablet_snap.rs:937] ["Too many sending snapshot tasks, drop Send Snap[to: tc-tiflash-1.tc-tiflash-peer.endless-htap-consistency-chaos-tps-2100561-1-885.svc:20170, snap: region_id: 13656 from_peer { id: 13660 store_id: 1 } to_peer { id: 13658 store_id: 21 role: Learner } message { msg_type: MsgSnapshot to: 13658 from: 13660 term: 6 snapshot { data: 0A6B08D86A121B7480000000000000FF645F728000000000FF1718280000000000FA1A1B7480000000000000FF645F728000000000FF22B2120000000000FA220508A90110402A0708D96A101618012A0708DA6A101518012A0508DB6A101F2A0508DC6A10012A0508DD6A100C20032A0438ABD703 metadata { conf_state { voters: 13659 voters: 13660 voters: 13661 learners: 13657 learners: 13658 } index: 60328 term: 6 } } } region_epoch { conf_ver: 169 version: 64 }]"]
[2023/08/22 13:42:01.548 +08:00] [INFO] [snapshot.rs:248] ["report snapshot status"] [status=Failure] [to="id: 13658 store_id: 21 role: Learner"] [peer_id=13660] [region_id=13656]
[2023/08/22 13:42:03.590 +08:00] [INFO] [snapshot.rs:425] ["requesting snapshot"] [state="Sending(data: 0A6B08D86A121B7480000000000000FF645F728000000000FF1718280000000000FA1A1B7480000000000000FF645F728000000000FF22B2120000000000FA220508A90110402A0708D96A101618012A0708DA6A101518012A0508DB6A101F2A0508DC6A10012A0508DD6A100C20032A0438ABD703 metadata { conf_state { voters: 13659 voters: 13660 voters: 13661 learners: 13657 learners: 13658 } index: 60328 term: 6 })"] [request_peer=13658] [request_index=0] [peer_id=13660] [region_id=13656]
[2023/08/22 13:42:03.590 +08:00] [WARN] [tablet_snap.rs:937] ["Too many sending snapshot tasks, drop Send Snap[to: tc-tiflash-1.tc-tiflash-peer.endless-htap-consistency-chaos-tps-2100561-1-885.svc:20170, snap: region_id: 13656 from_peer { id: 13660 store_id: 1 } to_peer { id: 13658 store_id: 21 role: Learner } message { msg_type: MsgSnapshot to: 13658 from: 13660 term: 6 snapshot { data: 0A6B08D86A121B7480000000000000FF645F728000000000FF1718280000000000FA1A1B7480000000000000FF645F728000000000FF22B2120000000000FA220508A90110402A0708D96A101618012A0708DA6A101518012A0508DB6A101F2A0508DC6A10012A0508DD6A100C20032A0438ABD703 metadata { conf_state { voters: 13659 voters: 13660 voters: 13661 learners: 13657 learners: 13658 } index: 60328 term: 6 } } } region_epoch { conf_ver: 169 version: 64 }]"]
[2023/08/22 13:42:03.590 +08:00] [INFO] [snapshot.rs:248] ["report snapshot status"] [status=Failure] [to="id: 13658 store_id: 21 role: Learner"] [peer_id=13660] [region_id=13656]
[2023/08/22 13:42:05.654 +08:00] [INFO] [snapshot.rs:425] ["requesting snapshot"] [state="Sending(data: 0A6B08D86A121B7480000000000000FF645F728000000000FF1718280000000000FA1A1B7480000000000000FF645F728000000000FF22B2120000000000FA220508A90110402A0708D96A101618012A0708DA6A101518012A0508DB6A101F2A0508DC6A10012A0508DD6A100C20032A0438ABD703 metadata { conf_state { voters: 13659 voters: 13660 voters: 13661 learners: 13657 learners: 13658 } index: 60328 term: 6 })"] [request_peer=13658] [request_index=0] [peer_id=13660] [region_id=13656]
[2023/08/22 13:42:05.654 +08:00] [INFO] [tablet_snap.rs:763] ["begin to send snapshot file"] [snap_key=13656_13658_6_60328]
[2023/08/22 13:42:25.961 +08:00] [INFO] [compact_log.rs:588] ["compact log"] [truncated="applied_index: 65179 truncated_state { index: 60165 term: 6 }"] [apply_trace="ApplyTrace { data_cfs: [Progress { flushed: 60328, last_modified: 65178 }, Progress { flushed: 60329, last_modified: 65179 }, Progress { flushed: 60330, last_modified: 65179 }], admin: Progress { flushed: 60328, last_modified: 5 }, persisted_applied: 60328, last_flush_trigger: 0, try_persist: false }"] [index=60166] [peer_id=13660] [region_id=13656]
[2023/08/22 13:45:32.027 +08:00] [INFO] [compact_log.rs:588] ["compact log"] [truncated="applied_index: 70247 truncated_state { index: 65205 term: 6 }"] [apply_trace="ApplyTrace { data_cfs: [Progress { flushed: 60328, last_modified: 70246 }, Progress { flushed: 60329, last_modified: 70247 }, Progress { flushed: 60330, last_modified: 70247 }], admin: Progress { flushed: 60328, last_modified: 5 }, persisted_applied: 60328, last_flush_trigger: 0, try_persist: false }"] [index=60328] [peer_id=13660] [region_id=13656]
[2023/08/22 13:50:03.797 +08:00] [INFO] [size.rs:179] ["Run size checker"] [policy=Approximate] [threshold=16106127360] [size=4962303810] [region_id=13656]
[2023/08/22 13:59:17.147 +08:00] [INFO] [tablet_snap.rs:786] ["sent all snap file finish"] [snap_key=13656_13658_6_60328]
[2023/08/22 13:59:21.001 +08:00] [INFO] [tablet_snap.rs:974] ["sent snapshot"] [duration=1035.346658988s] [size=3832627981] [snap_key=13656_13658_6_60328] [region_id=13656]
[2023/08/22 13:59:21.001 +08:00] [INFO] [snapshot.rs:248] ["report snapshot status"] [status=Finish] [to="id: 13658 store_id: 21 role: Learner"] [peer_id=13660] [region_id=13656]
[2023/08/22 14:00:11.314 +08:00] [INFO] [snapshot.rs:470] ["requesting new snapshot"] [request_peer=13658] [request_index=0] [peer_id=13660] [region_id=13656]
CalvinNeo commented 1 year ago

I think this is more of a performance issue involving TiKV, this is not a critical issue that will block TiFlash GA

JaySon-Huang commented 4 months ago

Close as the "Dynamic Region" aka "Partitioned Raft KV" has been stopped for a long time