tikv / client-rust

Rust Client for TiKV.
Apache License 2.0
390 stars 131 forks source link

Get timestamp has no response #431

Closed pingyu closed 1 year ago

pingyu commented 1 year ago

During some long run stability tests, I found there are low chances that Transaction::scan request will hang. After add some tracing logs (see https://github.com/pingyu/client-rust/tree/trace-scan), it show that the hang position is at the get_timestamp of pd client.

The tracing logs of a normal request (some irrelevant spans are ignored):

2023-11-02T22:40:53.942345Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 47: new
2023-11-02T22:40:53.942350Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 47: enter
2023-11-02T22:40:53.942354Z DEBUG Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 52: resolving locks
2023-11-02T22:40:53.942363Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks:TimestampOracle::get_timestamp: tikv_client::pd::timestamp: 67: new
2023-11-02T22:40:53.942367Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks:TimestampOracle::get_timestamp: tikv_client::pd::timestamp: 67: enter
2023-11-02T22:40:53.942373Z DEBUG Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks:TimestampOracle::get_timestamp: tikv_client::pd::timestamp: 69: getting current timestamp
2023-11-02T22:40:53.942383Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks:TimestampOracle::get_timestamp: tikv_client::pd::timestamp: 67: exit
2023-11-02T22:40:53.942388Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 47: exit
2023-11-02T22:40:53.942392Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}: tikv_client::request::plan: 163: exit
2023-11-02T22:40:53.942395Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}: tikv_client::request::plan: 124: exit
2023/11/02 22:40:53.948 task.rs:2023-11-02T22:40:53.948116Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}: tikv_client::request::plan: 124: enter
2023-11-02T22:40:53.948143Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}: tikv_client::request::plan: 163: enter
2023-11-02T22:40:53.948163Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 47: enter
2023-11-02T22:40:53.948172Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks:TimestampOracle::get_timestamp: tikv_client::pd::timestamp: 67: enter
2023-11-02T22:40:53.948180Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks:TimestampOracle::get_timestamp: tikv_client::pd::timestamp: 67: exit
2023-11-02T22:40:53.948184Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks:TimestampOracle::get_timestamp: tikv_client::pd::timestamp: 67: enter
2023-11-02T22:40:53.948188Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks:TimestampOracle::get_timestamp: tikv_client::pd::timestamp: 67: exit
2023-11-02T22:40:53.948193Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks:TimestampOracle::get_timestamp: tikv_client::pd::timestamp: 67: close time.busy=28.3µs time.idle=5.80ms
2023-11-02T22:40:53.948211Z DEBUG Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 61: resolving locks: expired_locks 0, live_locks 1
2023-11-02T22:40:53.948217Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 47: exit
2023-11-02T22:40:53.948222Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 47: enter
2023-11-02T22:40:53.948226Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 47: exit
2023-11-02T22:40:53.948230Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=6}:resolve_locks: tikv_client::transaction::lock: 47: close time.busy=97.3µs time.idle=5.79ms
2023-11-02T22:40:53.948239Z DEBUG Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}: tikv_client::request::plan: 619: ResolveLock::execute lock error retry (delay 64ms)

The abnormal request:

2023-11-02T22:40:54.032175Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=7}:resolve_locks: tikv_client::transaction::lock: 47: new
2023-11-02T22:40:54.032180Z INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=7}:resolve_locks: tikv_client::transaction::lock: 47: enter
2023-11-02T22:40:54.032185Z DEBUG Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=7}:resolve_locks: tikv_client::transaction::lock: 52: resolving locks
2023-11-02T22:40:54.032194Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:ResolveLock::execute::retry{retry_cnt=7}:resolve_locks: tikv_client::transaction::lock: 47: exit
2023-11-02T22:40:54.032199Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}:single_shard_handler{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}: tikv_client::request::plan: 163: exit
2023-11-02T22:40:54.032202Z  INFO Snapshot::scan:Transaction::scan:scan_inner{limit=3028 key_only=false reverse=false version=445373440315621376 range=BoundRange { from: Included(Key(78000001)), to: Excluded(Key(78000002)) }}:MergeResponse::execute:RetryableMultiRegion::execute:single_plan_handler:shard{region_store=RegionStore { region_id: 1555, region_version: 17, leader_store_id: 1003 }}: tikv_client::request::plan: 124: exit

(no log for region 1555 any more)

We can see that in the abnormal process, it did not enter the TimestampOracle::get_timestamp.