Pending async commit transactions don't necessarily block read

sticnarf commented 3 years ago

Now the min_commit_ts of async commit transactions cannot be advanced like the legacy transactions. So if our read timestamp is larger than the min_commit_ts of an async-commit lock, we can only wait until the lock expires and resolve the lock.

Actually, we can use the CheckSecondaryLocks API to achieve something like advancing min_commit_ts.

We can add caller_start_ts (like the one in the CheckTxnStatus API) to the request. TiKV uses this timestamp to update its max_ts.

message CheckSecondaryLocksRequest {
    ...

    // The start timestamp of the transaction which this request is part of.
    uint64 caller_start_ts = 4;
}

After the change, we don't need to wait until the TTL expires before checking secondary locks. If TTL is not expired, we set caller_start_ts. Otherwise, we keep caller_start_ts zero.

If caller_start_ts is zero, CheckSecondaryLocks writes rollbacks if the lock does not exist, which is the current behavior. If caller_start_ts exists, CheckSecondaryLocks needn't write anything if the lock does not exist.

After calling CheckSecondaryLocks with caller_start_ts, we can skip the locks of this transaction, just like we've advanced the min_commit_ts of the transaction: If a following prewrite hits the same TiKV, its min_commit_ts must be greater than the caller_start_ts due to the updated max_ts; if it sends to a different TiKV, the leader must have changed, so the max_ts should have been updated to a more recent timestamp from PD and thus the commit_ts should be also greater than the caller_start_ts.

However, we still need to weigh the benefit and the cost. It is heavy to check all secondaries, so maybe we should not do this so eagerly (maybe after several backoffs?). And the benefit is not so big. Async commit transactions are typically small. It is unusual that prewriting them takes a long time. So personally I don't think it's a task of high priority.

sticnarf commented 3 years ago

cc @nrc @youjiali1995 @MyonKeminta @cfzjywxk

cfzjywxk commented 3 years ago

We could do some benchmarks for the read-write conflict scenarios with different data distributions to show the impact. Currently I agree it's not urgent and we need to make the basic async-commit stable before introducing new improvements.

MyonKeminta commented 3 years ago

Did you mean that the transaction's lock can be skipped if we did CheckSecondaryLocks on one if its keys that's not yet locked? Sounds correct, but I'm afraid that if you do the CheckTxnStatus - CheckSecondaryLocks procedure even when the lock's TTL is not expired, it may cause too many these requests when there are many transaction conflicts and maybe results in lower performance.

sticnarf commented 3 years ago

Did you mean that the transaction's lock can be skipped if we did CheckSecondaryLocks on one if its keys that's not yet locked? Sounds correct, but I'm afraid that if you do the CheckTxnStatus - CheckSecondaryLocks procedure even when the lock's TTL is not expired, it may cause too many these requests when there are many transaction conflicts and maybe results in lower performance.

Yes. And it's why I suggest this only happens after several times of backoffs.

tikv / sig-transaction

Pending async commit transactions don't necessarily block read #101