Closed MyonKeminta closed 3 years ago
cc @coocood @sticnarf @cfzjywxk
Solution 2 looks more safe.
When TiKV receives an async-commit or 1PC prewrite request, it calculates the max_commit_ts
key by key. Then for a single prewrite request, some key may pass the commit_ts constraint check while some may not.
After a CommitTsTooLarge
error is encountered, subsequent mutations will be prewritten using the normal 2PC way. Prior async-commit mutations are not amended for the sake of easy implementation. 1PC mutations will be normal locks in this case. The returned prewrite response should set min_commit_ts
to 0 to indicate a fallback.
So, after CommitTsTooLarge
happens, all mutations should be successfully written as locks. But we don't guarantee that all of them satisfy use_async_commit = false
. As long as one of the locks has use_async_commit = false
, we know that this transaction falls back from async commit.
In order to resolve locks of a fallback transaction, we need a mechanism to roll back the primary lock. By default, CheckTxnStatus
does not roll back an async-commit lock. We can add a new flag async_commit_fallback
to indicate that the transaction has fallen back to normal 2PC, so it is safe to roll back the lock.
When TiDB prewrites using the async-commit or 1PC way and it receives 0 as min_commit_ts
, then it knows a fallback happens. Then, it uses the normal 2PC way to commit this transaction: commit the primary lock first, return success to the user and commit the secondary locks asynchronously.
Firstly, TiDB queries the primary lock for the transaction status using CheckTxnStatus
as usual. If the primary lock has a use_async_commit
flag, it checks all secondary locks. This operation will return the information of each lock or write rollback records if the lock does not exist.
If any rollback is written, the transaction is bound to fail. It has nothing to do with fallback.
If any returned lock's use_async_commit
is false
, it means it's a fallback transaction. We cannot resolve locks using the async-commit way. Then, we can set the new flag async_commit_fallback
and do CheckTxnStatus
again. This operation will roll back the primary lock if the primary lock still exists. And the following procedures are the same as resolving a normal 2PC transaction lock.
cc @nrc @youjiali1995 @MyonKeminta
If any
min_commit_ts
is 0, it means it's a fallback transaction.
I think it's better use the use_async_commit
field, which seems already included in the returned message of check_secondary_locks
. I'm kind of afraid that min_commit_ts
in secondary locks may not always the flag of whether async commit is used in the future.
If any
min_commit_ts
is 0, it means it's a fallback transaction.I think it's better use the
use_async_commit
field, which seems already included in the returned message ofcheck_secondary_locks
. I'm kind of afraid thatmin_commit_ts
in secondary locks may not always the flag of whether async commit is used in the future.
Fixed.
@sticnarf can we close this issue now?
@sticnarf can we close this issue now?
Ah, yes. We can close it.
As a solution to schema version check issue (https://github.com/tikv/sig-transaction/issues/51), we added
max_commit_ts
limit to async commit's prewrite requests. When the calculatedmin_commit_ts
exceeds themax_commit_ts
, the CommitTsTooLarge error will be thrown. We need to find a proper way to handle the CommitTsTooLarge error. Otherwise, when the load is high, the failure rate of async commit might be significant.Solution 1:
When TiDB receives CommitTsTooLarge error, check the schema version again.
Solution 2:
In solution 1, if the load is high enough, it's still likely to fail after retry. Another choice is to fallback to non-async-commit transaction when CommitTsTooLarge error occurs. This might be more complicated to implement than solution 1. If we always rewrite the primary lock to non-async-commit lock first when falling back, the implementation might be easier. We should confirm the correctness first before adopting this way.