Closed zyguan closed 2 years ago
https://github.com/tikv/tikv/issues/11148 discussed about optimizing optimistic deadlocks. But the final solution https://github.com/tikv/client-go/pull/367 introduced this bug. Before the optimization, the lock needs to be resolved first, so it will not report a false write conflict.
The idea of that optimization is that a transaction must be a write conflict if it encounters a lock whose start TS is larger. But we overlooked one case when the transaction has been already committed. This could happen when the response of the first prewrite is lost, but the retried prewrite encounters the newer lock. In this case, the user might receive a write conflict error even if the transaction is actually committed.
TiDB 5.0.6, 5.4.0 and 6.0.0 are affected. The unreleased 5.2 release should be blocked.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Not a minimal one, the issue occured when we run jepsen append test.
2. What did you expect to see? (Required)
The test passed normally.
3. What did you see instead (Required)
An anomoly was detected. It seems some transactions were actually committed however tidb returned write conflict (and notified users that they can try again later) rather than ok or underemined.
The full log can be accessed here
4. What is your TiDB version? (Required)
master