zodb / relstorage

A backend for ZODB that stores pickles in a relational database.
Other
53 stars 46 forks source link

Change locking order to eliminate deadlocks #469

Closed jamadden closed 3 years ago

jamadden commented 3 years ago

The locking algorithm in RelStorage goes in two steps.

First, we attempt to take shared locks on anything that we did readCurrent on. If we get all the locks, and there are no mismatches (where an object has changed) we proceed to step two. If there are mismatches, we return early and vote.py eventually raises VoteReadConflictError with details about the first such mismatch found. (Some point hopefully soon after this the transaction is aborted and the shared locks get released. MySQL is already releasing locks before returning from the database, but PostgreSQL has to be modified to do so.

These locks are taken in NOWAIT mode. If some other transaction has an exclusive lock on a row --- meaning its in the middle of changing it --- this transaction gets aborted by the database and we raise the UnableToLockRowsToReadCurrentError (usually, but not always, because it's heuristic based).

The second step is to take exclusive locks on the objects being modified. Each individual row may wait up to commit-lock-timeout to get locked.

This step either succeeds, or times out and raises UnableToLockRowsToModifyError, or deadlocks with other transactions. PostgreSQL only periodically checks for deadlocks because "The check for deadlock is relatively expensive, so the server doesn't run it every time it waits for a lock. The default is one second (1s), which is probably about the smallest value you would want in practice. On a heavily loaded server you might want to raise it."

So there are three failure times here:

1) Immediately if a shared lock is unavailable. 2) After at least commit-lock-timeout if an exclusive lock is unavailable. This usually results in UnableToLockRowsToModifyError. 3) After ~1 second if a deadlock occurs. This usually results in UnableToLockRowsToReadCurrentError (which is usually wrong).

Timeout (1), immediately, is good: the transaction is doomed and there is nothing the application can do to fix it other than trying again from the beginning. We haven't waited on getting exclusive locks, or worse, actually running _p_resolveConflict methods, only to discover that we're doomed.

Timeout (2) is not great for a few reasons. It's hard to bound, it's unfixable without retrying, and it isn't supposed to happen in a well-running system.

Timeout (3) is also bad: First, it's this short duration which leads to sometimes getting a weird error message (the heuristic is time-based). Second, all deadlocks involve some combination of shared locks and exclusive locks, and at least two transactions. This means that any given transaction may or may not have conflicts with one of the other transactions, and even if they do, they may or may not be able to resolve them (which should be cheaper than retrying the entire request again). But because of this error, at least one of them never gets a chance. (It's easy to show that for only two conflicting transactions, one of them was doomed and one could succeed, but once you get more transactions involved, it's hard to be sure who is doomed and who isn't).

Ok, so what can we do about any of this?

I propose adding an additional step to the locking algorithm and re-ordering the existing two. The new algorithm would be:

1) Just look for, but do not take shared locks, on readCurrent objects. If there are mismatches, return immediately. 2) Take the exclusive locks. 3) Repeat step 1, but do take shared NOWAIT locks.

By taking the NOWAIT locks after the exclusive locks, we should essentially eliminate the deadlock scenario, AFAICS. And by running the additional query up front in step 1, we mitigate how much extra waiting we have to do on doomed transactions.

I say mitigate because it obviously only works if the modifying transaction had committed by the time we look. In one large test environment stress-testing a real application, I found 800 retries involving a VoteReadConflictError --- times we know for sure that the modifying transaction had committed when we looked, but 9,232 retries for UnableToLockRowsToReadCurrent. Unfortunately, we can't say for sure how many of those were really deadlock errors, but we can say that 1,303 of those deadlocks bubbled up and caused the entire request to fail (after several retries), whereas only 228 "real" readCurrent lock failures bubbled up. So I think we can safely guess that most of those retries were due to deadlocks.

This should keep most of the good properties of (1), and eliminate the bad properties of (3). There's possibly nothing I can do about bounding or other bad properties of (2).

The hope is that by eliminating the immediate deadlock detection that dooms transactions, transactions that otherwise get retried instead get to go on to conflict resolution. There's a small risk that for certain highly concurrent and contended workloads, this may drop write throughput; we'll have to run large stress tests to be sure.

The zodbshootout benchmark suite, including one I added just for this, doesn't report any statistically significant differences in the old vs new behaviour (e.g., adding that extra query isn't expensive).

This reverts #314, but a lot of things have changed since that went in and we have a lot more experience with row-level locking now.