qertoip / transaction_retry

Retries database transaction on deadlock and transaction serialization errors. Supports MySQL, PostgreSQL and SQLite.
MIT License
116 stars 42 forks source link

fuzz factor on delay #2

Closed michaelmwu closed 6 years ago

michaelmwu commented 11 years ago

Hey,

I don't know if this is the best way to do this, but I've been hitting issues in a heavily trafficked app that launches possibly conflicting transactions at about the same time.

If two conflicting transactions start at the same time and backoff at the same schedule, they are likely to conflict again. I added some fuzz to prevent this from happening, but this might be better done with a customizable function to return the delay vs. how many retries have happened so far, that could default to looking up into an array.

lastobelus commented 11 years ago

+1

The first thing I thought of when I was looking at the Gem README and I came to look to see if anyone had posted an issue and sure enough...

You gotta randomize retry times!

michaelmwu commented 11 years ago

also allows you to specify which errors in addition to isolation conflict, as well as max_retries per call with

transaction(:retry_on => [ActiveRecord::RecordNotUnique], :max_retries => 2)

carsonreinke commented 10 years ago

If this is MySQL, and you receive a deadlock, not both connections receive an error, instead one is selected. If you look at the InnoDB engine status (show engine innodb status;), it will state something like this: "*\ WE ROLL BACK TRANSACTION (1)".

joevandyk commented 9 years ago

@qertoip can this get merged?

carsonreinke commented 9 years ago

I still don't see how this could ever happen, the fuzz concept does not make sense.

MySQL says "rolls back a transaction or transactions to break the deadlock" and "the lock request for the other client can be granted"

PostgreSQL says "resolves them by aborting one of the transactions involved, allowing the other(s) to complete"

michaelmwu commented 9 years ago

Actually both transactions can conflict and both be rolled back On Jun 17, 2015 6:04 PM, "Carson Reinke" notifications@github.com wrote:

I still don't see how this could ever happen, the fuzz concept does not make sense.

MySQL says "rolls back a transaction or transactions to break the deadlock" http://dev.mysql.com/doc/refman/5.6/en/innodb-deadlock-detection.html and "the lock request for the other client can be granted" http://dev.mysql.com/doc/refman/5.6/en/innodb-lock-modes.html

PostgreSQL says "resolves them by aborting one of the transactions involved, allowing the other(s) to complete" http://www.postgresql.org/docs/9.3/static/explicit-locking.html

— Reply to this email directly or view it on GitHub https://github.com/qertoip/transaction_retry/pull/2#issuecomment-112997805 .

carsonreinke commented 9 years ago

When? I have not been able to find anything that states this nor that this could happen.

aquach commented 7 years ago

This can happen when you have more than 2 transactions fighting for the same lock. Consider the case where 5 transactions all take the same read lock, then all attempt to upgrade to a write lock. Each transaction conflicts with each other transaction, and MySQL will rollback four of them and allow one to proceed.

Under the current scheme, the four will retry immediately, and the same issue will present itself: MySQL will rollback three and allow one to proceed. Then the three will sleep 1 second and try again, etc.. The issue is that even though transactions are sleeping to wait for contention to pass, they're sleeping in lockstep with each other, which means that when they try again, regardless of how much time they've slept, they'll conflict with each other again. In a degenerate case with 10 transactions, you'd have 9 rounds of sleeping in order to get all 10 through, not to mention 55 attempts overall (10 + 9 + 8 + 7..) in order to get the 10 transactions committed.

A fuzz factor can mitigate this by causing each transaction to start at a slightly different time, so that hopefully they won't conflict as much with each other.

carsonreinke commented 7 years ago

@aquach Yes, I see your point, I guess I was just assuming the 2 transaction scenario.

lastobelus commented 7 years ago

@carsonreinke it's statistically likely that if you have enough traffic that you actually need to deal with the 2 transaction case, that you will also be experiencing the more-than-2 transaction case.

you actually need both fuzz & backoff to minimize wait time when there is a lot of contention, not just fuzz.

lastobelus commented 7 years ago

it's really easy to code up a demo of this, and if you do you will find that tuning fuzz/backoff so that work is equally distributed among your threads, with a minimum of idle time, is a bit of an art. If you just sleep num_retries * factor, it's easy for "harmonics" to arise, where 1 or 2 threads are doing most of the actual work and the rest are mostly blocking. If backoff is not steep enough, high contention can cause churn, where all the threads are wasting a lot of time asking for and being denied locks. If backoff is too steep, when there's high contention, it's easy to get into the situation where workers get "stuck", because they never win and as time goes on their chance of winning keeps decreasing.

lastobelus commented 7 years ago

The good news is that if you tune for a particular number of workers & degree of contention to work, it will also work reasonably well for any lesser amount of contention.

carsonreinke commented 7 years ago

@lastobelus I would love to see an example of this actually happening, if you have one.

lastobelus commented 7 years ago

Not for database, but for the similar situation of a threaded worker queue consuming a 3rd party api that uses the bucket model for throttling.

joevandyk commented 6 years ago

@carsonreinke any chance this could get merged and released?

carsonreinke commented 6 years ago

@joevandyk Wish I could, don't have push access, that would have to be @qertoip