rails / solid_queue

Database-backed Active Job backend
MIT License
1.66k stars 90 forks source link

Fixes #224. I believe this is a Postgres only issue #231

Open hms opened 1 month ago

hms commented 1 month ago

Per all of my earlier discussion, Postgres transactions become invalid upon any database contraint invalidating an operation and make all follow on database interactions invalid until a rollback is executed.

Semephore#attempt_creation uses a rather common database pattern to rely on database locks to reliably syncrhronize based on uniqueness of values. The catch is that Postgresql requires a little more handholding to use this pattern. By wrapping just the Semaphore.create statement in a SAVEPOINT (a pretend nested transaction), we have what appears to be a transparent operation that works across all of the other supported databases without any known consequences AND allows Postgres to continue to work as expected per the existing code.

This change works across the Solid supported databases (at least based on the tests) and I believe the extra overhead for the non-postgres databases is small enough that special casing the code or resorting to the complexity of dependency injection is just not worth it.

I added two tests to the concurrency_controls_test.rb. One to show the error is real and is easy to reproduce. The other to show the fix fixes it.

hms commented 1 month ago

@rosa

I'm pausing on this for just a little as I did a little more digging (probably should have checked before coding a fix....) and there are some significant concerns around adding sub-transactions / savepoint for high volume deployments, especially those with replication / failover.

While my proposed solution is database agnostic (ignoring the Postgres performance issues referenced above), the tests cannot be. Both SqLite and mySQL work without a testable failure with either my PR or the original SolidQueue implementation.

There look to be some non-nested transaction / savepoint implementations, but they all require different syntax for each of the 3 supported databases. I'm open to investigating those alternatives, but it would help to understand if the SolidQueue team would be open to a PR based on database specific syntax and how to organize the code for any PRs with this approach.

Open issues that I could use some direction on:

Thanks,

Hal

hms commented 3 weeks ago

@rosa

I've revamped this PR to avoid nested transaction.

Postgres, unlike mysql and sqlite, invalidates the current transaction on a database contraint violation such as an attempt to insert a duplicate key into a unique index.

SolidQueue uses a pretty standard design pattern of application level
conflict detection and resolution via error handling as part of it's
concurrency management in Semaphore::Proxy#attempt create (via create! and rescue).

For Postgres based implementations when:
* enqueing a job inside a transaction
* under load such that the race condition triggerng the simultanious conflicted insert

the transaction would become unrecoverable at the application level.

There are two simple paths to address this issue in Postgres:
* Nested transactions
* Insert using 'on conflict do nothing'

Nested transaction are the easiest to implement and are portable across all 3
of the Rails standard databases. Unfortunately, nested transactions have a
known and very signifant performance problem, especially for databases under
load and/or replicated databases with long running queries.

Each of the big three database inplements the ANSI standard Insert on conflict do nothing
in a very different way.

Because of this, this PR only replaces the current implementation of
application level conflict handling for Postgres.  It takes advantage
of the Rails standard ActiveRecord::Persistence::Insert method supporting on
conflict semantics.