kvesteri / sqlalchemy-continuum

Versioning extension for SQLAlchemy.
BSD 3-Clause "New" or "Revised" License
568 stars 128 forks source link

Fix deadlock in mysql on concurrent inserts #315

Closed mfulgo closed 1 year ago

mfulgo commented 1 year ago

When inserting many versioned objects in MySQL concurrently (from separate threads), deadlocks would frequently occur and the inserts would fail. Updating to use the newer ORM-style query while applying the validity strategy avoids the problem.

Fixes #172

mfulgo commented 1 year ago

@marksteward - Anything I can do to help get this merged and a bug-fix version published?

marksteward commented 1 year ago

Thanks for the reminder! It's not clear to me why the ORM changes would affect deadlock. Can you provide a rationale, or the SQL in question?

mfulgo commented 1 year ago

I can take a deeper look, but I couldn't figure out an easy way to print the update query in the previous form. The updated SQL (from the test) comes out to be

UPDATE article_version SET end_transaction_id=:end_transaction_id
WHERE article_version.transaction_id = (
  SELECT max_1 FROM (
    SELECT max(article_version_1.transaction_id) AS max_1
    FROM article_version AS article_version_1
    WHERE
      article_version_1.transaction_id < :transaction_id_1 AND
      article_version_1.id = :id_1
    ) AS subquery
  ) AND article_version.id = :id_2

From the behavior and deadlock report, it was first getting a shared lock (for a select) and then attempting to obtain an exclusive lock (for an update or the insert), which was when the deadlock was happening. The updated code either skips the shared lock or doesn't hold onto it past the update.

The test in this PR fails with the deadlock error without the fix and then passes after. So, I'm pretty confident it resolves the issue. (I did only test it against MySQL, but I don't see anything language specific about the query.)

marksteward commented 1 year ago

Thanks, will have a dig. I'd missed your comments on #172.

mfulgo commented 1 year ago

Thanks!

I just did a server-side dump of the queries from the test runs before and after the code change. (The second log is truncated for brevity but continues on similarly.) I haven't had a chance to analyze it for why it's behaving differently, but a second set of eyes is welcome.

mfulgo commented 1 year ago

It looks like the secret is flipping the order of the INSERT and UPDATE calls to the version table. (I'm not sure why this happened from the code change.)

This fails.

INSERT INTO transaction (issued_at, remote_addr) VALUES ('2022-10-21 19:33:41.289363', NULL)
INSERT INTO article (name) VALUES ('Article 1-0001')
SELECT transaction_changes.transaction_id AS transaction_changes_transaction_id, transaction_changes.entity_name AS transaction_changes_entity_name FROM transaction_changes WHERE transaction_changes.transaction_id = 438 AND transaction_changes.entity_name = 'Article'
-- here
UPDATE article_version SET end_transaction_id=438 WHERE article_version.transaction_id = (SELECT max_1 FROM (SELECT max(article_version_1.transaction_id) AS max_1 FROM article_version AS article_version_1 WHERE article_version_1.transaction_id < 438 AND article_version_1.id = 231) AS subquery) AND article_version.id = 231
INSERT INTO article_version (id, name, transaction_id, end_transaction_id, operation_type) VALUES (231, 'Article 1-0001', 438, NULL, 0)
-- to here
INSERT INTO transaction_changes (transaction_id, entity_name) VALUES (438, 'Article')
SELECT transaction_changes.transaction_id AS transaction_changes_transaction_id, transaction_changes.entity_name AS transaction_changes_entity_name FROM transaction_changes WHERE transaction_changes.transaction_id = 438 AND transaction_changes.entity_name = 'Article'
COMMIT

While this works.

INSERT INTO transaction (issued_at, remote_addr) VALUES ('2022-10-21 19:41:39.198466', NULL)
INSERT INTO article (name) VALUES ('Article 2-0001')
SELECT transaction_changes.transaction_id AS transaction_changes_transaction_id, transaction_changes.entity_name AS transaction_changes_entity_name FROM transaction_changes WHERE transaction_changes.transaction_id = 1 AND transaction_changes.entity_name = 'Article'
-- here
INSERT INTO article_version (id, name, transaction_id, end_transaction_id, operation_type) VALUES (1, 'Article 2-0001', 1, NULL, 0)
UPDATE article_version SET end_transaction_id=1 WHERE article_version.transaction_id = (SELECT max_1 FROM (SELECT max(article_version_1.transaction_id) AS max_1 FROM article_version AS article_version_1 WHERE article_version_1.transaction_id < 1 AND article_version_1.id = 1) AS subquery) AND article_version.id = 1
-- to here
INSERT INTO transaction_changes (transaction_id, entity_name) VALUES (1, 'Article')
SELECT transaction_changes.transaction_id AS transaction_changes_transaction_id, transaction_changes.entity_name AS transaction_changes_entity_name FROM transaction_changes WHERE transaction_changes.transaction_id = 1 AND transaction_changes.entity_name = 'Article'
COMMIT

I think what's happening here is that the subquery, in the first sequence, grabs a shared lock on article_version first, then tries an exclusive lock for the insert which may fail/deadlock if another connection has grabbed a shared lock on article_version.

In the second case, the insert grabs an exclusive lock on article_version first, then it doesn't need to worry about the shared lock for the update. The other connections have to wait in line to get their own exclusive lock on article_version.

marksteward commented 1 year ago

Ahh, probably an implicit flush then. I might make that explicit (or add an explicit lock).

I've pushed a fix that should make the tests work. I've got some other stuff on this evening but I've also got a docker-compose.yml that should help with local testing.

mfulgo commented 1 year ago

Do you need me to rebase or anything to pick up the fixed tests? Looks like I can't re-run them since I'm not a collaborator.

mfulgo commented 1 year ago

Hi @marksteward - Sorry to keep poking this PR, but I'm hoping to get this fix into some production code soon without needing to do a fork or vendor a copy of the lib... anything I can do to help?

marksteward commented 1 year ago

Hi, sorry for the delay, it's on my list for tomorrow evening.

If you could rebase to include the latest commit on master the tests should run.

mfulgo commented 1 year ago

No problem! Thanks! (Rebased.)

mfulgo commented 1 year ago

I'm figuring out failing tests and hope to have a fix up later today.

My plan for the new multi-threaded test is to skip them for sqlite because that's a fairly non-standard use case.

mfulgo commented 1 year ago

Fixed, and tests for mysql, postgres, and sqlite are passing locally.

mfulgo commented 1 year ago

Try try again... Tested and passing locally with SQAlchemy 1.3 and 1.4 and MySQL, Postgres, and SQLite. 😌

mfulgo commented 1 year ago

Hi @marksteward - Any chance of getting this merged and released this week? Thanks!

mfulgo commented 1 year ago

Any chance of a release with this sometime soon? Thanks, @marksteward!

(Just pinging to keep this on the radar. I'd be happy to run with this myself, given access.)

marksteward commented 1 year ago

Hi, sorry. Not ignoring this but been really busy lately. I'll try to release at the weekend.

mfulgo commented 1 year ago

No worries; I understand how it goes. =) Thanks!

mfulgo commented 1 year ago

@marksteward - I noticed there was a patch release that happened in Jan. Any chance this could get included in the next one? (I'd love to remove our dependency on the fork of this.)

marksteward commented 1 year ago

Sorry for the lack of update here. I've tested this on 1.4 and 2.0, and it still deadlocks (it doesn't work on 1.3, but that's fine).

I'm not likely to fix it tonight but I'm going to add your test to the sqla-2.0 branch and see if I can fix the deadlock reliably.

bruceadams commented 1 year ago

it still deadlocks

Wow! I'm surprised! We're using this fix in production and it resolved the specific deadlock problem we were hitting (we could hit that deadlock very consistently, there was nothing ambiguous about it).

marksteward commented 1 year ago

Hmm, and your tests passed too, so it must have worked at some point! I'm running my tests in mariadb, maybe there's a difference.

Yep, still passing on the sqla-2.0 branch:

https://github.com/kvesteri/sqlalchemy-continuum/actions/runs/5227136249/jobs/9438476712

But it's deadlocking inside docker compose even with mysql. It might only get stuck in cleanup though:

+-----+-----------------+------------------+------+---------+------+---------------------------------+--------------------------------+
| Id  | User            | Host             | db   | Command | Time | State                           | Info                           |
+-----+-----------------+------------------+------+---------+------+---------------------------------+--------------------------------+
|   5 | event_scheduler | localhost        | NULL | Daemon  |  313 | Waiting on empty queue          | NULL                           |
| 481 | root            | 172.22.0.5:37854 | main | Sleep   |  153 |                                 | NULL                           |
| 535 | root            | 172.22.0.5:47508 | main | Sleep   |  138 |                                 | NULL                           |
| 536 | root            | 172.22.0.5:47522 | main | Sleep   |  137 |                                 | NULL                           |
| 537 | root            | 172.22.0.5:47524 | main | Sleep   |  136 |                                 | NULL                           |
| 538 | root            | 172.22.0.5:47526 | main | Sleep   |  137 |                                 | NULL                           |
| 539 | root            | 172.22.0.5:47548 | main | Sleep   |  137 |                                 | NULL                           |
| 540 | root            | 172.22.0.5:47554 | main | Query   |  136 | Waiting for table metadata lock | DROP TABLE transaction_changes |
| 541 | root            | localhost        | main | Query   |    0 | init                            | show processlist               |
+-----+-----------------+------------------+------+---------+------+---------------------------------+--------------------------------+

Nope that's just it trying to clean up after existing early when deadlock is detected. With mariadb, it's deadlocking on the commit:

tests/test_validity_strategy_multithreaded.py:69: in _insert_articles
    session.commit()
...
/usr/local/lib/python3.9/site-packages/pymysql/cursors.py:325: in _query
    conn.query(q)
...
/usr/local/lib/python3.9/site-packages/pymysql/connections.py:1157: in read
    first_packet = self.connection._read_packet()
/usr/local/lib/python3.9/site-packages/pymysql/connections.py:729: in _read_packet
    packet.raise_for_error()
...
E       sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
E       [SQL: UPDATE article_version SET end_transaction_id=%(end_transaction_id)s WHERE article_version.transaction_id = (SELECT max_1
E       FROM (SELECT max(article_version_1.transaction_id) AS max_1
E       FROM article_version AS article_version_1
E       WHERE article_version_1.transaction_id < %(transaction_id_1)s AND article_version_1.id = %(id_1)s) AS subquery) AND article_version.id = %(id_2)s]
E       [parameters: {'end_transaction_id': 17, 'transaction_id_1': 17, 'id_1': 10, 'id_2': 10}]
E       (Background on this error at: https://sqlalche.me/e/20/e3q8)

/usr/local/lib/python3.9/site-packages/pymysql/err.py:143: OperationalError
marksteward commented 1 year ago

Right, I've done the necessary digging and it seems the issue was largely due to MySQL's tendency to take shared next-key locks that can't be upgraded to exclusive ones. Here's an example in MariaDB:

MariaDB [main]> show engine innodb status;
...
*** (1) TRANSACTION:
TRANSACTION 142665, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 2
MariaDB thread id 240, OS thread handle 140366108550720, query id 55306 172.22.0.5 root Update
INSERT INTO article_version (id, name, transaction_id, end_transaction_id, operation_type) VALUES (56, 'Article 0-0028', 110, NULL, 0)
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142665 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** CONFLICTING WITH:
RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142664 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
...
RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142664 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142665 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
...
RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142665 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
...

*** (2) TRANSACTION:
TRANSACTION 142664, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 2
MariaDB thread id 241, OS thread handle 140366108857920, query id 55305 172.22.0.5 root Update
INSERT INTO article_version (id, name, transaction_id, end_transaction_id, operation_type) VALUES (55, 'Article 1-0028', 109, NULL, 0)
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142664 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** CONFLICTING WITH:
RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142664 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
...
RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142664 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142665 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
...
RECORD LOCKS space id 10111 page no 3 n bits 112 index PRIMARY of table `main`.`article_version` trx id 142665 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

or

*** (1) TRANSACTION:
TRANSACTION 142873, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 8 lock struct(s), heap size 1128, 6 row lock(s), undo log entries 3
MariaDB thread id 254, OS thread handle 140366108857920, query id 56247 172.22.0.5 root Update
INSERT INTO article_version (id, name, transaction_id, end_transaction_id, operation_type) VALUES (99, 'Article 1-0010.2', 194, NULL, 1)
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10111 page no 3 n bits 192 index PRIMARY of table `main`.`article_version` trx id 142873 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** CONFLICTING WITH:
RECORD LOCKS space id 10111 page no 3 n bits 192 index PRIMARY of table `main`.`article_version` trx id 142873 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
...
RECORD LOCKS space id 10111 page no 3 n bits 192 index PRIMARY of table `main`.`article_version` trx id 142873 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
...

*** (2) TRANSACTION:
TRANSACTION 142874, ACTIVE 0 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 2
MariaDB thread id 253, OS thread handle 140365579372096, query id 56246 172.22.0.5 root Optimizing
UPDATE article_version SET end_transaction_id=195 WHERE article_version.transaction_id = (SELECT max_1
FROM (SELECT max(article_version_1.transaction_id) AS max_1
FROM article_version AS article_version_1
WHERE article_version_1.transaction_id < 195 AND article_version_1.id = 100) AS subquery) AND article_version.id = 100
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10111 page no 3 n bits 192 index PRIMARY of table `main`.`article_version` trx id 142874 lock mode S waiting
Record lock, heap no 190 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000063; asc    c;;
 1: len 8; hex 80000000000000c0; asc         ;;
 2: len 6; hex 000000022e19; asc     . ;;
 3: len 7; hex 27000003cd18be; asc '      ;;
 4: len 14; hex 41727469636c6520312d30303130; asc Article 1-0010;;
 5: len 8; hex 80000000000000c2; asc         ;;
 6: len 2; hex 8000; asc   ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 10111 page no 3 n bits 192 index PRIMARY of table `main`.`article_version` trx id 142873 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
...
RECORD LOCKS space id 10111 page no 3 n bits 192 index PRIMARY of table `main`.`article_version` trx id 142873 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

This is a well-known issue, and your change that moves the update after the insert does mitigate it a bit. However, it doesn't prevent deadlock entirely, although for some reason MariaDB is more liable to hit it than MySQL:

*** (1) TRANSACTION:
TRANSACTION 147292, ACTIVE 0 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 6 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 3
MariaDB thread id 286, OS thread handle 140365773510208, query id 74958 172.22.0.5 root Updating
UPDATE article_version SET end_transaction_id=1 WHERE article_version.transaction_id = (SELECT max_1
FROM (SELECT max(article_version_1.transaction_id) AS max_1
FROM article_version AS article_version_1
WHERE article_version_1.transaction_id < 1 AND article_version_1.id = 1) AS subquery) AND article_version.id = 1
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10160 page no 3 n bits 8 index PRIMARY of table `main`.`article_version` trx id 147292 lock_mode X waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 6; hex 000000023f5c; asc     ?\;;
 3: len 7; hex fd0000038d1a18; asc        ;;
 4: len 14; hex 41727469636c6520302d30303031; asc Article 0-0001;;
 5: SQL NULL;
 6: len 2; hex 8000; asc   ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 10160 page no 3 n bits 8 index PRIMARY of table `main`.`article_version` trx id 147292 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 6; hex 000000023f5c; asc     ?\;;
 3: len 7; hex fd0000038d1a18; asc        ;;
 4: len 14; hex 41727469636c6520302d30303031; asc Article 0-0001;;
 5: SQL NULL;
 6: len 2; hex 8000; asc   ;;

*** (2) TRANSACTION:
TRANSACTION 147294, ACTIVE 0 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3
MariaDB thread id 287, OS thread handle 140366108550720, query id 74957 172.22.0.5 root Optimizing
UPDATE article_version SET end_transaction_id=2 WHERE article_version.transaction_id = (SELECT max_1
FROM (SELECT max(article_version_1.transaction_id) AS max_1
FROM article_version AS article_version_1
WHERE article_version_1.transaction_id < 2 AND article_version_1.id = 2) AS subquery) AND article_version.id = 2
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10160 page no 3 n bits 8 index PRIMARY of table `main`.`article_version` trx id 147294 lock mode S waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 6; hex 000000023f5c; asc     ?\;;
 3: len 7; hex fd0000038d1a18; asc        ;;
 4: len 14; hex 41727469636c6520302d30303031; asc Article 0-0001;;
 5: SQL NULL;
 6: len 2; hex 8000; asc   ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 10160 page no 3 n bits 8 index PRIMARY of table `main`.`article_version` trx id 147292 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 6; hex 000000023f5c; asc     ?\;;
 3: len 7; hex fd0000038d1a18; asc        ;;
 4: len 14; hex 41727469636c6520302d30303031; asc Article 0-0001;;
 5: SQL NULL;
 6: len 2; hex 8000; asc   ;;

After thinking about this a bit I've added a new test that updates two articles, which reliably deadlocks in MySQL:

*** (1) TRANSACTION:
TRANSACTION 58310, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 4
MySQL thread id 78, OS thread handle 139816757827328, query id 91474 172.22.0.5 root update
INSERT INTO article_version (id, name, transaction_id, end_transaction_id, operation_type) VALUES (5, 'Article 2 Thread-6-0001', 2, NULL, 0)

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 1800 page no 4 n bits 72 index PRIMARY of table `main`.`article_version` trx id 58310 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000003; asc     ;;
 1: len 8; hex 8000000000000002; asc         ;;
 2: len 6; hex 00000000e3c6; asc       ;;
 3: len 7; hex 8200000105013b; asc       ;;;
 4: len 23; hex 41727469636c652031205468726561642d362d30303031; asc Article 1 Thread-6-0001;;
 5: SQL NULL;
 6: len 2; hex 8000; asc   ;;

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1800 page no 4 n bits 72 index PRIMARY of table `main`.`article_version` trx id 58310 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000006; asc     ;;
 1: len 8; hex 8000000000000004; asc         ;;
 2: len 6; hex 00000000e3cd; asc       ;;
 3: len 7; hex 810000010a013b; asc       ;;;
 4: len 23; hex 41727469636c652031205468726561642d382d30303031; asc Article 1 Thread-8-0001;;
 5: SQL NULL;
 6: len 2; hex 8000; asc   ;;

*** (2) TRANSACTION:
TRANSACTION 58317, ACTIVE 0 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 7 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 4
MySQL thread id 80, OS thread handle 139816421340928, query id 91473 172.22.0.5 root optimizing
UPDATE article_version SET end_transaction_id=4 WHERE article_version.transaction_id = (SELECT max_1
FROM (SELECT max(article_version_1.transaction_id) AS max_1
FROM article_version AS article_version_1
WHERE article_version_1.transaction_id < 4 AND article_version_1.id = 6) AS subquery) AND article_version.id = 6

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1800 page no 4 n bits 72 index PRIMARY of table `main`.`article_version` trx id 58317 lock mode S locks gap before rec
Record lock, heap no 5 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000006; asc     ;;
 1: len 8; hex 8000000000000004; asc         ;;
 2: len 6; hex 00000000e3cd; asc       ;;
 3: len 7; hex 810000010a013b; asc       ;;;
 4: len 23; hex 41727469636c652031205468726561642d382d30303031; asc Article 1 Thread-8-0001;;
 5: SQL NULL;
 6: len 2; hex 8000; asc   ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1800 page no 4 n bits 72 index PRIMARY of table `main`.`article_version` trx id 58317 lock mode S waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000003; asc     ;;
 1: len 8; hex 8000000000000002; asc         ;;
 2: len 6; hex 00000000e3c6; asc       ;;
 3: len 7; hex 8200000105013b; asc       ;;;
 4: len 23; hex 41727469636c652031205468726561642d362d30303031; asc Article 1 Thread-6-0001;;
 5: SQL NULL;
 6: len 2; hex 8000; asc   ;;

To fix this, I've switched from using an immediate update query to leaning on the ORM. I'm much happier with this fix as it lets SQLAlchemy sort out the lock hierarchy, and removes the only query like this in the codebase.

marksteward commented 1 year ago

@mfulgo thank you massively for writing the test, it was a huge timesaver. Sorry for throwing your fix away but hopefully this one will work better!

Would you be able to give the sqla-2.0 branch a test? It should support 1.4 with future=True or False.

marksteward commented 1 year ago

Fixed by 813445c

mfulgo commented 11 months ago

@marksteward - We were finally able to get around to updating to and testing with v1.4.0. Success! No deadlocks. Thanks!