preaction / Minion-Backend-mysql

MySQL backend for the 🐙 Minion job runner
Other
7 stars 14 forks source link

InnoDB deadlock between cuncurrent lock/unlock in Minion #38

Open andrii-suse opened 2 years ago

andrii-suse commented 2 years ago

I am seeing this deadlock frequently (some lines are removed):

*** (1) TRANSACTION:
TRANSACTION 24717135, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s)
MySQL thread id 140107, OS thread handle 139826256254720, query id 19501875 **** Updating
DELETE FROM minion_locks
      WHERE expires > NOW() AND name = 'mirror_check_from_stat' ORDER BY EXPIRES
      LIMIT 1

*** (2) TRANSACTION:
TRANSACTION 24717134, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1128, 15 row lock(s), undo log entries 1
MySQL thread id 140106, OS thread handle 139826252568320, query id 19501880 ***** Update
INSERT INTO minion_locks (name, expires) VALUES ( NAME_CONST('$1',_utf8mb4'folder_sync/****' COLLATE 'utf8mb4_general_ci'),  NAME_CONST('new_expires',TIMESTAMP'2022-05-03 09:06:28'))
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 30 page no 3 n bits 112 index PRIMARY of table `minion_locks` trx id 24717134 lock_mode X

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 30 page no 4 n bits 368 index name of table `minion_locks` trx id 24717134 lock_mode X locks gap before rec insert intention waiting

Note that lock names are different in both transactions.

After some research I concluded that such behavior is a flaw in InnoDB and described in https://bugs.mysql.com/bug.php?id=25847 So, switching isolation level to READ-COMMITTED should be proper workaround for my case.

But I think the lib should remove DELETE command from stored function minion_lock(). Because currently the locks from DELETE are held longer than needed (i.e. until INSERT is committed). Instead, call the DELETE statement explicitly from perl code.

andrii-suse commented 1 year ago

Unfortunately READ-COMMITTED doesn't help in all cases and I am getting another deadlock between concurrent lock() commands:

2023-04-21 11:12:50 0x7fd7af5f1700
*** (1) TRANSACTION:
TRANSACTION 7392137242, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s)
MariaDB thread id 3308224, OS thread handle 140564336940800, query id 908519597 192.168.47.15 mirrorcache Updating
DELETE FROM minion_locks WHERE expires < NOW()
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 30 page no 3 n bits 136 index PRIMARY of table `mirrorcache`.`minion_locks` trx id 7392137242 lock_mode X locks rec but not gap waiting
Record lock, heap no 15 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 8; hex 0000000003971bf6; asc         ;;
 1: len 6; hex 000000000000; asc       ;;
 2: len 7; hex 80000000000000; asc        ;;
 3: len 21; hex 737461745f6167675f7363686564756c655f646179; asc stat_agg_schedule_day;;
 4: len 4; hex 64427305; asc dBs ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 30 page no 3 n bits 136 index PRIMARY of table `mirrorcache`.`minion_locks` trx id 7392137238 lock_mode X locks rec but not gap
Record lock, heap no 15 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 8; hex 0000000003971bf6; asc         ;;
 1: len 6; hex 000000000000; asc       ;;
 2: len 7; hex 80000000000000; asc        ;;
 3: len 21; hex 737461745f6167675f7363686564756c655f646179; asc stat_agg_schedule_day;;
 4: len 4; hex 64427305; asc dBs ;;

*** (2) TRANSACTION:
TRANSACTION 7392137238, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s)
MariaDB thread id 3308195, OS thread handle 140563342145280, query id 908519594 192.168.47.15 mirrorcache Sending data
SELECT minion_lock('mirror_scan_schedule_from_path_errors', '86400', '1')
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 30 page no 3 n bits 136 index PRIMARY of table `mirrorcache`.`minion_locks` trx id 7392137238 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 8; hex 000000000395137b; asc        {;;
 1: len 6; hex 000000000000; asc       ;;
 2: len 7; hex 80000000000000; asc        ;;
 3: len 30; hex 6d6972726f725f7363616e5f7363686564756c655f66726f6d5f70617468; asc mirror_scan_schedule_from_path; (total 37 bytes);
 4: len 4; hex 644306dd; asc dC  ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 30 page no 3 n bits 136 index PRIMARY of table `mirrorcache`.`minion_locks` trx id 7392137242 lock_mode X locks rec but not gap

So I've submitted #40

andrii-suse commented 1 year ago

Please also refer about behavior of autocommit in stored functions: https://jira.mariadb.org/browse/MDEV-31108