znuny / Znuny

Znuny/Znuny LTS is a fork of the ((OTRS)) Community Edition, one of the most flexible web-based ticketing systems used for Customer Service, Help Desk, IT Service Management.
https://www.znuny.org
GNU General Public License v3.0
343 stars 83 forks source link

MariaDB deadlock handling added #414

Open pboguslawski opened 1 year ago

pboguslawski commented 1 year ago

Proposed change

Znuny (checked in v6) occasionally throws MariaDB deadlock errors to its log (the more concurrency in DB, i.e. more application servers, the more such errors in time). More deadlocks were noticed after upgrading to Debian 11 (MariaDB 10.5).

Such deadlocks even if rare (i.e. a few per day) may introduce problems; for example...

DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /opt/otrs/Kernel/System/DB.pm line 471.
ERROR: OTRS-otrs.Daemon.pl - Daemon Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager-4 Perl: 5.32.1 OS: linux Time: Thu Dec 1 12:05:19 2022

 Message: Deadlock found when trying to get lock; try restarting transaction, SQL: '
            UPDATE scheduler_task
            SET lock_key = 0
            WHERE lock_key = 1 AND id = ?'

 Traceback (4000847):
   Module: Kernel::System::Daemon::SchedulerDB::TaskAdd Line: 172
   Module: Kernel::System::Daemon::SchedulerDB::RecurrentTaskExecute Line: 2151
   Module: Kernel::System::Daemon::SchedulerDB::CronTaskToExecute Line: 1453
   Module: Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager::Run Line: 103
   Module: (eval) Line: 347
   Module: main::Start Line: 347
   Module: /opt/otrs/bin/otrs.Daemon.pl Line: 163

...blocks executions of MailQueueSend task (outbound e-mails sending) for longer time, even over 30mins /until Daemon calls TaskCleanup() which removes broken (NULL in lock_update_time) task from scheduler_task table/. Details reported by MariaDB show deadlock on index scheduler_task_lock_key_id:

[...]
  Type: InnoDB
[...]
------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-12-05 02:29:06 0x7f454c2bd700
*** (1) TRANSACTION:
TRANSACTION 458003, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1
MySQL thread id 1132, OS thread handle 139935620974336, query id 33477 myhost [ip] its Updating
UPDATE scheduler_task
            SET lock_key = '100100000838', lock_time = '2022-12-05 01:29:06', lock_update_time = '2022-12-05 01:29:06'
            WHERE lock_key = 0 AND id = '6657'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2143 page no 6 n bits 72 index scheduler_task_lock_key_id of table `its`.`scheduler_task` trx id 458003 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 458004, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1
MySQL thread id 684, OS thread handle 139935607412480, query id 33478 myhost [ip] its Updating
UPDATE scheduler_task
            SET lock_key = 0
            WHERE lock_key = 1 AND id = '6658'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2143 page no 6 n bits 72 index scheduler_task_lock_key_id of table `its`.`scheduler_task` trx id 458004 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 8; hex 8000000000000001; asc         ;;
 1: len 8; hex 8000000000001a02; asc         ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2143 page no 6 n bits 72 index scheduler_task_lock_key_id of table `its`.`scheduler_task` trx id 458004 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 8; hex 8000000000000001; asc         ;;
 1: len 8; hex 8000000000001a02; asc         ;;

*** WE ROLL BACK TRANSACTION (2)
[...]

Similar problem was noticed (even more often) on scheduler_recurrent_task_lock_key_id index; in this case error sounds like

DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /opt/otrs/Kernel/System/DB.pm line 471.
ERROR: OTRS-otrs.Daemon.pl - Daemon Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager-4 Perl: 5.32.1 OS: linux Time: Mon Dec 5 11:40:02 2022

 Message: Deadlock found when trying to get lock; try restarting transaction, SQL: '
                UPDATE scheduler_recurrent_task
                SET lock_key = 0, lock_time = NULL, last_execution_time = ?, last_worker_task_id = ?,
                    change_time = '2022-12-05 11:40:02'
                WHERE lock_key = ? AND id = ?'

 Traceback (1220524):
   Module: Kernel::System::Daemon::SchedulerDB::RecurrentTaskExecute Line: 2154
   Module: Kernel::System::Daemon::SchedulerDB::CronTaskToExecute Line: 1453
   Module: Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager::Run Line: 103
   Module: (eval) Line: 347
   Module: main::Start Line: 347
   Module: /opt/otrs/bin/otrs.Daemon.pl Line: 163

Znuny should be prepared to re-issue a transaction if it fails due to deadlock.

This mod causes Znuny to retry MariaDB queries send using Do() up to 3 times after MariaDB/MySQL deadlock is detected.

Only 1213 (ER_LOCK_DEADLOCK, Deadlock found when trying to get lock; try restarting transaction) MariaDB/MySQL errors are handled as deadlocks now; other MariaDB/MySQL error codes and/or other database server deadlocks may be added in similar way in the future if required.

Type of change

Additional information

Replaces: https://github.com/znuny/Znuny/pull/322 Author-Change-Id: IB#1129362

Checklist