matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.91k stars 2.65k forks source link

[Bug] Deadlock on option table using log importer #21414

Open lato333 opened 1 year ago

lato333 commented 1 year ago

What happened?

Hello, I am experiencing the closed issue (#15545) - Unfortunately, I could not reopen it.

Details see https://github.com/matomo-org/matomo/issues/15545#issuecomment-1767844456

Thanks!

What should happen?

Import without deadlocks.

How can this be reproduced?

Using the following command on large logfiles: /usr/bin/python ./misc/log-analytics/import_logs.py --recorders 5 --url=<url> --enable-static --enable-http-redirects --idsite=<id> --token-auth <token> <filename>

Matomo Version

Matomo 3

Matomo Patch or Minor Version

4.15.1

PHP Version

PHP 8.1.2-1ubuntu2.14 (cli) (built: Aug 18 2023 11:41:11) (NTS)

Server Operating System

Ubuntu 22.04

What browsers are you seeing the problem on?

Not applicable (e.g. an API call etc.)

Computer Operating System

-

Relevant log output

No response

Validations

bx80 commented 1 year ago

Thanks for the detailed report @lato333.

There have been recent optimizations to the archiving process made in Matomo 5 which could potentially help with this issue.

Do the deadlocks still happen if the import command is run with --recorders 1?

lato333 commented 1 year ago

Thanks for the fast reply, I will give it a try.

lato333 commented 10 months ago

Unfortunately, the same issue occurs with Matomo 5.

Setting the recorders to 1 would probably resolve the deadlocks, but with that option set, I will run into time problems.

Thanks!

sgiehl commented 10 months ago

@lato333 Is your setup maybe a very old one and your database tables are still using MyISAM storage engine? If so, updating to InnoDB could help maybe.

lato333 commented 10 months ago

@sgiehl Unfortunately not, all tables are stored using the InnoDB storage engine.

sgiehl commented 10 months ago

Hm. Are you importing data that is older than yesterday?

lato333 commented 10 months ago

It is a bit hard to reproduce, but it also occurs on the current daily rotated logfiles. (access_log.1), so no, the data is not older than yesterday. But it often occurs on big files (>3GB).

sgiehl commented 10 months ago

Guess it might be hard to identify the root cause of this. I would assume it might be the amount of requests being performed in parallel. Maybe using the QueuedTracking plugin could solve the problem: https://plugins.matomo.org/QueuedTracking But that might be more a workaround rather than a real problem fix.

lato333 commented 10 months ago

Unfortunately, QueuedTracking did not work with the log importing, as I get tracking errors. I think it is also not recommended to use it when importing logs (https://github.com/matomo-org/matomo-log-analytics/issues/240)

Nevertheless, thanks for the help.

bx80 commented 10 months ago

Long shot, but do you have any scheduled archiving process (ie. the Matomo cronjob) that could be starting up during the import and causing the deadlocks? If so it might be worth temporarily disabling it during an import run to see if that makes a difference.

lato333 commented 10 months ago

No, each log (for different sites) is imported sequentially with multiple recorders. Afterwards, when all logs are imported, the archiving process is called.

Pseudocode:

import_logs.py --recorders 5 site1_access_log.1 --idsite=1
import_logs.py --recorders 5 site2_access_log.1 --idsite=2
...
console core:archive

No cronjob is defined and the archiving via browser is disabled.

Regular deletion of old raw data is turned on, maybe this could be an issue?

bx80 commented 10 months ago

If there is no cronjob configured and browser archiving is disabled then the deletion of old raw data will only run when archiving is called manually (scheduled tasks run at the end of the archiving process).

It looks like the deadlock is occurring when a temporary option is written to remember archives that need to be invalidated later on in the process. There is a source code comment which explains the contention risk and that it should be mitigated by suffixing the process id to the temporary option name.

From the debug information you provided, the problem insert statement used the option name: 6vS3_report_to_invalidate_14_2023-10-16_446883 which both includes a random character prefix at the start and what looks like a PID suffix.

Each recorder will be adding many visits and as part of adding each visit an option is inserted, since the process id will always be the same for each recorder then the only protection against collision will be the random string prefix.

It seems unlikely, but perhaps with enough records being processed a four byte random prefix is insufficient and could repeat within the same process, the random string currently has 61 possible characters and four bytes giving 1,345,841 possible combinations.

If you are comfortable making a very minor change to code, then you could try changing the random string length from 4 bytes to 8 bytes here and see if that improves things.

// old
$id = Common::getRandomString(4, 'abcdefghijklmnoprstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ') . '_';

// new
$id = Common::getRandomString(8, 'abcdefghijklmnoprstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ') . '_';
lato333 commented 10 months ago

Thanks for the detailed analysis and report, this makes sense to me. I will give it a try!

lato333 commented 9 months ago

Unfortunately, the issue remains with the recommended change.

| InnoDB |      |
=====================================
2024-01-24 12:21:08 0x7fa7ee792640 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 19 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 171605 srv_idle
srv_master_thread log flush and writes: 171556
----------
SEMAPHORES
----------
------------------------
LATEST DETECTED DEADLOCK
------------------------
2024-01-24 08:00:06 0x7fc8ce2d1640
*** (1) TRANSACTION:
TRANSACTION 4217788301, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 6 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 2
MariaDB thread id 152464, OS thread handle 140500429248064, query id 119329683 localhost 127.0.0.1 matomo Update
INSERT IGNORE INTO `matomo_option` (option_name, option_value, autoload) VALUES ('a9zKuCYR_report_to_invalidate_18_2024-01-23_691695', '1', '0')
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1074179 page no 171 n bits 208 index PRIMARY of table `matomo`.`matomo_option` trx id 4217788301 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 100 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 6148753743496e6b5f7265706f72745f746f5f696e76616c69646174655f; asc aHu7CInk_report_to_invalidate_; (total 50 bytes);
 1: len 6; hex 0000fb66578a; asc    fW ;;
 2: len 7; hex c400360c6c03c0; asc   6 l  ;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 1074179 page no 171 n bits 208 index PRIMARY of table `matomo`.`matomo_option` trx id 4217788301 lock_mode X locks gap before rec
Record lock, heap no 100 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 6148753743496e6b5f7265706f72745f746f5f696e76616c69646174655f; asc aHu7CInk_report_to_invalidate_; (total 50 bytes);
 1: len 6; hex 0000fb66578a; asc    fW ;;
 2: len 7; hex c400360c6c03c0; asc   6 l  ;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;

RECORD LOCKS space id 1074179 page no 171 n bits 208 index PRIMARY of table `matomo`.`matomo_option` trx id 4217788301 lock_mode X locks gap before rec insert intention
Record lock, heap no 100 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 6148753743496e6b5f7265706f72745f746f5f696e76616c69646174655f; asc aHu7CInk_report_to_invalidate_; (total 50 bytes);
 1: len 6; hex 0000fb66578a; asc    fW ;;
 2: len 7; hex c400360c6c03c0; asc   6 l  ;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;

RECORD LOCKS space id 1074179 page no 171 n bits 208 index PRIMARY of table `matomo`.`matomo_option` trx id 4217788306 lock_mode X locks gap before rec
Record lock, heap no 100 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 6148753743496e6b5f7265706f72745f746f5f696e76616c69646174655f; asc aHu7CInk_report_to_invalidate_; (total 50 bytes);
 1: len 6; hex 0000fb66578a; asc    fW ;;
 2: len 7; hex c400360c6c03c0; asc   6 l  ;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;

*** (2) TRANSACTION:
TRANSACTION 4217788306, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 2
MariaDB thread id 152462, OS thread handle 140500430169664, query id 119330416 localhost 127.0.0.1 matomo Update
INSERT IGNORE INTO `matomo_option` (option_name, option_value, autoload) VALUES ('4g8H3ZUV_report_to_invalidate_18_2024-01-23_699547', '1', '0')
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1074179 page no 171 n bits 208 index PRIMARY of table `matomo`.`matomo_option` trx id 4217788306 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 100 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 6148753743496e6b5f7265706f72745f746f5f696e76616c69646174655f; asc aHu7CInk_report_to_invalidate_; (total 50 bytes);
 1: len 6; hex 0000fb66578a; asc    fW ;;
 2: len 7; hex c400360c6c03c0; asc   6 l  ;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;

*** (2) TRANSACTION:
TRANSACTION 4217788306, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 2
MariaDB thread id 152462, OS thread handle 140500430169664, query id 119330416 localhost 127.0.0.1 matomo Update
INSERT IGNORE INTO `matomo_option` (option_name, option_value, autoload) VALUES ('4g8H3ZUV_report_to_invalidate_18_2024-01-23_699547', '1', '0')
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1074179 page no 171 n bits 208 index PRIMARY of table `matomo`.`matomo_option` trx id 4217788306 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 100 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 6148753743496e6b5f7265706f72745f746f5f696e76616c69646174655f; asc aHu7CInk_report_to_invalidate_; (total 50 bytes);
 1: len 6; hex 0000fb66578a; asc    fW ;;
 2: len 7; hex c400360c6c03c0; asc   6 l  ;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 1074179 page no 171 n bits 208 index PRIMARY of table `matomo`.`matomo_option` trx id 4217788301 lock_mode X locks gap before rec
Record lock, heap no 100 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 6148753743496e6b5f7265706f72745f746f5f696e76616c69646174655f; asc aHu7CInk_report_to_invalidate_; (total 50 bytes);
 1: len 6; hex 0000fb66578a; asc    fW ;;
 2: len 7; hex c400360c6c03c0; asc   6 l  ;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;

RECORD LOCKS space id 1074179 page no 171 n bits 208 index PRIMARY of table `matomo`.`matomo_option` trx id 4217788301 lock_mode X locks gap before rec insert intention
Record lock, heap no 100 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 6148753743496e6b5f7265706f72745f746f5f696e76616c69646174655f; asc aHu7CInk_report_to_invalidate_; (total 50 bytes);
 1: len 6; hex 0000fb66578a; asc    fW ;;
 2: len 7; hex c400360c6c03c0; asc   6 l  ;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;

RECORD LOCKS space id 1074179 page no 171 n bits 208 index PRIMARY of table `matomo`.`matomo_option` trx id 4217788306 lock_mode X locks gap before rec
Record lock, heap no 100 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 6148753743496e6b5f7265706f72745f746f5f696e76616c69646174655f; asc aHu7CInk_report_to_invalidate_; (total 50 bytes);
 1: len 6; hex 0000fb66578a; asc    fW ;;
 2: len 7; hex c400360c6c03c0; asc   6 l  ;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;
bx80 commented 9 months ago

Thanks for getting back with the outcome @lato333. unfortunate that change didn't fix it, but every bit of additional information is helpful :+1: I'm not sure there's much more we can do at the moment without scheduling some time to investigate and recreate the issue locally. So I'll assign this ticket to our product team for further prioritization.