matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.83k stars 2.64k forks source link

Archive job is causing Deadlock errors for Web UI. Version 2.11.1 #7337

Closed esapozhnikov-wish closed 9 years ago

esapozhnikov-wish commented 9 years ago

Running Piwik version 2.11.1

During archive job runs from cron web interface becomes inaccessible.

Database is hosted in AWS RDS: MySQL 5.6.19a vCPU Number of virtual cores 4 vCPU MemoryMemory 15 GiB IOPS: 3000

During the archive runs we are seeing very high IOPS usage:

screen shot 2015-03-02 at 8 19 13 pm

Is there a way to throttle the archive job to not use up all of DB IOPS.

mattab commented 9 years ago

Hi @esapozhnikov can you paste the error messages that you may get?

(I think we could implement a "throttler" in the core:archive cron script, for example we could sleep for N milliseconds between each archiving climulti:request call)

esapozhnikov-wish commented 9 years ago

Here is a sample form httpd error log.

Mon Mar 02 07:25:20 2015] [error] [client 52.1.156.157] Error in Piwik (tracker): Error query: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction In query: UPDATE piwik_log_visit SET idvisitor = ?, visit_total_time = ?, visit_last_action_time = ?, visit_exit_idaction_url = ?, visit_total_actions = visit_total_actions + 1 , custom_var_k1 = ?, custom_var_v1 = ? WHERE idsite = ? AND idvisit = ? Parameters: array ( 0 => '\xd1\x1a\x8axd\x87]\x81', 1 => 42179, 2 => '2015-02-22 19:43:17', 3 => 503938, 4 => 'Not-Bot', 5 => 'ELB-HealthChecker/1.0', 6 => 4, 7 => 1159767, )

mattab commented 9 years ago

I think this issue is the same as #6398 but i'll leave it open for now

medic123de commented 9 years ago

@esapozhnikov : do you use misc/log-analytics/import_logs.py ? @mattab, the error message posted is not related to the outage of the frontend.

esapozhnikov-wish commented 9 years ago

@medic123de Yes we do.

medic123de commented 9 years ago

@esapozhnikov OK. This errormessage is related to #6398 , and eventually fixed with medic123de/piwik:1213-fix-innodb-transaction-deadlock ... Tests are still running but seems fine so far.

do you have other errors which are related to the posted issue?

esapozhnikov-wish commented 9 years ago

The way we are currently seeing the issue is that the Front end hosts get dropped from our Load balancer as they start failing to respond to the HTTPS health check during the archive job.

Main issue we are running into is that during the archive process the tracking pixel for live tracking become unavailable.

Currently I am testing the Redis Queued Tracking Plugin ins our staging environment to see if it will be able to keep up for live tracking during the High DB usage windows caused by the archive job.

Also orginally we were runing into the issue of getting deadlocks when running the import_logs.py script using more than one recorder. Our work around for this issue was to configure piwik to use MySQLi instead of PDO_PHP to connect to the db and this has allowed us to run 8 recorders at a time with no deadlock errors during the import jobs.

We are currently only seeing deadlocks during the archive job.

medic123de commented 9 years ago

From what i can see, the Archive job may worsen the situation due to longer lasting Transactions caused by a busy Database, which in turn aborts transactions then. But the archiving does not use transactions (yet). So from my point of view, this is only a side effect and worses #6398 only.

Therefore, we should see #6398 and this ticket as "non-related"

mattab commented 9 years ago

@medic123de I think you're right, so I'm closing it as duplicate of #6398 - if after we fix #6398 and anyone still has an issue, we could re-open this or create new issue. thanks