matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.66k stars 2.62k forks source link

Research if Mysql Partitioning can be used for better performance? #6715

Open mattab opened 9 years ago

mattab commented 9 years ago

The goal of this issue is to research whether Mysql partitioning could be used in Piwik to provide better performance.

A few years ago this was not an option because they didn't support partitioning by datetime column. Now I see that they added the feature in Mysql 5.6. It could be helpful for partitioning tracker log_* tables by monthly partition. Sub-partitioning sounds also interesting, so maybe we could sub-partition log data by day.

toredash commented 5 years ago

Hope this information is useful; Running Matomo with AWS RDS Serverless is for us a cost-effective way to handle our dev-requirements for Matomo. We did encounter an issue where partitioning the tables would have helped our usecase.

In some rare cases, our serverless DB ran out of locale storage: SQLSTATE[HY000]: General error: 3 Error writing file '/rdsdbdata/tmp/MYB7XJVs' (Errcode: 28 - No space left on device). This if of course hard to fix when it is serverless :)

AWS Support gave us two options to solve this:

We opted for option 1 for now, option 2 would be nice to have upstream, and we might implement it but we would rather work on a as-close-to-upstream config as possible.

I'm not sure how option 2 this would impact write or read performance, since it probably depends on the storage backend and database engine. If the DB could read from multiple partitions when doing large queries I would imagine that would help the overall performance, given the storage backend is supportive of it.

tsteur commented 5 years ago

You could try to partition based on log_visit.visit_last_action_time and log_link_visit_action.server_time for example but we haven't run any tests yet. Maybe you could ask in the forum or check if other people already talked about it: https://forum.matomo.org

tsteur commented 1 year ago

Good post around this: http://mysql.rjweb.org/doc.php/partitionmaint