matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.75k stars 2.63k forks source link

Transitions can be slow even when an index is added #14172

Open mikkeschiren opened 5 years ago

mikkeschiren commented 5 years ago

Even after applying the INDEXes to make “Transitions” feature work faster on high traffic websites, sometimes Transitions can be slow to load.

Have not looked so deep into this - but is it possible to make transitions use archived data instead of the logs? If it it is not easily done - what is needed to change to make this possible?

mikkeschiren commented 5 years ago

Use case: We have instances wit a lot of logs - and we need to clean up the logs very often, we get all the reports on archived data - but Transitions goes direct to the log tables. And after we clean up old logs, we do not get any data for Transitions.

tsteur commented 5 years ago

It's not really planned currently but may be good to do at some point. Especially now that the report is more exposed and could cause a lot more performance issues as it's known to be possibly slow with lots of logs.

mikkeschiren commented 5 years ago

Ok, I will try to look into this in the near feature.

mfb commented 5 years ago

We'd love to be able to generate Transitions reports from archived records. Our privacy policy requires that we archive/aggregate logs on a weekly basis, but we want to be able to analyze transitions from the past month or so..

mattab commented 5 years ago

We've discussed it internally and we are a bit worried of archiving the Transitions data because it represents a lot of data to aggregate + a lot of slow running SQL queries to get this data daily. For each Page URL and Page title we'd need to store the last 10-30 pages/events/referrers and next 10-30. So that's a lot of string data/url to store for each URL/page title on the site.

Instead we have another idea: Maybe it would be possible for you to keep old RAW logs for a longer time, but make sure that the RAW logs you keep in Matomo are fully anonymised.

Maybe we could build an easier feature for "Full anonymisation" of the data, to fully remove any potential personal data, but would still leave the actual pageviews transitions data so we could still process and report on Transitions using RAW data?

This was discussed in #12737 Enable "Super Privacy" mode to not track any personal data, aka "I do not want to be bothered with GDPR"

And also more partially in other issues: