matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.93k stars 2.66k forks source link

Show Warning if Deleting Old Data Does Not Work #18839

Open mritzmann opened 2 years ago

mritzmann commented 2 years ago

I have a Matomo installation which became slower and slower over time. The cause turned out to be that the table matomo_log_visit contained several years of RAW data and thus became several GB in size. This affected the reading of this table noticeably. In the settings, Regularly delete old raw data is set to 30 days, and a cron job is set up.

It turns out: the cron job ran out of memory every time during Tasks.deleteLogData.

Matomo displays a notification in the backend when archiving via cronjob fails. However, when tasks fails, this is not displayed anywhere. My wish would be to proactively display failed tasks and notify the user that the tasks have not been fully completed. The ScheduledReports plugin does not show the error either.

Summary

Your Environment

MatomoForumNotifications commented 2 years ago

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/high-traffic-sites-and-custom-period-date-ranges-selection/44717/2

sgiehl commented 2 years ago

@mritzmann Thanks for creating the issue. Which size is your memory limit set to? Just wondering if you limit is quite small or if the memory consumption of our script is very high in that case.

mritzmann commented 2 years ago

Hello @sgiehl, thank you very much for your reply.

Which size is your memory limit set to?

The PHP memory limit was set to 4G.

$ cat php.ini | grep memory_limit
memory_limit = 4G

$ php -r 'echo ini_get("memory_limit");'      
4G
sgiehl commented 2 years ago

Ok. That should indeed be enough to clean up the data. I guess we need to investigate how our code to remove old visits currently works. Maybe it tries to query all data before removing it or has another memory lack we should close. ping @tsteur

tsteur commented 2 years ago

Had a quick look at the code and the delete log data task should be quite low memory and we delete max 2000 visits at once there and shouldn't save much in memory. On our own instance have never seen any memory issues there.

This may trigger also the log_actions cleanup though. @mritzmann could you maybe add below entry to your config/config.ini.php to see if it still happens? This effectively should delay running the log_action cleanup by a lot of days and basically disables it

[Deletelogs]
delete_logs_unused_actions_schedule_lowest_interval = 9000000
mritzmann commented 2 years ago

I am sorry, but I can no longer reproduce the problem, as I have since deleted several GB of RAW data (to solve the problem described above). The cronjob is currently running successfully for me. Therefore, it probably doesn't make sense for me to check settings like delete_logs_unused_actions_schedule_lowest_interval.

But I still think that some kind of monitoring check (for example a notification in webui) would be interesting. This would be less a bug, but more a feature request.

randy-innocraft commented 2 months ago

Hi @mritzmann , Thank you for bringing this to our attention and for your valuable input. Your suggestion seems like a valuable enhancement to our product. We will forward this to our Product team for review and future consideration. If you have any additional details or questions, please feel free to share them here.