matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.71k stars 2.62k forks source link

Upper tracking request limit for a pageview to prevent flooding server/db #19132

Open samjf opened 2 years ago

samjf commented 2 years ago

Summary

Often times we come across a tracking bug or bots that continually cause tracking requests to be sent. This can flood the server and slow down the DB. I would expect that there should be some sort of realistic upper limit in the client JS that stops a single visitor from generating too many requests per page view.

It may be wise to implement some upper limit on the client JS so that such bugs/bots don't cause this issue so easily. A good example may be that we have 5000 requests within a virtual page view (virtual page view meaning the counter is reset as soon as someone calls trackPageView).

Your Environment

MichaelRoosz commented 2 years ago

To protect your SQL Database and insert tracking data into it at a constant rate you may use https://plugins.matomo.org/QueuedTracking (requires a redis server)

To block bots and set some limits you may use https://plugins.matomo.org/TrackingSpamPrevention

sgiehl commented 2 years ago

@MichaelRoosz I guess the issue was more about the client side. So that the javascript tracking kind of counts the requests that were already sent on a certain page and when it reaches a limit further tracking requests would be dropped. That way they would not even reach the server.

samjf commented 2 years ago

@sgiehl Yes, this was what I was imagining.

An example of what i'm trying to avoid is say like 7000 form submissions in a single pageview by a single visitor. That wouldn't seem like a realistic usage of valid tracking.

I wonder if implement if this could cause trouble for SPA sites? Though, I would suspect that they would track virtual pageviews that could reset the limit.