matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.92k stars 2.66k forks source link

Performance Problem with matomo_session Table #22767

Open enual opened 4 days ago

enual commented 4 days ago

A user reported that they are using Matomo with the Premium plugins and looking at their DB performance stats. They have noticed one issue that seems to be caused by the matomo_session table.

It seem to be deleting the ongoing session with the query:

DELETE FROM matomo_session WHERE modified + lifetime < ?

The problem is that these fields are not indexed and the use of a calculated query (the use of sum) means they would not be used anyway. And this causes a full table scan relatively regularly and that makes it appear in the list of most expensive queries: image (21)

They also mentioned that it causes the insert cost to be non-negligible (lock contention with the ongoing delete?).

Additional feedback:

_It possibly would be an idea to change the "modified" to an "expired" one. That way you could still have a changeable "lifetime", have the "expiry" pre-calculated and updating it would just be a story of setting it to "now+lifetime"...

Then you would just have to add an index for the "expiry" field. In that scenario the delete cost would come to almost nothing. And the insert cost would not change that much either (there would be the index cost but on the other hand there would be less waiting for lock)._

They are using the Queued Plugin, meaning these effects are not visible from the clients but the DB strain is still there.

gcompienne-cafex commented 3 days ago

Another point I have just realised (but not verified yet), is that this could perhaps explain why sometimes the Redis queue grows very quickly and adding workers seems to have little effect (at least until the tracking events spike calms down):

So perhaps another reason why the platform will likely benefit from seeing that query optimised.

sgiehl commented 3 days ago

Adding an index could possibly speed up queries on that table. But to be honest, I'm wondering what is happening on your installation, that there are so many records in that table 🤔
The session table stores data for sessions logged in to matomo. So this should be unrelated to tracking and even if that table might be locked by delete queries for any reason, this should not affect tracking or queuedtracking. Are you having anonymous access to your instance enabled?

gcompienne-cafex commented 3 days ago

The session table stores data for sessions logged in to matomo

Do you mean the session table is only used when a user accesses the matomo web-ui?

I am asking because the majority of our users are authenticated and we do provide it to the analytics via "setUserId". So I wonder if that could cause the creation of the sessions 🤔

Are you having anonymous access to your instance enabled?

Do you mean "is there anonymous access to the Matomo web-ui"? The answer is no. But we have many virtual websites and many of the authenticated users can access the stats of the virtual websites that have been assigned to them.

sgiehl commented 2 days ago

Do you mean the session table is only used when a user accesses the matomo web-ui?

Yes, exactly. Tracking should not cause any records in the session table.

I am asking because the majority of our users are authenticated and we do provide it to the analytics via "setUserId". So I wonder if that could cause the creation of the sessions 🤔

No. setUserId will only have a affect on how something is tracked, but won't start a session in Matomo.

Do you mean "is there anonymous access to the Matomo web-ui"? The answer is no. But we have many virtual websites and many of the authenticated users can access the stats of the virtual websites that have been assigned to them.

I meant the feature to enable anonymous access to a Matomo site. That allows anyone to view statistics without being logged in. It's disabled by default.

How many concurrent users does your Matomo instance have? And how many records are stored in the session table?

gcompienne-cafex commented 2 days ago

The matomo_session table currently contains 713,192 records. From the looks of it, I would say they mostly get deleted after 14 days by the delete query (lifetime 1209600).

This is on a test system (staging), on which 22 Matomo users are currently defined. Now some of the tests on that system are driven by EndTest, so I am starting to wonder if one of the tests is currently just loading the login page to test that the system is up and that could perhaps trigger the creation of the session.

Still, this is a bit concerning for us as the plan is to have many virtual websites (tracked by the Matomo instance) and allow many (authenticated) users (per virtual website) to check their virtual website stats.

It is good to know that this should not affect the tracking or queuedtracking at all, but the potential of the matomo-session table causing DB overload is still a concern (could easily be turned in to a DoS amplifier I would suspect).