matomo-org / plugin-QueuedTracking

Scale your large traffic Matomo service by queuing tracking requests (in Redis or MySQL) for better performance.
https://matomo.org
GNU General Public License v3.0
82 stars 34 forks source link

Some issues seen during import apache logging #137

Open sgoeting opened 4 years ago

sgoeting commented 4 years ago

Hey ,

I don't know if I am at the right place but in my eyes it is the core of my observations

lately we've been doing a lot of log import in Matomo. We use the import_logs.py tool available in [matomo-path] / misc / log-anaytics.

Initially we used the following command: ./misc/log-analytics/import_logs.py --url = [matomo host] --recorders = 8 --replay-tracking [path-to-logfile]

As long as the log files were not too large, say less than 1 Gb, the process went quite smoothly and we did not encounter any real problems. But the apache log files keep getting bigger. As the log files getting bigger, we noticed that our QT processing (a cron job scheduled per minute) was becoming increasingly difficult to process the queues. The QT-processing queues grew (a lot). Normally, the QT processing can get a grip at the situation and the queues will decrease and return to the normal state.

In previous mentioned situation, QT processing still does that, but we also noticed some strange behavior. When the import was finished we expected, when QT processing kicked in, the queues whould decrease soon after, but they didn't. Queues continued to grow and grow and only after a few hours would they decrease. But there is also another thing. There was something that caught our attention. With the QT monitor next to it, we saw that the queues grew, but the memory footprint became smaller, see the attached screenshot.

Image Pasted at 2020-7-7 14-27

But now our questions:

  1. What is the relationship of QT processing with the log import? We considered them separate, not related processes.
  2. How is it possible that the QT queues grow while at the same time the memory claim decreases?

Oh, for the record, we're using a slightly different import command now: ./misc/log-analytics/import_logs.py --url = [matomo host] --recorders = 8 --replay-tracking --request-suffix = queuedtracking = 0 [path-to-logfile] And along with this command we disabled the QTprocessing per cronjob and enabled the per request QT processing with a batchsize of 1

This command/configuration does not show the side effects of growing QT processing queues. Can you confirm this observation?

tsteur commented 4 years ago

Are you using queued tracking for all your sites? We usually don't recommend using them together as there's not really any benefit of it. You're rather adding a lot of overhead.

And if you do it, we usually recommend setting the "Number of requests to process" to 1 https://plugins.matomo.org/QueuedTracking#faq

Eventually we will even completely avoid Queued tracking when using log importer https://github.com/matomo-org/matomo-log-analytics/issues/240

sgoeting commented 4 years ago

Yes we do use queuedtracking for all our sites. I am curious, why is there no benefit in tracking them together?

As described we ended up with de setting of the "Number of requests to process" to 1 using the per request setting (Process during trackin request) in stead of our prefered cronjob QT, because stopping the QueuedTracking completely, or using the QT cronjob in any form, showed the behaviour as I have mentioned before. See screenshot (I added it where it should be). How dow have to interpret that behaviour? In fact that is our real question. Is it normal or .....

Just to be sure.. We, a large government service, use version 3.3.5 of the QT-plugin.

tsteur commented 4 years ago

I am curious, why is there no benefit in tracking them together?

That's because log importer does the queue part here already. What happens is basically one queue, sends requests to another queue, and only then the tracking requests are inserted. You have basically for every tracking request an additional insert, delete, and potentially few select database statement which is not needed. For this reason we will eventually even completely ignore queued tracking when using log analytics.