matomo-org / plugin-QueuedTracking

Scale your large traffic Matomo service by queuing tracking requests (in Redis or MySQL) for better performance.
https://matomo.org
GNU General Public License v3.0
82 stars 35 forks source link

QueueTracking lock status so long #145

Closed danielsss closed 3 years ago

danielsss commented 3 years ago

I've multiple Matomo containers behind of the load balancer.

Each Matomo container uses an individual Redis as request cache, but I've noticed one worker of them that has been locked for a long time. Also, occurs the same situation on other server nodes.

/var/www/html # ./console queuedtracking:lock-status
"QueuedTrackingLock2" is locked for 29955 ms
Set option --unlock=QueuedTrackingLock2 to unlock the queue.

"QueuedTrackingLock0" is locked for 57902 ms
Set option --unlock=QueuedTrackingLock0 to unlock the queue.

"QueuedTrackingLock1" is locked for 31219609 ms
Set option --unlock=QueuedTrackingLock1 to unlock the queue.

/var/www/html #

31219609 ms / 1000 / 60 / 60 ~= 8.672113611111111 (hours)

/var/www/html # ./console queuedtracking:monitor
Queue is enabled
The command ./console queuedtracking:process has to be executed to process request sets within queue
Up to 4 workers will be used
Processor will start once there are at least 30 request sets in the queue
418940 (36486+382345+46+63) request sets left in queue. 868.42M used memory (869.23M peak). 4 workers active.    
/var/www/html #

The cron job runs in period of 30 seconds.

docker exec matomo sh -c "/var/www/html/console queuedtracking:process --queue-id=0" >> archive-report.log
docker exec matomo sh -c "/var/www/html/console queuedtracking:process --queue-id=1" >> archive-report.log
docker exec matomo sh -c "/var/www/html/console queuedtracking:process --queue-id=2" >> archive-report.log
docker exec matomo sh -c "/var/www/html/console queuedtracking:process --queue-id=3" >> archive-report.log

I've closed Process during tracking request and 4 is the Number of queue workers.

I totally have no idea how to deal with it. Please help !

danielsss commented 3 years ago

I also tried to delete the lock and processes request manually. But failed, it will lock the cache queue again.

./console queuedtracking:lock-status --unlock=QueuedTrackingLock1

./console queuedtracking:process --queue-id=1
danielsss commented 3 years ago

Tried to process with blocked queue multi times and got result as below:

Executed commands:

./console queuedtracking:lock-status --unlock=QueuedTrackingLock1
./console queuedtracking:process --queue-id="1" -vvv

Result - 1:

[2021-01-14 11:06:29] piwik.ERROR: Fatal error encountered: /var/www/html/plugins/Monolog/Processor/ClassNameProcessor.php(73): Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes)  on /var/www/html/plugins/Monolog/Processor/ClassNameProcessor.php(73)  {"exception":{"type":1,"message":"Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes)","file":"/var/www/html/plugins/Monolog/Processor/ClassNameProcessor.php","line":73,"backtrace":" on /var/www/html/plugins/Monolog/Processor/ClassNameProcessor.php(73)\n"},"ignoreInScreenWriter":true} {"class":"Piwik\\FrontController","request_id":5269}

Result - 2:

[2021-01-14 11:16:00] piwik.DEBUG: Executing Piwik\Plugins\CustomDimensions\Tracker\CustomDimensionsRequestProcessor::processRequestParams()... [] {"class":"QueuedTracking","request_id":5775}

Didn't see something like This worker finished queue processing with *req/s (* requests in * seconds)

tsteur commented 3 years ago

@danielsss it might help to increase the memory limit from 128MB to say 256 or 512MB.

Usually, the locks are released automatically, but if there's some failure like a memory limit then it might not be able to release the lock automatically. The locks are like usually I think around 20-120 sec (haven't checked) but they can be larger if you for example use bulk requests eg by replaying requests from log analytics etc (which in this case it would not be recommended to use queued tracking in case you replay logs from log analytics).

I suggest you increase the memory limit in your PHP setting and check if there's still an issue.

danielsss commented 3 years ago

@tsteur Thanks for your help.

Actually, the bulk requests is the way that I used and I can try splitting bulk requests to single request. (in my case, the client can't reach the internet sometimes)

I'm wondering about the Fatal error, because of the memory limit is not 128M and I already set memory limit to 2048M(checked by Matomo system check).

I still have a question. The Matomo server has a individual Redis as request cache(That is, I've multiple Redis nodes and they are not sentinel or master/slave ) . Will this cause some potential problem to Queued Tracking ?

tsteur commented 3 years ago

Actually, the bulk requests is the way that I used and I can try splitting bulk requests to single request.

This should cause shorter lock times. Ideally, it be great to fix the root cause though as long lock times shouldn't be an issue when the locks get released properly. It did sound to me that there is a memory issue but maybe not. I'd suggest to check again though just because sometimes there's eg a different limit for web and CLI configured or sometimes multiple PHP ini files exist etc. The Matomo system check was probably in the UI (web). On CLI where you execute the queued tracking you can eg check using php -i | grep memory (on a linux).

I still have a question. The Matomo server has a individual Redis as request cache(That is, I've multiple Redis nodes and they are not sentinel or master/slave ) . Will this cause some potential problem to Queued Tracking ?

Yes I think this can cause issues. I suppose you would need on every server a queuedtracking:process running but the biggest problem might be that the requests are potentially not replayed in the right order. It is a bit difficult to explain like say

If for example the queuedtracking:process is faster on server B with redis B then eg a wrong "entry page" might be reported as AFAIK Matomo would think the user entered on page with url /test2.

danielsss commented 3 years ago

@tsteur I appreciated with your explanation.

As you said. I'm using one Redis as shared cache and everything goes well for now.

tsteur commented 3 years ago

@danielsss great to hear everything goes well for now. So we can close the issue for now?

danielsss commented 3 years ago

@tsteur I found something that might is not an issue, but it will cause a large queue.

Example:

/var/www/html # ./console queuedtracking:monitor
Queue is enabled
The command ./console queuedtracking:process has to be executed to process request sets within queue
Up to 8 workers will be used
Processor will start once there are at least 15 request sets in the queue
180977 (180060+12+195+158+120+231+156+45) request sets left in queue. 380.96M used memory (381.71M peak). 3 workers active

The --queue-id=0 is not blocked, but the queuedtracking:process-request takes at least ~2000 seconds for ~10000+ requests. Also I've decreased the Number of requests that are processed in one batch to 1 from 15 and it takes ~500 seconds for 5000+ requests.

This worker finished queue processing with 11.66req/s (5010 requests in 429.61 seconds)

So, I don't know why only occurs this situation on queue-id=0(Maybe the issue that is caused by my request. Anyway, I'll keep eyes on it.)

Do you have any experience dealing with it ?

tsteur commented 3 years ago

Processing only one at once can be faster especially if you have a lot of bulk requests in the queue (AFAIK each bulk request would be only counted as one even though one bulk request might contain many requests). This could also explain why it takes a while to process these requests.

Without knowing all details and possibly without inspecting all the requests in the queue etc it's unfortunately hard to come to any conclusions. Do a lot of the requests have the same userId or IP address or so? This would explain why many requests or why some specific requests that might cause trouble or take longer go into queueId 0.

danielsss commented 3 years ago

@tsteur The problem has solved and now, you can close this issue.

I've checked the tracking requests and indeed, there're a lot of the requests that have the same userId or IP address.

Also I've made some adjustments it wouldn't take longer go into queueId 0.

tsteur commented 3 years ago

Great to hear @danielsss thanks for letting us know 👍