Open mattab opened 4 years ago
Hello Can multiple workers process the same queueid ? We have a situation were we had originally 4 workers processing 4 queues, but due to slowness in our setup those workers were not fast enough to process the queues and now we have a ton of requests pending to be processed. Can we use multiple workers to process these specific 4 queues somehow? Thanks
Hello Can multiple workers process the same queueid ? We have a situation were we had originally 4 workers processing 4 queues, but due to slowness in our setup those workers were not fast enough to process the queues and now we have a ton of requests pending to be processed. Can we use multiple workers to process these specific 4 queues somehow? Thanks
+1
Forgot to mention we are using matomo 3.14.1
Hi @okossuth @danielsss
Multiple workers work on the same queue automatically if you don't set the queue-id
option. However, they don't work on it at the very same time in parallel. They work on it one after another. It's not possible that multiple workers work on the very same queue in parallel (only one after another) as otherwise the tracked data could end up wrong and random visits with eg 0 actions could be created etc.
Just a suggestion, if we used lpop or better blpop then that would eliminate potential race conditions, allow use of only one shared queue, and unlimited workers processing the same queue with no need for complicated locking. It would also scale to any level. We had our workers stop for a day now we have a 60GB queue size that we are trying to catch up with, but it's taking forever as only one worker can process each queue.
The main downside being that if the processing of the popped data fails then there are no retries. However I don't think that's a big deal, and even if it is can work around that by adding the data back to the beginning of the list, or into a failed queue.
Thanks @uglyrobot the problem is less around redis but more about Matomo and how it tracks data etc. There's a related issue in core eg https://github.com/matomo-org/matomo/issues/6415 basically if two workers were to work on the same a queue and one worker was processing the second tracking request of a visit slightly faster than another worker does the first tracking request Matomo could store wrong data in its database and sometimes even additionally create multiple visits.
I seem to be having issue with the following command which is stopping me from executing this correctly;
./console queuedtracking:process --queue-id=X
When activating ./console queuedtracking:process --queue-id=0 specifically for queue-id=0, it doesn’t work, i get this error
ERROR [2020-07-06 09:10:58] 4700 Uncaught exception: C:\inetpub\wwwroot\vendor\symfony\console\Symfony\Component\Console\Input\ArgvInput.php(242): The “–queue-id” option requires a value.
It works for fine for “./console queuedtracking:process --queue-id=1”
is this a known issue or am i doing something incorrectly?
@StevieKay90 could you send us the output of your system check see https://matomo.org/faq/troubleshooting/how-do-i-find-and-copy-the-system-check-in-matomo-on-premise/ ? The output should be anonymised automatically.
Hi Thomas, i've just found out that if you set ./console queuedtracking:process --queue-id=00 it works, good help from the community!
One thing which is vexing me though is why queue=0 seems to the most full, its not evenly distributing the load. The other queues are just a handful of requests in but queue 0 has over 200 Is there a way to stop this?
Thanks for this. I still can't reproduce it just yet. @sgiehl any chance you have a windows running with Matomo and can try to reproduce this? I'm wondering if it's maybe windows related.
@tsteur don't have a matomo running directly on windows. But I could check if my Windows VM where I had set this up once is still running. But I guess it's already outdated and I would need to set it up again. Let me know if it's important enough to spend time on it.
Hi all @tsteur @sgiehl thanks for taking a look into this
as you can see its quickly is becoming a big problem here for me, i'm going to have to stop queued tracking
This has happened since the upgrade, previously i haven't run into this issue. Any interim advice would be great
@StevieKay90 could you remove the queue-id
parameter? Then the requests in the first queue should get processed
@tsteur I have done, i'm not using command line at all now i'm using the "Process during tracking request option" It just seems to heave the vast majority of requests into one queue and as its one worker at a time, it ant handle all the requests in the queue
@StevieKay90 it will likely catch up and process these requests. If otherwise overall it always pushes more requests into the first queue that might be if a lot of the requests are coming from the same IP address or a lot of them use the same visitorId or userId (if userId feature is used). It's possible that simply the visits in the queue 0
weren't processed in the past because of the error you were getting
btw you could maybe also try --queue-id="0"
not sure if that makes a difference in Windows
@tsteur the command --queue-id=00 seems to work on windows to process queue 0. However this problem i'm now suffering from is way deeper (i thought this was the issue like you but now i don't think it is). Previously, not stating an ID did actually process queue 0- its just that a) queue 0 seemed to be much bigger and also B), write speed gets really slow as the redis db grew, takingsomething like 500 records in 2 minutes, i've got pretty high spec servers so that was surpising. it could never clear it all, and it reached massive levels until redis choked. so now i'm think was it an error in the upgrade or a software config thing
@tsteur Ok, i've done some research and have some very interesting findings!
Forcing queue ID: 0 : This worker finished queue processing with 3.2req/s (150 requests in 46.91 seconds) Forcing queue ID: 1 : This worker finished queue processing with 39.01req/s (125 requests in 3.20 seconds) Forcing queue ID: 2 : This worker finished queue processing with 42.12req/s (150 requests in 3.56 seconds) Forcing queue ID: 3 : This worker finished queue processing with 38.92req/s (125 requests in 3.21 seconds) Forcing queue ID: 4 : This worker finished queue processing with 44.05req/s (100 requests in 2.27 seconds) Forcing queue ID: 5 : This worker finished queue processing with 39.85req/s (125 requests in 3.14 seconds)
So its not that the there is more requests being routed to Queue ID 0 - its just the computing time of this specific queue is incredibly slow in comparison to the others!
UPDATE
I now opted for 16 workers as i figured that the relative speed of the other 15 would counter balance that of the slow moving queue 0.
However - Now queue 0 is performing a lot better (figuratively speaking at about 12-20req/s) but queue number 6 is now the naughty boy! There was nothing especially wrong in the verbose process output when i processed this queue manually, only the fact that it was slow and i could read most of the lines as they went by when normally its just a black and white fuzzy blur.
@StevieKay90 any chance our using our log analytics for example to track / import data? This would explain that more requests go into the first queue and that it's slower since every request might consist of multiple tracking requests. Or in case you do custom tracking with bulk tracking requests that would explain it too.
That another queue might have now more entries be likely expected if you're not using the regular JS tracker. Be great to know how you track the data @StevieKay90
Thanks for the response thomas. All data is from the regular JS tracker.
It looks like I’m going to have to return to matomo 3 to check if it was the upgrade which changed the queued tracker process.
Currently with QT s weir Ged on I eventually get a pool of data in a queue which can’t be cleared fast enough and without QT I get a lot of strain on the db server
On Thu, 11 Mar 2021 at 20:08, Thomas Steur @.***> wrote:
@StevieKay90 https://github.com/StevieKay90 any chance our using our log analytics for example to track / import data? This would explain that more requests go into the first queue and that it's slower since every request might consist of multiple tracking requests. Or in case you do custom tracking with bulk tracking requests that would explain it too.
That another queue might have now more entries be likely expected if you're not using the regular JS tracker. Be great to know how you track the data @StevieKay90 https://github.com/StevieKay90
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/matomo-org/plugin-QueuedTracking/issues/134#issuecomment-797015011, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATFEZHWPNXTY2R7PGTA5HKDTDEPKRANCNFSM4NBIIKTQ .
Let us know how you go with the downgrade to Matomo 3. Generally, there wasn't really any change though in queued tracking so I don't think it would make a difference. Be interesting to see though.
@tsteur is queued tracking compatible with php 8? out of interest?
AFAIK it should be @StevieKay90
Hi,
we are using QueuedTracking on 3 frontend servers, each with 24core and backend DB+redis with 128core+1TB RAM. We are tracking single website with billion of monthly pageviews. DB has little workload, redis has ~24% CPU core.
Having 16 queues, 10 requests per batch, processing 6 queues on 1st frontent and 5 queues on second and third frontend, each queue processor is hitting ~80% cpu, but frontend servers still have spare CPU power. Is it possible to increase number of queues beyond 16 to get even more performance? We have already written start scripts for queue processors so they immediately restart after reaching NumberOfMaxBatchesToProcess and do not wait for cron to restart for remaining seconds until full minute.
Do you have any other advices to increase QueuedTracking capacity here?
Hi @bitactive. I'm sorry you're experiencing issues. Sadly, 16 is currently the maximum number of queues supported. You could try adjusting the number of requests processed in each batch. I believe that the default is 25. Any other recommendations @AltamashShaikh ?
@bitactive We would recommend to increase the no of requests here
@snake14 @AltamashShaikh Increased no of requests from 10 per batch to 25 per batch. Now each of 16 workers have ~80% CPU and increased total throughput (processed requests per second) by ~15%. Still not able to process queue in realtime during high hours with 16 workers, each at 80% CPU on 3.8GHz cores.
What are further possible steps to increase efficiency, e.g. by an additional 100%? We do track one big website and have nearly unlimited resources for this (machines / CPU cores / memory).
@bitactive What if you change the no pf requests to 50 ?
@snake14 @AltamashShaikh Changing requests per batch to 50 gives another 10-15% throughput increase. Will try 100 soon as traffic increase.
Meantime i have another question for this configuration.
If i would like to add second big project to this Matomo instance, is it possible to configure it so for example matomo project #1
will use redis queue 0 and matomo project #2
will use redis queue 1 and then run 16 workers for queue 0 and 16 workers for queue 1?
As far as i know different matomo projects can be processed independently so it should be possible to direct requests from one project to one redis queue and from second project to another redis queue and then process them independently by another 16 workers?
Hi @bitactive . I'm glad that helped. As far as I can tell, each Matomo instance would need a separate Redis database. Can you confirm @AltamashShaikh ?
@bitactive You can specify the database if you want to use the same Redis for 2 instances
Here are some notes I wrote earlier and thought it would be useful to put in the FAQ maybe?
How do I setup QueuedTracking on multiple tracking servers?
Say you have
then on each of your 4 frontend servers, you need to:
./console queuedtracking:process --queue-id=X
Where X is the queue ID. Each server handles 2 queues. So the 4 servers handle the 8 queues.
Queue ID starts at 0.
Notes:
./console queuedtracking:monitor
to track the state of the queue