matomo-org / plugin-QueuedTracking

Scale your large traffic Matomo service by queuing tracking requests (in Redis or MySQL) for better performance.
https://matomo.org
GNU General Public License v3.0
82 stars 35 forks source link

High Traffic Handling Issues with Queue Processing #252

Open newjar opened 1 month ago

newjar commented 1 month ago

Hello, we are facing issues with Matomo during high traffic when processing data from the redis-queue into the database. We are currently using PHP v8.3.8 and Matomo v5.1.0. On average, our website receives about 33 million pageviews per day, which amounts to approximately 1 billion pageviews per month. During peak seasons, this can surge to 6 million hits per minute, which means up to 100K hits per second at peak times.

We are using a total of 5 machines including the database for Matomo:

We have already tuned PHP, Redis, and the Database according to Matomo’s recommendations:

  1. https://matomo.org/faq/new-to-piwik/faq_134/
  2. https://matomo.org/faq/on-premise/how-to-configure-matomo-for-speed
  3. https://matomo.org/faq/new-to-piwik/faq_137

For the queued tracking plugin, we have disabled the process during tracking request option, set our total workers at 16, and batch processing at 50. For queuedtracking:process, we created a cron job for each queue-id (a total of 16 jobs).

With an experiment of 500K data in the queue (16 workers), the queuedtracking:process was only able to process about 100K every 2 minutes from within redis. This is still far from our expectations, as we hoped to process at least 5 times that amount. Moreover, with a load test of 15,000 users per second, the queuedtracking:process couldn't keep up with the data already in the queue as the data in redis continued to increase, leading to an maxmemory error on our Redis server.

FYI: We have spent $2804/year on purchasing additional plugins for Matomo.

Is there anyone here who could help us out? If more information is needed for analysis, please feel free to ask.

Thank you for your help.

snake14 commented 1 month ago

Hi @newjar . Thank you for taking the time to create this issue. I'm sorry to hear that you're experiencing this issue. Have you tried lowering the batch size and increasing the frequency in which the cron runs queuedtracking:process?

Any other recommendations @AltamashShaikh ?

newjar commented 1 month ago

Hi @newjar . Thank you for taking the time to create this issue. I'm sorry to hear that you're experiencing this issue. Have you tried lowering the batch size and increasing the frequency in which the cron runs queuedtracking:process?

Any other recommendations @AltamashShaikh ?

Yes, we have tried reducing the batch size to 25, and the cron job runs every minute. However, it still couldn't keep up the load test with 15,000 users per second.

AltamashShaikh commented 1 month ago

@newjar Do you know how many requests are being processed with batch size of 25 ?

newjar commented 1 month ago

@newjar Do you know how many requests are being processed with batch size of 25 ?

Here is the data for the batch size of 25:

php console queuedtracking:process Starting to process request sets, this can take a while This worker finished queue processing with 173.58req/s (12525 requests in 72.16 seconds)

newjar commented 1 month ago

Is it possible for us to create 2 queued tracking in the same Redis but in different databases? If so, is there a possibility that the received data may become invalid?

Example: Matomo instance-A in redis database 0 and Matomo instance-B in redis database 1

haristku commented 1 month ago

@newjar Do you know how many requests are being processed with batch size of 25 ?

Here is the data for the batch size of 25:

php console queuedtracking:process Starting to process request sets, this can take a while This worker finished queue processing with 173.58req/s (12525 requests in 72.16 seconds)

@AltamashShaikh is it normal to have around 173 rps per run on 72 cpu box?

has anyone ever benchmarked queuedtracking:process rps?

AltamashShaikh commented 1 month ago

@haristku We did the performance test initially as you can see the same in our FAQ

How fast are the requests inserted from Redis to the Database?

This very much depends on your setup and hardware. With fast CPUs you can achieve up to 250req/s with 1 worker, 400req/s with 2 workers and 1500req/s with 8 workers (tested on a AWS c3.x2large instance).

How should the redis server be configured?

Make sure to have enough memory to save all tracking requests in the queue. One tracking request in the queue takes about 2KB, 20.000 tracking requests take about 50MB. All tracking requests of all websites are stored in the same queue. There should be only one Redis server to make sure the data will be replayed in the same order as they were recorded. If you want to configure Redis HA (High Availability) it is possible to use Redis Sentinel see further down. We currently write into the Redis default database by default but you can configure to use a different one.
AltamashShaikh commented 1 month ago

Is it possible for us to create 2 queued tracking in the same Redis but in different databases? If so, is there a possibility that the received data may become invalid?

Example: Matomo instance-A in redis database 0 and Matomo instance-B in redis database 1

@newjar Yes you can specify redis database instance per instance

image

AltamashShaikh commented 1 month ago

@newjar @haristku You can also check this link it states that with higher no of requests and workers you should lower the no of requests to reduce the wait time.

haristku commented 1 month ago

hi @AltamashShaikh thanks for the reply...

my personal record was 3200rps at 16 workers running 100 batch on 72 cpu, 64 GB ram box, according to this link :

With fast CPUs you can achieve up to 250req/s with 1 worker, 400req/s with 2 workers and 1500req/s with 8 workers

it means that i have succeeded in achieving the same number of rps as stated, but I have a need to be able to reach 100000 rps at any cost.

I have tried various batch numbers from 5, 10, 25, 50, 100, 500, 1000, but the results are similar, cannot exceed 3200 rps.. and what makes me pulling my remaining hair is, this link recommend to increase the number:

@bitactive We would recommend to increase the no of requests here

so should i increase or decrease?

when I got 3200 rps, the cpu and ram utilization was still very small in each box (85% idle), meaning there were still a lot of unused resources and I wanted at least 70% of the resources to be used efficiently, I've tried various methods and tuning to exceed 3200 rps, but unfortunately didn't succeed.

another question, i did a little profiling with xdebug and i got this: grind

there is a total of 18.63% time spent on Piwik/Common::printDebug and i didn't turned on the log or debug on my config.ini.php why it keeps trying to printDebug?

if i modified the code add return; there, the whole queuedtracking:process is slightly faster..

do you have any other recommendations about how to reach 100000 rps queuedtracking:process?

Thank you for your help.

AltamashShaikh commented 1 month ago

@haristku Ill ask internally if we had a customer with this big QPS using QueuedTracking in realtime.

Also are this request from a single website or multiple ?

haristku commented 1 month ago

currently we are still in the stress test stage, which is still using single website, but later in the live-production stage there will be several websites...

we use locust to generate the requests...

atom-box commented 1 month ago

I'm with Matomo. Replying to @AltamashShaikh (internal question, above): The highest rate I remember seeing in queued tracking is 5000 hits per second at peak.

bitactive commented 1 week ago

Our top rate is approximately 269 RPS * 16 workers, resulting in a total of 4300 RPS. This is on a machine with 128 high-clock cores, 1TB of RAM, and both Redis and MySQL running on the same machine to reduce RTT for Redis and MySQL requests. The issue is that at peak CPU usage, this maxes out the 16 Matomo worker cores, and we are unable to set more than 16 workers. Based on our code review, this limitation is imposed by using the first hex character of the visitor ID to determine the queue ID. Perhaps we could use two characters of the visitor ID to allow for a maximum of 256 workers? This seems like a fairly straightforward change. What do you think, @AltamashShaikh?

AltamashShaikh commented 1 week ago

@bitactive Can you test the following ?

Replace this function with below code

protected function getQueueIdForVisitor($visitorId)
    {
        $id = ord ($visitorId);

        return $id % $this->numQueuesAvailable;
    }

Replace this line with below line

$field->validators[] = new NumberRange(1, 256);

Now set the numQueueWorkers to 256 and test and see if it helps

theredcat commented 1 week ago

FYI I've come across this problem for remote databases.

My Setup is 3 matomo frontend servers, each with their local redis and workers and only one central database.

Frontend 03 is in the same datacenter as the database (0.1ms ping) and reach about 150req/seq on one worker and one queue Frontend 02 is NOT in the same datacenter as the database (10ms ping) and only peaks at 15 req/sec

bitactive commented 1 week ago

From our experience, RTT between workers, Redis, and the database can significantly impact RPS. This is why we use multiple frontend nodes but run workers, the database, and Redis on a single large machine (with a separate database slave only for backup purposes).

If you're unable to saturate 16 workers with ~100% CPU, the bottleneck is likely due to RTT and/or Redis/DB latency, and adding more workers may not help.

We will test the code changes suggested by AltamashShaikh and provide an update later today.

haristku commented 1 week ago

i managed to unlock the worker limitation up to 4096 using this code: (reference)

    protected function getQueueIdForVisitor($visitorId)
    {
        $visitorId = strtolower(substr($visitorId, 0, 3));
        if (ctype_xdigit($visitorId) === true) {$id = hexdec($visitorId);}
        else {
            $pos1 = ord($visitorId);
            $pos2 = isset($visitorId[1]) ? ord($visitorId[1]) : $pos1;
            $pos3 = isset($visitorId[2]) ? ord($visitorId[2]) : $pos2;
            $id = $pos1 + $pos2 + $pos3;
        }

        return $id % $this->numQueuesAvailable;
    }

im not using it on production yet (No lifeguards in attendance. Swim at your own risk.)

bitactive commented 1 week ago

@AltamashShaikh I think that your solution with

protected function getQueueIdForVisitor($visitorId)
    {
        $id = ord ($visitorId);

        return $id % $this->numQueuesAvailable;
    }

will not work because ord() will return the ASCII code of the first character, which can be 0-255. However, since the first character is hex, we still only have 16 unique combinations. We should use the first two characters to get 255 combinations. @haristku's solution looks better. However, 4096 workers are unnecessary, as we will encounter other bottlenecks before reaching that number of workers.

EDIT: We conducted testing with 32 queues. As expected, using a simple ord() resulted in only 16 queues being saturated, while the other 16 remained almost empty. @haristku's solution resulted in all 32 queues being evenly saturated, with 200 RPS per queue, leading to a total of approximately 6400 RPS, which is a record for us.

I think that this change should be incorporated into the main code.

AltamashShaikh commented 1 week ago

@bitactive We marked this issue "For Prioritisation" for our product team to view and schedule this, meanwhile we encourage you to send a PR for this and we can help you if any issue and we are happy to merge the PR if everything looks good.

AltamashShaikh commented 6 days ago

@bitactive I tweaked and test @haristku's code and at a glance it does looks good, https://3v4l.org/qt1mt

Instead of taking the first 3 character, I took the last 3 characters and it distributes the load evenly

bitactive commented 6 days ago

@AltamashShaikh In real world scenario @haristku works quite well when we have random distribution of VisitorIds. However your approach is better and will better work in edge cases like one from your example. I have checked your code on production and it works great. Good job!

It is also worth noting that on servers with very high traffic, changing the connection method in connect function to pconnect for Redis on the frontend can significantly reduce system load by minimizing syscalls. It would be useful to have an option in the configuration to choose whether the connection method is permanent or not.

haristku commented 6 days ago

with up to 4069 workers available, using queuedtracking:monitor would make my eyes crinkle a bit, so I modified it to look like this:

monitor


monitor1

Key ,=first page, .=last page, 0-9=move to page section, arrow LEFT=prev page, RIGHT=next page, UP=next 10 pages, DOWN=prev 10 pages, q=quit -p, --perpage=PERPAGE Number of queue worker displayed per page. [default: 16]


diff/sec 0 explains how many request in and out per second, color GREEN means "track in < track processed", RED "mean track in > track processed" this indication can help me read the requests processing ratio, if red often appears, then it tells me to increase the worker.

and to prevent the bottle neck, i modified few files to make redis cluster available Exercise - Creating a Redis Cluster

config

i also do some modifications in queuedtracking:process and change the cron style to supervisor base, with extra parameter -c to define how many cycles that queuedtracking:process would loop, and let the supervisor to auto respawn the process after it finish the cycles, -c 0 means infinite loop

$ php console --help queuedtracking:process

  -c, --cycle=CYCLE                                                            The proccess will automatically loop for "n" cycle time(s), set "0" to infinite. [default: 1]
  -s, --sleep=SLEEP                                                            Take a nap for "n" second(s) before recycle, minimum is 1 second. [default: 1]

with this modification i expect queuedtracking:process to consume a giant rps

i'll be happy to do PR if you let me

AltamashShaikh commented 3 days ago

@haristku You are welcome to create a PR :+1: