matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.8k stars 2.64k forks source link

Archiving Memory exhausted error - Archiving memory leaks #766

Closed anonymous-matomo-user closed 11 years ago

anonymous-matomo-user commented 15 years ago

Users have memory exhausted errors when running Piwik archiving. There are two distinct use cases:

The error looks like:

Fatal error: Allowed memory size of 67108864 bytes exhausted 
(tried to allocate 73 bytes) in
 /var/www/web5/html/piwik/core/DataTable.php on line 939

-> We highly suggest users experiencing issues to upgrade to PHP 5.3 which has some improvements around memory management.

Ideas to solve this issue (less time consuming first)?


#This query uses the "counter" to truncate the rows after X rows. In real life, we would truncate after X=500 or 5000
SELECT SUM(count), case when counter = 15 then "Others" else hour end AS label
FROM (

#This query adds the "counter" to each row after it was sorted by visits desc
SELECT count, hour, case when @counter = 15 then 15 else @counter:=@counter+1 end AS counter
FROM
(

#This query selects count for each hour of the day
SELECT count(*) as count, 
       HOUR(visit_last_action_time) as hour
FROM `piwik_log_visit`
WHERE 1 
GROUP BY hour
ORDER BY count DESC
) T1,
(SELECT @counter:=0) T2
) T3

GROUP BY counter
ORDER BY counter ASC

Here I only sum the number of visits, but we could automatically rewrite all aggregate queries in ArchiveProcessing/Day.php using this mechanism?

Output:


SUM(count)  label
113     5
108     19
94  2
93  20
90  21
87  22
80  4
79  9
77  8
67  6
65  1
65  3
61  11
60  10
467     Others

Obviously truncating the "Hourly" report is a bad example, but replace it with Page URLs to get the picture :) This will save a lot of bandwith and CPU processing, especially for websites that have more than 500 unique Page URLs per day or external keywords, etc.

anonymous-matomo-user commented 12 years ago

we tapped into the same issue:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 2 bytes) in /wwwroot/piwik/core/DataTable.php on line 1022

(and no, i won't increase memory limit)

it can be triggered, when i increase the "date range": "week" does work, "month" triggers the error.

the error message is shown for "Last visits graph", "List of external Websites" and "Visits by server time" on the dashboard, most other pages doesn't work, too.

imo the solution sugested "3) Process aggregation in MySQL" is fine, as only the big numbers are interesting when analyzing stats.

micw commented 12 years ago

A good approach I used a while ago for solving such a problem is the following. I don't know it it will also be usefull for piwik:

So for example a "visits by hour of day" listener would contain 24 counters. For each row passed to this listener, depending on the hour of it's visit, the corresponding counter would increased by 1. During runtime, memory would only consumed for one row and the 24 counters.

anonymous-matomo-user commented 12 years ago

Archiving period = week for idsite = 2...

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 87 bytes) in /home/site/piwik/core/DataTable/Row.php on line 380

Archiving period = month for idsite = 2...

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 3145728 bytes) in /home/site/piwik/core/Archive/Single.php on line 501

Archiving period = year for idsite = 2...

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 3145728 bytes) in /home/site/piwik/core/DataTable.php on line 1022

I've increased my PHP memory limit to 2 gigs, but this is on shared hosting, so I'm not sure if that is having an effect.

anonymous-matomo-user commented 12 years ago

It is impossible to archive pages with 20.000 unique id's and 5M Visits / Day. The php script needs more then 16GB of memory.

I tried to have a look in this issue myself, but as I'm no php developer, just perl/bash/c/c++ i don't get your code. I also have searched for some documentation what the archiving process does exactly. If you could provide such a doc, maybe we could provide our SQL KnowHow, and provide an alternative way of archiving.

As the processed data is from one of our customers, i could not provide datasamples.

mattab commented 12 years ago

Thanks for the report. 20,000 unique Ids, do you mean 20k unique URLs?

If so, we have an idea on how to solve this problem nicely, see the proposal in #766 using some SQL magic, we would do the truncation in memory in mysql and then return the top 1000 pages (for example, customizable in config file), to PHP. Right now we return ALL data to PHP which then does the truncating which is highly memory inefficient.

Finally, if you have some time and are willing to help, it would be very interesting to install the PHP profile XHProf from facebook on your server and give me access to it, it might help finding out other problems and fixes that could help your case. Thanks

Dupes #766

mattab commented 12 years ago

I got confused here, meant to comment in #3066

anonymous-matomo-user commented 12 years ago

Replying to matt:

Thanks for the report. 20,000 unique Ids, do you mean 20k unique URLs? Right, its a portal from a huge computer magazin If so, we have an idea on how to solve this problem nicely, see the proposal in #766 using some SQL magic, we would do the truncation in memory in mysql and then return the top 1000 pages (for example, customizable in config file), to PHP. As i mentioned, we don't get the code. In this ticket there is just one example, how it could work. But i don't see how to implement this on our side. We could provide performant statements if we know what exactly the current code does at the moment.

Finally, if you have some time and are willing to help, it would be very interesting to install the PHP profile XHProf from facebook on your server and give me access to it, it might help finding out other problems and fixes that could help your case. Thanks

Sorry, but we are an enterprise datacenter, so there is no way to provide 3rd party developers access to the system. But we have asked our customer, if we could forward some example logfiles with fake-ip's.

anonymous-matomo-user commented 12 years ago

No success at the customer front. We could not provide some sample data. The workaround in the issue description is not usable for us, as we really want to replace awstats and need the full data, not only the TOP X.

tbe

anonymous-matomo-user commented 12 years ago

We're also having major memory issues with Piwik, we're unable to run an archive process at all; I feel like I've been pretty thorough in investigating most of the suggestions but we still don't have a workable solution. We have implemented the Piwik front end on a medium amazon ec2 instance with nginx in place of apache for performance, and the database is on a large amazon rds. We can see the real time data coming in correctly, so we know there is data being collated. Our site is in online publishing, we have around 350,000 unique url's, and we're looking at 20-25M page views / month. We have a requirement for accessing detailed analytics data from our system via api's, so every unique url's page views are important, but for reporting through piwik that is far less important to us, so your 3rd proposal above could suit us (as i said, provided the raw page views for all urls were still available via api).

We are in the evaluation stages of finding the right analytics fit for our needs, so we are open to experimenting and losing or tainting the current piwik data while we work out a method to make it work. We are also open to the idea above of installing XHProf if that would help give insight into causes and possible solutions.

I'm going to make a forum post as well in case someone in the community can provide some direction for us.

Thanks, Rafe

mattab commented 12 years ago

@Rafe and others, if you hit a wall with Piwik and are stuck, please consider contacting Piwik Professional Services. We help Power Users run Piwik on 1M per day and more requests, on dedicated servers. PLease get in touch: http://piwik.org/consulting/

timo-bes commented 12 years ago

I am going to build a class that allows "3) Process aggregation in MySQL" from the ticket description. I created a separate ticket #3330 where I will commit code and where the implementation can be discussed.

diosmosis commented 12 years ago

(In [6966]) Refs #3330, #766 add integration test for blob limiting behavior present in certain plugins. Also, fix three bugs:

diosmosis commented 12 years ago

(In [6976]) Refs #3330, #766 improved BlobReportLimitingTest.

diosmosis commented 12 years ago

(In [6980]) Refs #3330, #766, #3227 use RankingQuery and truncate tables as they are created in Actions plugin. Also modified phpunit integration testing mechanism so all API calls are tested and outputted before a test case throws.

diosmosis commented 12 years ago

(In [7080]) Refs #3330, #766 refactor truncate-as-you-go approach used in Actions. Added maximumAllowedRows member and walkPath method to DataTable and some other utility methods to DataTable_Row types.

diosmosis commented 12 years ago

(In [7083]) Refs #3330, #766 remove unnecessary methods and rename one function for clarity.

mattab commented 11 years ago

Research Work around this is finished, we have identified the two main tasks remaining to fix the archiving memory errors:

If you can sponsor these optimizations improvements, or need any other tuning done, please contact us: http://piwik.org/consulting/