matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.68k stars 2.62k forks source link

Report archives have tripled in size since update to 2.10 #7181

Closed skyhawk669 closed 8 years ago

skyhawk669 commented 9 years ago

Since upgrading to 2.10 the archive blobs and archive numbers tables have tripled in size (blob tables usually between 30-40MB, now they are 140-500MB).

The archiving process is set in cron to run every hour and nothing else has changed in the system beyond upgrading from 2.9 to 2.10 (no drastic change in amount of visitors, or to the structure of the site).

The tables are reduced in size a bit by running core:run-scheduled-tasks --force, but they're still quite a bit bigger than they used to be.

More background info in the following thread: http://forum.piwik.org/read.php?2,123852

Running Piwik on: Red Hat 5 apache 2.4 PHP 5.4.35 MySQL 5.0.45

tassoman commented 9 years ago

@patriiiiiiiiiick what's your configuration? In the early post users reported PHP 5.4 and PHP 5.3 unfortunately both are deprecated and php 5.5 is the current stable version.

This issue sadly prevent us to upgrade more than Piwik 2.9.1. Our DB is about 60 GB and fits half of the DB's VM so we have no choice until this bug will close.

The only try we could do is a snapshot of testing environment, upgrade to latest stable (2.13.?) then run some tracking on the testing environment. Unfortunately we can't simulate the big traffic of production websites so I'm not comfortable with having evident results after this. More, I need to ask for permission of SysAdmins and Security Officiers if we could manage to dupe the test VM and let you get in.

patriiiiiiiiiick commented 9 years ago

@tassoman This runs on a PHP 5.5. @diosmosis Last time I tried to run the purge old data instruction, it hung on the January table for two days without visible effect. I didn't try after the last drop of archive tables.

mattab commented 9 years ago

This bug should be fixed in 2.14.1-b1 - if you still have any issue with this version or newer, please comment here

patriiiiiiiiiick commented 9 years ago

This seems to be mostly solved for past months but not for the current one where the blob table remains abnormally large.

mattab commented 9 years ago

It may not be fully fixed so re-opening. This was reported in http://forum.piwik.org/read.php?2,128325 - (I've also removed the issue from the 2.14.1 changelog where it was announced fixed.)

patriiiiiiiiiick commented 9 years ago

Indeed.

archive_blob_2015_08 6.4 G 295 M 5,161,333 archive_blob_2015_07 21 G 736 M 11,117,685 archive_blob_2015_06 445.9 M 21.6 M 499,029 archive_blob_2015_05 340 M 13.6 M 142,096 archive_blob_2015_04 384 M 44.7 M 505,544 archive_blob_2015_03 411 M 55.7 M 1,463,385 archive_blob_2015_02 389 M 52.8 M 1,027,133 archive_blob_2015_01 27.1 G 336 M 2,732,667 archive_numeric_2015_08 364 M 332.8 M 2,395,281 archive_numeric_2015_07 919.9 M 825.5 M 4,891,328 archive_numeric_2015_06 104.5 M 103.9 M 205,967 archive_numeric_2015_05 22.6 M 16.1 M 55,698 archive_numeric_2015_04 9.5 M 16.1 M 84,590 archive_numeric_2015_03 8.5 M 16.1 M 93,888 archive_numeric_2015_02 8.5 M 16.1 M 80,488 archive_numeric_2015_01 306.9 M 233.6 M 1,040,788

I'll have the following run to free up some space: console core:purge-old-archive-data 2015-07-01 --include-year-archives --force-optimize-tables

Please note this is not documented in http://piwik.org/docs/setup-auto-archiving/#help-for-corearchive-command.

patriiiiiiiiiick commented 9 years ago

Here are my numbers after a “purge all”:

archive_blob_2015_08 628 M 35.7 M 943,370 archive_blob_2015_07 952 M 48.7 M 762,198 archive_blob_2015_06 317.8 M 12.6 M 117,246 archive_blob_2015_05 265.8 M 13.6 M 215,000 archive_blob_2015_04 380 M 44.7 M 825,327 archive_blob_2015_03 410 M 55.7 M 1,080,096 archive_blob_2015_02 388 M 52.8 M 1,159,006 archive_blob_2015_01 4.6 G 83.9 M 1,858,964

archive_numeric_2015_08 28.6 M 45.3 M 275,553 archive_numeric_2015_07 26.6 M 53.3 M 340,280 archive_numeric_2015_06 8.5 M 18.1 M 97,546 archive_numeric_2015_05 8.5 M 18.1 M 94,479 archive_numeric_2015_04 7.5 M 16.1 M 88,844 archive_numeric_2015_03 7.5 M 16.1 M 81,660 archive_numeric_2015_02 7.5 M 16.1 M 87,024 archive_numeric_2015_01 41.6 M 76.3 M 472,563

There is clearly an asymmetry between July-August and the rest of the year but I can't tell whether the old ones didn't run completely when they had been re-archived or if July and August still contain too many rows. To be continued...

mattab commented 9 years ago

Hi everyone,

We believe we may have finally found issue causing this bug of archive tables becoming too big.

I'm re-opening while we wait for confirmation from you guys, that this issue has been fixed.

It would be great if you could test that this bug is also fixed for you. We've released 2.15.0-b2 which you can install easily (see instructions: http://piwik.org/faq/how-to-update/faq_159/)

We're waiting for your feedback :+1:

MaxWinterstein commented 9 years ago

i just want to leave a comment here. I updated my installation to latest beta, did an archiving and optimze on all tables. My installation shrank from 80gb down to 10gb, where 6gb are from piwik.piwik_log_link_visit_action that is not able to optimize.

mysql> optimize table piwik_log_link_visit_action;
+-----------------------------------+----------+----------+------------------------------------------------------------------------------+
| Table                             | Op       | Msg_type | Msg_text                                                                     |
+-----------------------------------+----------+----------+------------------------------------------------------------------------------+
| piwik.piwik_log_link_visit_action | optimize | note     | Table does not support optimize, doing recreate + analyze instead            |
| piwik.piwik_log_link_visit_action | optimize | error    | Incorrect key file for table 'piwik_log_link_visit_action'; try to repair it |
| piwik.piwik_log_link_visit_action | optimize | status   | Operation failed                                                             |
+-----------------------------------+----------+----------+------------------------------------------------------------------------------+
bartek85 commented 8 years ago

I had 650GB archive_blob_2015_01 table on 2.12.0, will let you know with 2.14.3 and than beta after archive will finish.

gaumondp commented 8 years ago

Remember that January is also the table having the yearly reports so it's "normal" it's way bigger (8X ?) than other months. AFAIK.

bartek85 commented 8 years ago

Another piwik server: piwik_archive_blob_2015_01 - 428GB piwik_archive_blob_2015_02 - 26.1GB piwik_archive_blob_2015_03 - 20.2GB piwik_archive_blob_2015_04 - 46.3GB

so it's way bigger (10-20 times)

ThaDafinser commented 8 years ago

In my case it's solved and event got better with the new CLI command

mattab commented 8 years ago

Hi everyone, could you confirm whether this bug is fixed for you after upgrade to 2.15.0? if not fixed, please let us know: we need to make sure this issue is really solved. thanks!

patriiiiiiiiiick commented 8 years ago

It seems mostly solved. I wonder to what point it is still normal to have invalidated archives still present for the month of October, now on the 18th of November.

Please look at the attached results of

./console diagnostics:analyze-archive-table 2015_09

where the one of September has been run on the 5th of October (invalidated archives present) and Today (none present).

analyze-archive-table_2015_10.txt analyze-archive-table_2015_09.txt

The table sizes compare as follows: archive_blob_2015_11 733 M 40.1 M 554,460 archive_blob_2015_10 1.2 G 34.7 M 338,403 archive_blob_2015_09 250.8 M 13.6 M 70,623 archive_numeric_2015_11 49.1 M 44.2 M 226,694 archive_numeric_2015_10 21.6 M 43.2 M 235,579 archive_numeric_2015_09 7.5 M 17.1 M 91,433 where I ran an optimize table until the one of October some time in the past. I have probably added some new sites lately but none that big.

Did we make sure a table optimisation is run when needed?

mattab commented 8 years ago

Hi everyone,

We haven't heard that the issue is still active so closing this issue. please leave a comment if you find anything interesting or experience some issue. Thanks!

gaumondp commented 8 years ago

I updated to 2.15.0 few weeks ago and problem seems solved.

piwik_archive_blob_2015_month are now around 400 MB instead of 6 GB (except for January but that's normal, I know) !

Thanks.

ravenxxxv commented 8 years ago

I've updated to 2.16.0 and although it shrank the tables considerably they still appear to be bloated with duplicate entries.

before 2.16.0 console core:archive and optimize: 130G piwik_archive_blob_2016_01.ibd 31G piwik_archive_blob_2016_02.ibd 68G piwik_archive_blob_2016_03.ibd

after 2.16.0 core:archive and optimize: 2.9G piwik_archive_blob_2016_01.ibd 1.3G piwik_archive_blob_2016_02.ibd 22G piwik_archive_blob_2016_03.ibd

According to the comments in this thread, I ran the following to see the number of duplicate entries in the Mar 2016 blob table.

SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_blob_2016_03 GROUP BY idsite, date1, date2, period, name HAVING count > 1;

This resulted in 727180 rows

I ran the following diagnostic as well

./console diagnostics:analyze-archive-table 2016_03

Here's the summary

Total # Archives: 40728 Total # Invalidated Archives: 1154 Total # Temporary Archives: 0m Total # Error Archives: 0 Total # Segment Archives: 20433

Please let me know if you need any further information.