matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.91k stars 2.65k forks source link

Report "Event Categories" summarize "others" with only 1 unique Visitor #20397

Open OlliWu opened 1 year ago

OlliWu commented 1 year ago

I’m facing the following issue with one of our Sites. The Report “Behavior -> Events -> Event Categories” always sums up visits as “others”, even with an extremely high number of “datatable_archiving_maximumrows”. Believing the Report, these “other Visits” are coming from only 1 unique Visitor. Which seems wrong to me.

event_categories_others

Expected Behavior

With datatable_archiving_maximumrows* = 100.000.000, the Report "Event Categories" should not summarize Visits to "Others" as described here: https://matomo.org/faq/how-to/faq_54/

Current Behavior

The Report “Behavior -> Events -> Event Categories” always sums up visits as “others”, coming from only one unique Visitor.

event_categories_others_2

Steps to Reproduce (for Bugs)

  1. Track lots of Event Actions
  2. Run Archiver
  3. View Report “Behavior -> Events -> Event Categories”

Context

As you can see in the following Screenshot, the Site is tracking lots of Event Names. The “Event Names” Report itself, can’t even be displayed. (I guess because there’s no paging) Don’t know if this is part of the problem.

event_actions

I started by setting the datatable_archiving_maximum_rows from 100k to 1 Million and then from there to 10 and 100 Million rows. In every Step, I invalidated the Report Data and started the archiver. The Numbers (Visits, Events, etc) always changed. But “Others” just decreased by a couple of thousand Visits.

These are the Settings I used to produce the above Report. datatable_archiving_maximum_rows_custom_dimensions = 100000000 datatable_archiving_maximum_rows_subtable_custom_dimensions = 100000000 datatable_archiving_maximum_rows_actions = 100000000 datatable_archiving_maximum_rows_subtable_actions = 100000000 datatable_archiving_maximum_rows_events = 100000000 datatable_archiving_maximum_rows_subtable_events = 100000000 datatable_archiving_maximum_rows_custom_variables = 100000000 datatable_archiving_maximum_rows_subtable_custom_variables = 100000000 archiving_ranking_query_row_limit = 0

Your Environment

The Matomo Installation handle multiple Sites, tracking approximately 1 million visits, 4.5 million pageviews and 8 million actions per day.

I can’t get behind it and really don’t know which Metric causes the Issue. I don’t know either if there are just too much rows fetched by the select statement or if its some kind of bug. Hope you can give me a hint where to look or what to do. Just let me know if you need more info. Thanks Olli

bx80 commented 1 year ago

Thanks for the detailed report on this @OlliWu :+1: With maximum archiving rows settings of 100,000,000 you shouldn't be seeing any data grouped under "Others" so this looks like a bug. This doesn't seem to happen on smaller datasets, so it could be related to the number of rows.

I'll assign the issue for prioritization.

mattab commented 1 year ago

@OlliWu Q: we're thinking it could be a bug caused by setting archiving_ranking_query_row_limit = 0 - could you try instead to change it to 1archiving_ranking_query_row_limit = 1000000` and try again to invalidate, or wait, for data to be processed with this setting, whether it works better?

If it works better or not, please let us know, so we can confirm finding the problem. Thanks!

OlliWu commented 1 year ago

I did so many tests, I was pretty sure i tested something like that too. But, it turned out, I did not. :-) I invalidated two different days and did for each of them another archiving with the following parameters:

datatable_archiving_maximum_rows_custom_dimensions = 100000
datatable_archiving_maximum_rows_subtable_custom_dimensions = 100000
datatable_archiving_maximum_rows_actions = 1000000
datatable_archiving_maximum_rows_subtable_actions = 1000000
datatable_archiving_maximum_rows_events = 1000000
datatable_archiving_maximum_rows_subtable_events = 1000000
archiving_ranking_query_row_limit = 1000000

The same result on both days: No Others. 👍

Here are the Screenshots.

BEFORE 20230308_id35_before

AFTER 20230308_id35_after

As far as I can tell, the Numbers are looking valid too.

mattab commented 1 year ago

Thanks for the update @OlliWu and does it mean that from your perspective, there is a bug when archiving_ranking_query_row_limit = 0 ? If so, it would be appreciated to create a new bug report for it (or we could do it once you confirm)

OlliWu commented 1 year ago

@mattab i really don't know for sure. Maybe i just get the documentation wrong. So, first of all, if I want to get my data not summarized as "others" , i'm going to change the values according to the following FAQ: https://matomo.org/faq/how-to/faq_54/ Now, if there are still "others". I'm looking at the description of the various config options and there's "archiving_ranking_query_row_limit" which is described as "maximum number of rows to fetch from the database when archiving. if set to 0, no limit is used." Maybe it's just me, but i thought if i set it to 0, all the data needed to create a detailed report is fetched. But it's clearly not.

So, maybe the the query_row_limit works as expected, but only the Description is misleading? For me, it's totally fine to use a high value, e.g. 1 Million instead of "0".