matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.7k stars 2.62k forks source link

Filtered / missing datas in reports #21035

Closed tisonv closed 1 year ago

tisonv commented 1 year ago

Context

How has this issue affected you? Reports are wrong / missing https://forum.matomo.org/t/archiving-ignores-filters-url-patterns/52218/1 https://forum.matomo.org/t/missing-data-in-behaviours-and-page-titles/52188/1

Expected Behavior

The search in the 'Page' page should return all results The 'Title' page should return all visits The numbers should match

Current Behavior

Only urls with certain patterns are taken into account. No settings are to be found on the why. No log explains the behaviors

Possible Solution

Give an easy way to see / query the blob value in matomo_archiveblob* ? More logs in the archiving process ? The datas exist in matomo_log_link_visit_action and matomo_log_action

Steps to Reproduce (for Bugs)

Both in our staging and production environments,

Your Environment

sgiehl commented 1 year ago

Hi @tisonv. Thanks for creating this issue. The pages reports are "truncated" to a certain amount of data. By default this should be 500 for the first level and 100 for all levels below. So if you are tracking a lot of different urls it can happen that some urls with low amount of visits is aggregated into the "Others" row. This makes it impossible to find the record using the table search as well.

In general when comparing data in Matomo against data in the database, please be aware that all data in the database is stored in UTC time. So depending on the timezone you have configured you might need to adjust them selected timeframe accordingly.

tisonv commented 1 year ago

Hi @sgiehl ! Thank a lot for your reply

I do not think this apply to my problem. I can query pages with 1 or 2 page views without any problem while some with 500+ are missing. I would not mind a slight gap between the UI and the db but this is too much to be ignored.

I do not see the "Others" row you're talking about.

sgiehl commented 1 year ago

It's really hard to dig into such problems as they can be anywhere. I don't know what exactly you are tracking nor how the tracking is implemented. Without having an exact way how to reproduce that, we likely won't be able to investigate and maybe fix it. We have a dozens auf automatic tests that check if the tracked data is correctly aggregated into reports and there is currently no known problem around this topic.

tisonv commented 1 year ago

I understand There is nothing off the basic internal site setup. No filter nor anything Datas end up in the matomo_log_link_visit_action but is missing in the reports.

Is there a way to get a highly verbose log while computing the reports to see what is being aggregated or not ?

peterbo commented 1 year ago

@tisonv try -vvv as a parameter for archiving. This is the most verbose mode. See the "help output" on https://matomo.org/faq/on-premise/how-to-set-up-auto-archiving-of-your-reports/

tisonv commented 1 year ago

@peterbo -vvv is unhelpful as it prints only the calls to the Matomo API, not the actual archiving work.

I tried deleting the faulty actions from the database to see if they would appear when new calls are made but to no avail. ( the actions are still logged in the database)

I plugged the website to a new Matomo site (same matomo instance but new site with new id). It may seem to work but I'll know in a couple of days when more datas will be available.

tisonv commented 1 year ago

Hi @tisonv. Thanks for creating this issue. The pages reports are "truncated" to a certain amount of data. By default this should be 500 for the first level and 100 for all levels below. So if you are tracking a lot of different urls it can happen that some urls with low amount of visits is aggregated into the "Others" row. This makes it impossible to find the record using the table search as well.

hi @sgiehl You were indeed right.
I never found the 'Others' before today. After upping it to 5.000, I was able to find all my missing statistics.

I find it rather counter-intuitive when Matomo is used in internal websites as opposed to news websites. Our intranet is 10.000+ pages big and the need for statistics remains the same on every page all the time whether the page was viewed 1 or 1000+ times or 5 years old

Maybe it should be worth mentioning on the "Intranet Website" documentation.

Thanks for your help !

sgiehl commented 1 year ago

Thanks for the hint. Might indeed be worth mentioning it there. @Stan-vw Shall we create an internal issue to cross link https://matomo.org/faq/how-to/faq_54/#faq_54 on the intranet documentation?

Stan-vw commented 1 year ago

We can make an internal issue, that's all good @sgiehl It's not 100% clear to me what the relation is between the two though, can someone help me to better understand the issue?

sgiehl commented 1 year ago

It might not be directly related. Maybe its also a general thing that people might not expect. Maybe we could also display that somewhere in Matomo itself, so people are aware that reports might be "truncated" after a certain amount of records and all remaining ones are grouped to a Others row.

Stan-vw commented 1 year ago

I'm not sure how people are missing that the bottom row literally says "Other", it'd be good to understand this better if that's a real problem that people are having (it's also pretty common practice in analytics tools, since otherwise tables/charts become unwieldy and/or unreadable).

Having had another look at it, I think I understand why you suggested the crosslink. I've added a section at the bottom of https://matomo.org/faq/how-to/faq_19/ that crosslinks to https://matomo.org/faq/how-to/faq_54/#faq_54

tisonv commented 1 year ago

As a new user, I missed the 'Other' row because I never unfold the whole "Pages". I was expecting to see the list every pages ever viewed on the selected time range (which can be a lot) so I didn't bother. I just go there and click on the magnifier to find the pages I need from their ids.

Even if they're grouped on "Other" for display, I would have expected them to be searchable or have a at least have message saying "it exists but it's insignificant. Go to faq_54 if you so desire them".

Stan-vw commented 1 year ago

Thanks for sharing that, I'll have to have another look at it at some point. I've added it to our list of UX elements to look at, and will keep my eyes open to see if this is a common issue.