Open vkovalcik opened 10 months ago
This issue has been mentioned on Matomo forums. There might be relevant details there:
https://forum.matomo.org/t/incorrect-aggregated-info-for-imported-data/53897/5
Keen to hear if more people have this issue. Sounds like it might be a result of the very specific setup. If more people have this problem it can help us understand the problem and subsequent prioritisation.
I completely understand. I am waiting for the final release of Matomo 5 to upgrade and see if the bug won't accidentally vanish :) If not, I would try to go a bit deeper into this to see if I can find something interesting.
EDIT: I have detached the bug from this single comment to a separate issue: https://github.com/matomo-org/matomo/issues/21808
OLD: After a lot of digging in the code and messing with phpMyAdmin I think I found the underlying issue:
In some cases the core:invalidate-report-data invalidates even such day data for which there are no logs. (Furthermore, those data are subsequently deleted, probably during some maintenance operations... fortunately, I have some backups).
What exactly happens:
During invalidation in ArchiveInvalidator::findOlderDateWithLogs() it is checked whether for the archive there are actually log data, but the check is only done using the number of days from the "Delete logs when older than..." option. If the user is trying to invalidate older archives than the specified number of days, the invalidation doesn't proceed. However, since for me this option is essentially set to infinity, the check always succeeds and the archives are happily invalidated even there are no matching logs.
I guess the actual minimum date of the entries in the logs should be used instead.
As a current solution, I will use core:invalidate-report-data with --periods=week,month,year The InvalidateReports plugin have no such options, so it is pretty dangerous.
And there is also ANOTHER bug/weird behaviour:
Through a sequence of actions I got table that sometimes contain same "archiveid" number for different sets of siteid+date1+date2+period, which I believe shouldn't be possible.
(Again I expect that when I now invalidate weeks, it will use this archiveid and mark also some days as invalidated and delete them)
I did this:
So far I guess there is something weird with Sequence and getting a new archiveid, but I wasn't able to understand the inner workings.
I can send you the table before the attempt to invalidate and archive it ... I would rather not share it publicly and send it privately, if that is an option.
I have detached the FIRST bug to its separate issue: https://github.com/matomo-org/matomo/issues/21808
As noted in the previous comment, there is probably at least one other bug, not related to that one.
Sorry mis-click, I didn't mean to close it.
I suspect I might know the cause of the wrong archiveid numbers in the archives:
It was caused by merging archives from one Matomo installation to another by JUST taking the archive tables and copying it to the other database (while adjusting siteid). The problem is that I didn't copy contents of the _sequence table, which is undocumented, but seems to be very critical for the archives. It probably contains last used archiveid in each archive and from there the new archiveid is generated. If this table is missing or contains wrong (low) numbers, I guess the archiveid starts from the 0/1 again and can mistakenly use numbers that are already used.
My proposal is to:
This is probably not the end of my journey :) There might at least one more issue considering the blobs. But for it I need to get more information, so for now I formulated a question on the forum: https://forum.matomo.org/t/merging-archive-data-what-exactly-is-in-blobs/54981
What happened?
After importing data from the Google Analytics, even if everything was invalidated, the core:archive seem to ignore the actual day data and create week archives with 0 visits (same for month and year archives). Not ALL week data are 0 though. There is usually one week per month with some data in it (I guess copied just from a single day or two).
When I view such a data in web UI and select Custom Range with up to six days, all seems to be working and total stats are correct.
When I select 7 days (even it is from Wednesday to the next Tuesday), it shows "0" for most stats.
See this difference in stats below charts. Working one:
And a wrong one with the same days plus one more:
What should happen?
There should be correct week stats in the database and correctly show stats in the UI. (I am not sure if these are not two separate issues, but perhaps not.)
How can this be reproduced?
This will be tough :/ I did it like this:
Imported Google Analytics data from 2007 to 2023 on a different computer (but with Matomo and DB settings copied) and had it under different sideid. I migrated from this "testing" DB to DB on with my live stats and changed siteid in phpMyAdmin, so there might be some points of possible failure, but it seems to be working on the day-level.
Then I invalidated all the reports using the Invalidate plugin (which as per docs seems to keep day data as they there are no logs from the respective days).
Then I ran core:archive, which outputted a lot of “Archiving week XY: 0 total visits” even though that is incorrect.
Matomo Version
Matomo 4
Matomo Patch or Minor Version
4.15.1
PHP Version
8.1.18
Server Operating System
Debian GNU/Linux 9
What browsers are you seeing the problem on?
Firefox
Computer Operating System
Windows 10
Relevant log output
Validations