Closed patrick-stickler-csc-fi closed 11 months ago
Hi @patrick-stickler-csc-fi, thanks for reporting this, it does seem odd.
Regarding questions 1&2:
I tried reproducing this on matomo.cloud and wasn't able to. So my guess right now is that the data for some or all of these segments might be out of date (there were some bugs in 4.2.1 around this logic). Can you try invalidating the segments for the date range and requesting them again? To do so you'll have to use the core:invalidate-report-data
command (there is also an InvalidateReports plugin that lets you invalidate through the UI, but this will not let you supply a segment that wasn't created in the UI). The segment parameter has to be encoded properly to work, it should have the same exact encoding that is sent in the API request URL.
Note: you can also add the segments to the UI via the API using the SegmentEditor.add
method (mentioning in case that is useful to you).
Regarding question 3:
Why are the pageTitle values in the JSON output double URL encoded? And why is there URL encoding at all, since they are valid JSON string value characters?
This is how Matomo allows operator characters to be placed in segment condition values. If for some reason, your page title had an =
or ^
or @
in it, simply using pageTitle==my=pageTitle^with@symbols
would not be parsed correctly. The value is encoded twice, because that was added to the code along time ago and hasn't been changed since. There's an issue for this here: https://github.com/matomo-org/matomo/issues/17050
Thank you for getting back to us. I will try your suggestions and let you know if it resolves the issues.
Regards,
Patrick
From: "dizzy" @.> To: "matomo-org/matomo" @.> Cc: "patrick stickler" @.>, "Mention" @.> Sent: Thursday, 6 May, 2021 02:18:39 Subject: Re: [matomo-org/matomo] Inconsistent results from Reporting API when defining segments using pageTitle =^ (starts with) (#17507)
Hi [ https://github.com/patrick-stickler-csc-fi | @patrick-stickler-csc-fi ] , thanks for reporting this, it does seem odd.
Regarding questions 1&2:
I tried reproducing this on matomo.cloud and wasn't able to. So my guess right now is that the data for some or all of these segments might be out of date (there were some bugs in 4.2.1 around this logic). Can you try invalidating the segments for the date range and requesting them again? To do so you'll have to use the core:invalidate-report-data command (there is also an InvalidateReports plugin that lets you invalidate through the UI, but this will not let you supply a custom segment). The segment parameter has to be encoded properly to work, it should have the same exact encoding that is sent in the API request URL.
Regarding question 3:
Why are the pageTitle values in the JSON output double URL encoded? And why is there URL encoding at all, since they are valid JSON string value characters? This is how Matomo allows operator characters to be placed in segment condition values. If for some reason, your page title had an = or ^ or @ in it, simply using @.*** would not be parsed correctly. The value is encoded twice, because that was added to the code along time ago and hasn't been changed since. There's an issue for this here: [ https://github.com/matomo-org/matomo/issues/17050 | #17050 ]
— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/matomo-org/matomo/issues/17507#issuecomment-833109363 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AGYS3P6AFNAF2QJJ6FBF45TTMHG47ANCNFSM43WZJV4Q | unsubscribe ] .
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
Hello,
Is there a way to use the console:invalidate-report-data which invalidates all report data for all dates for all segments for a particular site ID?
Or do I need to execute console:invalidate-report-data for each and every segment that I have used in the past via the analytics API?
Also, when I use the —cascade option, with -vvv and specifying a —dates parameter that is just the year, I get no definitive indication that all days have been invalidated. The verbose output is confusing. Why only 4 dates listed:
/var/www/html/matomo/console core:invalidate-report-data --dates=2021 --sites=14 --cascade -vvv Invalidating day periods in 2021 [segment = ]... Success. The following dates were invalidated successfully: 2021-05-10 Invalidating week periods in 2021 [segment = ]... Success. The following dates were invalidated successfully: 2021-05-10 Invalidating month periods in 2021 [segment = ]... Success. The following dates were invalidated successfully: 2021-05-01 Invalidating year periods in 2021 [segment = ]... Success. The following dates were invalidated successfully: 2021-01-01
Thanks,
Patrick
On 6. May 2021, at 2.18, dizzy @.***> wrote:
Hi @patrick-stickler-csc-fi https://github.com/patrick-stickler-csc-fi, thanks for reporting this, it does seem odd.
Regarding questions 1&2:
I tried reproducing this on matomo.cloud and wasn't able to. So my guess right now is that the data for some or all of these segments might be out of date (there were some bugs in 4.2.1 around this logic). Can you try invalidating the segments for the date range and requesting them again? To do so you'll have to use the core:invalidate-report-data command (there is also an InvalidateReports plugin that lets you invalidate through the UI, but this will not let you supply a custom segment). The segment parameter has to be encoded properly to work, it should have the same exact encoding that is sent in the API request URL.
Regarding question 3:
Why are the pageTitle values in the JSON output double URL encoded? And why is there URL encoding at all, since they are valid JSON string value characters?
This is how Matomo allows operator characters to be placed in segment condition values. If for some reason, your page title had an = or ^ or @ in it, simply using @.*** would not be parsed correctly. The value is encoded twice, because that was added to the code along time ago and hasn't been changed since. There's an issue for this here: #17050 https://github.com/matomo-org/matomo/issues/17050 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/matomo-org/matomo/issues/17507#issuecomment-833109363, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGYS3P6AFNAF2QJJ6FBF45TTMHG47ANCNFSM43WZJV4Q.
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
After running core:invalidate-report-data per below, we still see inexplicable behavior such as the following, when first trying to match against page title exactly versus as a prefix. The exact match fails, but the prefix match returns results that include events that should have matched exactly:
[]#
curl -k "https://localhost/index.php?token_auth=*&format=json&date=2021-05-10&period=day&idSite=14&module=API&method=Actions.getPageTitles&segment=pageTitle=^PRODUCTION%20%2F%20SSO%20%2F%20LOGIN%20%2F%20IDA&filter_limit=-1&filter_sort_column=nb_hits&filter_sort_order=desc”
[{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA","nb_visits":7,"nb_uniq_visitors":7,"nb_hits":9,"sum_time_spent":12,"avg_page_load_time":0,"avg_time_on_page":1,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ SUCCESS","nb_visits":5,"nb_uniq_visitors":1,"nb_hits":7,"sum_time_spent":391,"entry_nb_uniq_visitors":"1","entry_nb_visits":"4","entry_nb_actions":"7","entry_sum_visit_length":"674","entry_bounce_count":"2","exit_nb_uniq_visitors":"1","exit_nb_visits":"3","avg_page_load_time":0,"avg_time_on_page":56,"bounce_rate":"50%","exit_rate":"60%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BSUCCESS"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ HAKA","nb_visits":2,"nb_uniq_visitors":2,"nb_hits":3,"sum_time_spent":10,"avg_page_load_time":0,"avg_time_on_page":3,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BHAKA"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ CSCID","nb_visits":1,"nb_uniq_visitors":1,"nb_hits":1,"sum_time_spent":0,"avg_page_load_time":0,"avg_time_on_page":0,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BCSCID"}]#
Note that the encoding of the segments matches the Matomo dashboard UI behavior precisely.
Also, the exact same inexplicable behavior is shown by the dashboard UI when defining the exact same segments defined via the UI.
Any further advice?
Thanks,
Patrick
On 10. May 2021, at 10.28, Patrick Stickler @.***> wrote:
Hello,
Is there a way to use the console:invalidate-report-data which invalidates all report data for all dates for all segments for a particular site ID?
Or do I need to execute console:invalidate-report-data for each and every segment that I have used in the past via the analytics API?
Also, when I use the —cascade option, with -vvv and specifying a —dates parameter that is just the year, I get no definitive indication that all days have been invalidated. The verbose output is confusing. Why only 4 dates listed:
/var/www/html/matomo/console core:invalidate-report-data --dates=2021 --sites=14 --cascade -vvv Invalidating day periods in 2021 [segment = ]... Success. The following dates were invalidated successfully: 2021-05-10 Invalidating week periods in 2021 [segment = ]... Success. The following dates were invalidated successfully: 2021-05-10 Invalidating month periods in 2021 [segment = ]... Success. The following dates were invalidated successfully: 2021-05-01 Invalidating year periods in 2021 [segment = ]... Success. The following dates were invalidated successfully: 2021-01-01
Thanks,
Patrick
On 6. May 2021, at 2.18, dizzy @. @.>> wrote:
Hi @patrick-stickler-csc-fi https://github.com/patrick-stickler-csc-fi, thanks for reporting this, it does seem odd.
Regarding questions 1&2:
I tried reproducing this on matomo.cloud and wasn't able to. So my guess right now is that the data for some or all of these segments might be out of date (there were some bugs in 4.2.1 around this logic). Can you try invalidating the segments for the date range and requesting them again? To do so you'll have to use the core:invalidate-report-data command (there is also an InvalidateReports plugin that lets you invalidate through the UI, but this will not let you supply a custom segment). The segment parameter has to be encoded properly to work, it should have the same exact encoding that is sent in the API request URL.
Regarding question 3:
Why are the pageTitle values in the JSON output double URL encoded? And why is there URL encoding at all, since they are valid JSON string value characters?
This is how Matomo allows operator characters to be placed in segment condition values. If for some reason, your page title had an = or ^ or @ in it, simply using @.*** would not be parsed correctly. The value is encoded twice, because that was added to the code along time ago and hasn't been changed since. There's an issue for this here: #17050 https://github.com/matomo-org/matomo/issues/17050 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/matomo-org/matomo/issues/17507#issuecomment-833109363, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGYS3P6AFNAF2QJJ6FBF45TTMHG47ANCNFSM43WZJV4Q.
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @. @.>
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
Hi @patrick-stickler-csc-fi:
Is there a way to use the console:invalidate-report-data which invalidates all report data for all dates for all segments for a particular site ID? Or do I need to execute console:invalidate-report-data for each and every segment that I have used in the past via the analytics API?
Unfortunately not at the moment. You could specify a large enough date range (see below in the next part of this reply), but currently there's no way to specify all segments. And if there were, it would only be possible to invalidate segments that are specifically created in the UI or via the SegmentEditor API, since otherwise they would not be stored in the segment
table, and we wouldn't know what they are.
Also, when I use the —cascade option, with -vvv and specifying a —dates parameter that is just the year, I get no definitive indication that all days have been invalidated. The verbose output is confusing. Why only 4 dates listed:
/var/www/html/matomo/console core:invalidate-report-data --dates=2021 --sites=14 --cascade -vvv
The date in this command is incorrect, it should be a fully specified date in the YYYY-MM-DD format or a date range like YYYY-MM-DD,YYYY-MM-DD.
To invalidate everything within say 2019-2021 for a specific segment, you could use the command:
./console core:invalidate-report-data --dates=2019-01-01,today --sites=14 --segment=...
--cascade
is for cascading downwards (ie, invalidating days in the week). The command automatically invalidates periods above (ie, the month containing a week).
Note: If you are using browser triggered archiving, you can also just delete the archives for a site, at which point the archives would be created again. Before simply doing that it might be useful to verify the reason the data is inaccurate is because of outdata not being rearchived, and not because of some other bug. Invalidating would be an easy way to do that. Or you could delete the archive data for a single day and single site and see if the result from the API changes. (Don't delete archive data if you have log purging enabled, as then you wouldn't be able to recompute the archive data).
To delete archive data manually:
DELETE FROM archive_blob_YYYY_MM WHERE idsite = <idSite> AND date1 = <start of date range> AND date2 = <end of date range> AND period = <period type integer id>
DELETE FROM archive_numeric_YYYY_MM WHERE idsite = <idSite> AND date1 = <start of date range> AND date2 = <end of date range> AND period = <period type integer id>
Where idSite is the site ID, date1 is the start date of the period, date2 the end date, and period the period ID (1 = day, 2 = week, 3 = month, 4 = year, 5 = range).
To delete data for an entire site, run the following queries (again I wouldn't do this until the actual cause of the issue is found, especially since I don't know exactly how your matomo is configured):
DELETE FROM archive_blob_YYYY_MM WHERE idsite = ?
DELETE FROM archive_numeric_YYYY_MM WHERE idsite = ?
Thank you again for the detailed guidance. I executed the following sequence of queries/commands, and as shown, the console command has no affect. Am I still doing something incorrectly, or shouldn’t the console command have resulted in the initial query finding the events that the final query does?
[]#
./console core:invalidate-report-data --dates=2021-05-03 --sites=14 --segment=pageTitle==PRODUCTION%20%2F%20SSO%20%2F%20LOGIN%20%2F%20IDA -vvv
Invalidating day periods in 2021-05-03 [segment = pageTitle==PRODUCTION / SSO / LOGIN / IDA]... Success. The following dates were invalidated successfully: 2021-05-03 Invalidating week periods in 2021-05-03 [segment = pageTitle==PRODUCTION / SSO / LOGIN / IDA]... Success. The following dates were invalidated successfully: 2021-05-03 Invalidating month periods in 2021-05-03 [segment = pageTitle==PRODUCTION / SSO / LOGIN / IDA]... Success. The following dates were invalidated successfully: 2021-05-01 Invalidating year periods in 2021-05-03 [segment = pageTitle==PRODUCTION / SSO / LOGIN / IDA]... Success. The following dates were invalidated successfully: 2021-01-01
./console core:archive --force-all-websites --force-idsites=14
INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 INIT INFO [2021-05-11 05:48:22] 20875 Running Matomo 4.2.1 as Super User INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 NOTES INFO [2021-05-11 05:48:22] 20875 - If you execute this script at least once per hour (or more often) in a crontab, you may disable 'Browser trigger archiving' in Matomo UI > Settings > General Settings. INFO [2021-05-11 05:48:22] 20875 See the doc at: https://matomo.org/docs/setup-auto-archiving/ INFO [2021-05-11 05:48:22] 20875 - Async process archiving supported, using CliMulti. INFO [2021-05-11 05:48:22] 20875 - Reports for today will be processed at most every 900 seconds. You can change this value in Matomo UI > Settings > General Settings. INFO [2021-05-11 05:48:22] 20875 - Archiving was last executed without error 45s ago INFO [2021-05-11 05:48:22] 20875 - Will process 1 websites (--force-idsites) INFO [2021-05-11 05:48:22] 20875 - Will process all 1 websites INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 START INFO [2021-05-11 05:48:22] 20875 Starting Matomo reports archiving... INFO [2021-05-11 05:48:22] 20875 Start processing archives for site 14. INFO [2021-05-11 05:48:22] 20875 Will invalidate archived reports for today in site ID = 14's timezone (2021-05-11 00:00:00). INFO [2021-05-11 05:48:22] 20875 Will invalidate archived reports for yesterday in site ID = 14's timezone (2021-05-10 00:00:00). INFO [2021-05-11 05:48:22] 20875 Finished archiving for site 14, 0 API requests, Time elapsed: 0.502s [1 / 1 done] INFO [2021-05-11 05:48:22] 20875 Done archiving! INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 SUMMARY INFO [2021-05-11 05:48:22] 20875 Processed 0 archives. INFO [2021-05-11 05:48:22] 20875 Total API requests: 0 INFO [2021-05-11 05:48:22] 20875 done: 0 req, 561 ms, no error INFO [2021-05-11 05:48:22] 20875 Time elapsed: 0.562s INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 SCHEDULED TASKS INFO [2021-05-11 05:48:22] 20875 Starting Scheduled tasks... INFO [2021-05-11 05:48:22] 20875 done INFO [2021-05-11 05:48:22] 20875 —————————————
[]#
[{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA","nb_visits":15,"nb_uniq_visitors":15,"nb_hits":22,"sum_time_spent":72,"avg_page_load_time":0,"avg_time_on_page":3,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ SUCCESS","nb_visits":6,"nb_uniq_visitors":1,"nb_hits":14,"sum_time_spent":2866,"entry_nb_uniq_visitors":"1","entry_nb_visits":"6","entry_nb_actions":"20","entry_sum_visit_length":"7953","entry_bounce_count":"3","exit_nb_uniq_visitors":"1","exit_nb_visits":"5","avg_page_load_time":0,"avg_time_on_page":205,"bounce_rate":"50%","exit_rate":"83%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BSUCCESS"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ HAKA","nb_visits":8,"nb_uniq_visitors":8,"nb_hits":10,"sum_time_spent":7,"avg_page_load_time":0,"avg_time_on_page":1,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BHAKA"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ CSCID","nb_visits":3,"nb_uniq_visitors":3,"nb_hits":4,"sum_time_spent":15,"avg_page_load_time":0,"avg_time_on_page":4,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BCSCID"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ ERROR \/ NO_IDA_PROJECTS","nb_visits":2,"nb_uniq_visitors":2,"nb_hits":2,"sum_time_spent":0,"avg_page_load_time":0,"avg_time_on_page":0,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BERROR%2B%252F%2BNO_IDA_PROJECTS"}]#
After the console command, and if Matomo was working correctly, I would expect the first query to return the following:
[{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA","nb_visits":15,"nb_uniq_visitors":15,"nb_hits":22,"sum_time_spent":72,"avg_page_load_time":0,"avg_time_on_page":3,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ SUCCESS","nb_visits":6,"nb_uniq_visitors":1,"nb_hits":14,"sum_time_spent":2866,"entry_nb_uniq_visitors":"1","entry_nb_visits":"6","entry_nb_actions":"20","entry_sum_visit_length":"7953","entry_bounce_count":"3","exit_nb_uniq_visitors":"1","exit_nb_visits":"5","avg_page_load_time":0,"avg_time_on_page":205,"bounce_rate":"50%","exit_rate":"83%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BSUCCESS”}]
When an entirely new segment, neverbefore used, is specified in a query to the reporting API, is that segment, and its results in any way being stored by Matomo, for the optimization of future queries?
Is it possible that segments defined via the dashboard/UI might be affecting direct API queries specifying those same segments?
What in Matomo might be causing the initial query to fail, when clearly the data exists in the database, and is found by other queries? Or am I still doing something incorrectly in my queries?
Thanks,
Patrick
On 11. May 2021, at 0.57, dizzy @.***> wrote:
Hi @patrick-stickler-csc-fi https://github.com/patrick-stickler-csc-fi:
Is there a way to use the console:invalidate-report-data which invalidates all report data for all dates for all segments for a particular site ID? Or do I need to execute console:invalidate-report-data for each and every segment that I have used in the past via the analytics API?
Unfortunately not at the moment. You could specify a large enough date range (see below in the next part of this reply), but currently there's no way to specify all segments. And if there were, it would only be possible to invalidate segments that are specifically created in the UI or via the SegmentEditor API, since otherwise they would not be stored in the segment table, and we wouldn't know what they are.
Also, when I use the —cascade option, with -vvv and specifying a —dates parameter that is just the year, I get no definitive indication that all days have been invalidated. The verbose output is confusing. Why only 4 dates listed:
/var/www/html/matomo/console core:invalidate-report-data --dates=2021 --sites=14 --cascade -vvv
The date in this command is incorrect, it should be a fully specified date in the YYYY-MM-DD format or a date range like YYYY-MM-DD,YYYY-MM-DD.
To invalidate everything within say 2019-2021 for a specific segment, you could use the command:
./console core:invalidate-report-data --dates=2019-01-01,today --sites=14 --segment=...
--cascade is for cascading downwards (ie, invalidating days in the week). The command automatically invalidates periods above (ie, the month containing a week).
Note: If you are using browser triggered archiving, you can also just delete the archives for a site, at which point the archives would be created again. Before simply doing that it might be useful to verify the reason the data is inaccurate is because of outdata not being rearchived, and not because of some other bug. Invalidating would be an easy way to do that. Or you could delete the archive data for a single day and single site and see if the result from the API changes. (Don't delete archive data if you have log purging enabled, as then you wouldn't be able to recompute the archive data).
To delete archive data manually:
identify the table the data belongs to. This is based on the start date of the period of the archive. So if it's for a week starting on April 29th, 2021, the data would be in the archive_*_2021_04 tables. to delete data for a single period, run the queries: DELETE FROM archive_blob_YYYY_MM WHERE idsite =
AND date1 = AND date2 = AND period = DELETE FROM archive_numeric_YYYY_MM WHERE idsite = AND date1 = AND date2 = AND period = Where idSite is the site ID, date1 is the start date of the period, date2 the end date, and period the period ID (1 = day, 2 = week, 3 = month, 4 = year, 5 = range).
To delete data for an entire site, run the following queries (again I wouldn't do this until the actual cause of the issue is found, especially since I don't know exactly how your matomo is configured):
DELETE FROM archive_blob_YYYY_MM WHERE idsite = ? DELETE FROM archive_numeric_YYYY_MM WHERE idsite = ?
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
I don’t know if it matters, but I will note that we have configured Matomo to treat segments as filters per the legacy behavior, by defining the following in config.ini.php:
[General] enable_segments_cache = 0
On 11. May 2021, at 8.56, Patrick Stickler @.***> wrote:
Thank you again for the detailed guidance. I executed the following sequence of queries/commands, and as shown, the console command has no affect. Am I still doing something incorrectly, or shouldn’t the console command have resulted in the initial query finding the events that the final query does?
curl -k "https://localhost/index.php?token_auth=*&format=json&date=2021-05-03&period=day&idSite=14&module=API&method=Actions.getPageTitles&segment=pageTitle==PRODUCTION%20%2F%20SSO%20%2F%20LOGIN%20%2F%20IDA https://localhost/index.php?token_auth=*&format=json&date=2021-05-03&period=day&idSite=14&module=API&method=Actions.getPageTitles&segment=pageTitle==PRODUCTION%20%2F%20SSO%20%2F%20LOGIN%20%2F%20IDA&filter_limit=-1&filter_sort_column=nb_hits&filter_sort_order=desc”
[]#
./console core:invalidate-report-data --dates=2021-05-03 --sites=14 --segment=pageTitle==PRODUCTION%20%2F%20SSO%20%2F%20LOGIN%20%2F%20IDA -vvvInvalidating day periods in 2021-05-03 [segment = pageTitle==PRODUCTION / SSO / LOGIN / IDA]... Success. The following dates were invalidated successfully: 2021-05-03 Invalidating week periods in 2021-05-03 [segment = pageTitle==PRODUCTION / SSO / LOGIN / IDA]... Success. The following dates were invalidated successfully: 2021-05-03 Invalidating month periods in 2021-05-03 [segment = pageTitle==PRODUCTION / SSO / LOGIN / IDA]... Success. The following dates were invalidated successfully: 2021-05-01 Invalidating year periods in 2021-05-03 [segment = pageTitle==PRODUCTION / SSO / LOGIN / IDA]... Success. The following dates were invalidated successfully: 2021-01-01
./console core:archive --force-all-websites --force-idsites=14
INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 INIT INFO [2021-05-11 05:48:22] 20875 Running Matomo 4.2.1 as Super User INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 NOTES INFO [2021-05-11 05:48:22] 20875 - If you execute this script at least once per hour (or more often) in a crontab, you may disable 'Browser trigger archiving' in Matomo UI > Settings > General Settings. INFO [2021-05-11 05:48:22] 20875 See the doc at: https://matomo.org/docs/setup-auto-archiving/ https://matomo.org/docs/setup-auto-archiving/ INFO [2021-05-11 05:48:22] 20875 - Async process archiving supported, using CliMulti. INFO [2021-05-11 05:48:22] 20875 - Reports for today will be processed at most every 900 seconds. You can change this value in Matomo UI > Settings > General Settings. INFO [2021-05-11 05:48:22] 20875 - Archiving was last executed without error 45s ago INFO [2021-05-11 05:48:22] 20875 - Will process 1 websites (--force-idsites) INFO [2021-05-11 05:48:22] 20875 - Will process all 1 websites INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 START INFO [2021-05-11 05:48:22] 20875 Starting Matomo reports archiving... INFO [2021-05-11 05:48:22] 20875 Start processing archives for site 14. INFO [2021-05-11 05:48:22] 20875 Will invalidate archived reports for today in site ID = 14's timezone (2021-05-11 00:00:00). INFO [2021-05-11 05:48:22] 20875 Will invalidate archived reports for yesterday in site ID = 14's timezone (2021-05-10 00:00:00). INFO [2021-05-11 05:48:22] 20875 Finished archiving for site 14, 0 API requests, Time elapsed: 0.502s [1 / 1 done] INFO [2021-05-11 05:48:22] 20875 Done archiving! INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 SUMMARY INFO [2021-05-11 05:48:22] 20875 Processed 0 archives. INFO [2021-05-11 05:48:22] 20875 Total API requests: 0 INFO [2021-05-11 05:48:22] 20875 done: 0 req, 561 ms, no error INFO [2021-05-11 05:48:22] 20875 Time elapsed: 0.562s INFO [2021-05-11 05:48:22] 20875 --------------------------- INFO [2021-05-11 05:48:22] 20875 SCHEDULED TASKS INFO [2021-05-11 05:48:22] 20875 Starting Scheduled tasks... INFO [2021-05-11 05:48:22] 20875 done INFO [2021-05-11 05:48:22] 20875 —————————————
curl -k "https://localhost/index.php?token_auth=*&format=json&date=2021-05-03&period=day&idSite=14&module=API&method=Actions.getPageTitles&segment=pageTitle==PRODUCTION%20%2F%20SSO%20%2F%20LOGIN%20%2F%20IDA https://localhost/index.php?token_auth=*&format=json&date=2021-05-03&period=day&idSite=14&module=API&method=Actions.getPageTitles&segment=pageTitle==PRODUCTION%20%2F%20SSO%20%2F%20LOGIN%20%2F%20IDA&filter_limit=-1&filter_sort_column=nb_hits&filter_sort_order=desc”
[]#
curl -k "https://localhost/index.php?token_auth=*b&format=json&date=2021-05-03&period=day&idSite=14&module=API&method=Actions.getPageTitles&segment=pageTitle=^PRODUCTION%20%2F%20SSO%20%2F%20LOGIN%20%2F%20IDA https://localhost/index.php?token_auth=*b&format=json&date=2021-05-03&period=day&idSite=14&module=API&method=Actions.getPageTitles&segment=pageTitle=^PRODUCTION%20%2F%20SSO%20%2F%20LOGIN%20%2F%20IDA&filter_limit=-1&filter_sort_column=nb_hits&filter_sort_order=desc”
[{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA","nb_visits":15,"nb_uniq_visitors":15,"nb_hits":22,"sum_time_spent":72,"avg_page_load_time":0,"avg_time_on_page":3,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ SUCCESS","nb_visits":6,"nb_uniq_visitors":1,"nb_hits":14,"sum_time_spent":2866,"entry_nb_uniq_visitors":"1","entry_nb_visits":"6","entry_nb_actions":"20","entry_sum_visit_length":"7953","entry_bounce_count":"3","exit_nb_uniq_visitors":"1","exit_nb_visits":"5","avg_page_load_time":0,"avg_time_on_page":205,"bounce_rate":"50%","exit_rate":"83%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BSUCCESS"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ HAKA","nb_visits":8,"nb_uniq_visitors":8,"nb_hits":10,"sum_time_spent":7,"avg_page_load_time":0,"avg_time_on_page":1,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BHAKA"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ CSCID","nb_visits":3,"nb_uniq_visitors":3,"nb_hits":4,"sum_time_spent":15,"avg_page_load_time":0,"avg_time_on_page":4,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BCSCID"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ ERROR \/ NO_IDA_PROJECTS","nb_visits":2,"nb_uniq_visitors":2,"nb_hits":2,"sum_time_spent":0,"avg_page_load_time":0,"avg_time_on_page":0,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BERROR%2B%252F%2BNO_IDA_PROJECTS"}]#
After the console command, and if Matomo was working correctly, I would expect the first query to return the following:
[{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA","nb_visits":15,"nb_uniq_visitors":15,"nb_hits":22,"sum_time_spent":72,"avg_page_load_time":0,"avg_time_on_page":3,"bounce_rate":"0%","exit_rate":"0%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA"},{"label":" PRODUCTION \/ SSO \/ LOGIN \/ IDA \/ SUCCESS","nb_visits":6,"nb_uniq_visitors":1,"nb_hits":14,"sum_time_spent":2866,"entry_nb_uniq_visitors":"1","entry_nb_visits":"6","entry_nb_actions":"20","entry_sum_visit_length":"7953","entry_bounce_count":"3","exit_nb_uniq_visitors":"1","exit_nb_visits":"5","avg_page_load_time":0,"avg_time_on_page":205,"bounce_rate":"50%","exit_rate":"83%","segment":"pageTitle==PRODUCTION%2B%252F%2BSSO%2B%252F%2BLOGIN%2B%252F%2BIDA%2B%252F%2BSUCCESS”}]
When an entirely new segment, neverbefore used, is specified in a query to the reporting API, is that segment, and its results in any way being stored by Matomo, for the optimization of future queries?
Is it possible that segments defined via the dashboard/UI might be affecting direct API queries specifying those same segments?
What in Matomo might be causing the initial query to fail, when clearly the data exists in the database, and is found by other queries? Or am I still doing something incorrectly in my queries?
Thanks,
Patrick
On 11. May 2021, at 0.57, dizzy @. @.>> wrote:
Hi @patrick-stickler-csc-fi https://github.com/patrick-stickler-csc-fi:
Is there a way to use the console:invalidate-report-data which invalidates all report data for all dates for all segments for a particular site ID? Or do I need to execute console:invalidate-report-data for each and every segment that I have used in the past via the analytics API?
Unfortunately not at the moment. You could specify a large enough date range (see below in the next part of this reply), but currently there's no way to specify all segments. And if there were, it would only be possible to invalidate segments that are specifically created in the UI or via the SegmentEditor API, since otherwise they would not be stored in the segment table, and we wouldn't know what they are.
Also, when I use the —cascade option, with -vvv and specifying a —dates parameter that is just the year, I get no definitive indication that all days have been invalidated. The verbose output is confusing. Why only 4 dates listed:
/var/www/html/matomo/console core:invalidate-report-data --dates=2021 --sites=14 --cascade -vvv
The date in this command is incorrect, it should be a fully specified date in the YYYY-MM-DD format or a date range like YYYY-MM-DD,YYYY-MM-DD.
To invalidate everything within say 2019-2021 for a specific segment, you could use the command:
./console core:invalidate-report-data --dates=2019-01-01,today --sites=14 --segment=...
--cascade is for cascading downwards (ie, invalidating days in the week). The command automatically invalidates periods above (ie, the month containing a week).
Note: If you are using browser triggered archiving, you can also just delete the archives for a site, at which point the archives would be created again. Before simply doing that it might be useful to verify the reason the data is inaccurate is because of outdata not being rearchived, and not because of some other bug. Invalidating would be an easy way to do that. Or you could delete the archive data for a single day and single site and see if the result from the API changes. (Don't delete archive data if you have log purging enabled, as then you wouldn't be able to recompute the archive data).
To delete archive data manually:
identify the table the data belongs to. This is based on the start date of the period of the archive. So if it's for a week starting on April 29th, 2021, the data would be in the archive_*_2021_04 tables. to delete data for a single period, run the queries: DELETE FROM archive_blob_YYYY_MM WHERE idsite =
AND date1 = AND date2 = AND period = DELETE FROM archive_numeric_YYYY_MM WHERE idsite = AND date1 = AND date2 = AND period = Where idSite is the site ID, date1 is the start date of the period, date2 the end date, and period the period ID (1 = day, 2 = week, 3 = month, 4 = year, 5 = range).
To delete data for an entire site, run the following queries (again I wouldn't do this until the actual cause of the issue is found, especially since I don't know exactly how your matomo is configured):
DELETE FROM archive_blob_YYYY_MM WHERE idsite = ? DELETE FROM archive_numeric_YYYY_MM WHERE idsite = ?
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @. @.> -- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
On 6. May 2021, at 2.18, dizzy @.***> wrote:
I tried reproducing this on matomo.cloud and wasn't able to.
Is matomo.cloud running 4.2.1? The same latest version that is available for download? Or is it in any way different from the publically released 4.2.1 code base?
So my guess right now is that the data for some or all of these segments might be out of date (there were some bugs in 4.2.1 around this logic).
Do you have any links to the issues or discussion pertaining to the known bugs you refer to? So I can review their nature, and what is being done to resolve them?
Thank you.
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
Is matomo.cloud running 4.2.1? The same latest version that is available for download? Or is it in any way different from the publically released 4.2.1 code base?
@patrick-stickler-csc-fi Matomo cloud is running 4.2.1, but we have already applied some of the PRs that are part of the upcoming 4.3.0
@patrick-stickler-csc-fi more answers:
When an entirely new segment, neverbefore used, is specified in a query to the reporting API, is that segment, and its results in any way being stored by Matomo, for the optimization of future queries?
There are two types of data in Matomo, log data and archive data. Log data is the raw visit data, each action a user takes. The archive data is the result of aggregating that data; it's the report data cached in the archive tables.
So the answer is yes, that is one half of Matomo's core report generation mechanism.
Is it possible that segments defined via the dashboard/UI might be affecting direct API queries specifying those same segments?
No, it shouldn't be. The segment definitions are hashed via md5 and identified that way when looking in the DB, so it's possible a collision might cause two segments to have the same hash, but that is very unlikely.
What in Matomo might be causing the initial query to fail, when clearly the data exists in the database, and is found by other queries? Or am I still doing something incorrectly in my queries?
The result of this test confirms that at least one issue is that archives are not being rearchived. Since it's a custom segment archiving should be triggered, but instead of either showing the old data or showing different data, it shows nothing. There are quite a few fixes around this logic in 4.3, so it might be better to wait until 4.3 is released (which should be relatively soon).
To continue diagnosing, you could run the following query:
SELECT * FROM archive_numeric_2021_05 WHERE idsite = 14 AND date1 = '2021-05-03' and date2 = '2021-05-03' and period = 1 AND name LIKE 'done7fc20d1d1a85a360f07b17433450892f%';
This should provide archive status rows for the first query that is returning []
for you. Also, could you try running the same curl as before with the following query parameter: &segment=pageTitle%3D%3DPRODUCTION%252520%25252F%252520SSO%252520%25252F%252520LOGIN%252520%25252F%252520IDA
?
Do you have any links to the issues or discussion pertaining to the known bugs you refer to? So I can review their nature, and what is being done to resolve them?
There were several fixes around the archiving logic in 4.3. These specific fixes are likely relevant to this problem:
On 11. May 2021, at 20.58, dizzy @.***> wrote:
To continue diagnosing, you could run the following query:
SELECT * FROM archive_numeric_2021_05 WHERE idsite = 14 AND date1 = '2021-05-03' and date2 = '2021-05-03' and period = 1 AND name LIKE 'done7fc20d1d1a85a360f07b17433450892f%'; This should provide archive status rows for the first query that is returning [] for you.
MariaDB [matomo]> SELECT * FROM matomo_archive_numeric_2021_05 WHERE idsite = 14 AND date1 = '2021-05-03' and date2 = '2021-05-03' and period = 1 AND name LIKE 'done7fc20d1d1a85a360f07b17433450892f%'; Empty set (0.00 sec)
Also, could you try running the same curl as before with the following query parameter: &segment=pageTitle%3D%3DPRODUCTION%252520%25252F%252520SSO%252520%25252F%252520LOGIN%252520%25252F%252520IDA?
[]%
So, no results for either.
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
@patrick-stickler-csc-fi 4.3.0 should be released relatively soon, it will hopefully solve these issues for you. This issue would be rather hard to debug through email so I think it'd be better to wait for now.
We will wait for v4.3. Thank you for your help.
Patrick
On 13. May 2021, at 1.46, dizzy @.***> wrote:
@patrick-stickler-csc-fi https://github.com/patrick-stickler-csc-fi 4.3.0 should be released relatively soon, it will hopefully solve these issues for you. This issue would be rather hard to debug through email so I think it'd be better to wait for now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/matomo-org/matomo/issues/17507#issuecomment-840142094, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGYS3P4TL4BRTX4NZ4L7HJ3TNMAOFANCNFSM43WZJV4Q.
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
Hi @patrick-stickler-csc-fi, were you able to update to 4.3 and see if this fixed the issues you were experiencing?
I have now updated our Matomo instance to 4.3.1 and will test if it resolves the issues.
Should the update affect both past and newly aggregated data, or only newly aggregated data?
Patrick
On 26. May 2021, at 23.58, dizzy @.***> wrote:
Hi @patrick-stickler-csc-fi https://github.com/patrick-stickler-csc-fi, were you able to update to 4.3 and see if this fixed the issues you were experiencing?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/matomo-org/matomo/issues/17507#issuecomment-849112745, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGYS3P24DKXJQVHNAJQJ3RTTPVOGXANCNFSM43WZJV4Q.
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
@patrick-stickler-csc-fi only newly aggregated data. You may need to invalidate old data to initiate aggregation again.
OK, I have a script that invalidates old data per all of the segmentations we use, for all date periods we have aggregated data. I will run that, and then start regenerating reports, and hopefully the issues will be resolved (at least for new data from now onward, which would be sufficient).
Thanks.
Patrick
On 27. May 2021, at 8.53, dizzy @.***> wrote:
@patrick-stickler-csc-fi https://github.com/patrick-stickler-csc-fi only newly aggregated data. You may need to invalidate old data to initiate aggregation again.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/matomo-org/matomo/issues/17507#issuecomment-849344243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGYS3P2E2PIQ7HWMP5MUOUDTPXM4FANCNFSM43WZJV4Q.
-- Patrick Stickler Senior Software Specialist, Research Data Services CSC - IT Center for Science Ltd., PO Box 405, 02101 Espoo, FINLAND +358 50 381 8615, @.***
Thanks for contributing to this issue. As it has been a few months since the last activity and we believe this is likely not an issue anymore, we will now close this. If that's not the case, please do feel free to either reopen this issue or open a new one. We will gladly take a look again!
We are encountering unexplainable inconsistency in the Matomo API responses when segmenting data using pageTitle prefixes.
Please see the query examples below and their Matomo responses, and the subsequent questions.
Notes:
1 - We do not utilize the Matomo Javascript at all, but push all event data directly to the Matomo Tracking API using custom application code, whereby we explicitly specify the title using the action_name= URL parameter.
2 - None of the segments used in the queries below are pre-defined via the Matomo web UI and it is understood/presumed that it is not necessary to do so, that if the segment is not cached, it will be created in order to produce the appropriate query response.
3 - In the examples below, the query URLs have been sanitized and the JSON response from Matomo has been pruned to only show the most relevant fields, which provide the particular values needed by our application. All queries reference the same site ID.
Query examples and responses:
1 - Segment: pageTitle == "PRODUCTION / IDA / FILES / FREEZE / FILE":
2 - Segment: pageTitle =^ "PRODUCTION / IDA / FILES / FREEZE / FILE":
3 - Segment: pageTitle =^ "PRODUCTION / IDA / FILES / FREEZE /":
4 - Segment: pageTitle =^ "PRODUCTION / IDA / FILES / FREEZE ":
5 - Segment: pageTitle =^ "PRODUCTION / IDA / FILES / FREEZE":
6 - Segment: pageTitle =^ "PRODUCTION / IDA / FILES / FREEZ":
Questions:
1 - Why does the first == (equals) query not produce the same results as the second =^ (starts with) query, where they are identical except for the pageTitle comparison operator? Surely it is always true that a string starts with itself.
2 - Why do the last 4 =^ (starts with) queries not report the same results, when they should match the same set of event page titles, since the event page titles all start with all of the variant prefix strings?
3 - Why are the pageTitle values in the JSON output double URL encoded? And why is there URL encoding at all, since they are valid JSON string value characters? I would expect the JSON output to be either:
or
Expected Behavior
Page titles are matched correctly and consistently per the specified =^ starts with prefix string.
Current Behavior
See above.
Your Environment
Matomo version: 4.2.1 MySQL version: 5.5.68-MariaDB PHP version: 7.3.19 utf8mb4 used throughout
Active Plugins:
Actions (Core) CustomJsTracker (Core) Dashboard (Core) DevicePlugins (Core) DevicesDetection (Core) Diagnostics (Core) Events (Core) Feedback (Core) ForceSSL (v4.0.1) GeoIp2 (Core) Goals (Core) Heartbeat (Core) ImageGraph (Core) Insights (Core) Live (Core) Login (Core) Marketplace (Core) Monolog (Core) Overlay (Core) PagePerformance (Core) PrivacyManager (Core) Resolution (Core) ScheduledReports (Core) SegmentEditor (Core) Tour (Core) Transitions (Core) TwoFactorAuth (Core) UserCountry (Core) UserCountryMap (Core) UserId (Core) UserLanguage (Core) VisitFrequency (Core) VisitTime (Core) VisitorInterest (Core) VisitsSummary (Core) Widgetize (Core)