Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
Alternative implementation of #22571 using WITH ROLLUP
To avoid running two queries for custom dimension reports, this PR explores the option of using rollup queries.
Fixture changes and updated result expectations have been copied from #22571. The failing entry in the test_reportLimitingdimension_2_rankingQuery__CustomDimensions.getCustomDimension_day.xml fixture stems from a new Others row being added by the implementation of this PR. Needs to be verified if the changes are actually correct or a bad side effect.
As WITH ROLLUP uses NULL for the summary rows, we need to prevent those values changing the results. If we cannot COALESCE those values we need find a way to update the counter expressions to handle them.
Followup tasks
Verify the query results are actually correct.
Add more tests and further improve test data.
Replace the subquery sorting with a sorted rollup if the database engine supports it.
The current way of SELECT FROM ( SELECT WITH ROLLUP ) ORDER BY should be compatible with all supported database engines. Using SELECT WITH ROLLUP ORDER BY without a subquery could remove the added filesort from the rollup queries, improving the overall performance.
Performance test against real data to ensure the performance does not degrade too much.
This should not only look at the SQL query performance, but also the results being created. In the example queries each distinct custom_dimension_1 value will create a separate rollup row, in addition to the ones already present. This could lead to an increased data table size we should be aware of before committing to a full rollup implementation.
Local testing
Compared the current queries with the new implementation, both WITH and WITHOUT the COALESCE addition, as my local dataset contained several entries with idaction_url = NULL that changed the results.
The duplicate url = NULL rows are creating some problems with the data table. They are messing with the ranking counter and are not merged into the __mtm_ranking_query_others__ row.
The query is only listed to highlight the problems of potential NULL values in a grouping column.
[ ] Potential edge cases thought about (behavior of the code with strange input, with strange internal state or possible interactions with other Matomo subsystems)
[ ] Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
Description
Alternative implementation of #22571 using
WITH ROLLUP
To avoid running two queries for custom dimension reports, this PR explores the option of using rollup queries.
Fixture changes and updated result expectations have been copied from #22571. The failing entry in the
test_reportLimitingdimension_2_rankingQuery__CustomDimensions.getCustomDimension_day.xml
fixture stems from a newOthers
row being added by the implementation of this PR. Needs to be verified if the changes are actually correct or a bad side effect.As
WITH ROLLUP
usesNULL
for the summary rows, we need to prevent those values changing the results. If we cannotCOALESCE
those values we need find a way to update the counter expressions to handle them.Followup tasks
SELECT FROM ( SELECT WITH ROLLUP ) ORDER BY
should be compatible with all supported database engines. UsingSELECT WITH ROLLUP ORDER BY
without a subquery could remove the addedfilesort
from the rollup queries, improving the overall performance.custom_dimension_1
value will create a separate rollup row, in addition to the ones already present. This could lead to an increased data table size we should be aware of before committing to a full rollup implementation.Local testing
Compared the current queries with the new implementation, both WITH and WITHOUT the
COALESCE
addition, as my local dataset contained several entries withidaction_url = NULL
that changed the results.Current state
SQL Query EXPLAIN Query Result Data
ROLLUP (with duplicate NULLs)
SQL Query EXPLAIN Query Result Data
The duplicate
url = NULL
rows are creating some problems with the data table. They are messing with the ranking counter and are not merged into the__mtm_ranking_query_others__
row.The query is only listed to highlight the problems of potential
NULL
values in a grouping column.ROLLUP with COALESCE
SQL Query EXPLAIN Query Result Data
Review