matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.94k stars 2.66k forks source link

Actions/Pages: Unique pageviews shouldn't be summed up in hierarchical view #10127

Open adaqus opened 8 years ago

adaqus commented 8 years ago

This is a db query used when creating Actions/Pages report:

SELECT
            /* Actions */
            log_action.name,
            log_action.type,
            log_action.idaction,
            log_action.url_prefix,
            count(DISTINCT log_link_visit_action.idvisit)   AS `2`,
            count(DISTINCT log_link_visit_action.idvisitor) AS `1`,
            count(*)                                        AS `12`,
            sum(
                CASE WHEN custom_float IS NULL
                  THEN 0
                ELSE custom_float
                END
            ) / 1000                                        AS `30`,
            sum(
                CASE WHEN custom_float IS NULL
                  THEN 0
                ELSE 1
                END
            )                                               AS `31`,
            min(custom_float) / 1000                        AS `32`,
            max(custom_float) / 1000                        AS `33`,
            CASE WHEN (MAX(log_link_visit_action.custom_var_v5) = 0
                       AND log_link_visit_action.custom_var_k5 = '_pk_scount')
              THEN 1
            ELSE 0 END
                                                            AS `28`,
            SUM(CASE WHEN log_action_name_ref.type = 8
              THEN 1
                ELSE 0 END)
                                                            AS `29`
          FROM
            piwik_log_link_visit_action AS log_link_visit_action
            LEFT JOIN piwik_log_action AS log_action
              ON log_link_visit_action.idaction_url = log_action.idaction
            LEFT JOIN piwik_log_action AS log_action_name_ref
              ON log_link_visit_action.idaction_name_ref =
                 log_action_name_ref.idaction
          WHERE
            log_link_visit_action.server_time >= ?
            AND log_link_visit_action.server_time <= ?
            AND log_link_visit_action.idsite = ?
            AND log_link_visit_action.idaction_url IS NOT NULL AND
            log_link_visit_action.idaction_event_category IS NULL
          GROUP BY
            log_action.idaction
          ORDER BY
            `12` DESC, name ASC

In this query for each action type (group by log_action.idaction) number of unique visits is counted (count(DISTINCT log_link_visit_action.idvisit) AS 2). So when we have a visit with pageviews on /category1/sub1 and /category1/sub2, for each pagieview we have one unique visit (unique pageview). Category1 also has 1 unique visit, but it will be shown it has 2.

actions-pages

Values underlined (and each expandable element) are sums of unique pageviews from subelements, so some visits are counted more than once.

This will probably also apply to other reports from Actions group.

tsteur commented 8 years ago

I think this is kind of expected behaviour currently which doesn't mean that it is good or correct. ping @mattab

ghost commented 8 years ago

@mattab Can you let us know, if this behavior has to be changed in the future or will be the same in upcoming Piwik versions and no changes are planned? Many thanks in advance.

mattab commented 8 years ago

It's not correct to sum the number of unique pageviews and still call it a number of unique pageviews... so ideally we would change it in the future :+1:

ghost commented 8 years ago

@mattab We are ready to work on this topic and prepare PR, but first we have to know what actually we are going to do. Change they way how it's being calculated ? Do not display any value in that column for expandable nodes, or change the column label?

ghost commented 8 years ago

@mattab Can you please let me know what is your expected way of solving the issue? Change they way how it's being calculated ? Do not display any value in that column for expandable nodes, or change the column label?

We are ready to prepare adequate PR, but we need to know what changes will you approve and merge.

mattab commented 8 years ago

Hi @tomasztomik - It would be useful to solve this problem :+1: Here is how it could be solved:

I think this would solve the issue nicely. Maybe you have some feedback or suggestion?

danielwalo commented 7 years ago

@mattab I got sent here by a support rep. Just adding my 2 cents:

If I look at the aggregated row /category1/, I would like to see the number of visits that hit at least one of the subpages within /category1/. The sum of "Unique pageviews" for all subpages isn't a useful number.