matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.91k stars 2.65k forks source link

Time spent on page calculation is buggy #9198

Open tsteur opened 9 years ago

tsteur commented 9 years ago
  1. Avg time spent on a page is calculated by dividing the sum of all time_spent_ref_action divided by the number of visits nb_visits. Not all visits have time_spent_ref_action though. Instead sum_time_spent should be divided by something like nb_hits_with_time_spent.
  2. In tracker we calculate the time_spent_ref_action wrong. It calculates visit_last_action_time - currentTimestamp but visit_last_action_time is updated on any tracking call, meaning also on any hit.

To make it a bit more clear let's say there are the following tracking calls

The time spent for first pageview is calculated by the time difference between the event and the pageview, not between the two pageviews. This means for many common scenarios where one triggers a pageview and then an event, search, content impression, ... the time spent information is not accurate.

hpvd commented 9 years ago

uuh good find.- that's quite important!

SR-mkuhn commented 9 years ago

Seems like we are experiencing this too -> http://forum.piwik.org/t/update-to-2-15-changed-average-visit-duration-and-time-on-page/16744

img

tsteur commented 9 years ago

I think the issue I mentioned here has been an issue for a long time and not only since the last update. From which Piwik version did you update? I presume the problem you are describing might be actually a different one

mattab commented 9 years ago

Marking this issue as duplicate of #9199 - which was renamed to include in its scope this bug

Edit: re-opened this issue as it may be easier to fix this one rather than #9199

See also: https://github.com/matomo-org/matomo/issues/9539

SR-mkuhn commented 8 years ago

@tsteur : from 2.14.3 to 2.15.0 And this is just one of 800 Sites tracked in one Piwik instance. (it has effect on the other 799 too)

SR-mkuhn commented 8 years ago

Main Question is: which counting is correct?

tsteur commented 8 years ago

@SR-mkuhn this particular issue has been buggy for quite a while and not only since last update I think.

tsteur commented 8 years ago

@SR-mkuhn maybe create a new issue for your problem and describe it there

sebastianpiskorski commented 8 years ago

I've also encountered this issue recently. And I've found that there is problem with Metric calculation:

Time spent on site is defined as sum_time_spent and calculated as SUM(): https://github.com/piwik/plugin-CustomDimensions/blob/master/Archiver.php#L164

SUM() function in SQL databases omits records containing NULL values. Later average time spend on time is calculated by dividing this sum by number of visits nb_hits ( https://github.com/piwik/piwik/blob/master/plugins/Actions/Columns/Metrics/AverageTimeOnPage.php#L39 ) which is calculated as COUNT(*) ( https://github.com/piwik/piwik/blob/master/plugins/Actions/Archiver.php#L359 ).

The problem is that COUNT(*) counts all rows, even those containing NULL value. So average value isn't average at all. Solution would be use of SUM(COALESCE(sum_time_spent, 0)) which will count NULL values or introducing nb_hits_with_time_spent as COUNT(sum_time_spent), then using it to divide as @tsteur said.

petecroaker commented 7 years ago

I’ve just encountered the same issue. Checking at the Visitor Profile, I can see that if I look at interactions where we have a page with lots of tracking events occurring, the Page View event gets a minimal time whereas the events are given times between each interaction. As such the user could be interacting on a page for say a minute or more, triggering numerous events, but the page dwell time would be still close to zero. Surely this is a major bug. This means that any page which has subsequent events occurring will have a incorrect dwell time.

mattab commented 7 years ago

Would agree this bug is quite major as it is causing the Time on page to be wrong, for any page tracking events.

hpvd commented 7 years ago

it is causing the Time on page to be wrong, for any page tracking events.

Possibly there is a strong relationship to: we have no "time on url": Piwik handels events as leaving page (at least in visitor log) #11546

mgloss commented 5 years ago

Has it been already solved? We had Piwik 2.something, now we are in upgrade process to matomo 3.7. and I am wondering if it will be correct. I have checked the previous data in database. Zou can easily see, when filter one specific idvisit in piwik_log_link_action_table and you will see, that every event will close the time on pageview. Especially if you are usinng some events like formSeen, bannerImpression etc. you will understand that it is not correct. And also time spent with some events is soo funny. Thank you also for pointing me to some another issue etc.

tsteur commented 5 years ago

As the issue is still open I don't think anything has been solved here yet AFAIK. @mattab that might be indeed quite important to fix the time on page.

hatdio commented 5 years ago

I also encountered this issue getting wrong time on page. Is this planned by anyone? I know it's in the backlog, but it's the older of the - only - two issues labeled as Major + Bug

rennyeb commented 2 years ago

I'm being hit by this problem, too - my "AVG. TIME ON PAGE" numbers are coming out as near-zero due to events on the page.

How do we get this bug prioritised for fixing, please?

In case it's helpful to anyone, in my local Matomo deployment I unashamedly hacked my ./plugins/Actions/Archiver.php file and commented-out the line which restricts by getWhereClauseActionIsNotEvent:

    /**
     * Time per action
     */
    protected function archiveDayActionsTime($rankingQueryLimit)
    {
        $rankingQuery = false;
        if ($rankingQueryLimit > 0) {
            $rankingQuery = new RankingQuery($rankingQueryLimit);
            $rankingQuery->addLabelColumn('idaction');
            $rankingQuery->addColumn(PiwikMetrics::INDEX_PAGE_SUM_TIME_SPENT, 'sum');
            $rankingQuery->partitionResultIntoMultipleGroups('type', array_keys($this->actionsTablesByType));

            $extraSelects = "log_action.type, log_action.name, count(*) as `" . PiwikMetrics::INDEX_PAGE_NB_HITS . "`,";
            $from = array(
                "log_link_visit_action",
                array(
                    "table"  => "log_action",
                    "joinOn" => "log_link_visit_action.%s = log_action.idaction"
                )
            );
            $orderBy = "`" . PiwikMetrics::INDEX_PAGE_NB_HITS . "` DESC, log_action.name ASC";
        } else {
            $extraSelects = false;
            $from = "log_link_visit_action";
            $orderBy = false;
        }

        $select = "log_link_visit_action.%s as idaction, $extraSelects
                sum(log_link_visit_action.time_spent_ref_action) as `" . PiwikMetrics::INDEX_PAGE_SUM_TIME_SPENT . "`";

        $where = $this->getLogAggregator()->getWhereStatement('log_link_visit_action', 'server_time');
        $where .= " AND log_link_visit_action.time_spent_ref_action > 0
                 AND log_link_visit_action.%s > 0"
//            . $this->getWhereClauseActionIsNotEvent() //include time spent in events as well
;

Informally, this worked for my use case - I haven't given any thought about whether this is a robust solution.

desertking commented 1 year ago

How can this issue still be open after 7 years? Just wondered why the big gap between GA3 (Universal) data and this one could be and found out that his has been discussed a few times. Is there any ohter workaround in the code for that to ignore the users that spent 0 time in the "avg time on page" row?

9joshua commented 6 months ago

This is still an issue as currently multiple users are experiencing a problem with this and see a significant difference between GA and Matomo time spent on page.

atom-box commented 4 months ago

This has been carefully investigated by one of our heavy users. They tracked down six real visits and found a difference of 6 to 1.

According to behavior >> pages: 9 seconds average time But if you do a manual calculation from Visitors >> Visits log: 54 seconds average time

hamlet-behavior-pages hamlet-time-in-visits-log

atom-box commented 4 months ago

A different user has reported that Behavior >> Pages report is inconsistent before/after applying a segment. They sent screenshots at https://github.com/matomo-org/matomo/issues/4719#issuecomment-2135980876

atom-box commented 4 months ago

I reproduced this error.

Time on page is inaccurate in Behavior >> Pages Time on page is accurate in Visitors >> Visits Log

image image

ronak-innocraft commented 4 months ago

We have reviewed this Bug in our new triaging process and this has turned out to be a higher priority in comparison to other bugs we have triaged so far and will be aiming to plan a fix for this in Q3 and we will update you on the progress here when we have an update to share.

nick-myers-dt commented 1 week ago

I hope you're doing well. I’m one of the customers impacted by the issue outlined in ticket #9198 regarding inaccuracies in the "time spent on page" calculations. I noticed that the issue is no longer associated with any upcoming release.

Could you please provide an update on whether this is still being actively worked on and when we might expect a resolution? Accurate time-on-page metrics are critical for our reporting, and I’d like to plan accordingly based on your timeline.

Thank you in advance for your time and attention to this. I look forward to hearing from you.

ronak-innocraft commented 1 week ago

@nick-myers-dt This is certainly something we want to fix and is a priority for us to look at. Currently we are figuring out the best possible way to implement this. I believe that there is an error in automation somewhere which removed the version and we do tend to adjust our plans depending on the priorities shift but it certainly is something we look to get it in upcoming minor releases.