matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.91k stars 2.65k forks source link

Heart beat should attribute spent time to the correct action #9539

Open tsteur opened 8 years ago

tsteur commented 8 years ago

So far, whenever we send a ping request from the heart beat feature, we increase the log_visit.visit_total_time. (by calculating current timestamp - timestamp of first action). The last action itself remains unchanged.

This means that if eg the last action was a content impression, an event, or anything else we will attribute the last time spent to this action instead of to the last pageview. While this might be expected in a few cases, it will be surely not always expected.

Also say you open two pages of the same website in a new tab, both pages will send ping events. No matter on which page you are the pings will be always attributed to the page (or action) that was opened last.

Solution: Generate a random string in piwik.js that defines a specific action. Eg piwik.js would generate 545jfj3M343 and send such a random string with any action that it creates. The random string is different for each action. Now, to attribute the ping to the correct action in the database, we would send a ping=1 along with this generated random id. This way we can exactly attribute the correct time spend on this page (or event or ...).

Downsides: We need a new column in the database and a new index. We could store the mapping from random string to specific action id temporarily in a redis database if queued tracking is used or if Redis is available but not everyone would profit from it.

Another solution would be to return the idlink_va of the action via the tracker API but it would leak sensitive information so this is actually not a solution. Only mentioning it in case someone thinks "Why don't you just return the ID of the action". One could exactly find out this way how many actions there are generated on a certain website etc.

The only metric that heart beat is useful for so far is the total time spent, we cannot really say on which action the time was spent. Maybe we should remove the "time spent on action" from the last action in visitor log until we fixed this issue.

Heartbeat must make sure to update also the time spent on the last page as a result of this. Otherwise we still have pages with 0s.

hpvd commented 8 years ago

Also say you open two pages of the same website in a new tab, both pages will send ping events. No matter on which page you are the pings will be always attributed to the page (or action) that was opened last.

When testing https://github.com/piwik/piwik/issues/9504, we got the same expierence. Your are proposing

Solution: Generate a random string in piwik.js that defines a specific action. Eg piwik.js would generate 545jfj3M343 and send such a random string with any action that it creates. The random string is different for each action.

hmm not quite sure about this. Is an ID per action needed or could one send with each "tracking signal" no matter which kind (ping, event...) in addition a PAGE identify-ID to get where it comes from?

tsteur commented 8 years ago

in addition a PAGE identify-ID to get where it comes from

That's pretty much what I meant :)

hpvd commented 8 years ago

:-) sorry I didn't get it...