matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.67k stars 2.62k forks source link

Server-side conversion attribution across visits #12273

Open laszlovl opened 6 years ago

laszlovl commented 6 years ago

Currently, conversions are attributed (to a referer or campaign) by two means:

  1. By piwik.js storing campaign name & keyword in a cookie and passing it to _rcn or _rck in subsequent visits
  2. By the server side extracting referer information from the current visit

This causes various limitations in accurately tracking attributions. If a user converts in a 2nd or later visit (where URL referer/campaign information is no longer available) and the cookie is unavailable, attribution for that conversion is lost. For example:

  1. Thanks to campaign X, a user initially visits the site on their cellphone
  2. By setting a user-id, the user is tracked across multiple devices
  3. One day later, the user visits on their tablet and converts

The conversion is now attributed to "direct entry" instead of campaign X.

The same thing happens:

Other limitations are:

I propose that cross-visit goal attribution is handled on the server side instead. When a conversion is created, instead of fetching dimensions from the current visit, we should simply look in all of the visitor's visits and using the last (or first) visit with non-empty attributes. setConversionAttributionFirstReferrer would move from piwik.js to a server-side configuration option.

As far as I can see this would solve all problems, without significant performance issues:

explain select * from piwik_log_visit where idvisitor='abc' and (campaign_content is not null or campaign_id is not null or campaign_keyword is not null or campaign_medium is not null or campaign_name is not null or campaign_source is not null) order by idvisit desc limit 1;
+----+-------------+-----------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table           | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | piwik_log_visit | NULL       | index | NULL          | PRIMARY | 8       | NULL |    1 |    16.67 | Using where |
+----+-------------+-----------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

Perhaps the only caveat would be installations where the raw visitor logs are deleted after X days, so attribution wouldn't be possible if the conversion happens X (180 by default) days after the initial visit. I don't think that's an issue, but if it is we could keep the cookie attribution information as a fallback.

laszlovl commented 6 years ago

I created a proof of concept here: https://github.com/piwik/plugin-MarketingCampaignsReporting/compare/master...laszlovl:attribute-across-visits