matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.92k stars 2.66k forks source link

Google ads keyword match modifiers issue with (+ character) #13467

Open kumar-ebalnasral opened 6 years ago

kumar-ebalnasral commented 6 years ago

I’m using Matomo 3.6.0 I described issue in details on matomo forum Matomo removes broad match modifier (plus sign) from keywords and replace it with space character. The source of issue is urldecode function. As urlencode documentation says:

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent ( % ) sign followed by two hex digits and spaces encoded as plus ( + ) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

So urldecode consider plus sign in string as encoded space character and decode it accordingly. But in this case the result is incorrect and undesirable cause +women’s +hats and women’s hats are different search keywords. Saving keyword modifiers is necessary for analytics purposes.

Is it safe to replace all urlencode/urldecode functions with it's raw- analogues?

P.S. match modifiers docs https://support.google.com/google-ads/answer/2497836?hl=en 1

kumar-ebalnasral commented 6 years ago

@tsteur @mattab @sgiehl Is there a chance to get a feedback from the matomo team?

mattab commented 6 years ago

Is it safe to replace all urlencode/urldecode functions with it's raw- analogues?

Hi there, personally I don't know if it's safe or what it would change. So I suggest you create a branch with your changes, test them, and then open a pull request with your proposed changes if they work? See some pointers in https://developer.matomo.org/guides/contributing-to-piwik-core

bastienito commented 4 years ago

Hi Kumar, Mathieu,

I have the same problem as Kumar. Not only with broad match but with all the keyword match modifiers that you see here: https://support.google.com/google-ads/answer/7478529?hl=en&visit_id=637316911389777150-3799549718&rd=1

The symbols are : +, " and []

Why is it important: because without these symbols, the data reported in the keywords report in Matomo is wrong.

Let me take an example: two keywords: +women +hats and [women hats] +women + hats has 100 visits, one conversion, cost 100$, so the cost per conversion is 10$ and the conversion rate is 1%. [women hats] has 20 visits, 2 conversions, cost 10$, so the cost per conversion is 5$ and the conversion rate is 10%.

In matomo, all the data will be regrouped in one keyword, women hats, which don't exist in Google Ads. The regrouped data is wrong because it's two different keywords, not just one. The symbols can make a big difference here.

Moreover, the second problem is when we want to reconciliate the data between Matomo and google Ads. Indeed, Matomo allows us to see the number of visits, the number of conversions, the conversion rate, but on keywords which don't really exist in google ads (as the symbols are stripped).

But there is no cost information. For that, we have to retrieve the data from google ads. This allows us to calculate the cost per acquisition and the total cost per keyword for example. The cost per acquisition is a main metric for ads analytics. It's often the clients' goal.

But to be able to reconciliate the data between Matomo and Google Ads, we need a key to match the two data sources, here, this is the keyword. But the problem is that the keywords from Google ads (with match modifiers) are not the keywords in Matomo (without match modifier). So we can't reconciliate the data. So it's impossible to calculate the cost per conversion for a specific keywords.

I think it's very important to keep the exact form of Google Ads keywords (and also Bing Ads) in order to have accurate data in Matomo reports and to be able to create other reports by mixing the data from the two sources.

The solution proposed by Kumar with the use of rawdecode seems to solve the problem.

Available for more information if needed.

sgiehl commented 4 years ago

The keywords reported in Matomo are actually not that trustworthy, as most search engines meanwhile do not longer provide keywords in the referrer url. So only a small percentage will have a keyword set at all... If you want to correlate data between Matomo and Google Ads you might maybe want to have a look at https://plugins.matomo.org/PaidAdvertisingPerformance.

bastienito commented 4 years ago

Hi Stefan, Thanks for your reply.

About the paid ads plugin, I asked for a Google Ads API token and the Google API team told me they can't give me one in order to use it with Matomo analytics.

Their reply: "Dear API Applicant, Thank you for submitting an application to the AdWords API. Unfortunately, the application you submitted was rejected due to the following:

I wrote this to the support team and I had this reply from Jason from the Matomo support team:

"We released the paid advertising plugin, a while ago. Google has since provided some changes that need to be made to the plugin for them to allow it to work with their service. These changes are in the process of being completed. I can not give you a time frame of when this will be, we expect several weeks. I will keep this email and update you as soon as I have any further information."

By being forced to have a Google API token to get the accurate keywords, Matomo is dependent of Google on this point.

However, we could have the same accuracy for the keywords by using the pk_kwd if the match modifiers were not stripped. In this case, it's not Google dependent. Even without Google API token, we could have accurate data. It seems to me better for Matomo.

I don't know if the paid ads plugin works for Microsoft Ads, but with the pk_kwd, you will also provide accurate data from Bing Ads.

sgiehl commented 4 years ago

I know about the problems regarding receiving an API token. Not sure why they accepted some, while they decline others 🤷 But as the support mentioned. We are in contact with Google and will hopefully find a solution for this.

The plugin does not yet work with Microsoft Ads.

Regarding the keywords within Matomo. Are you using the MarketingCampaignsReporting plugin and referring to the campaign keyword reports? 🤔

bastienito commented 4 years ago

I use for example the report in Goals > Resume > Goals by referal > Keywords in campaign in the dashboard or custom reports.

But I also retrieve them via the MarketingCampaignsReporting.getKeyword via the Matomo API in a Google Sheet (for reporting).

In the google sheet, I also import the google ads keywords data. That's why I would love being able to reconciliate the two datatables of keywords to have the cost per conversion, cost per click et total cost per keyword. But to do that, I need to have the same keywords in the two datatables.

Bonus: with the right keywords in Matomo, it becomes possible to automatically reconciliate the data between Matomo and Google Ads directly in Google DataStudio for reporting, without having to do it in an intermediary Google Sheet.

tsteur commented 4 years ago

FYI re the token: They probably misunderstood and thought you're using a third party tool and not that it's basically your own tool (in which case they would have approved it). We're working on recommendations on what answers to use when applying for a token so it's more clear for them. If you haven't done yet, be good to get in touch with our support.

bastienito commented 4 years ago

Hi Thomas,

Thanks for your feedback. I will request the token again with your explanations.

bastienito commented 4 years ago

Hi,

About the paid ads plugin, I am not sure of one thing: it will import the data from Google Ads and these can be accessible in the Aquisition > Google Ads report. OK, but for example if I click on the "keywords" tab, I will se the number of visites, cost per click, ... but will I will the number of conversion for a specific keyword and the cost per conversion ?

I think not, because the conversions are bound to the pk_kwd which stripped the match modifiers. So even Matomo can't match the data from Google Ads with the one he has for the keywords. So the data imported with the Paid Ads plugin will stay separated from the data retrieved by Matomo. Am I wrong ?

Other example: in the Goals > Campaign Keywords report, will I see the data imported from Google Ads through the paid ads plugin or the data retrieved by the Matomo tag or a mix of the two datasets ?

If it works like that, it's useless for what I need.

(I don't have conversions data in Google Ads because of the GDPR. That's why I use Matomo to try to have accurate data and to mix data between tools to have the entire view for these data).

tsteur commented 4 years ago

@bastienito do you mind email our support about this? The email is shop at innocraft.com . Here we usually only handle feature requests and bugs for the Matomo core and while we aim to always respond to comments and new issues this is not always guaranteed here. And by contacting our support they will be able to get the right person to respond to you. Thanks for understanding.

mattab commented 4 years ago

@tsteur re-opening because the issue is in core, not in the paid plugin. The issue is that it seems we remove some characters like + from keywords when passed in campaign tracking parameter &pk_kwd=+test for example. I'm not sure if we can fix it and how it would look like (without regressions) but it's good to keep it opened as a known limitation.

bastienito commented 4 years ago

Thanks Matthieu for reopening the ticket.

I want to be clear because I am not sure we see this problem by the same way: actually, Matomo is reporting wrong data for the ads campaigns data.

It sums the data of two different keywords in one which doesn't exist. What I can see in the keywords campaign reporting is wrong.

I wanted to actually use the cloud version for all of our clients who are going to move from Google Analytics to Matomo because of the GDPR, but with this problem, I am not sure. Maybe we will be forced to use the on premise version and change the core as Kumar suggested in his ticket on the forum (https://forum.matomo.org/t/google-keyword-match-modifiers-issue/29767).

mattab commented 4 years ago

For reference here is what Kumar posted:

The source of issue is urldecode function:

log_visit.campaign_keyword

MarketingCampaignsReporting/Campaign/CampaignDetector.php:83
        $valueFromRequest = trim(urldecode($valueFromRequest));

log_visit.referer_keyword

Referrers/Columns/Base.php:244
        protected function detectCampaignFromString($string) {...

Menu "Acquisition -> Campaigns", Campaign Keywords widget

DataTable/Filter/SafeDecodeLabel.php:42
        $raw = urldecode($value);

It's intented behaviour as documentation says:

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent ( % ) sign followed by two hex digits and spaces encoded as plus ( + ) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

So urldecode consider plus sign as space sign and decode it accordingly. But in this case the result is incorrect and undesirable cause +women’s +hats and women’s hats are different search keywords.

Chardonneaur commented 4 years ago

Hi there, I am confirming what @bastienito is describing. Within Google Ads one can get the following match type value:

sgiehl commented 4 years ago

Note: The issue might not be caused by the urldecode alone. To get the parameters from the URL UrlHelper::getParameterFromQueryString() is used, which itself applies a sanitizeInputValue on all url values given.

bastienito commented 4 years ago

To complete what Ronan said: it's not only for the + modifier but for all the match modifiers: phrase is " exact is [ and ]

The only kind of keyword where there is no modifier is the broad match.

The ", [ and ] are also encoded in the url, like the +. The symbols have to be in the reports. Edit: urldecode change only + in space, so it will be ok for the ", [ and ].

mattab commented 4 years ago

fyi similar feedback received today:

We use 3 different keyword types to bid on > broad, phrase and exact match: +keyword "keyword" [keyword]

Would your dynamic keyword insertion in the tracking link display the correct keyword type (broad, phrase or exact match) in the tracking? Or will it only display the keyword as: keyword