theiconic / php-ga-measurement-protocol

Send data to Google Analytics from the server using PHP. Implements GA measurement protocol.
MIT License
656 stars 134 forks source link

Filtering bots? #20

Open khalwat opened 8 years ago

khalwat commented 8 years ago

First of all, fantastic library, thank you for creating it!

My question is regarding filtering bots. I assumed that by passing down setUserAgentOverride() Google would filter out bots on the Analytics reporting side of things based on UserAgent... but it appears not to.

Is this just something that would be left to the user to try to figure out before sending the hits to GA at all using the Measurement Protocol?

khalwat commented 8 years ago

For now I'm using it in conjunction with this lib:

https://github.com/JayBizzle/Crawler-Detect

...in case it helps anyone.

jorgeborges commented 8 years ago

You mean as in automatically detecting if a Bot is generating the traffic and then the library would not send any hits?

Because if that is the case I think it goes beyond the scope of this library.

Maybe I could explore the possibility of building plugins for this, then when you enable the plugins it would do the filtering before passing it to the main lib. But that logic would live in another repo, the sole responsibility of php-ga-measurement-protocol is sending data to GA, not detecting bots.

I'll leave this issue open to explore the possibility later.

mildlygeeky commented 8 years ago

Silly question, Jorge — is the plugin intended to be used to send general web traffic hit data to GA (i.e., doing it server-side rather than with JS)?

jorgeborges commented 8 years ago

@mildlygeeky The main purpose of this library is to send data to GA from the server side, yes.

The plugin I mentioned above would detect when web traffic is being generated by a bot (via the user agent, I suppose), then it would make the decision to filter the data or use the main lib (this repo) to send it to GA.

Hope that was clear enough?

mildlygeeky commented 8 years ago

Totally — it's just that (through no fault of your own) Google sadly doesn't apply the same bot filtering to data passed in by the backend Measurement Protocol that they do to data passed in via the JS embed.

I like the idea for a plugin architecture (and @khalwat has indeed implemented bot filtering in the code has has written for a CMS we both use that employs your code), but anyone looking to use this plugin for general web traffic will have large issues without some sort of bot filtering in place.

You can pretty clearly see what happened when I switched from the normal Google JS embed to the Measurement Protocol below — the massive jump in "traffic" was with no bot filtering, and the smaller jump was with a good bit of bot filtering (but still not as much as Google themselves provide). I've since gone back to Google's JS embed.

Again — I think this is ultimately Google's issue and not yours — they should be consistent in how they record / filter data, no matter how it is coming in.

stats

jorgeborges commented 8 years ago

@mildlygeeky I see, well in our case we use it mainly for sending data to GA that we cannot send via regular JS tracking (such as product returns that happen via an administrative system on the backend), or important transactions that don't happen often and you don't want to miss sending the hit such as purchases, so I wasn't aware of this problem.

I guess unless we figure out a way to filter most bot data I would not use this method to record regular web traffic such as all page views.

I could also try to contact Google as our company has premium GA support, not sure if they would be willing to help us with this but you never know.

khalwat commented 8 years ago

@jorgeborges It's not that big of a deal, because I'm using that library mentioned above to filter out bots, and it works pretty well. I guess Google assumes that if you're sending data to it via the GMP, that you know what you're doing.

Whereas with the Javascript, it assumes it's embedded in a webpage, and thus applies the requisite filtering.

symbios-zi commented 7 years ago

@jorgeborges I have configured this package to send the data of orders to GA. The problem is that all orders without traffic source. I as understood I have to use the regular JS.

The one question: Is it possible to send all the data of order via server-side and also send via regular JS without additional data, just order id? Will it merge in GA?