snowplow-referer-parser / referer-parser

Library for extracting marketing attribution data from referrer URLs
http://snowplowanalytics.com
360 stars 150 forks source link

Handling Android app referrers like "com.google.android.googlequicksearchbox"? #131

Open kingo55 opened 8 years ago

kingo55 commented 8 years ago

I've been seeing a lot of traffic recently from "com.google.android.googlequicksearchbox". I suspect it's users that have searched from the home screen in Android.

Should we classify this under google / search?

alexanderdean commented 8 years ago

Sounds like google / search to me...

kingo55 commented 8 years ago

Actually - there's a whole bunch of them:

SELECT page_referrer AS page_referrer,
       COUNT(*) AS count
FROM atomic.events
WHERE collector_tstamp >= '2016-04-18 08:17:01.000000'
  AND collector_tstamp <= '2016-07-17 08:17:01.000000'
  AND refr_urlscheme = 'android-app'
GROUP BY page_referrer
ORDER BY count DESC LIMIT 5000
alexanderdean commented 8 years ago

Hey @kingo55 did you mean to share the result of the query?

kingo55 commented 8 years ago

Hi @alexanderdean - my bad... here you go:

page_referrer count
android-app://com.google.android.googlequicksearchbox 433817
android-app://com.google.android.googlequicksearchbox/https/www.google.com 31848
android-app://com.google.android.apps.genie.geniewidget 7470
android-app://com.google.android.apps.plus/https/plus.url.google.com/mobileapp 446
android-app://com.google.android.googlequicksearchbox/googlequicksearchbox/suggest 334
android-app://org.telegram.messenger 175
android-app://com.pinterest 165
android-app://com.laurencedawson.reddit_sync.pro 46
android-app://com.laurencedawson.reddit_sync 39
android-app://arun.com.chromer 32
android-app://com.Slack 21
android-app://com.noinnion.android.greader.reader 21
android-app://org.telegram.plus 17
android-app://com.andrewshu.android.reddit 15
android-app://com.paladin.auto.car.news.reviews 10
android-app://com.noinnion.android.greader.readerpro 9
android-app://com.laurencedawson.reddit_sync.dev 7
android-app://com.hanista.mobogram 4
android-app://com.google.android.apps.social.spaces 4
android-app://com.linkedin.android 3
android-app://com.levelup.palabre 2
android-app://com.tumblr 2
android-app://com.andrewshu.android.redditdonation 2
android-app://com.innologica.inoreader 2
android-app://ir.felegram 1
alexanderdean commented 8 years ago

Waho! Quite a list. My question is, how do we know all of these are using Google search?

kingo55 commented 8 years ago

Sorry @alexanderdean I changed the title on you.

When I discovered other referers, I figured we might need to take a step back. I just don't know enough about what these are yet to recommend how we classify them or approach them. Many look like social referers from other apps.

But they're missing further details on the referrer. Aka no query.

It's also unclear how the apps are generating /passing the referrer. It only started in March or May in our dataset.

alexanderdean commented 8 years ago

Agree, I have made the title even more conditional to reflect the uncertainty.

ghost commented 8 years ago

We also get a lot of android referrers right now. And we really need the information if this referrer comes from a search engine or not

>>> logs.filter(logs.referrer.uri.substr(0, 14) == "android-app://").select(logs.referrer.uri.alias("uri")).groupBy("uri").agg({"uri": "count"}).show(truncate=False)
+----------------------------------------------------------------------------------+----------+
|uri                                                                               |count(uri)|
+----------------------------------------------------------------------------------+----------+
|android-app://com.stickypassword.android                                          |1         |
|android-app://com.google.android.apps.plus/https/plus.url.google.com/mobileapp    |10        |
|android-app://com.google.android.googlequicksearchbox                             |399125    |
|android-app://com.google.android.googlequicksearchbox/https/www.google.com        |14231     |
|android-app://com.Slack                                                           |15        |
|android-app://com.google.android.googlequicksearchbox/googlequicksearchbox/suggest|50        |
|android-app://de.idealo.android                                                   |38        |
|android-app://com.android.chrome                                                  |16        |
|android-app://com.twitter.android                                                 |3         |
|android-app://org.telegram.messenger                                              |97        |
|android-app://com.twitpane.premium                                                |2         |
|android-app://com.google.android.apps.social.spaces                               |8         |
|android-app://com.guardian                                                        |2         |
|android-app://com.linkedin.android                                                |1         |
|android-app://org.telegram.plus                                                   |3         |
|android-app://com.pinterest                                                       |4648      |
|android-app://com.google.android.apps.genie.geniewidget                           |10        |
+----------------------------------------------------------------------------------+----------+
chepchepcirkus commented 7 years ago

Hi everybody, i also hunt informations about this kind of url android-app://com.*, my web application generates some exception while handling this referer url, have you some fresh news about that ?

Kind regards

scottbullard commented 7 years ago

@chepchepcirkus - same here. All of the sudden we're receiving a spike in these android-app scheme referrals resulting in: RefererParser::InvalidUriError: Only HTTP and HTTPS schemes are supported -- android-app

Submitted a PR: https://github.com/snowplow/referer-parser/pull/145

garywoodfine commented 6 years ago

I have seen a wave of this when investigating I am starting to suspect it is some type of Referer spam It seems to give the title Wordpress Android App but the referer has this type of link

http://android-app//com.google.android.googlequicksearchbox/https/www.google.com

Strongly suspect junk