matomo-org / device-detector

The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.
http://devicedetector.net
GNU Lesser General Public License v3.0
2.9k stars 472 forks source link

Bot not detected - WhatsApp #5463

Open jLynx opened 8 years ago

jLynx commented 8 years ago

The bot that is not being detected is:

WhatsApp/2.12.15/i

sgiehl commented 8 years ago

Not sure if that should be detected as bot. Whatsapp is an mobile app and it should already be detected as those. I guess the useragent is used when getting page content for previewing links within a chat. So I guess it's more or less triggered by the user and no "automatic" bot

jLynx commented 8 years ago

As much as it is triggered by the user, it is still a bit as when a user clicks a link, it has the whatsapp bot still visit the Page and then the user visits it, I guess it just collects meta data for its database. But I would still call it a bot since the initial view it's getting isn't the real user

thE-iNviNciblE commented 7 years ago

Would be nice to add this "bot"

ghost commented 7 years ago

Indeed it is a bot.

When you type (or paste) a URL within a WhatsApp conversation, the WhatsApp server loads the URL and parses the Open Graph meta tags. More specifically, it loads the og:image meta tag, which usually loads a relevant image of the URL.

It is common to see two hits, like this:

"GET / HTTP/1.1" 206 11133 "-" "WhatsApp/2.16.16/i"
"GET /images/logo.png HTTP/1.1" 206 18093 "-" "WhatsApp/2.16.16/i"

The first loads the URL as it was typed or pasted in the WhatsApp conversation, the second loads the relevant og:image meta tag, in this case a logo.

whimsicaldreamer commented 7 years ago

Is there any fix to it yet?

jLynx commented 6 years ago

Any update to this issue?

Ethreal commented 5 years ago

👍 Any updates? It's a snippet request and it's straightforward to treat it as a ‘bot’

Findus23 commented 5 years ago

I have now tested this myself and indeed during writing of a URL, whatsapp fetches the page to show the preview. (even before sending the message).

But the IP address isn't the one of the Whatsapp servers, but the one of the phone, so the app itself fetches the meta tags.

198.51.100.0 - - [13/Sep/2019:17:24:29 +0200] "GET / HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:32 +0200] "GET / HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:32 +0200] "GET /te HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:33 +0200] "GET /ted HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:33 +0200] "GET /tedt HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:34 +0200] "GET /tes HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:35 +0200] "GET /test HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:53 +0200] "GET /t HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:53 +0200] "GET /hg HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:53 +0200] "GET /hgf HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:54 +0200] "GET /hgff HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:54 +0200] "GET /hgfff HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:54 +0200] "GET /hgffff HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:54 +0200] "GET /hgfffff HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:54 +0200] "GET /hgffffff HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:54 +0200] "GET /hgfffffff HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:24:55 +0200] "GET /hgffffffff HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
198.51.100.0 - - [13/Sep/2019:17:25:05 +0200] "GET / HTTP/1.1" 200 1036 "-" "WhatsApp/2.19.229 A"
thE-iNviNciblE commented 4 years ago

it seems not to be added, why?

you could count this as social media traffic....

sgiehl commented 4 years ago

it should be detected as mobile app. See https://github.com/matomo-org/device-detector/blob/master/regexes/client/mobile_apps.yml#L82-L85

thE-iNviNciblE commented 4 years ago

if i test this with web.whatsapp.com and add a link from the shop, i cant see this request. maybe the snippet generator doesn't open the page with javascript and can't be tracked.

Maybe the WhatsApp "Bot" is using grabing the source of html/text. I can see the call in my access.log from the webserver.

liviuconcioiu commented 3 years ago

This issue should be closed.

Findus23 commented 3 years ago

Just to make this issue clear, @liviuconcioiu, why do you think this should be closed?

One could argue if the app on the phone should count as a bot, but as it is making automated requests the user doesn't initialize, I think excluding it and detecting it as a bot instead of an app wouldn't be unreasonable.

sanchezzzhak commented 3 years ago

if we consider useragent ^WhatsApp/\d+([\d+\.]+) A$ to be a bot, we may break the page preview functionality in the WhatsApp app. Since some pages do not show the content to the bot.

Similar functionality is available in applications SkypePreview, Telegram.

If Skype SkypePreview is a bot, then this user agent should also be considered a bot, based on the precedent

- 
  user_agent: Mozilla/5.0 (Windows NT 6.1; WOW64) SkypeUriPreview Preview/0.5
  bot:
    name: Skype URI Preview
    category: Service Agent
    url: ""
    producer:
      name: Skype Communications S.à.r.l.
      url: https://www.skype.com