lennerd / vipx-bot-detect

A bot detector written in PHP
MIT License
43 stars 18 forks source link

Missing bots #10

Closed smilesrg closed 9 years ago

smilesrg commented 9 years ago

List of bots that are missing (detected by posting a link via twitter):

There are also such request headers and ip addresses when when posting link via Twitter:

This is probably UnwindFetchor http://ru.myip.ms/info/whois/54.241.198.78

User-Agent: Google-HTTP-Java-Client/1.17.0-rc (gzip)
Via: 1.1 cache-child3a.prod.gnip.com (squid/3.1.19), 1.1 cache-parent1a.prod.gnip.com (squid/3.1.19)
X-Forwarded-For: 216.46.175.35, 10.160.107.8

IP Address: 54.241.198.78

This is actually ButterflyBot

User-Agent:      Mozilla/5.0 ()
IP Addresses: 74.112.131.244, 74.112.131.245
lennerd commented 9 years ago

Thank you for pointing out the missing bots. PRs are very welcome for bots where we know the correct meta data. If you have problems pointing out the right category, leave a comment here, so we can discuss it. (I defenetly need to make a section in the README file for this kind of PRs.)

There are also such request headers when when posting link via Twitter: I do not quite get it. Where do you get this header from? After an redirect from a twitter post? I think this kind of bots are not real bots, but simply tracking your movement towards other pages.

The special usecase with empty user agents is difficult to handle. Where do you came across this one?

smilesrg commented 9 years ago

I do not quite get it. Where do you get this header from? After an redirect from a twitter post?

  1. Posted a link to Twitter
  2. Dumped request headers and IP addresses.
smilesrg commented 9 years ago

@lennerd I have a qestion where to place FlipboardProxy ? I've placed it at crawlers.

I've created PR #11

BTW, you can use bot list from here, maybe import this list somehow: https://github.com/podigee/device_detector/blob/develop/regexes/bots.yml

Currently I'm using both piwik/device-detector and your library to detect bots :-)

smilesrg commented 9 years ago

@lennerd can you please merge #11 ?

smilesrg commented 9 years ago

@lennerd any news? :-) ping

lennerd commented 9 years ago

@lennerd I have a qestion where to place FlipboardProxy ? I've placed it at crawlers.

From the flipboard website:

Flipboard uses a proxy service to fetch, validate, and prepare certain elements of websites for presentation through the Flipboard Application.

Sounds like a crawler for me.

BTW, you can use bot list from here, maybe import this list somehow: https://github.com/podigee/device_detector/blob/develop/regexes/bots.yml

In the past when I developed this library, I didn't found a good library or archive with some kind of public API. Time has changed. I definitely will look into this one. Thank you for the hint.

Currently I'm using both piwik/device-detector and your library to detect bots :-)

Nice to know that my library comes in handy. :wink: