kevinelliott / agent_orange

Parse and process User Agents like a secret one
126 stars 36 forks source link

Bot check failing on many bots due to only checking the comment #35

Open msaspence opened 11 years ago

msaspence commented 11 years ago

Loads of bots are being missed because the bot check is only checking content[:comment]

Some that are coming through for use and not being caught include:

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) compatible; Googlebot/2.1; +http://www.google.com/bot.html compatible; YandexBot/3.0; +http://yandex.com/bots compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li compatible; AhrefsBot/4.0; +http://ahrefs.com/robot/ Twitterbot/1.0 LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com) http://showyou.com/crawler compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm compatible; TweetmemeBot/3.0; +http://tweetmeme.com/ +http://search.msn.com/msnbot.htm

Is there a good reason why only the comment section is checked? user_agents starting "compatible;" dont seem to be parsed at all

kevinelliott commented 11 years ago

Thanks, I'll look into it.

On Jan 18, 2013, at 4:59 AM, msaspence notifications@github.com wrote:

Loads of bots are being missed because the bot check is only checking content[:comment]

Some that are coming through for use and not being caught include:

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) compatible; Googlebot/2.1; +http://www.google.com/bot.html compatible; YandexBot/3.0; +http://yandex.com/bots compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li compatible; AhrefsBot/4.0; +http://ahrefs.com/robot/ Twitterbot/1.0 LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com) http://showyou.com/crawler compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm compatible; TweetmemeBot/3.0; +http://tweetmeme.com/ +http://search.msn.com/msnbot.htm

— Reply to this email directly or view it on GitHub.

be9 commented 11 years ago

Implemented named-based check in PR #38