mssola / user_agent

This project has been moved, check the README.md file!
https://github.com/mssola/useragent
MIT License
769 stars 137 forks source link

Bots not marked as bots #18

Open grotos opened 9 years ago

grotos commented 9 years ago

Here is a list of UserAgent strings which are not marked as bots, but in fact they are:

"ADmantX Platform Semantic Analyzer - ADmantX Inc. - www.admantx.com - support@admantx.com"
"Apache-HttpClient/4.2.3 (java 1.5)"
"Apache-HttpClient/4.3 (java 1.5)"
"Apache-HttpClient/4.3.3 (java 1.5)"
"Application"
"CATExplorador/1.0beta (sistemes at domini dot cat; http://domini.cat/catexplorador.html)"
"COMODOSpider/Nutch-1.2"
"Comodo Spider 1.2"
"Comodo-Webinspector-Crawler 2.1"
"Faraday v0.8.9"
"GigablastOpenSource/1.0"
"GoogleBot 1.0"
"Google_Analytics_Snippet_Validator"
"HTTPClient/1.0 (2.3.4.1, ruby 1.9.3 (2013-06-27))"
"HTTPClient/1.0 (2.4.0, ruby 1.9.3 (2013-06-27))"
"Java/1.6.0_29"
"Java/1.6.0_45"
"Java/1.7.0_09"
"Java/1.7.0_21"
"Java/1.7.0_40"
"Java/1.7.0_60-ea"
"Java/1.7.0_65"
"Mozilla/2.0 (compatible; crw)"
"Mozilla/3.0 (compatible; Indy Library)"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.2)"
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C; .NET4.0E; .NET CLR 1.1.4322; Tablet PC 2.0); 360Spider"
"Mozilla/4.0 (compatible; Netcraft Web Server Survey)"
"Mozilla/4.0 (compatible; Synapse)"
"Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)"
"Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)"
"Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)"
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36 AlexaToolbar/alxg-3.1"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider(compatible; HaosouSpider; http://www.haosou.com/help/help_3_2.html)"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"
"Mozilla/5.0 (Windows NT 6.1; Win64; x64) KomodiaBot/1.0"
"Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google (+https://developers.google.com/+/web/snippet/)"
"Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
"Mozilla/5.0 (Windows NT 6.2; WOW64) Runet-Research-Crawler (itrack.ru/research/cmsrate; rating@itrack.ru)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) Survey/2.3 (fr.wsdata.com)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) SurveyBot/2.3 (DomainTools)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; )  Firefox/1.5.0.11; 360Spider"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11)  Firefox/1.5.0.11; 360Spider"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.11 (KHTML, like Gecko) DumpRenderTree/0.0.0.0 Safari/536.11"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36"
"Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20100101 Firefox/21.0 WordPress.com mShots"
"Mozilla/5.0 (compatible; Google-Site-Verification/1.0)"
"Mozilla/5.0 (compatible; IstellaBot/1.18.81 +http://www.tiscali.it/)"
"Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1) (http://name911.com)"
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider"
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider(compatible; HaosouSpider; http://www.haosou.com/help/help_3_2.html)"
"Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)"
"Mozilla/5.0 (compatible; Owler/0.4; +; )"
"Mozilla/5.0 (compatible; PageAnalyzer/1.1;)"
"Mozilla/5.0 (compatible; XML Sitemaps Generator; http://www.xml-sitemaps.com) Gecko XML-Sitemaps/1.0"
"Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)"
"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
"Mozilla/5.0(compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"
"Mozilla/5.0(compatible;Sosospider/2.0;+http://help.soso.com/webspider.htm)"
"Porkbun/Mustache (Website Analysis; http://porkbun.com; tech@porkbun.com)"
"PycURL/7.23.1"
"Python-urllib/1.17"
"Python-urllib/2.6"
"Python-urllib/2.7"
"Python-urllib/3.4"
"Robosourcer/1.0"
"Ruby"
"Sosospider+(+http://help.soso.com/webspider.htm)"
"W3C_Validator/1.3 http://validator.w3.org/services"
"WebTarantula.com Crawler"
"Wget/1.12 (linux-gnu)"
"Wget/1.13.4 (linux-gnu)"
"WhatWeb/0.4.8-dev"
"Who.is Bot"
"WinInet Test"
"YisouSpider"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.13.1.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
"curl/7.35.0"
"ip-web-crawler.com"
"panscient.com"
"python-requests/1.1.0 CPython/2.7.4 Linux/3.8.0-19-generic"
"python-requests/1.2.0 CPython/2.7.4 Linux/3.8.0-33-generic"
"python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-24-generic"
"spotinfluence/Nutch-1.4 (Spot Influence crawler; http://spotinfluence.com; hello at spotinfluence dot com)"
"visaduhoc.info Crawler"
"wsr-agent/1.0"
crackcomm commented 8 years ago

Some more data from me https://gist.github.com/crackcomm/40bad73724f14369b602 Second revision is after #26

vodolaz095 commented 8 years ago

+1

blixt commented 7 years ago

This one also appears to fail, possibly due to having more than one section (so bot doesn't match) and using HTTPS (the site regexp only appears to match http://...).

Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)

Here's a couple more user agents I consider bots in addition to the results from Bot():

AppEngine-Google; (+http://code.google.com/appengine; appid: s~something) Slack-ImgProxy (+https://api.slack.com/robots)