monperrus / crawler-user-agents

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
MIT License
1.13k stars 243 forks source link

Raw list of user-agents #260

Closed mirabellette closed 2 years ago

mirabellette commented 4 years ago

Hello,

I have access to an important database and I make a list of 71 user-agent uses by bots and which are not listed in this repository. I haven't time to make a proper pull request but if someone have time, it can make it with the list on this link.

https://privatebin.mirabellette.eu/?2caed8cd254e76a5#mDZoPge1GFGzHvEANyi3WFOkiBuADw6ZytFsa/dkG78=

or below

'Mozilla/5.0 (compatible; MJ12bot/v1.4.7; http://mj12bot.com/)' 'Mozilla/5.0 (compatible; SemrushBot/1.2~bl; +http://www.semrush.com/bot.html)' 'Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)' 'Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)' 'Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)' 'Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)' 'BUbiNG (+http://law.di.unimi.it/BUbiNG.html)' 'Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)' 'Wget/1.18 (linux-gnu)' 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'Mozilla/5.0 (compatible; Cliqzbot/2.0; +http://cliqz.com/company/cliqzbot)' 'Mozilla/5.0 (compatible; SemrushBot/2~bl; +http://www.semrush.com/bot.html)' 'Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)' 'CCBot/2.0 (http://commoncrawl.org/faq/)' 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)' 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'Mozilla/5.0 (compatible; SemrushBot/3~bl; +http://www.semrush.com/bot.html)' 'Mozilla/5.0 (compatible; Findxbot/1.0; +http://www.findxbot.com)' 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)' 'ZoominfoBot (zoominfobot at zoominfo dot com)' 'Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)' 'Mozilla/5.0 (compatible; YaK/1.0; http://linkfluence.com/; bot@linkfluence.com)' 'Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)' 'TelegramBot (like TwitterBot)' 'Mozilla/5.0 (compatible; bnf.fr_bot; +http://www.bnf.fr/fr/outils/a.dl_web_capture_robot.html)' 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0' 'Mozilla/5.0 (compatible; archive.org_bot +http://archive.org/details/archive.org_bot)' 'Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)' 'Googlebot/2.1 (+http://www.googlebot.com/bot.html)' 'yacybot (-global; amd64 Linux 4.4.0-116-generic; java 1.8.0_151; GMT/en) http://yacy.net/bot.html' 'MauiBot (crawler.feedback+wc@gmail.com)' 'MauiBot (crawler.feedback+dc@gmail.com)' 'Mozilla/5.0 (compatible; SemrushBot/1.0~bm; +http://www.semrush.com/bot.html)' 'yacybot (-global; amd64 Linux 4.4.0-127-generic; java 1.8.0_151; GMT/en) http://yacy.net/bot.html' 'Mozilla/5.0 (Linux; Android 7.0; M bot 60 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.68 Mobile Safari/537.36' 'Mozilla/5.0 (compatible; AhrefsBot/5.2; News; +http://ahrefs.com/robot/)' 'yacybot (-global; amd64 Linux 4.4.0-128-generic; java 1.8.0_151; GMT/en) http://yacy.net/bot.html' 'Mozilla/5.0 (Linux; Android 7.0; M bot 60 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Mobile Safari/537.36' 'yacybot (-global; amd64 Linux 4.4.0-128-generic; java 1.8.0_171; GMT/en) http://yacy.net/bot.html' 'CCBot/2.0 (https://commoncrawl.org/faq/)' 'yacybot (-global; amd64 Linux 4.4.0-131-generic; java 1.8.0_171; GMT/en) http://yacy.net/bot.html' 'Mozilla/5.0 (Linux; Android 7.0; CUBOT MAGIC Build/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/68.0.3440.91 Mobile Safari/537.36 UCBrowser/11.5.2.1188 (UCMini) Mobile' 'Mozilla/5.0 (Linux; Android 7.0; CUBOT MAGIC Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.91 Mobile Safari/537.36' 'Mozilla/5.0 (compatible; SEOkicks; +https://www.seokicks.de/robot.html)' 'Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)' 'yacybot (-global; amd64 Linux 4.4.0-131-generic; java 1.8.0_181; GMT/en) http://yacy.net/bot.html' 'Mozilla/5.0 (Linux; Android 6.0; IDbot553 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.85 Mobile Safari/537.36' 'yacybot (/global; amd64 Linux 4.4.0-1031-aws; java 1.8.0_191-heroku; Etc/en) http://yacy.net/bot.html' 'yacybot (/global; amd64 Linux 4.15.18; java 1.8.0_192; Europe/de) http://yacy.net/bot.html' 'yacybot (-global; amd64 Linux 4.4.0-141-generic; java 1.8.0_191; Europe/en) http://yacy.net/bot.html' 'yacybot (-global; amd64 Linux 4.4.0-142-generic; java 1.8.0_191; Europe/en) http://yacy.net/bot.html' 'Mozilla/5.0 (compatible; Qwantify/Bleriot/1.1; +https://help.qwant.com/bot)' 'yacybot (-global; amd64 Linux 4.4.0-143-generic; java 1.8.0_191; Europe/en) http://yacy.net/bot.html' 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Safari/537.36' 'Mozilla/5.0 (compatible; Snapbot/1.0; +http://www.snapchat.com)' 'MBCrawler/1.0 (https://monitorbacklinks.com/robot)' 'yacybot (-global; amd64 Linux 4.4.0-145-generic; java 1.8.0_191; Europe/en) http://yacy.net/bot.html' 'Mozilla/5.0 (compatible; Go-http-client/1.1; +centurybot9@gmail.com)' 'Mozilla/5.0 (compatible; FemtosearchBot/1.0; http://femtosearch.com)' 'yacybot (/global; amd64 Linux 5.0.6-gnu-1; java 11.0.3; UTC/en) http://yacy.net/bot.html' 'yacybot (-global; amd64 Linux 4.4.0-150-generic; java 1.8.0_212; Europe/en) http://yacy.net/bot.html' 'Mozilla/5.0 (compatible; SemrushBot/3~bl; +http://www.semrush.com/bot.html) AppEngine-Google; (+http://code.google.com/appengine; appid: s~tutorialses-hrd)' 'Linguee Bot (http://www.linguee.com/bot; bot@linguee.com)' '\'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)\'' 'Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html) AppEngine-Google; (+http://code.google.com/appengine; appid: s~tutorialses-hrd)' 'Mozilla/5.0 (compatible; bnf.fr_bot; +https://www.bnf.fr/fr/capture-de-votre-site-web-par-le-robot-de-la-bnf)' 'Keybot Translation-Search-Machine' 'Gigabot (1.1 1.2)'

monperrus commented 4 years ago

thanks