matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.93k stars 2.66k forks source link

Bots/spiders detected as Not-Bot #3720

Closed anonymous-matomo-user closed 11 years ago

anonymous-matomo-user commented 11 years ago

The following strings are bots/spiders that are being registered in the Not-Bots section when using log import. Using Piwik 1.10.1

Baiduspider/2.0 Baiduspider-image Ezooms/1.0; ezooms.bot@gmail.com Sosospider/2.0; JikeSpider

anonymous-matomo-user commented 11 years ago

Added a few more.

news bot /2.1 Blekkobot ScoutJet

mattab commented 11 years ago

Surprising, because 'spider' is already in the array of user agent to classify as Bots..

anonymous-matomo-user commented 11 years ago

Here is a list of strings as seen in the log files. I had to remove the 'http:' part of the url's in order to paste this due to some kind of anti-spam setting that was rejecting the links.

Piwik: Baiduspider/2.0 Log: "Mozilla/5.0 (compatible; Baiduspider/2.0; +//www.baidu.com/search/spider.html)"

Piwik: Baiduspider-image Log: "//image.baidu.com/i?ct=503316480&z=0&tn=baiduimagedetail" "Baiduspider-image+(+//www.baidu.com/search/spider.htm)"

Piwik: Ezooms/1.0; ezooms.bot@ Log: "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"

Piwik: Sosospider/2.0; Log: "Mozilla/5.0(compatible; Sosospider/2.0; +//help.soso.com/webspider.htm)"

Piwik: JikeSpider Log: "Mozilla/5.0 (compatible; JikeSpider; +//shoulu.jike.com/spider.html)"

Piwik: news bot /2.1 Log: "Mozilla/5.0 (compatible; news bot /2.1)"

Piwik: Blekkobot Log: "Mozilla/5.0 (compatible; Blekkobot; ScoutJet; +//blekko.com/about/blekkobot)"

Piwik: ScoutJet Log: "Mozilla/5.0 (compatible; Blekkobot; ScoutJet; +//blekko.com/about/blekkobot)"

The 'Blekkobot' and "ScoutJet' bot appear to be the same in the logs, but are detected separately in Piwik's log import.

anonymous-matomo-user commented 11 years ago

Concerning the 'spider' keyword. I upgraded the Piwik system the customers see to the 1.10.1. I was not sure if the log analytic copies that exist on the web servers to do the import were updated. I have updated those today to be sure, and will report back after our next import.

Thank you

mattab commented 11 years ago

Havent heard feedback so I assume it works fine