matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.73k stars 2.63k forks source link

Plugin: BotTracker to track bot-actions #2391

Closed Thomas--F closed 10 years ago

Thomas--F commented 13 years ago

BotTracker Plugin

* When installed, the plugin will detect configured bots & webspiders and exclude them from the visitor-log
* All bot-hits are counted and for every bot the last visit is logged
* Some well-known bots are pre-configured

How to install?

* Download Piwik BotTracker Plugin
* Unzip the plugin and copy the extracted directory "BotTracker" in the directory piwik/plugins/
* Configure the Bot-List by editing the MySQL-Table *piwik_bot_db*.

Author

* Thomas Fasselt

Any help is welcome

Changelog

* version 0.10: first public version

Feedback

Please leave a comment if you have any feedback, suggestion, or bug report.

Keywords: bot, spider, search, engine, agent, third-party-plugin

anonymous-matomo-user commented 11 years ago

Attachment: Hungarian translation hu.php

Thomas--F commented 11 years ago

Attachment: v 0.31: fixed static calls in API BotTracker.2.zip

hpvd commented 10 years ago

Attachment: 2013-10-26_09h33_33.png

Thomas--F commented 10 years ago

Attachment: v0.32 - last version for Piwik 1.x BotTracker_alt.zip

hpvd commented 10 years ago

Attachment: 2013-12-31_14h02_00.png

hpvd commented 10 years ago

Attachment: 2013-12-31_15h08_57.png

Thomas--F commented 10 years ago

Attachment: v 0.43: adding extra Statistics BotTracker.zip

Thomas--F commented 13 years ago

The plugins comes with a widget to show the data from the bot_db-table. I've testet the plugin, but there maybe still bugs in it. I don't suggest to use it in a productive enviroment.

ToDos:

Oh, and there is one big limitation: Most bots don't use JavaScript. So if you use only the Piwik-JS-API (default), you don't get any results. I use the PHP-API, so every bot hits my visitor-log. That's why I wrote this plugin.

anonymous-matomo-user commented 13 years ago

Nice I exactly look for this module,but after copying this module I get this error.

Unable to load plugin 'BotTracker' because '/home/**/public_html/plugins/BotTracker/BotTracker.php' couldn't be found. You can manually uninstall the plugin by removing the line Plugins[] = BotTracker from the Piwik config file.

The file is there! ?? Any Ideas

Thomas--F commented 13 years ago

Hmmm.... sounds strange. I installed the plugin about 30 times during development and test.

First check, if the folder looks exact as shown in the message (e.g. "BotTracker" or "bottracker") and then check the folder and file permissions. Is the read-access restricted?

anonymous-matomo-user commented 13 years ago

We are using piwik 1.4 and getting getting the following error when activating the plugin

Fatal error: Call to undefined method Piwik_BotTracker::LogToFile() in example.com/analytics/piwik/plugins/BotTracker/BotTracker.php on line 41

anonymous-matomo-user commented 13 years ago

Replying to jekko:

We are using piwik 1.4 and getting getting the following error when activating the plugin

Fatal error: Call to undefined method Piwik_BotTracker::logToFile() in example.com/analytics/piwik/plugins/BotTracker/BotTracker.php on line 41

Thomas--F commented 13 years ago

Hi jekko,

try the new version (v0.12). I wrote the LogToFile-function for some debug-logging and in v0.10 i deleted the function but not all of the calls.

btw: I tested the plugin with Piwik 1.3 and 1.4

anonymous-matomo-user commented 13 years ago

thomas its working, ty

Thomas--F commented 13 years ago

Changelog

* version 0.15: better Widget & new entry in visitor-menue "Bot Tracker"
Thomas--F commented 13 years ago

Changelog

anonymous-matomo-user commented 13 years ago

Hello,

I'm running piwik 1.4 with BotTracker 0.16. My Piwik Installation monitors multiple sites.

Plugin has been installed and activated. Widged has been added. Sites contain the Java Script code. I can't see that any bot access gets counted. After some days I used google webmaster tools trying to force a "access like a bot" access. Still no count.

Question: Is there any alternate method to simulate an access and verify the installation ? Should the bots which allow JS be catched by the Plugin ? Might the problem be caused by that I'm tracking multiple sites on the installation ?

Any hint very much appreciated.

Best regards, sun

anonymous-matomo-user commented 13 years ago

did you read this?

Oh, and there is one big limitation: Most bots don't use JavaScript. So if you use only the Piwik-JS-API (default), you don't get any results. I use the PHP-API, so every bot hits my visitor-log. That's why I wrote this plugin.

Thomas--F commented 13 years ago

Hi sun,

first of all: the plugin is not able to track multi-sites. But I will put that on my todo-list. The results are currently the sum of all sites.

To test the plugin I use Firefox with a plugin called "User Agend R G".

But remeber: The plugin will only catch non-JS-Vots if you use the PHP-Tracking-API! Most Bots don't use JS, so don't expect much results when you only use the standard-tracking-code!

Thomas--F commented 13 years ago

Dev-Status: I am currently trying to improve the configuration of the plugin:

In addition to that I will change the database so the plugin can track multiple sites seperately.


I'm a professional programmer for more than 16 years now, but mainly on the mainframe. In the last years I also code some java-applications but PHP is only a hobby. So don't expect a fast solution here. I try to learn the API by looking into the examples, the source and doing some try-and-error-debugging.

If someone will jump in and offers help... the door is wide open! Just leave me a message.

And the last point: I try to track even this site by using the Image-Tracking. [[Image(http://piwik.rwk-kempen-krefeld.de/piwik.php?idsite=2&rec=1)]] Is this going to work...?

Thomas--F commented 13 years ago

Changelog

* version 0.18: Multi-Site

I had add a column to the database, so if you install the new version, it will drop the old database and create a new one. Then the script will insert the bot-list for all sites where the user is admin. To run the install-script you have to follow these steps:

anonymous-matomo-user commented 13 years ago

Hi Thomas,

I looked further into my issue not getting any count or timestamp. First of all. I updated to 0.18. Second, please forgive me, but I havn't any glue about PHP.

Having that said, I modified BotTracker.php providing me some more log data at function checkBot.

[2011/05/10 21:42:05] user Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
[2011/05/10 21:42:05] SiteID:3
[2011/05/10 21:42:05] Row:43
[2011/05/10 21:42:05] Row[botId]:4

I can see, that the bot access gets catched and that the proper row gets detected (43). GoogleBot matches in my database ID 43. Also I can see, that the code execution enters the if statement where I print the value of $row['botId'] which by above example is 4. So the query updating the database gets performed. But, rather than updating row 43, he is updating row 4, which is the wrong site as well as the wrong bot.

Shouldn't the if statement at the checkBot statement more be like:

        if ($row > 0 ){
            $query = "UPDATE `".Piwik_Common::prefixTable('bot_db')."` 
                      SET botCount = botCount + 1
                        , botLastVisit = CURRENT_TIMESTAMP()
                      WHERE botId = ".$row." ";

            Piwik_Query($query);

            $exclude =& $notification->getNotificationObject();
            $exclude = true;
        }

I replaced $row['botId'] with just $row. At least on my installation he updates now the correct row.

Could you please verify and let me know your comment.

Best regards,

sun

Thomas--F commented 13 years ago

Hi sun,

I get the variable $row from the function Piwik_FetchOne:

        $row = Piwik_FetchOne("SELECT botId FROM ".Piwik_Common::prefixTable('bot_db')."
                               WHERE botActive = 1 
                               AND   idSite = ".$idSite."
                               AND   LOCATE(botKeyword,'".$ua."') >0
                           LIMIT 1");

This function returns an labled array with all selected columns. In your case it should be array( 'botId' => 43)

Because of this I'm surprised, that you get an positive result when you only use $row.

How do you print the log-data? What PHP-Version do you use?

Best regards, Thomas

anonymous-matomo-user commented 13 years ago

Hi Thomas,

it's PHP 5.2.12 and MYSQL 5.1

The 2 lines

     Piwik_BotTracker::logToFile('Row:'.$row);
     Piwik_BotTracker::logToFile('Row[botId]:'.$row['botId']);

just before the if statement generate

[2011/05/11 17:16:11] Row:43
[2011/05/11 17:16:11] Row[botId]:4

which results into using your code that row 4 gets updated and using my code, row 43 gets updated.

sun

anonymous-matomo-user commented 13 years ago

One more. Above was a googlebot access. With MSNBOT I get

[2011/05/11 17:36:13] Row:41
[2011/05/11 17:36:13] Row[botId]:4

So with my code it counts for Row 41, which is MSN in my database. With your's, it also ends up in row 4

sun

Thomas--F commented 13 years ago

Hi sun,

I found the error.

The function Piwik_FetchOne did not return an array, it returns the value. Because "botId" was not defined, the array-access returns only the first char: 4 instead of 41

I will fix an test it. Thank you.

anonymous-matomo-user commented 13 years ago

Yep. That works for me as well. Thanks!

robocoder commented 13 years ago

Thomas: I'm glad to see you're continuing to develop this oft-requested feature.

A couple of comments:

Thomas--F commented 13 years ago

Hi vipsoft,

thanks for the tips. As you can see I already updated the plugin.

Will the update-scripts run automaticly when you update the plugin? Will all scripts run in the right order if I have skipped a version?

And I have some questions concerning the usage of Trac:

Thomas--F commented 13 years ago

Oh, and by the way: Can you delete the last row (last point) in comment:13?

Thomas--F commented 13 years ago

Changelog 0.21

I have written and tested 2 update-scripts: v0.18 and v0.21 It's a great feature of piwik. Thanks a lot to vipsoft for the hint.

robocoder commented 13 years ago

In checkBot():

Thomas--F commented 13 years ago

The UPDATE uses only the botId because it's the unique index of the table. So if I found a qualified row while using the idSite, I can update the table with just using the botId.

Do you have a description of the tracker cache and how to use it? The table is very small and my sites doesn't get very much hits, so performance tuning is not my top priority. But I will compare both ways if I can implement the cache.

robocoder commented 13 years ago

The tracker cache are files in tmp/cache/tracker that are automatically loaded with each tracker request. You can see this being used by SitesManager.php in recordWebSiteDataInCache().

Please also take a look at Matt's ideas in #653

Thomas--F commented 13 years ago

There are some points to think about:


I wish to make some of these features flexible (e.g. how much user agents should be stored or how much time between 2 hits for logging). Should I use global variables in BotTracker.php or are there any plans for a config-dialog for plugins?

Thomas--F commented 13 years ago

Changelog 0.22

I've tested a lot, but I'm sure there are bugs left. Please report anything, that doesn't work as designed.

Thomas--F commented 13 years ago

To solve some of Matt's "aditional features" I have to generate a new table that logs every hit of a bot. What do you think, what information should be stored in that table?

Is there more to think about?

anonymous-matomo-user commented 13 years ago

Got an error message regarding

duplicate "idsite" column name

when installing on a fresh 1.4 install.

Fix: remove the 0.18.php file from the updates folder (was trying to alter table and add the "idsite" column when said column was already created during installation process).

Thomas--F commented 13 years ago

@nslyv

I could not reproduce your error. I reinstalled Piwik on a fresh server and tried to install BotTracker several times without any issues.

But the last two weeks were full of work so I hadn't much time to continue on BotTracker. I hope the next weeks will be better....

anonymous-matomo-user commented 13 years ago

Hello, Thomas!

I'm lost. I downloaded the plugin, unzipped and uploaded to server. There is no install file, and I don't know where to look for the database. Any pointers would be appreciated. Best regards

Thomas--F commented 13 years ago

Hi marketeer,

first you have to upload the plugin to the folder www.(myServer).xyz/(myPiwikFolder)/plugins/BotTracker

Then login as Super-Admin and klick on "Settings" and then on the "Plugins"-tab. There you shold see now an entry "BotTracker" with status "Inactive" an the possible action "Activate"....

anonymous-matomo-user commented 13 years ago

Replying to ThomasF:

Hi marketeer,

first you have to upload the plugin to the folder www.(myServer).xyz/(myPiwikFolder)/plugins/BotTracker

Then login as Super-Admin and klick on "Settings" and then on the "Plugins"-tab. There you shold see now an entry "BotTracker" with status "Inactive" an the possible action "Activate"....

Hello, Thomas!

Thank you very much for your prompt reaction.

Yes, I followed the instructions, but after activating the plug-in I received this error message: "SQLSTATE[42S21]: Column already exists: 1060 Duplicate column name 'idsite'".

another result is that I can no longer login to my piwik installation.

Is this because I have multiple sites listed?

Do I need to re-install?

Thomas--F commented 13 years ago

Hi marketeer,

that's strange. It's the same error that nslyv already postet. But I wasn't able to reproduce it.

As a walkaround just delete the file 0.18.php in the folder update

Thomas--F commented 13 years ago

As reaction on the "Duplicate column"-error I deletet the 0.18.php-update-script.

anonymous-matomo-user commented 13 years ago

Replying to ThomasF:

As reaction on the "Duplicate column"-error I deletet the 0.18.php-update-script.

Hello, Thomas!

Thank you for your attention to this matter.

Yes, deleting 0.18.php puts me back on track.

Now I need to figure out the api-php for a joomla installation (into template or module).

Best regards

anonymous-matomo-user commented 13 years ago

Hi, Thomas!

My logfiles show several IPs that access my website at a rate of about 2500 hits per hour each! I would like to track these using your plug-in and the ip-adresses. Can this be done, configured?

Thomas--F commented 13 years ago

Hi marketeer,

the plugin works only on keywords in the User Agent not on IPs. In your server-access-log you should see the User Agent of these visitors. If they are specific enough you can use the plugin to count their hits.

For a full track (logging each visit in a seperate table) you have to wait for the next version I'm currently working on.

For using the PHP-API in Joomla just copy the "PiwikTracker.php" to your template-folder and enter the following code in the index.php of your template: (I put it in the "footer_r")

<!-- Piwik -->
<?php 
// -- Piwik Tracking API init --
require_once "PiwikTracker.php";
$pageTitle = $this->getTitle();

$piwikTracker = new PiwikTracker( $idSite = 1 );
PiwikTracker::$URL = 'http://yourwebsite.org/piwik/';

$piwikTracker->setTokenAuth = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
$piwikTracker->setUrl( $url = JURI::current());
$piwikTracker->setIp( $_SERVER['REMOTE_ADDR'] );
// Sends Tracker request via http
$piwikTracker->doTrackPageView($pageTitle);

?>
<!-- End Piwik Tracking Code -->
anonymous-matomo-user commented 13 years ago

ThomasF, you write that your tracker is intended for use with the PHP-API. I'm trying to track visits/bots with the help of Piwik's GIF image. However, loading the GIF image with your plugin activated, I get the following error output when loading the GIF file:

<b>Warning</b>: fopen(tmp/logs/log.txt) [<a href='function.fopen'>function.fopen</a>]: failed to open stream: No such file or directory in <b>/public_html/piwik/plugins/BotTracker/BotTracker.php</b> on line <b>157</b>
<b>Warning</b>: fwrite(): supplied argument is not a valid stream resource in <b>/public_html/piwik/plugins/BotTracker/BotTracker.php</b> on line <b>166</b>
<b>Warning</b>: fclose(): supplied argument is not a valid stream resource in <b>/public_html/piwik/plugins/BotTracker/BotTracker.php</b> on line <b>168</b>

Any way around this?

Thomas--F commented 13 years ago

Hi opensourcer,

I used a log-file to track all user-agents and find new bots. I haven't planed to include this to the public version so I removed the function in v.23. This new version should work with the GIF-image.

But I found another problem I couldn't fix so far: The pie-chart is showing the wrong entries. If I use the bar-chart, everything is ok. All values and descriptors are corret, but when I switch to the pie-chart, the biggest entry shows something like "bot x 1%(2 hits)" instead of "bot y 90% (345 hits)". Can anyone confirm the problem?

robocoder commented 13 years ago

The pie chart problem is a Piwik bug. It's fixed in trunk.