matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.87k stars 2.65k forks source link

Plugin Internal search tracking - search analytics reports #5469

Closed mattab closed 11 years ago

mattab commented 16 years ago

Piwik now has internal search keywords tracking. Awesome!

Functionnality:

anonymous-matomo-user commented 16 years ago

I'm subscribing, as this is an important feature we already have in the Drupal module for Goggle Analytics and is currently commented out in Piwik module.

mattab commented 16 years ago

Google is good at analysing search quality / search in general. We can get ideas from how google analytics does internal search tracking.

Site search http://www.google.com/support/analytics/bin/topic.py?topic=12626

General http://www.google.com/support/analytics/bin/answer.py?answer=75961

http://www.google.com/support/googleanalytics/bin/static.py?page=troubleshooter.cs&problem=gatsc&selected=a10h1_a10h1t3_&ctx=gatsc_a10h1_a10h1t3__77234&aw_referral=

Google Analytics uses the following formulas to calculate the metrics used in internal site search reports:

* Visits with Search = The number of visits that used your site's search function at least once.
* Percentage of visits that used internal search = Visits with Search / Total Visits
* Total Unique Searches = The total number of times your site search was used. This excludes multiple searches on the same keyword during the same visit.
* Results Pageviews / Search = Pageviews of search result pages / Total Unique Searches
* Search Exits = The number of searches a visitor made immediately before leaving the site.
* Percentage of Search Exits = Search Exits / Visits with Search
* Search Refinements = The number of times a visitor searched again immediately after performing a search.
* Percentage Search Refinements = The percentage of searches that resulted in a search refinement. Calculated as Search Refinements / Pageviews of search result pages.
* Time after Search = The average amount of time visitors spend on your site after performing a search. This is calculated as Sum of all "search_duration" across all searches / ("search_transitions" + 1)
* Search Depth = The average number of pages visitors viewed after performing a search. This is calculated as Sum of all "search_depth" across all searches / ("search_transitions" + 1)

Example Calculations This section describes a visitor's experience with your website's search engine and explains how Google Analytics calculates the resulting data. The visitor progresses through three different pages when interacting with your website's search engine:

* Search Page - Page on site where the visitor enters terms for a web search
* Search Results Page -  Results page that is returned on a search engine query
* Results Pageview - The page viewed after a click on a results page

Assuming your website received three visits from visitors that navigate as described...:

* Visit 1: (time between "camera" term search page and "black camera" term search page is 30 seconds; and "black camera" search to site exit is 60 seconds)
      o Search Term Page (term "camera") > 
      o Results Page >  
      o view Results Pageview > 
      o view Results Pageview > 
      o Search Term Page (term "black camera") > 
      o Search Results Page > 
      o view Results Pageview > 
      o view Results Pageview > 
      o view Results Pageview >
      o Site exit

* Visit 2 : (time between "computer" term search page to site exit is 15 seconds)
      o Search Page (term "computer") > 
      o Results Page >
      o Site exit
* Visit 3: 
      o No Search

...The following metrics can now be calculated:

* % Visits used internal search = 2 Visitor that used site search (Visit 1 & Visit 2)  / 3 Total Visitors = 66.7%
* Visits with Search = 2 (Visit 1 & Visit 2)
* Total Unique Searches = 3 ("camera", "black camera", "computer")
* Results Pageviews / Search = (2 + 3) / 3 = 1.67
* Search Exits = 1 (Visit 2)
* % Search Exits = 1 (Visit 2) / 2 (Visit 1 & Visit 2) = 50% 
* Refinement = 1 (Visit 1 - "black camera")
* % Refinement = 1 (Visit 1 - "black camera") / 3 = 33.3%
* Time after Search = (30 seconds + 60 seconds + 15 seconds) / (1 + 1) [1 & visit 2](visit) = 52.5 sec
* Search Depth = (2 [+ 3 ["black camera"]("camera"]) + 0 [/ (1 [Visit 1]("computer"])) + 1 [2](Visit)) = 2.5
anonymous-matomo-user commented 15 years ago

Any progress with this plugin? Is there a dev version of it somewhere? Are there any other plugins like this?

mattab commented 15 years ago

see http://www.slideshare.net/markohurst/marko-hurst-site-search-analytics-e-metrics-madrid

anonymous-matomo-user commented 14 years ago

Hi,

have there been any new developments on this issue?

We'd love to see an internal search feature in Piwik as soon as possible because in our opinion it's one of the most important features that are still missing in comparison to commercial tools like Google Analytics.

I'm working for a small web agency and we would like to start implementing this feature, or offer our help if there's already someone working on it. :-)

Benjamin

robocoder commented 14 years ago

Benjamin: afaik, no is working on it; so, feel free to implement and share.

timo-bes commented 14 years ago

Hey guys,

I gave this plugin a try (it's my first Piwik plugin, so every suggestion is welcome).

The menu items are registered, I extended the site table by columns for url and search parameter name. The settings area is working.

Now, the logs have to be analyzed. As far as I can see, there are no existing API methods, that would provide adequate functionality. This whole DataTable business seems to be pretty complex (but cool!), so I'd really appreciate it, if somebody would help me get started with this (Documentation, Tips or Code).

What I have done so far can be found on github:

http://github.com/BeezyT/piwik-sitesearch

I know, Piwik uses SVN, but github can be accessed via SVN as well:

svn checkout https://svn.github.com/BeezyT/piwik-sitesearch

Thanks for your help,

Timo

timo-bes commented 14 years ago

I just pushed the first version of the keyword analysis to github (including screenshots).

timo-bes commented 14 years ago

The plugin is making good progress...

Have a look at the github wiki for up to date information:

http://wiki.github.com/BeezyT/piwik-sitesearch/

halfdan commented 14 years ago

Looks quite good. However I was you're using mysql_real_escapestring and probably other mysql* specific functions (e.g. http://github.com/BeezyT/piwik-sitesearch/blob/master/SiteSearch.php#L117). You should use the second argument of Piwik_Query to work with parameterised queries (Piwik_Query($sqlQuery, $parameters)) and therefore allow other database backends (in future).

timo-bes commented 14 years ago

Thanks for your feedback (also the other open issues on github)!

I see, why I souldn't use the mysql functions. I replaced the parameters in a query with ? and passed the second argument to Piwik_FetchAll. Now the query is not working anymore, and I can't find a way to debug.

How can I find out, what the query looks like when it is executed?

mattab commented 14 years ago

It looks very interesting start!

Are you interested to have such plugin included in Piwik core? if so, we would need to review the schema updates (to process metrics above, visits per search, total search, search exits, etc.).

Tracker:

Archiving: the plugin doesn't do archiving currently, I understand you pointed out code was not reusable. Indeed because you are doing "new" metrics in the Piwik world :-) but technically your code should archive data using the same mechanism used, for example, in the Visits by Server Time. You can then lookup query enrich*() in ArchiveProcessing, to see what you would base your query on.

Your integration of Search Results using custom data is very cool!! First cool use case of this function. And your code looks really good. This would be amazing to have in Piwik core for sure :)

timo-bes commented 14 years ago

Thanks for the feedback, matt.

I know that performance is a huge issue at the moment. To be honest, I didn't care much about it yet since it's still more a proof of concept. At the moment, I'm adding a search refinements feature, which is the last of the must-haves. This is probably the most performance critical, I think, we have to extend the schema a little more to get efficiency.

Archiving would be great, I read a lot of code, but I still don't get how it's done :-(. Some sort of documentation about that would be great, but I guess, the target group isn't that big... So if you (or anybody else) have the time, feel free to fork the project and get the archiving process started. I'm very open to collaboration!

What does including the plugin in piwik core mean? That is comes with piwik by default? That would be great, but I'd like to keep working on it (at least as much as I have time for it). Can we find a solution for a common version control? I'd be happy to stick with github, but if you guys have a better suggestion, I'm open...

mattab commented 14 years ago

github is perfect for now until the code is maybe ready, and committed to SVN trunk. Then you could have SVN commit and be part of the team if it interests you :)

Before that, it would need to be in line with other plugins in terms of performance and vision. Yours is a great start so promising.

Regarding Archiving, the big idea is to query the logs GROUPED BY a given entity (eg. keyword), and then request common stats for all keywords (visits, pages, avg time on site, bounce count, etc.). The helpers in ArchiveProcessing/* are doing this. Check out enrich* methods in particular. You can of course write SQL directly in your archiving module, but you can then create datatables. The advantage is that you can just sum them automatically when archiving week and months (which are sums of days). So it makes the code smaller to reuse these classes.

Let us know how it goes. good luck!

timo-bes commented 14 years ago

I started implementing the archiving process, and I'm not sure what the best solution is. What I did:

I only implemented this for the keyword overview and for the day archive. Before I go on, please have a look and tell me whether that's what you had in mind...

mattab commented 14 years ago

Quick question: is the code working in its current state?

The concept of archiving in Piwik is explained briefly in: http://dev.piwik.org/trac/wiki/DatabaseSchema#Archiveddata

Idea is:

Let me know if you need specific guidance.

timo-bes commented 14 years ago

Thanks for the comment. I had most of that figured out by now, but still it's good to know, that there is documentation ;-)

I have some specific questions:

When the user clicks a keyword, the plugin shows statistics for that keyword only (following pages, previous pages, evolution, search refinements).

  1. Should I store the DataTable for each keyword in an individual archive record? (At the moment, the plugin is doing that, there can be quite a lot of keywords, but I don't see a more efficient way.)
  2. Should I trigger archiving the DataTables related to only one keyword before the keyword is clicked, meaning when the general DataTables are archived? (At the moment the plugin is doing that, and it's not performing well. The alternative would be to trigger the archiving process only when the user clicks a keyword. But would we then have to handle the cronjob archiving separately, and include archiving the keyword-DataTables in it?)

I haven't worked on the plugin for a few days now, and I won't have much time for the next 4 weeks or so, but after that, I'm planning to finish the the first beta version within a few weeks.

mattab commented 14 years ago
  1. What data set do you store on a per keyword basis? If it is small, ie. a few metrics, it is best to store the metrics for all keywords in the same datatable. The table should also be truncated after 1,000 keywords to keep it manageable / fast to load in memory (see http://piwik.org/faq/how-to/#faq_54 )
  2. When archiving, the plugin should archive data for all keywords at once. Most data sets will never be displayed/used, but in Piwik we prep-process all reports. Your archiving process will also be triggered by the cron task.
timo-bes commented 14 years ago

The metrics stored on a per keyword basis are:

At the moment, only the first two metrics are archived, and it still takes a long time to complete, when there are many keywords.

Have a look at http://github.com/BeezyT/piwik-sitesearch/blob/master/Archive.php (method archiveDay). The main performance issue is, that I have to analyze the actions (not only the visits) a lot - for every keyword.

Here are some more specific questions:

Thanks for your help, matt! I really appreciate it.

mattab commented 14 years ago

EZdesign, sorry for the delay. Have you made further progress?

timo-bes commented 14 years ago

Thanks for the feedback, it helped getting the additional archiving time for my test database down from 100 to 5 seconds ;-)

I'll let you know, when I have some more specific questions.

timo-bes commented 14 years ago

The plugin is making great progress, everything uses archiving and seems to work now. You could say, that we have reached the first beta version. If you have any bug reports, please create issues on github.

There is one problem I'm having with the evolution graph, that I can't figure out. Have a look at this screenshot: http://github.com/downloads/BeezyT/piwik-sitesearch/Percentage.png

The axis is not scaled properly...

The Controller method is called searchPercentage, the API method is getSearchPercentageEvolution.

Did anybody have this problem before? Is it a bug or am I doing something wrong??

Btw, if you had the plugin installed previously and want to update to the latest version, remove the schema changes from piwik_site and piwik_log_action by hand, run the install method again and then check "analyze urls now" in the settings.

robocoder commented 14 years ago

Please test against trunk. In fixing #1562 (displaying goal conversation rates, i.e., percentages), we've made some changes to the visualization code.

timo-bes commented 14 years ago

The ticket is about exactly the same problem, but unsing trunk didn't help.

robocoder commented 14 years ago

When you use ColumnCallbackAddColumnPercentage, the result is a localized number with a '%'. This locale-specific format works well when displayed in the table, but it's a string, not a number. When the Visualization code goes to find the max value, PHP's max() function does a string comparison, so "13.5%" is "bigger" than "100%".

We also run into an issue with locales. Consider 3/4. In "en_US.UTF-8", this would become "0.75%". In "de_DE.UTF-8", this becomes "0,75%". Casting to (float) isn't locale-aware.

Can you use ColumnCallbackReplace with Piwik::getPercentageSafe?

re: search_percentage. core/ViewDataTable/GenerateGraphData/ChartEvolution.php will guess the unit from the column name. We can add _percentage to the list, or you can use _rate (e.g., search_rate), or you can explicitly set the Y-axis unit ('%').

timo-bes commented 14 years ago

You hit the nail on the head with that response! Works fine now.

I just released v0.1.2: it includes the fix and some widgets I added yesterday. If you want to test / use the plugin, I recommend using only the commits tagged with a version number. They should be more or less stable. If you don't want to use git or svn to access github, there are tgz/zip archives of the releases in the downloads section on github.

Looking forward to your feedback...

robocoder commented 14 years ago

Timo: your code is missing a license statement.

timo-bes commented 14 years ago

Hey guys, I was just checking out repopular (http://repopular.com/) and what do I see on the first page? My plugin!

Thanks for the publicity, are you using it?

@vipsoft: what license do you recommend? What do I need to pick, so you can add it to the core when the time is right?

robocoder commented 14 years ago

Must have been my tweet.

The license is up to you. For inclusion with Piwik core, we require that it be GPL v3 compatible, e.g., GPL v3, BSD, MIT, or LGPL v3. Affero GPL v3 isn't strictly compatible, but is also allowed.

anonymous-matomo-user commented 14 years ago

Are you able to add this site search plugin to the latest code?

robocoder commented 14 years ago

Sorry for the delay. I'm going to try and squeeze in a review this week.

robocoder commented 14 years ago

Ok. That was a pleasant code read. Only a few issues to address/discuss with Timo and Matt:

robocoder commented 14 years ago

re: logResults:

timo-bes commented 14 years ago

Thanks for the review.

I had planned to remove the logging and the dev folder from the plugin and move them to a separate plugin, that I use for development. If you want to include this functionality in the core, that's fine with me as well.

I also have some questions regarding the tracker cache (Matt's comment:20):

Thanks for your help.

robocoder commented 14 years ago

The tracker cache are files in tmp/cache/tracker to reduce the number of SQL queries by the tracker.

In plugins/SitesManager/SitesManager.php, recordWebsiteDataInCache() hooks on "Common.fetchWebsiteAttributes" to cache site data. The site search url and search parameter could also be saved this way.

In API.php, any update of the site table is followed by a call to Piwik_Common::regenerateWebsiteCacheAttributes().

brb

robocoder commented 14 years ago

Last part: logResults would call Piwik_Common::getCacheWebsiteAttributes( $idSite ) to access the tracker cache (which may already be loaded at this point), thus avoiding a SELECT during tracking.

anonymous-matomo-user commented 13 years ago

Hi!

The archiving job fails in piwik 1.2 with a SQL error, whhen viewing today, following patch fixes this.

--- plugins/SiteSearch/Archive.php.orig 2011-03-03 14:45:44.000000000 +0100
+++ plugins/SiteSearch/Archive.php      2011-03-03 14:46:05.000000000 +0100
@@ -403,7 +403,7 @@
                                visit_action.idaction_url_ref != 0 AND
                                action_set.search_term IS NOT NULL AND
                            action_get.search_term IS NULL AND
-                               (visit.visit_server_date BETWEEN :startDate AND :endDate)
+                               (visit_action.server_time BETWEEN :startDate AND :endDate)
                        GROUP BY
                                search.id,
                                action_get.idaction

Regards

Marco

timo-bes commented 13 years ago

Thanks Marco, that was spot on! I also added a check for the Piwik version, because the old query breaks the new verion and the new query breaks the old version... (See Github)

timo-bes commented 13 years ago

I just released a new version with numerous improvements, including tracker cache. Please notice the release notes in the README, otherwise the plugin won't work anymore.

anonymous-matomo-user commented 13 years ago

After upgrading to Piwik 1.3 the Site Search 'add on' Internal Search Evolution has stopped working. Percentage of internal search users and Percentage of users are still working fine. It (ISE) was working yesterday (showing data) but now we just get a flat line, even if looking back at previous data

[http://forum.piwik.org/read.php?2,75306]

timo-bes commented 13 years ago

Thanks jekko for submitting the report. The latest commit at github will fix your problems.

In 1.3, the constructor signature of Piwik_DataTable_Filter_ReplaceColumnNames has been changed and that broke the search evolution chart. This has happened a couple of times now, that a new Piwik release changes vital things like the database schema or core signatures - without any chance for me to have a trial run before the release. After the new release is out, bug reports come in, and I have to take the blame for writing an incompatible plugin. Am I the only plugin developer or is there something I don't know about (like a developer release before the public release)? This has to happen to other people as well, so there has to be something, right?

Further, 1.3 introduced the custom date range. Is there any documentation on how that works? Previously, I was relying on Piwik_Controller::$date. Someone added the comment "null if the requested date is a range", but I doesn't say what to when it's null. Where do I get the date? How does archiving date ranges work?

mattab commented 13 years ago

EZdesign, I hear your complaint. We have done a two weeks long beta testing, advertised it on the blog post & twitter & facebook but maybe you have missed the announcement. Maybe we should have some kind of lists for all beta testers (and plugin developers, etc.)?

It happens often to you with Search Tracking, because you are building one of the most advanced piwik plugins, so most likely that when we change core API it breaks. It is part of our goals to keep the API stable as much as possible, but sometimes there is no choice as we are still fast evolving.

On this note, we should integrate Search Tracking in core... it is a very useful plugin. However I think it should be improved performance wise, and maybe feature set. If you have time and interest for this, maybe we can work together? (also, a sponsor bounty would be possible for such work, if we include in core)

Date Range: if you use standard archivePeriod hooks, piwik will handle date range automatically (it sums daily periods for ranges, like it sums daily periods for weeks and months). If you have to do manual coding for period=range there is probably something wrong, or something that could be improved.

timo-bes commented 13 years ago

Thanks for the quick answer, matt.

Good to know, that there is a beta testing phase... Btw, it was not announced on the blog, otherwise I most likely would have read it. A mailing list for beta testers / developers would be great (or something else that creates some kind of push notification).

Including the plugin in the core sounds good for me. I'd be interested in working togerther on this. And if it's sponsored, making time for further development would be easier, of course ;-)

Can we maybe talk on skype about this?

Date Range: The overall date management of the plugin definately can be improved, but I can't find a clean way to do so. This could be one of the first things, we would improve together.

mattab commented 13 years ago

1.3-rc1 was announced on the blog: http://piwik.org/blog/2011/04/new-piwik-mobile-app-released-also-piwik-1-3rc1-available-for-early-adopters/

Sure we can talk on skype, my skype is my first name dot last name cheers

timo-bes commented 13 years ago

Oops, I missed that behind the Piwik Mobile headline. My bad.

anonymous-matomo-user commented 13 years ago

@EZdesign: Great Plugin. Thanks for your time developing this. I would love to see that in core.

I'm trying to use the SiteSearch Plugin (0.1.7 with Piwik 1.3) to track a TYPO3 page with activated "indexed_search" module. To date the SiteSearch widgets doesn't show any data even if the searches are correctly recorded by piwik (they show up correctly under actions/pages). The tabel "piwik_log_sitesearch" is empty.

By default indexed_search uses the parameter "tx_indexedsearch[sword]" to pass the search word. The search URL looks like this

http://www.example.org/search.html?tx_indexedsearch[sword]=bags&tx_indexedsearch[sections]=0&tx_indexedsearch[submit_button]=Search&search=Search

In the SiteSearch configuration I use "/search.html" as search URL and "tx_indexedsearch[sword]" as search parameter. Could the square brackets in the search parameter be responsible for the trouble?

P.S. There is a typo in the german localisation of "SiteSearch_TableNoData". It should read "Verfgung I think ;)

P.P.S Version number in "SiteSearch.php" doesn't correspond to the one in github

anonymous-matomo-user commented 13 years ago

Thank you for this great plugin. I have only one suggestion: We have more than one internal search forms on our site. That wouldn't be a problem if I was able to use regular expressions to define my search site. I'd love to see this feature in future release.

mattab commented 13 years ago

An excellent article about how to use Site Search feature: http://www.cxfocus.com/index.php/google-analytics-tips/google-analytics-site-search-report/

Maybe we could somehow integrate "Analysis tips" in the UI and display in the UI the main questions raised in this article, to help users of the feature to find out interesting facts from the data.

anonymous-matomo-user commented 12 years ago

looking at githup of this plugin, it seems to no more actively developed (no update for about 1 year, https://github.com/BeezyT/piwik-sitesearch), so I am a little bit hesitant to install this on our site (due to maybe foreseeable problems in the future with new piwik releases)

It would be nice if this could become somewhat part of a more actively maintained structure (e.g. as is Anonymous IP plugin or maybe even part of piwik core)

timo-bes commented 12 years ago

Thanks for your interest in the plugin, jens.

There are plans to integrate the plugin in Piwik core. It's not certain yet, but they might be realized very soon as part of a sponsored project. If we integrate it in core, most of it will be overhauled (especially the backend) which means you wound have to set it up again and the reports would start from scratch as well.

If you want to analyze your internal search now, go ahead and use the plugin from github. It works well for many users. For a more performant core version, keep an eye on this ticket.