Closed mattab closed 11 years ago
I'm subscribing, as this is an important feature we already have in the Drupal module for Goggle Analytics and is currently commented out in Piwik module.
Google is good at analysing search quality / search in general. We can get ideas from how google analytics does internal search tracking.
Site search http://www.google.com/support/analytics/bin/topic.py?topic=12626
General http://www.google.com/support/analytics/bin/answer.py?answer=75961
Google Analytics uses the following formulas to calculate the metrics used in internal site search reports:
* Visits with Search = The number of visits that used your site's search function at least once.
* Percentage of visits that used internal search = Visits with Search / Total Visits
* Total Unique Searches = The total number of times your site search was used. This excludes multiple searches on the same keyword during the same visit.
* Results Pageviews / Search = Pageviews of search result pages / Total Unique Searches
* Search Exits = The number of searches a visitor made immediately before leaving the site.
* Percentage of Search Exits = Search Exits / Visits with Search
* Search Refinements = The number of times a visitor searched again immediately after performing a search.
* Percentage Search Refinements = The percentage of searches that resulted in a search refinement. Calculated as Search Refinements / Pageviews of search result pages.
* Time after Search = The average amount of time visitors spend on your site after performing a search. This is calculated as Sum of all "search_duration" across all searches / ("search_transitions" + 1)
* Search Depth = The average number of pages visitors viewed after performing a search. This is calculated as Sum of all "search_depth" across all searches / ("search_transitions" + 1)
Example Calculations This section describes a visitor's experience with your website's search engine and explains how Google Analytics calculates the resulting data. The visitor progresses through three different pages when interacting with your website's search engine:
* Search Page - Page on site where the visitor enters terms for a web search
* Search Results Page - Results page that is returned on a search engine query
* Results Pageview - The page viewed after a click on a results page
Assuming your website received three visits from visitors that navigate as described...:
* Visit 1: (time between "camera" term search page and "black camera" term search page is 30 seconds; and "black camera" search to site exit is 60 seconds)
o Search Term Page (term "camera") >
o Results Page >
o view Results Pageview >
o view Results Pageview >
o Search Term Page (term "black camera") >
o Search Results Page >
o view Results Pageview >
o view Results Pageview >
o view Results Pageview >
o Site exit
* Visit 2 : (time between "computer" term search page to site exit is 15 seconds)
o Search Page (term "computer") >
o Results Page >
o Site exit
* Visit 3:
o No Search
...The following metrics can now be calculated:
* % Visits used internal search = 2 Visitor that used site search (Visit 1 & Visit 2) / 3 Total Visitors = 66.7%
* Visits with Search = 2 (Visit 1 & Visit 2)
* Total Unique Searches = 3 ("camera", "black camera", "computer")
* Results Pageviews / Search = (2 + 3) / 3 = 1.67
* Search Exits = 1 (Visit 2)
* % Search Exits = 1 (Visit 2) / 2 (Visit 1 & Visit 2) = 50%
* Refinement = 1 (Visit 1 - "black camera")
* % Refinement = 1 (Visit 1 - "black camera") / 3 = 33.3%
* Time after Search = (30 seconds + 60 seconds + 15 seconds) / (1 + 1) [1 & visit 2](visit) = 52.5 sec
* Search Depth = (2 [+ 3 ["black camera"]("camera"]) + 0 [/ (1 [Visit 1]("computer"])) + 1 [2](Visit)) = 2.5
Any progress with this plugin? Is there a dev version of it somewhere? Are there any other plugins like this?
Hi,
have there been any new developments on this issue?
We'd love to see an internal search feature in Piwik as soon as possible because in our opinion it's one of the most important features that are still missing in comparison to commercial tools like Google Analytics.
I'm working for a small web agency and we would like to start implementing this feature, or offer our help if there's already someone working on it. :-)
Benjamin
Benjamin: afaik, no is working on it; so, feel free to implement and share.
Hey guys,
I gave this plugin a try (it's my first Piwik plugin, so every suggestion is welcome).
The menu items are registered, I extended the site table by columns for url and search parameter name. The settings area is working.
Now, the logs have to be analyzed. As far as I can see, there are no existing API methods, that would provide adequate functionality. This whole DataTable business seems to be pretty complex (but cool!), so I'd really appreciate it, if somebody would help me get started with this (Documentation, Tips or Code).
What I have done so far can be found on github:
http://github.com/BeezyT/piwik-sitesearch
I know, Piwik uses SVN, but github can be accessed via SVN as well:
svn checkout https://svn.github.com/BeezyT/piwik-sitesearch
Thanks for your help,
Timo
I just pushed the first version of the keyword analysis to github (including screenshots).
The plugin is making good progress...
Have a look at the github wiki for up to date information:
Looks quite good. However I was you're using mysql_real_escapestring and probably other mysql* specific functions (e.g. http://github.com/BeezyT/piwik-sitesearch/blob/master/SiteSearch.php#L117). You should use the second argument of Piwik_Query to work with parameterised queries (Piwik_Query($sqlQuery, $parameters)) and therefore allow other database backends (in future).
Thanks for your feedback (also the other open issues on github)!
I see, why I souldn't use the mysql functions. I replaced the parameters in a query with ? and passed the second argument to Piwik_FetchAll. Now the query is not working anymore, and I can't find a way to debug.
How can I find out, what the query looks like when it is executed?
It looks very interesting start!
Are you interested to have such plugin included in Piwik core? if so, we would need to review the schema updates (to process metrics above, visits per search, total search, search exits, etc.).
Tracker:
Archiving: the plugin doesn't do archiving currently, I understand you pointed out code was not reusable. Indeed because you are doing "new" metrics in the Piwik world :-) but technically your code should archive data using the same mechanism used, for example, in the Visits by Server Time. You can then lookup query enrich*() in ArchiveProcessing, to see what you would base your query on.
Your integration of Search Results using custom data is very cool!! First cool use case of this function. And your code looks really good. This would be amazing to have in Piwik core for sure :)
Thanks for the feedback, matt.
I know that performance is a huge issue at the moment. To be honest, I didn't care much about it yet since it's still more a proof of concept. At the moment, I'm adding a search refinements feature, which is the last of the must-haves. This is probably the most performance critical, I think, we have to extend the schema a little more to get efficiency.
Archiving would be great, I read a lot of code, but I still don't get how it's done :-(. Some sort of documentation about that would be great, but I guess, the target group isn't that big... So if you (or anybody else) have the time, feel free to fork the project and get the archiving process started. I'm very open to collaboration!
What does including the plugin in piwik core mean? That is comes with piwik by default? That would be great, but I'd like to keep working on it (at least as much as I have time for it). Can we find a solution for a common version control? I'd be happy to stick with github, but if you guys have a better suggestion, I'm open...
github is perfect for now until the code is maybe ready, and committed to SVN trunk. Then you could have SVN commit and be part of the team if it interests you :)
Before that, it would need to be in line with other plugins in terms of performance and vision. Yours is a great start so promising.
Regarding Archiving, the big idea is to query the logs GROUPED BY a given entity (eg. keyword), and then request common stats for all keywords (visits, pages, avg time on site, bounce count, etc.). The helpers in ArchiveProcessing/* are doing this. Check out enrich* methods in particular. You can of course write SQL directly in your archiving module, but you can then create datatables. The advantage is that you can just sum them automatically when archiving week and months (which are sums of days). So it makes the code smaller to reuse these classes.
Let us know how it goes. good luck!
I started implementing the archiving process, and I'm not sure what the best solution is. What I did:
I only implemented this for the keyword overview and for the day archive. Before I go on, please have a look and tell me whether that's what you had in mind...
Quick question: is the code working in its current state?
The concept of archiving in Piwik is explained briefly in: http://dev.piwik.org/trac/wiki/DatabaseSchema#Archiveddata
Idea is:
Let me know if you need specific guidance.
Thanks for the comment. I had most of that figured out by now, but still it's good to know, that there is documentation ;-)
I have some specific questions:
When the user clicks a keyword, the plugin shows statistics for that keyword only (following pages, previous pages, evolution, search refinements).
I haven't worked on the plugin for a few days now, and I won't have much time for the next 4 weeks or so, but after that, I'm planning to finish the the first beta version within a few weeks.
The metrics stored on a per keyword basis are:
At the moment, only the first two metrics are archived, and it still takes a long time to complete, when there are many keywords.
Have a look at http://github.com/BeezyT/piwik-sitesearch/blob/master/Archive.php (method archiveDay). The main performance issue is, that I have to analyze the actions (not only the visits) a lot - for every keyword.
Here are some more specific questions:
Thanks for your help, matt! I really appreciate it.
EZdesign, sorry for the delay. Have you made further progress?
Thanks for the feedback, it helped getting the additional archiving time for my test database down from 100 to 5 seconds ;-)
I'll let you know, when I have some more specific questions.
The plugin is making great progress, everything uses archiving and seems to work now. You could say, that we have reached the first beta version. If you have any bug reports, please create issues on github.
There is one problem I'm having with the evolution graph, that I can't figure out. Have a look at this screenshot: http://github.com/downloads/BeezyT/piwik-sitesearch/Percentage.png
The axis is not scaled properly...
The Controller method is called searchPercentage, the API method is getSearchPercentageEvolution.
Did anybody have this problem before? Is it a bug or am I doing something wrong??
Btw, if you had the plugin installed previously and want to update to the latest version, remove the schema changes from piwik_site and piwik_log_action by hand, run the install method again and then check "analyze urls now" in the settings.
Please test against trunk. In fixing #1562 (displaying goal conversation rates, i.e., percentages), we've made some changes to the visualization code.
The ticket is about exactly the same problem, but unsing trunk didn't help.
When you use ColumnCallbackAddColumnPercentage, the result is a localized number with a '%'. This locale-specific format works well when displayed in the table, but it's a string, not a number. When the Visualization code goes to find the max value, PHP's max() function does a string comparison, so "13.5%" is "bigger" than "100%".
We also run into an issue with locales. Consider 3/4. In "en_US.UTF-8", this would become "0.75%". In "de_DE.UTF-8", this becomes "0,75%". Casting to (float) isn't locale-aware.
Can you use ColumnCallbackReplace with Piwik::getPercentageSafe?
re: search_percentage. core/ViewDataTable/GenerateGraphData/ChartEvolution.php will guess the unit from the column name. We can add _percentage to the list, or you can use _rate (e.g., search_rate), or you can explicitly set the Y-axis unit ('%').
You hit the nail on the head with that response! Works fine now.
I just released v0.1.2: it includes the fix and some widgets I added yesterday. If you want to test / use the plugin, I recommend using only the commits tagged with a version number. They should be more or less stable. If you don't want to use git or svn to access github, there are tgz/zip archives of the releases in the downloads section on github.
Looking forward to your feedback...
Timo: your code is missing a license statement.
Hey guys, I was just checking out repopular (http://repopular.com/) and what do I see on the first page? My plugin!
Thanks for the publicity, are you using it?
@vipsoft: what license do you recommend? What do I need to pick, so you can add it to the core when the time is right?
Must have been my tweet.
The license is up to you. For inclusion with Piwik core, we require that it be GPL v3 compatible, e.g., GPL v3, BSD, MIT, or LGPL v3. Affero GPL v3 isn't strictly compatible, but is also allowed.
Are you able to add this site search plugin to the latest code?
Sorry for the delay. I'm going to try and squeeze in a review this week.
Ok. That was a pleasant code read. Only a few issues to address/discuss with Timo and Matt:
re: logResults:
Matt's comment:20
the search URL and search term should be archived in the Tracker file (tmp/cache/tracker/) - check out the hook 'Common.fetchWebsiteAttributes' and how it can be used. the goal is to do less requests at Tracker time.
Thanks for the review.
I had planned to remove the logging and the dev folder from the plugin and move them to a separate plugin, that I use for development. If you want to include this functionality in the core, that's fine with me as well.
I also have some questions regarding the tracker cache (Matt's comment:20):
Thanks for your help.
The tracker cache are files in tmp/cache/tracker to reduce the number of SQL queries by the tracker.
In plugins/SitesManager/SitesManager.php, recordWebsiteDataInCache() hooks on "Common.fetchWebsiteAttributes" to cache site data. The site search url and search parameter could also be saved this way.
In API.php, any update of the site table is followed by a call to Piwik_Common::regenerateWebsiteCacheAttributes().
brb
Last part: logResults would call Piwik_Common::getCacheWebsiteAttributes( $idSite ) to access the tracker cache (which may already be loaded at this point), thus avoiding a SELECT during tracking.
Hi!
The archiving job fails in piwik 1.2 with a SQL error, whhen viewing today, following patch fixes this.
--- plugins/SiteSearch/Archive.php.orig 2011-03-03 14:45:44.000000000 +0100
+++ plugins/SiteSearch/Archive.php 2011-03-03 14:46:05.000000000 +0100
@@ -403,7 +403,7 @@
visit_action.idaction_url_ref != 0 AND
action_set.search_term IS NOT NULL AND
action_get.search_term IS NULL AND
- (visit.visit_server_date BETWEEN :startDate AND :endDate)
+ (visit_action.server_time BETWEEN :startDate AND :endDate)
GROUP BY
search.id,
action_get.idaction
Regards
Marco
Thanks Marco, that was spot on! I also added a check for the Piwik version, because the old query breaks the new verion and the new query breaks the old version... (See Github)
I just released a new version with numerous improvements, including tracker cache. Please notice the release notes in the README, otherwise the plugin won't work anymore.
After upgrading to Piwik 1.3 the Site Search 'add on' Internal Search Evolution has stopped working. Percentage of internal search users and Percentage of users are still working fine. It (ISE) was working yesterday (showing data) but now we just get a flat line, even if looking back at previous data
Thanks jekko for submitting the report. The latest commit at github will fix your problems.
In 1.3, the constructor signature of Piwik_DataTable_Filter_ReplaceColumnNames has been changed and that broke the search evolution chart. This has happened a couple of times now, that a new Piwik release changes vital things like the database schema or core signatures - without any chance for me to have a trial run before the release. After the new release is out, bug reports come in, and I have to take the blame for writing an incompatible plugin. Am I the only plugin developer or is there something I don't know about (like a developer release before the public release)? This has to happen to other people as well, so there has to be something, right?
Further, 1.3 introduced the custom date range. Is there any documentation on how that works? Previously, I was relying on Piwik_Controller::$date. Someone added the comment "null if the requested date is a range", but I doesn't say what to when it's null. Where do I get the date? How does archiving date ranges work?
EZdesign, I hear your complaint. We have done a two weeks long beta testing, advertised it on the blog post & twitter & facebook but maybe you have missed the announcement. Maybe we should have some kind of lists for all beta testers (and plugin developers, etc.)?
It happens often to you with Search Tracking, because you are building one of the most advanced piwik plugins, so most likely that when we change core API it breaks. It is part of our goals to keep the API stable as much as possible, but sometimes there is no choice as we are still fast evolving.
On this note, we should integrate Search Tracking in core... it is a very useful plugin. However I think it should be improved performance wise, and maybe feature set. If you have time and interest for this, maybe we can work together? (also, a sponsor bounty would be possible for such work, if we include in core)
Date Range: if you use standard archivePeriod hooks, piwik will handle date range automatically (it sums daily periods for ranges, like it sums daily periods for weeks and months). If you have to do manual coding for period=range there is probably something wrong, or something that could be improved.
Thanks for the quick answer, matt.
Good to know, that there is a beta testing phase... Btw, it was not announced on the blog, otherwise I most likely would have read it. A mailing list for beta testers / developers would be great (or something else that creates some kind of push notification).
Including the plugin in the core sounds good for me. I'd be interested in working togerther on this. And if it's sponsored, making time for further development would be easier, of course ;-)
Can we maybe talk on skype about this?
Date Range: The overall date management of the plugin definately can be improved, but I can't find a clean way to do so. This could be one of the first things, we would improve together.
1.3-rc1 was announced on the blog: http://piwik.org/blog/2011/04/new-piwik-mobile-app-released-also-piwik-1-3rc1-available-for-early-adopters/
Sure we can talk on skype, my skype is my first name dot last name cheers
Oops, I missed that behind the Piwik Mobile headline. My bad.
@EZdesign: Great Plugin. Thanks for your time developing this. I would love to see that in core.
I'm trying to use the SiteSearch Plugin (0.1.7 with Piwik 1.3) to track a TYPO3 page with activated "indexed_search" module. To date the SiteSearch widgets doesn't show any data even if the searches are correctly recorded by piwik (they show up correctly under actions/pages). The tabel "piwik_log_sitesearch" is empty.
By default indexed_search uses the parameter "tx_indexedsearch[sword]" to pass the search word. The search URL looks like this
http://www.example.org/search.html?tx_indexedsearch[sword]=bags&tx_indexedsearch[sections]=0&tx_indexedsearch[submit_button]=Search&search=Search
In the SiteSearch configuration I use "/search.html" as search URL and "tx_indexedsearch[sword]" as search parameter. Could the square brackets in the search parameter be responsible for the trouble?
P.S. There is a typo in the german localisation of "SiteSearch_TableNoData". It should read "Verfgung I think ;)
P.P.S Version number in "SiteSearch.php" doesn't correspond to the one in github
Thank you for this great plugin. I have only one suggestion: We have more than one internal search forms on our site. That wouldn't be a problem if I was able to use regular expressions to define my search site. I'd love to see this feature in future release.
An excellent article about how to use Site Search feature: http://www.cxfocus.com/index.php/google-analytics-tips/google-analytics-site-search-report/
Maybe we could somehow integrate "Analysis tips" in the UI and display in the UI the main questions raised in this article, to help users of the feature to find out interesting facts from the data.
looking at githup of this plugin, it seems to no more actively developed (no update for about 1 year, https://github.com/BeezyT/piwik-sitesearch), so I am a little bit hesitant to install this on our site (due to maybe foreseeable problems in the future with new piwik releases)
It would be nice if this could become somewhat part of a more actively maintained structure (e.g. as is Anonymous IP plugin or maybe even part of piwik core)
Thanks for your interest in the plugin, jens.
There are plans to integrate the plugin in Piwik core. It's not certain yet, but they might be realized very soon as part of a sponsored project. If we integrate it in core, most of it will be overhauled (especially the backend) which means you wound have to set it up again and the reports would start from scratch as well.
If you want to analyze your internal search now, go ahead and use the plugin from github. It works well for many users. For a more performant core version, keep an eye on this ticket.
Piwik now has internal search keywords tracking. Awesome!
Functionnality: