wpsharks / wp-kb-articles

KB Articles for WordPress; adds a new Custom Post Type.
http://wpkbarticles.com/
2 stars 1 forks source link

Search to be Dead-On-Balls Accurate! #101

Closed jaswrks closed 9 years ago

jaswrks commented 9 years ago

Next Actions

None. All of these solutions would leave the search engine UI and underlying result listing functionality the same. The only change would be in the underlying index and API calls to retrieve results; based on the search engine you have selected as your preference in WPKBA configuration options.


Possible UI Enhancements

jaswrks commented 9 years ago

Note: Google Custom Search is limited to 100 queries per day. You then pay $5 for every 1000 queries. WPKBA can implement caching to reduce cost, but Google CSE and ElasticSearch are both going to have some overhead.

raamdev commented 9 years ago

One way to improve this mode would be to force a phrase match by default.

What if we did two searches: a regular search (e.g., membership options page, the current behavior) and a phrase-match search (e.g., "membership options page") and then merged the results of both searches, ensuring that the phrase-matched results always appeared first (at the top)?

I'm guessing this is really expensive performance-wise, but it seems like that would have a much better effect on the the results than choosing one method over the other. We could also cache the results for certain terms to improve performance, only clearing that cache when a new KB Article is published.

Note: Google Custom Search is limited to 100 queries per day. You then pay $5 for every 1000 queries. WPKBA can implement caching to reduce cost, but Google CSE and ElasticSearch are both going to have some overhead.

For that reason, I would be hesitant to go that route. Such an option for search may be a great Pro feature to help market the Pro version, but I'd say the majority of site owners are not going to like the idea of X number of searches (something they can't really control) generating an incremental cost. It also seems like it gives up a lot of control over what we (WPKBA developers) can do about tweaking search functionality.

We could also consider adding a secondary MySQL table that is specifically for searching KB articles.

That sounds like the second best option to me (second to improving how current search works using what we already have available). I like that idea more than integrating with an outside service because it means we can really tweak and optimize things exactly how they need to be optimized--i.e., we'd have full control over the entire search engine. That would allow us to extend it, add features going forward, etc.


Here's what I think the plan should be (in order of priority):

Thoughts?

raamdev commented 9 years ago

Another thing I just realized that would greatly improve search:

We tag KB Articles with things like Membership Options Page (MOP). If someone searches for membership options page, we should be able to see that there's already a tag that starts with that phrase and give those KB Articles the highest priority in the search results.

I realize this may require running more than one query, but it seems like an easy way to match searches with very relevant results. If any search term is a partial or exact match for an existing tag, all KB Articles with that tag should be considered most relevant. (If there are KB Articles with only that tag, those would be of even higher relevance, but that's taking it a step further...)

jaswrks commented 9 years ago

@raamdev writes...

What if we did two searches: a regular search (e.g., membership options page, the current behavior) and a phrase-match search (e.g., "membership options page") and then merged the results of both searches, ensuring that the phrase-matched results always appeared first (at the top)?

I'm guessing this is really expensive performance-wise

That's a good idea. This could be a good short-term solution, but yes, definitely expensive. It will not scale well at all; i.e., more than 200 articles and this is going to get very slow and could even get WPKBA flagged by some hosting companies.

@raamdev writes...

Here's what I think the plan should be (in order of priority):

Work to improve current search results, even if only marginally, by tweaking how searches currently work using what WordPress provides us with. At a future date (i.e., after a public release, maybe a few months from now), revisit the idea of adding a second WP KB Articles Database Table, one that has fulltext indexes. At a future date (i.e., after a public release, and after we revisit adding a second database table), revisit the idea of adding integration for Google CSE and ElasticSearch.

Thoughts?

I think what we have already is just hacked together, so adding another double query hack on top of this seems wrong to me. Hackety hacks, because we are being forced to work within an established set of rules presented by the WordPress core Post Type API, which is known to suck at searching anything; i.e., it does not support FULLTEXT indexes.

I have worked extensively with FULLTEXT indexes in the past (all the way to back to 2004), so I'm happy to add this in. I see this taking less time than continuing with hacks.

@raamdev writes...

I like that idea more than integrating with an outside service

I also agree that relying on outside services adds an unnecessary dependency for a majority of WordPress sites. However, for some people (like me), I will not trust anything but Google or something like Elastisearch; i.e., a true search engine that learns from each search. So for that reason, I think Elastisearch would be a great option. I'd like to explore adding this to our stack at some point in the future, and then we can introduce the feature in WPKBA also. Does that work well for just anyone though? Nope, I'd say no.

For now, adding a second search index (i.e., another DB table that supports MySQL FULLTEXT indexes); one that is specifically tuned for WPKBA seems like the best short-term option to me.

@raamdev writes...

we should be able to see that there's already a tag that starts with that phrase and give those KB Articles the highest priority in the search results.

Cool idea. That can be integrated with MySQL FULLTEXT indexes and the sorting algorithm could consider this also. We can assign points to the title, points to the body, points to tags, and then sort by overall relevance, popularity, date, comment count, and hit count.

raamdev commented 9 years ago

more than 200 articles and this is going to get very slow and could even get WPKBA flagged by some hosting companies.

Ah, I had not even thought about getting flagged by hosting companies--that's a great point. I agree, we shouldn't go that route.

I think what we have already is just hacked together, so adding another double query hack on top of this seems wrong to me. Hackety hacks [...] I have worked extensively with FULLTEXT indexes in the past (all the way to back to 2004), so I'm happy to add this in. I see this taking less time than continuing with hacks.

Great! If you feel comfortable adding a separate database table for this, then that would definitely be the best next step.

I will not trust anything but Google or something like Elastisearch; i.e., a true search engine that learns from each search.

Ah, good point. I had not even thought about how those services can learn from each search to make future searches even better. That's definitely a feature worth exploring down the road.

That can be integrated with MySQL FULLTEXT indexes and the sorting algorithm could consider this also. We can assign points to the title, points to the body, points to tags, and then sort by overall relevance, popularity, date, comment count, and hit count.

Perfect!

jaswrks commented 9 years ago

Adding the following table structure.

CREATE TABLE IF NOT EXISTS `wp_kb_articles_index` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `post_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `post_title` text COLLATE utf8_unicode_ci NOT NULL,
  `post_content` longtext COLLATE utf8_unicode_ci NOT NULL,
  `post_tags` text COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `unique_post_id` (`post_id`),
  FULLTEXT KEY `ft_searchable` (`post_title`,`post_content`,`post_tags`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
jaswrks commented 9 years ago

The latest release of WPKBA includes enhanced search functionality with MySQL fulltext indexes. There is also a new JSON API that allows for interaction with the new search engine. Actually, this API can be used for listing articles in various ways; i.e., with or without a search query.

s2Member.com and ZenCache.com have been updated.


Quick API Example

We will need to document this in full, but here's just some quick info to help us use this in tools like Alfred.

Example endpoint: http://zencache.com/?wp_kb_articles[query_api] Search example: http://zencache.com/?wp_kb_articles[query_api][q]=clearing+the+cache Category example: http://zencache.com/?wp_kb_articles[query_api][category]=questions

Other arguments that you can pass:

wp_kb_articles[query_api][tag]=[comma delimited tag slugs or IDs]
wp_kb_articles[query_api][category]=[comma delimited category slugs or IDs]
wp_kb_articles[query_api][author]=[comma delimited author usernames or IDs]
wp_kb_articles[query_api][q]=[search query]

wp_kb_articles[query_api][per_page]=[default is 25]
wp_kb_articles[query_api][page]=[page to show]

wp_kb_articles[query_api][trending_days]=[defaults to 7, impacts trending views only]
wp_kb_articles[query_api][snippet_length]=[defaults to 100, applicable for search queries only]
wp_kb_articles[query_api][expand]=[defaults to 1, set to 0 to reduce details provided in the response]

Quick PHP Example

<?php
$args     = array(
  'q'        => 'clearing cache',
  'per_page' => 2,
);
$endpoint = 'http://zencache.com/?';
$args     = array('wp_kb_articles' => array('query_api' => $args));
$response = file_get_contents($endpoint.http_build_query($args, '', '&'));
$response = json_decode($response);
print_r($response);

Output Response

stdClass Object
(
    [results] => Array
        (
            [0] => stdClass Object
                (
                    [id] => 1054
                    [title] => Clearing the Cache Dynamically
                    [url] => http://zencache.com/kb-article/clearing-the-cache-dynamically/
                    [author] => Raam Dev
                    [time] => 1426203558
                    [visits] => 56
                    [hearts] => 1
                    [relevance] => 5.5
                    [snippet] => ZenCache automatically handles updating the cache in many scenarios, such as when you publish a new post
                    [tags] => Array
                        (
                            [0] => Clearing the Cache
                        )

                )

            [1] => stdClass Object
                (
                    [id] => 871
                    [title] => What are the Clear Cache and Wipe Cache Routines?
                    [url] => http://zencache.com/kb-article/what-are-the-clear-cache-and-wipe-cache-routines/
                    [author] => Raam Dev
                    [time] => 1423931315
                    [visits] => 49
                    [hearts] => 0
                    [relevance] => 4
                    [snippet] => ZenCache occasionally needs to automatically clear or wipe the cache when certain events on your site oc
                    [tags] => Array
                        (
                            [0] => Clearing the Cache
                        )

                )

        )

)

Supported Search Syntax

WPKBA uses MySQL fulltext search in boolean mode. The most common use of boolean search syntax is + - and "phrase matches". Example: cache -clearing finds results with the word cache, but those that do not contain clearing.

WPKBA supports the full set of boolean operators listed here: https://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html

jaswrks commented 9 years ago

There is also an orderby arg that I forgot to include in my previous comment. The orderby arg is an array of columns to order by, and in what order to apply each orderby specification.

Example; Default Sort Order

<?php
$args     = array(
  'q'        => 'clearing cache',
  'per_page' => 2,
  'orderby'  => array(
  // This is the default ordered list of orderby colums.
  // If you want to alter the sort order, you can adjust this list.
    'relevance'     => 'DESC', // By search relevance.
    'popularity'    => 'DESC', // By article popularity/hearts.
    'visits'        => 'DESC', // By total unique visitors.
    'comment_count' => 'DESC', // By article comment count.
    'date'          => 'DESC', // By article date.
  ),
);
$endpoint = 'http://zencache.com/?';
$args     = array('wp_kb_articles' => array('query_api' => $args));
$response = file_get_contents($endpoint.http_build_query($args, '', '&'));
$response = json_decode($response);
print_r($response);
jaswrks commented 9 years ago

Fulltext Scoring

We apply the following scores to each of these columns in our fulltext index.

= Total relevance (i.e., what we sort by).

Note that MATCH indicates a default score that is returned by MySQL. We multiply this default score by constants (e.g., 1.5, 1.0, 0.5) in order to artificially manipulate the final score; giving us a more accurate set of results.

raamdev commented 9 years ago

Wooooohooooo! Great work! This really enhances WP KB Articles in a HUGE way. :-)

I'll make a TODO to write a KB Article to document this new functionality.