pat / riddle

Ruby Client API for Sphinx
MIT License
136 stars 67 forks source link

Riddle gives broken results, while SphinxQL gives correct results for same query. #50

Closed barttenbrinke closed 12 years ago

barttenbrinke commented 12 years ago

My entire endeavor can be read here: http://sphinxsearch.com/forum/view.html?id=9422, but the gist is this:

If I do this:

client = Riddle::Client.new
client.match_mode = :extended2
client.rank_mode = :bm25
client.group_by = 'group_by_id'
client.group_function = :attr
client.index_weights = {"content_stemmer" => 7, "content_metaphone" => 10,"content_soundex" => 3}
client.field_weights = {'title' => 10, 'artist' => 10, 'description' => 5, 'search_words'=> 5}
client.sort_mode = :extended
client.sort_by = '@weight DESC'
results = client.query("test")

I does nothing with the query. Instead I get the top 20 of all my documents sorted by document id, however if I do the same in SphinxQL:

SELECT content_id,class_id,group_by_id
FROM content_stemmer,content_metaphone,content_soundex
WHERE MATCH('test')
GROUP BY group_by_id 
ORDER BY @weight DESC
OPTION field_weights=(title=10, artist=10, description=5, search_words=5),
    index_weights=(content_stemmer=7,content_soundex=3,content_metaphone=10), ranker=bm25;

I get correct and nicely weighted results. Either I am using Riddle incorrectly or I hit a bug.

pat commented 12 years ago

Hi Bart

Just to confirm - as there's been a bit of back and forth on the Sphinx forum discussion - is the behaviour still not as you expect? I'm fairly sure Riddle is constructing the query correctly for Sphinx (my test suite actually compares itself against the PHP API that Sphinx is distributed with). I'm certainly not spotting anything wrong with how you're using it, though...

barttenbrinke commented 12 years ago

It is still wrong: From SphinxQL I get documents who's order changes if I change the index weights and or field weights (this is what I expect).

If I run the same query through Riddle, I get a result list that is "sort of" relevant (the word I am searching for is in there or a word sounding like it is in there), but all index & field weights are ignored and the results are just sorted by document id DESC. Changing index or fields weights does not change this.

I could email you an example database if that helps.

barttenbrinke commented 12 years ago

Replaced Riddle with Mysql2Gem and now it all works as expected. This is too bad, I really liked the Riddle DSL and now I'm just creatively contacting strings.

pat commented 12 years ago

Ah, that's certainly not ideal. If you can send me an example database to query against, that'd be fantastic - pat at freelancing-gods dot com.

pat commented 12 years ago

Hi Bart - just figured it out - the key was in the documentation of Sphinx!

The issue is that, using the Sphinx API protocol (as opposed to SphinxQL), when you're grouping results, the standard sorting applies to each set of grouped documents - i.e., the sorting within each group (to determine which document from that group is returned). However, if you want to impact the overall sorting, you need to use the groupsort value - which, in Riddle, is group_clause. So, adding this one line made the Riddle query match the SphinxQL query:

client.group_clause   = '@weight DESC'

Thanks for your patience, and for the data set to test against, it helped enormously.

barttenbrinke commented 12 years ago

Wow, thats really impressive! Wouldn't it be a smart move to make this riddles default, because I cant think of any real life situation where you would not want to sort by @weight DESC as the default.

pat commented 12 years ago

Agreed, just pushed a commit that does that.

barttenbrinke commented 12 years ago

\0/

pat commented 12 years ago

Closing this issue, but if anything else crops up, re-open or log another :)

barttenbrinke commented 12 years ago

I wrote a blogpost about it here: http://www.retrosync.com/blog/2012/05/15/Indexing-Redis-with-Sphinx-and-Ruby/