pat / thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails
http://freelancing-gods.com/thinking-sphinx
MIT License
1.63k stars 469 forks source link

Question about user_weight #1244

Open lesliev opened 1 year ago

lesliev commented 1 year ago

In our project we are not setting user_weight manually anywhere that I can see. But if I run this I can see a weight of mostly 1's and 2's: select *, weight() from article_core where match('beatl') OPTION ranker=expr('sum(user_weight)');

Then if I run this I get 2's and 3's: select *, weight() from person_core where match('beatl') OPTION ranker=expr('sum(user_weight)')

I think this explains a mysterious bias where person records are being ranked quite a lot higher than article records.

Does ThinkingSphinx set user_weight for each document somehow? I've searched the whole project for user_weight and I don't see it being set anywhere.

I assume the default ranker is being used to calculate the final weight: SPH_RANK_PROXIMITY_BM25 = sum(lcs*user_weight)*1000+bm25

lesliev commented 1 year ago

I've read some more and Sphinx docs refer to user_weight as the "user field weight". Perhaps it's referring to the field weights that are configured using set_property here: https://freelancing-gods.com/thinking-sphinx/v5/searching.html#fieldweights

But I don't see this being set in any of my index files, the only property I'm setting is set_property delta: true. So why are the user_weight values different for each document then?

mysql> select weight() from article_core where match('beatl') OPTION ranker=expr('sum(user_weight)');
+----------+
| weight() |
+----------+
|        2 |
|        2 |
|        2 |
|        2 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
|        1 |
+----------+
20 rows in set (0.00 sec
mysql> select weight() from person_core where match('beatl') OPTION ranker=expr('sum(user_weight)');
+----------+
| weight() |
+----------+
|        3 |
|        3 |
|        3 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
|        2 |
+----------+
20 rows in set (0.01 sec)

From: http://sphinxsearch.com/docs/current.html#sphinxql-select

user_weight (integer), the user specified per-field weight (refer to SetFieldWeights() in SphinxAPI and OPTION field_weights in SphinxQL respectively). The weights default to 1 if not specified explicitly.