vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.78k stars 604 forks source link

Increasing weight is not affecting the order of docs returned #25286

Closed 107dipan closed 1 year ago

107dipan commented 1 year ago

Describe the bug We are trying to perform a search by adding a weight to one of the search clause

select * from sources * where (boost_s contains ({weight:500} "BoostTerm") OR table contains "newTableName" )

To Reproduce Steps to reproduce the behavior:

  1. Perform search query by adding weight to a term.
  2. We are using the default ranking i.e. nativeRanking

YQL Query from traceLevel logs

select * from sources * where (boost_s contains ({stem: false, normalizeCase: false, id: 1, weight: 500}\"boostterm\") OR table contains ({origin: {original: \"gsam_clientdoc\", offset: 0, length: 14}, id: 2}phrase(\"gsam\", \"clientdoc\"))) timeout 499"

Expected behavior Upon adding weight criteria we would expect the document matching the clause with weight to be ranked higher but that is not the case.

**Environment (please complete the following information):

Infrastructure: Kubernetes Versions : Vepsa 8.80.20

jobergum commented 1 year ago

It depends on the rank expression used in the rank profile, and order does not necessarily change, but the score will change.

107dipan commented 1 year ago

We have not added any rank expression or rank profile to our cluster currently.

jobergum commented 1 year ago

Then you will be using nativeRank, and as I said, a weight boost does not necessarily change the order.

107dipan commented 1 year ago

If I understand correctly, even if the order does not change there should be a change in the relevance of the doc. I tried to increase/decrease the weight but even with that the relevance of the doc for the particular search query does not change.

jobergum commented 1 year ago

This is standard functionality here, is an example of the functionality.

select id,title,abstract from sources * where title contains ({weight:100} "covid-19") OR title contains "sars-cov-2" 

I'm limiting the retrieval to a specific id using the recall parameter and where id is a field in the schema which uniquely identifies the hit.