vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.58k stars 586 forks source link

Calculate nearest neighbor search rank features on-demand #22016

Closed jobergum closed 2 years ago

jobergum commented 2 years ago

If you retrieve using hybrid OR of weakAnd and nearestNeighbor like this:

OR 
 (NEAREST_NEIGHBOR {field=doc_vector,queryTensorName=query_vector,approximate=true,targetHits=10})
 (WEAKAND(10) text:term1 text:term2)

The nearestNeighbor associated rank features such as rawScore(doc_vector) or closeness(field, doc_vector) is 0 for hits that the weakAnd query operator retrieves. Such a ranking expression like this

rank-profile hybrid {
  first-phase {
     expression: closeness(field, doc_vector) + bm25(text) 
  }

Will compute bm25(text) independent of the method that brought the hit into configurable ranking, but closeness(field, doc_vector) is only non-zero for the ten retrieved by the NN.

You can get around this by adding another tensor compute expression instead of the closeness/rawScore features.

geirst commented 2 years ago

This is fixed in 8.15.51 (#23437). The rank features closeness(field,my_doc_vector), closeness(label,my_query_term), distance(field,my_doc_vector) and distance(label,my_query_term) now calculate the raw score (closeness / distance) on demand for all documents exposed to ranking that did not have this calculated as part of matching.

System test added in https://github.com/vespa-engine/system-test/pull/2480.

Preparations in: #23277, #23298, #23344, #23362, #23383, #23397.