vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.47k stars 584 forks source link

model.filter/filter is ignored with yql while recall works as expected #28769

Open jobergum opened 9 months ago

jobergum commented 9 months ago

Using recall in combination with YQL

Using recall

This is identical to filter, except that recall terms are not exposed to the ranking framework and thus not ranked. As such, one can not use unprefixed terms; they must either be positive or negative

{'yql': 'select id,title,text from sources * where {targetHits:10}nearestNeighbor(embedding, q)', 'query': 'how does the coronavirus respond to changes in the weather', 'ranking.profile': 'dense', 'presentation.format.tensors': 'short-value', 'hits': 3, 'language': 'en', 'timeout': '15s', 'presentation.timing': 'true', 'input.query(q)': 'embed(bge, "Represent this sentence for searching relevant passages: how does the coronavirus respond to changes in the weather")', 'recall': '+text:ARIMA +text:"novel coronavirus illness"', 'tracelevel': 3}

Gives the following correct query tree

{'message': 'sc0.num0 search to dispatch: query=[AND NEAREST_NEIGHBOR {field=embedding,queryTensorName=q,hnsw.exploreAdditionalHits=0,distanceThreshold=Infinity,approximate=true,targetHits=10} |text:arima |text:"novel coronaviru illness"] timeout=14977ms offset=0 hits=3 rankprofile[dense]

Which is the expected query tree. filter (not highlight), and ranking is disabled.

Using model.filter in combination with YQL

{'yql': 'select id,title,text from sources * where {targetHits:10}nearestNeighbor(embedding, q)', 'query': 'how does the coronavirus respond to changes in the weather', 'ranking.profile': 'dense', 'presentation.format.tensors': 'short-value', 'hits': 3, 'language': 'en', 'timeout': '15s', 'presentation.timing': 'true', 'input.query(q)': 'embed(bge, "Represent this sentence for searching relevant passages: how does the coronavirus respond to changes in the weather")', 'filter': '+text:ARIMA +text:"novel coronavirus illness"', 'tracelevel': 3}

It gives the following incorrect query tree (filter is silently dropped)

sc0.num0 search to dispatch: query=[NEAREST_NEIGHBOR {field=embedding,queryTensorName=q,hnsw.exploreAdditionalHits=0,distanceThreshold=Infinity,approximate=true,targetHits=10}] timeout=14960ms offset=0 hits=3 rankprofile[dense]
jobergum commented 7 months ago

Any updates on this @bjorncs ?

bjorncs commented 7 months ago

No.