opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
152 stars 113 forks source link

k-NN query rescore support for native engines #1984

Closed jmazanec15 closed 1 month ago

jmazanec15 commented 1 month ago

Description

Implements re-scoring. It uses the rescoring context in the query builder to execute a two-phased search. First, it oversamples the ANN index and then reduces the results down to the oversample factor, and then rescores and returns.

Shared code for exact search was pulled out into a class called exact searcher. Rescore search functionality is included in a class called RescoreSearcher. A few sanity integ tests have been added along with existing ones, but intend to add more once quantization framework is integrated.

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jmazanec15 commented 1 month ago

Overall I think there is a potential to merge firstPass to reduce repeated search logic Here is the psuedo code, see if it works out

@shatejas I didnt want to have to modify the k parameter in the k-NN query, so i kept them as separate.

shatejas commented 1 month ago

Overall I think there is a potential to merge firstPass to reduce repeated search logic Here is the psuedo code, see if it works out

@shatejas I didnt want to have to modify the k parameter in the k-NN query, so i kept them as separate.

Can't you create a local k parameter either inside knnweight or nativequery? that way you don't manipulate the one in query

jmazanec15 commented 1 month ago

The searchLeaf method seems to have different implementations for different scenarios (like with or without rescoring) and it's all over places like one in KNN Weight and other on Exacat Search , which internally calls exact search class search leaf only , we should consider unifiying this expreince , either by using startegy or use of function interface and have one class which has searchleaf method where we can look into.

@Vikasht34 I think the main goal for using KNNWeight calls in NativeKNNVectorQuery was to avoid too much change around that. I think we can change this up as well, but will do in future review, just so that I can move on to working on mode and compression. Does that sound good?