opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

Split segment by search type #2273

Open VijayanB opened 1 week ago

VijayanB commented 1 week ago

Description

For exact search, it is not required to perform qunatization during rescore with oversamples. However, to avoid normalization between segments from approx search and exact search, we will first identify segments that needs approxsearch and will perform oversamples and, at end, after rescore, we will add scores from segments that will perform exact search.

Related Issues

2215

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jmazanec15 commented 3 days ago

Should we skip quantization on indexing if we are not using it here then?

navneet1v commented 3 days ago

Should we skip quantization on indexing if we are not using it here then?

+1. I think it is valid point. We should do that too. But @VijayanB there were couple of more ideas that we were discussing on how to fix this issue. Did you put some thoughts on that like why we should split the segments by search type?

heemin32 commented 12 hours ago

Should we skip quantization on indexing if we are not using it here then?

+1. I think it is valid point. We should do that too. But @VijayanB there were couple of more ideas that we were discussing on how to fix this issue. Did you put some thoughts on that like why we should split the segments by search type?

Another option could be to read the quantized values directly from the native engine file, as discussed here: https://github.com/opensearch-project/k-NN/issues/2266. This approach would also address cases where the search results are fewer than k, and we fall back to an exact search.