opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

[Bug-Fix] Fix efficient filtering of vector search when quantization is used #2076

Closed navneet1v closed 2 months ago

navneet1v commented 2 months ago

Description

[Bug-Fix] Fix efficient filtering of vector search when quantization is used

Issue

When efficient filtering happens then at per segment level k-NN plugin takes a decision to either do filtered ANN search or Exact search. But when quantization is present then the first pass search happens on the quantized vectors. Currently only ANN search happens with quantized vectors and not exact search. This can lead of different scores for a shard for first pass search if its segments takes different route to do the filtered search.

What this PR do?

This PR adds the capability to do quantized exact search when exact search happens during filtered vector search. The way it is achieved is, by passing the segment level quantization information to KNNIterators which iterates over the vectors. It also add a capability now where we can specify if we want to search on quantized vectors or non quantized vectors.

Testing

Performed a Manual testing for filtered vector search. Will be adding more ITs and UTs for the change. Raising the PR for doing first pass on the code.

Issue

https://github.com/opensearch-project/k-NN/issues/1949

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v commented 2 months ago

Any latency considerations considering each vector from flat values will be quantized for efficient filters. It might be worth calling out if some impact is expected and can be considered for optimizations of efficient filters

As the quantized vectors are not stored in the segment, and we have to do on the fly quantization there will be latency impact. But thing is we don't have a baseline for this. Since QF is added with on-disk support so this is the first time we will be having this flow. For all the older cases there will no latency impact ideally at-least from this change.