Open navneet1v opened 1 month ago
Maybe related to this. We ran some latency benchmarks on the different compression levels for mode= on_disk
on 2.18:
Upon inspecting the segments, we saw multiple ones with less than 15k documents. Setting index.knn.advanced.approximate_threshold=0
for >=8x compression we recover the expected latencies:
@frejonb thanks for providing the details. @VijayanB is looking into the issue.
Oppurtunity
Currently in plugin, whenever we are doing disk based vector search we first do a ANN search on quantized index and then do the rescoring using full precision vectors. ref: https://github.com/opensearch-project/k-NN/blob/7cf45c8cfd4219912d67becb731f8a7af61410c0/src/main/java/org/opensearch/knn/index/query/nativelib/NativeEngineKnnVectorQuery.java#L64-L75
But this flow of first doing ANN search is not always true when efficient filters are present. For efficient filters we first see if exact search can be done or not. If exact search can be done then we do exact search rather than ANN search. ref: https://github.com/opensearch-project/k-NN/blob/7cf45c8cfd4219912d67becb731f8a7af61410c0/src/main/java/org/opensearch/knn/index/query/KNNWeight.java#L132-L156
For disk based index, this exact search during filters first fetches the full precision vectors from disk, do binary quantization on them. ref: https://github.com/opensearch-project/k-NN/blob/7cf45c8cfd4219912d67becb731f8a7af61410c0/src/main/java/org/opensearch/knn/index/query/iterators/VectorIdsKNNIterator.java#L91-L101 The same vectors is also read when we do rescoring. Hence I think we can actually remove this double reading of vectors and during the first pass itself we should calculate the correct scores since we already have vectors with us.
We would have to some changes in reduceToTopK function as with the above code scores of documents from different segments are not using same space_type. But I think that is still an easy problem to solve where we can just avoid the docs from the segments where hamming distance is not used.
We would need to do some benchmarks to see the performance differences. But I think in case of constraint envs, this can really help in reducing disk reads, and not necessary the distance computations.