opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

[FEATURE] Integrating live document information during search in the faiss engine #2275

Closed heemin32 closed 5 days ago

heemin32 commented 6 days ago

In the Faiss engine, when no filtering clause is applied, the search initially considers all available documents in a segment, and deleted documents are filtered out afterward.

This approach affects both recall and latency when there are many deleted documents. For ANN search, recall decreases because the results may include deleted documents among the top k. For exact search, latency increases due to the additional computation of distances for deleted documents.

This applies only with Faiss engine. With Lucene engine the live document information is already being used during search.

navneet1v commented 5 days ago

@heemin32 this same feature is discussed in this github issue: https://github.com/opensearch-project/k-NN/issues/1491 . Lets track this feature at 1 place only.