opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
155 stars 115 forks source link

Interrupt condition for FAISS efficient filtering logic #1898

Open neo-anderson opened 3 months ago

neo-anderson commented 3 months ago

I've been reviewing the efficient filtering logic of FAISS HNSW as described in the OpenSearch documentation: https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/

Observations and Questions

  1. Redundant Condition Check The logic includes a condition R < K and P >= K. However, if I understand correctly, P should always be greater than K when the flow reaches this point. This makes the P >= K check redundant.

  2. Potential Performance Impact If the R < K check is performed after conducting an ANN search on the entire graph, it could lead to frequent "full" graph searches. This is because, as per the flow chart, the condition is evaluated only after the search is complete.

Proposed Improvement

This could potentially reduce unnecessary computations and improve overall efficiency.

neo-anderson commented 3 months ago

Not sure if this qualifies as a bug. GitHub is not letting me remove the label. Sorry

navneet1v commented 3 months ago

The logic includes a condition R < K and P >= K. However, if I understand correctly, P should always be greater than K when the flow reaches this point. This makes the P >= K check redundant.

Yes the check is redundant. But its a integer comparison, so shouldn't harm the latency. But I do agree it can be removed. @neo-anderson do you want to fix this condition.

Not sure if this qualifies as a bug. GitHub is not letting me remove the label. Sorry

Yes its not a bug. But I have removed the bug label.

Potential Performance Impact If the R < K check is performed after conducting an ANN search on the entire graph, it could lead to frequent "full" graph searches. This is because, as per the flow chart, the condition is evaluated only after the search is complete.

Yes I agree this is a performance improvement. I did consider this during the implementation but Faiss doesn't have this capability. So we need to build it in FAISS. @neo-anderson do you have any thoughts on how it can be added in FAISS.