Closed alexworkorb closed 1 year ago
@alexworkorb can you please add these details.
Once i have this info I can provide more details
hi @navneet1v, please see below
@alexworkorb can you please add these details.
Total number of documents in the index. 884,228 documents in the index:
Total number of shards and segments. 34 segments according to the output of _cat/segments. There is only 1 shard
Hi @alexworkorb Thanks for providing the details. I am able to better understand what is happening here. So, the case you are going into is, when the filters results in very less number of documents as compared to documents present in the segment. In that case what happens is during graph traversal search is not able to reach all the documents. I am currently working on the solution to fix this issue, and we are targeting 2.10 release for that. I will add the RFC soon here.
To get around this issue here are some suggestions:
Please check this flow chart to understand more on how filters work in Faiss: https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#faiss-k-nn-filter-implementation
Thank you @navneet1v , for now we have set K to a large number as the work around.
@alexworkorb is there a reason you cannot use Lucene? like is your dimensions > 1024?
@alexworkorb is there a reason you cannot use Lucene? like is your dimensions > 1024?
yes @navneet1v , our vector is of dimension 1536 (openAI's ada-002) thanks.
@alexworkorb thanks for providing the info.
I have removed the bug tag and added Features tag to this issue. Will add the proposal to solve this case for Faiss Filtered Search
@alexworkorb Added the proposal here: https://github.com/opensearch-project/k-NN/issues/1049
Feature is merged and released with 2.10 . Resolving the issue
What is the bug? Efficient K-NN filtering with faiss engine returns less than K number of closest neighbors.
We are using the faiss engine with hnsw to perform approximate K-NN search with filter, OpenSearch 2.9 docker build is used. We put the filtering clause inside the K-NN query as instructed here ( https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/ ). For the particular filter we tested on, there should be 1157 documents matching the filter. When we query the K-NN filter with K=20, and size=20, OpenSearch only returns 4 K-NN results.
How can one reproduce the bug? Please see our index mapping info here: https://gist.githubusercontent.com/alexworkorb/5faaa08806012c85abfcdf1cf0b51ad8/raw/420b6f4543619690390eb7e4a71979831d8ee1ca/opensearch%25202.9%2520info
What is the expected behavior? Efficient K-NN filtering with faiss engine should return K closest neighbors if there are K documents satisfying the filter.
What is your host/environment?
Do you have any screenshots? Query showing there are 1157 documents matching the filter:
Query showing K-NN query only returns 4 neighbours (with the same filter):
But when we set K to 10,000 , we can get 1152 nearest neighbours (but is still less than 1157)
Do you have any additional context? We are primarily following OpenSearch 2.9's documentation on faiss efficient pre-filtering: https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#using-a-faiss-efficient-filter