opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

Add Support for Multi Values in innerHit for Nested k-NN Fields in Lucene and FAISS #2283

Open heemin32 opened 1 day ago

heemin32 commented 1 day ago

Description

This PR introduces support for returning all nested fields with their scores inside innerHit for nested k-NN fields, applicable to both Lucene and FAISS engines.

The implementation involves executing a search request across all segments and collecting results at the shard level, similar to the approach used in disk-based k-NN searches. After reducing the results to the top k, we retrieve all sibling documents associated with these results. Using the IDs of the retrieved sibling documents as filtered document IDs, we perform another exact search to score them comprehensively.

Here are additional explanations for the changes made:

  1. Added JsonPath as a dependency exclusively for integration testing, using version 2.8.0. 2.9.0 has an dependency conflict issue with SLF4J.
  2. Adopted a composite approach in NestedKnnVectorInnerHitQuery.java to enable code reuse between byte vectors and float vectors.
  3. Replaced the use of BitSet with DocIdSetIterator for filteredDocId to eliminate the overhead of converting from an iterator to a BitSet and back to an iterator.

Related Issues

Resolves #2249

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.