Description of changes:
This PR focuses on refactoring current faiss-support branch's interface to support several additional features including:
IVF index type - a cell probe based method that allows a user to reduce search space using a k-Means clustering algorithm. It takes "ncentroids" and "nprobes" as parameters
Product quantization - a method to encode vectors to reduce size. It takes "code_size" as a parameter
Composite indices - the ability to combine different faiss features into a single index
A lot of code was changed in order to support these additional features:
Because we use faiss's index factory, only a certain portion of the parameters are configured through the index factory string description. To support additional parameters (for example, ef_construction for HNSW), this PR adds functionality to pass an extra parameter map to the jni to be parsed.
Because IVF and PQ require training, in the JNI save index function, this PR implements a training approach where a subset of the data to be indexed is used for training. This is inherently inefficient because it requires each segment to be trained before it can add data to it. In the future, we will introduce a train api that trains before indexing, to work around this.
Several other minor changes to make refactor cleaner/easier
Testing
For testing, this PR focuses on addings tests that exercise the interface as opposed to adding end to end tests testing each jni libraries functionality. This is because that functionality will change in the future. Right now, it is just a place holder to get the interface functionality working. That being said, the following test refactoring was done:
Added additional unit tests to test faiss interface
Refactored old tests so that gradle build passes
Future Development
Introduce training api
Add additional end to end tests
Investigate storing data exclusively with faiss (as opposed to storing vectors in doc values in Lucene)
Notes
We are in the process of migrating from ODFE to OpenSearch. Included in this will be porting over the faiss-support branch to OpenSearch. Because porting requires significant refactoring, we will merge this PR and then port the faiss-support branch to OpenSearch.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
225
Description of changes: This PR focuses on refactoring current faiss-support branch's interface to support several additional features including:
The interface looks like:
The main logic where the interface has been refactored can be found in:
A lot of code was changed in order to support these additional features:
Testing For testing, this PR focuses on addings tests that exercise the interface as opposed to adding end to end tests testing each jni libraries functionality. This is because that functionality will change in the future. Right now, it is just a place holder to get the interface functionality working. That being said, the following test refactoring was done:
Future Development
Notes We are in the process of migrating from ODFE to OpenSearch. Included in this will be porting over the faiss-support branch to OpenSearch. Because porting requires significant refactoring, we will merge this PR and then port the faiss-support branch to OpenSearch.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.