opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

Encapsulate dimension, vector data type validation/processing inside Library #1957

Closed jmazanec15 closed 3 months ago

jmazanec15 commented 3 months ago

Description

As part of #1779 , we need to have the ability to take a user config for a float based space type (i.e. l2) and configure the quantization framework to go from float to bit and build a faiss binary index. In the current implementation, the data type validation is outside of the KNNLibrary/engine abstractions. This will make it difficult do the complex configuration around the quantization framework index building and search. The main theme of this PR is to move towards handling Library specific configuration to the library.

This PR does the following:

  1. Adds KNNMethodConfigContext that is passed to KNNLibrary for validation as well as processing. The KNNMethodConfigContext object has information outside of what is passed by the user for configuring the method. This will allow the KNNLibrary specific components to do more complex validation/processing logic such as configuring a binary index, or figuring out how to validate/process incoming vectors. This wasnt added as a part of KNNMethodContext because it would bloat the class further. This is somewhat of an extension to the VectorSpaceInfo abstraction
  2. For per-dimension processing/validation, faiss fp16, we moved the different validators/processors to the Library so we dont have to extract encoder information from the mapping.
  3. Dimension check's and version created were added to KNNMethodConfigContext

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jmazanec15 commented 3 months ago

@navneet1v ah I forgot about that - good catch. Let me fix

navneet1v commented 3 months ago

@jmazanec15 seems like a build failure.. can you check once whats happening

jmazanec15 commented 3 months ago

@navneet1v I dont see anything interesting in the logs and am not able to reproduce it locally. Im guessing its unrelated/flaky. Captured the logs: https://gist.github.com/jmazanec15/ec985b4aeeae3325d5159974cfb375ab. Ill retry

opensearch-trigger-bot[bot] commented 3 months ago

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1957-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f42e86eaaa8d4c7b6e0da30326a1215312560b0a
# Push it to GitHub
git push --set-upstream origin backport/backport-1957-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1957-to-2.x.