opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

Quantization Framework Implementation with 1bit and MultiBit Binary Quantizer #1929

Closed Vikasht34 closed 3 months ago

Vikasht34 commented 3 months ago

Description

This pull request introduces a new quantization framework, including the MultiBitScalarQuantizer and OneBitScalarQuantizer, to provide efficient quantization methods for vectors in OpenSearch.

Key features:

OneBitScalarQuantizer

MultiBitScalarQuantizer

Related Issues

Resolves #1889

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ryanbogan commented 3 months ago

Looks like DCO and spotless are failing.

Running ./gradlew spotlessApply and pushing the commit will fix the spotless part.

For DCO, every commit needs to be signed off with your name and email, which can be done using -s when committing if you have the git configure set up. This specific commit can be fixed by following the instructions found here: https://github.com/opensearch-project/k-NN/pull/1929/checks?check_run_id=28295803029

Vikasht34 commented 3 months ago

Took the first pass. Need to still look at a few more details and unit testing

A few overall things

  • Statics: QuantizationFactory is static. unit testing with static gets much harder as it affects multiple components and gets harder to mock. Is there a way we can avoid it and initialize things like registrar in KNNPlugin.createComponents
  • Enums: I see values which are not yet supported by framework. Enums are a good way to validate whats supported. Consider only having the values which are currently supported by framework and add it as and when the support is added

For Point 2 , My vision was to give overall picture , but I don't mind removing it , the ones which is nit getting used. For Point 1 :- I would be prefer Keeping registration close to Framework , Let Client not worry on registry. I would not prefer registration via Create Components as I wanted to have lazy and and not all the clients would be Quantization.

shatejas commented 3 months ago

Point 1 :- I would be prefer Keeping registration close to Framework , Let Client not worry on registry. I would not prefer registration via Create Components as I wanted to have lazy and and not all the clients would be Quantization.

Fair, my main concern is with static factory is the unit testing, I would still want to see how this is plugged in. If this is not plugged in as a member variable I see unit tests turning more into a component tests since this piece will be used in the active path of both indexing and search.