opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

Add CompressionLevel Calculation for PQ #2200

Closed jmazanec15 closed 1 month ago

jmazanec15 commented 1 month ago

Description

Currently, for product quantization, we set the calculated compression level to NOT_CONFIGURED. The main issue with this is that if a user sets up a disk-based index with PQ, no re-scoring will happen by default.

This change adds the calculation so that the proper re-scoring will happen. The formula is fairly straightforward => actual compression = (d 32) / (m code_size). Then, we round to the neareste compression level (because we only support discrete compression levels).

One small issue with this is that if PQ is configured to have compression > 32x, the value will be 32x. Functionally, the only issue will be that we may not be as aggressive on oversampling for on disk mode.

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v commented 1 month ago

One small issue with this is that if PQ is configured to have compression > 32x, the value will be 32x. Functionally, the only issue will be that we may not be as aggressive on oversampling for on disk mode.

should we allow more compression level?