opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
152 stars 113 forks source link

Re-Call Issue Fix with Binary Quantized Vectors #2071

Closed Vikasht34 closed 2 weeks ago

Vikasht34 commented 2 weeks ago

Description

This PR addresses a critical issue that was identified during benchmarking, where the recall performance unexpectedly dropped below 1. The root cause of the issue was traced to two main problems in the quantization and vector handling process:

  1. Bit Packing of Quantized Vector: The quantized vector was not correctly updated during the bit-packing process. Instead, the same vector values were being reused, resulting in incorrect quantization of subsequent vectors.
  2. Vector Transfer to JNI: When transferring vectors to the JNI layer, vectors were passed by reference, causing all the vectors in the VectorTransfer list to reference the same object. This led to unintended behavior where all vectors became identical, severely affecting the recall accuracy.

    Testing

    Performed Benchmarking with NQ Data Set Results are here s3://disk-based-ann-bq/NQ-1M-768/ Re-Call for One Bit Quantization with 2x Oversampling us 0.94

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.