opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
152 stars 113 forks source link

Bounds the transferLimit in OffheapVectorTransfer #2070

Closed shatejas closed 2 weeks ago

shatejas commented 2 weeks ago

The array list buffer was unnecessarily allocated a large memory irrespective of the number of vectors to transfer. This change considers total vectors

Description

Attached below is the heap usage for the duration of BinaryIndexIT.testFaissHnswBinary_when1000Data_thenCreateIngestQueryWorks test

transferLimitNotCapped.log: This is the current code in main. The max heap usage is 1.8 gb transferLimitCapped.log: This is code with the current PR. The max heap usage is 1.2 gb NoPreallocation.log: This is without allocating the transferLimit as a size for arraylist. The max heap usage is comparable to the code in PR which 1.1 gb

Capping the transfer limit is chosen as a solution over not preallocating to avoid fragmentation due to constant resizing of arraylist

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.