opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

[Enhancement] Remove multiple vectors references during flush #2207

Open navneet1v opened 1 month ago

navneet1v commented 1 month ago

Description

With upgrade to Lucene 9.12, Lucene started exposing the FlatVectorsFormat as KnnVectorsFormat. With this FlatFieldVectorsWriter now exposes DocsWithFieldSet and vectors which are added in the FlatFieldVectorsWriter during flush.

Now NativeEngineFieldVectorsWriter and FlatFieldVectorsWriter stores the same reference of vectors and docIds, which is not required. We can completely get rid of reference of vectors and docIds from NativeEngineFieldVectorsWriter and just use FlatFieldVectorsWriter during flush. This will simplify the code and will also free up some resources from heap.