opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
154 stars 113 forks source link

[k-NN] Avoid additional copy to stream during binary doc values deserialization #1736

Closed jmazanec15 closed 2 months ago

jmazanec15 commented 3 months ago

Description

Related #1709

During deserialization, we have an intermediate step where we copy a bytesref to a bytesstream. This can be removed as it adds some overhead and does add any benefit. With #1087, this will be no longer necessary. But until then, it could add some benefit.

From the experiments in #1709 , the diff with and without the optimization is:

Results - w/o change

Run # p50 latency (ms) p90 latency (ms) p99 latency (ms) Recall
1 674 684 692 0.99998
2 674 684 692 0.99998

Results - w/ change

Run # p50 latency (ms) p90 latency (ms) p99 latency (ms) Recall
1 568 584 594 0.99998
2 568 584 596 0.99998

PoC commit: https://github.com/jmazanec15/k-NN-1/commit/7984618de9369228af388148d023475b0c58c061.

navneet1v commented 3 months ago

This intermediate step of copy byteref to bytestream is also present when we read vectors during merge. Hence marking this issue as indexing-improvements too.

navneet1v commented 2 months ago

@jmazanec15 can we close this GH issue as the feature is added in k-NN plugin,