opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
152 stars 113 forks source link

Index Initialization Alloc Method #1933

Closed MrFlap closed 1 month ago

MrFlap commented 1 month ago

Previously, the iterative index insertion feature only allocated memory for HNSW indices. New functionality for other indices needs to be implemented.

Description

This branch adds a method to allocate an index and adds logic for HNSWSQ indices. The following benchmark was run on a 4gb docker container for COHERE with 1m vectors on HNSWSQ. graph_default_osb_sqtest-15 Metric Task Value Unit
Min Throughput custom-vector-bulk 754.72 docs/s
Mean Throughput custom-vector-bulk 1420.79 docs/s
Median Throughput custom-vector-bulk 1194.48 docs/s
Max Throughput custom-vector-bulk 3388.12 docs/s
50th percentile latency custom-vector-bulk 221.291 ms
90th percentile latency custom-vector-bulk 470.813 ms
99th percentile latency custom-vector-bulk 23489.8 ms
99.9th percentile latency custom-vector-bulk 69936.4 ms
99.99th percentile latency custom-vector-bulk 83420.9 ms
100th percentile latency custom-vector-bulk 112154 ms
50th percentile service time custom-vector-bulk 221.291 ms
90th percentile service time custom-vector-bulk 470.813 ms
99th percentile service time custom-vector-bulk 23489.8 ms
99.9th percentile service time custom-vector-bulk 69936.4 ms
99.99th percentile service time custom-vector-bulk 83420.9 ms
100th percentile service time custom-vector-bulk 112154 ms
error rate custom-vector-bulk 0 %
Min Throughput force-merge-segments 0 ops/s
Mean Throughput force-merge-segments 0 ops/s
Median Throughput force-merge-segments 0 ops/s
Max Throughput force-merge-segments 0 ops/s
100th percentile latency force-merge-segments 4.24338e+06 ms
100th percentile service time force-merge-segments 4.24338e+06 ms
error rate force-merge-segments 0 %
Min Throughput warmup-indices 0.26 ops/s
Mean Throughput warmup-indices 0.26 ops/s
Median Throughput warmup-indices 0.26 ops/s
Max Throughput warmup-indices 0.26 ops/s
100th percentile latency warmup-indices 3912.27 ms
100th percentile service time warmup-indices 3912.27 ms
error rate warmup-indices 0 %
Min Throughput prod-queries 15.96 ops/s
Mean Throughput prod-queries 131.02 ops/s
Median Throughput prod-queries 138.18 ops/s
Max Throughput prod-queries 144.59 ops/s
50th percentile latency prod-queries 4.74785 ms
90th percentile latency prod-queries 5.54795 ms
99th percentile latency prod-queries 6.62209 ms
99.9th percentile latency prod-queries 10.8428 ms
99.99th percentile latency prod-queries 20.4215 ms
100th percentile latency prod-queries 417.638 ms
50th percentile service time prod-queries 4.74785 ms
90th percentile service time prod-queries 5.54795 ms
99th percentile service time prod-queries 6.62209 ms
99.9th percentile service time prod-queries 10.8428 ms
99.99th percentile service time prod-queries 20.4215 ms
100th percentile service time prod-queries 417.638 ms
error rate prod-queries 0 %
Mean recall@k prod-queries 0.91
Mean recall@1 prod-queries 0.98

Related Issues

https://github.com/opensearch-project/k-NN/issues/1600

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v commented 1 month ago

add the entry in the changelog.

navneet1v commented 1 month ago

Overall code looks good to me. I will approve the PR once all CIs are successful,

MrFlap commented 1 month ago

This backwards compatibility test keeps failing. Don't know what it is

navneet1v commented 1 month ago

This backwards compatibility test keeps failing. Don't know what it is

This is because 2.16 is moved from SNAPSHOT to release. There is a fix needed in main branch. Its a known thing during releases. Hence we can ignore that check for now.

PR: https://github.com/opensearch-project/k-NN/pull/1940

Lets ensure all the build tasks are successful

navneet1v commented 1 month ago

@MrFlap with the updated code, please re-run the benchmarks and paste the results. Once we validated the benchmarks with new code I will merge the changes.

navneet1v commented 1 month ago

Merged the code as the benchmarks are updated with new code.