opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
152 stars 113 forks source link

Integrates KNN plugin with ConcurrentSearchRequestDecider interface #2111

Closed shatejas closed 1 week ago

shatejas commented 1 week ago

This allows knn queries to enable concurrency when index.search.concurrent_segment_search.mode or search.concurrent_segment_search.mode in auto mode. Without this the default behavior of auto mode is non-concurrent search

Description

More details in https://github.com/opensearch-project/OpenSearch/issues/15259

Testing

Functional

Performance

Baseline (no settings update)

50th percentile latency,prod-queries,58.86101468404134,ms 90th percentile latency,prod-queries,85.32014465332031,ms 99th percentile latency,prod-queries,95.10733413696289,ms

Concurrent_segment_search.mode: auto

50th percentile latency,prod-queries,45.04893729613377,ms 90th percentile latency,prod-queries,48.14952836545217,ms 99th percentile latency,prod-queries,50.41835594177246,ms

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v commented 1 week ago

@shatejas for perf runs completeness please add below details

  1. How many shards were used?
  2. What was the dataset used?
  3. What was the machine used?
shatejas commented 1 week ago
  • How many shards were used?

1 shard

  • What was the dataset used?

cohere 1 million

  • What was the machine used?

This was run on docker with this configuration JVM=36g CPU_COUNT=8 MEM_SIZE=48g

navneet1v commented 1 week ago

JVM=36g

Curious on why JVM was 36g?

shatejas commented 1 week ago

JVM=36g

Curious on why JVM was 36g?

Didn't think memory made a difference as there is no change in amount of threads (as of 2.17) allocated by OS when concurrent segment search code path is used, so gave sufficient memory

opensearch-trigger-bot[bot] commented 1 week ago

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2111-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0421cdc907b43e4a930bd5a51454e5efea8413b6
# Push it to GitHub
git push --set-upstream origin backport/backport-2111-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2111-to-2.x.

opensearch-trigger-bot[bot] commented 1 week ago

The backport to 2.17 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.17 2.17
# Navigate to the new working tree
cd .worktrees/backport-2.17
# Create a new branch
git switch --create backport/backport-2111-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0421cdc907b43e4a930bd5a51454e5efea8413b6
# Push it to GitHub
git push --set-upstream origin backport/backport-2111-to-2.17
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.17

Then, create a pull request where the base branch is 2.17 and the compare/head branch is backport/backport-2111-to-2.17.