opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
146 stars 112 forks source link

[FEATURE]: Supporting Batch Query/array of vectors in K-NN Query #796

Open navneet1v opened 1 year ago

navneet1v commented 1 year ago

Is your feature request related to a problem? Current, in k-nn while doing k-nn search we can send a single vector for doing the query. But there are usecases where customer want to send multiple vectors in 1 single query to perform the k-NN Search.

What solution would you like? The solution is to enable the array as an input to k-NN query and then perform search with 1 single query in optimized way for all the input vectors.

We want to explore how for different k-NN engine can support this batch query. Having a thread-pool per request in k-NN plugin is also a possible solution.

If native engines can support the batch query then we should lean towards that. For fassis we can support batch using GPU: https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU#performance

What alternatives have you considered? Customers can right now use msearch or should clause to do this, but should clause suffers latency as it performs sub-query sequential. Msearch does the parallelization but its at the very start of OpenSearch search request and has limits in term of search queries that can be done. It also suffers performance losses.

Open Question

  1. Once we get the different documents for each vector present in the array provided in the query how we want to combine the results and scores? Should we give max score, or customer should have a capability to customize this scoring and combination.

Use cases:

  1. Community request: https://github.com/opensearch-project/k-NN/issues/794
vamshin commented 1 year ago

Please +1 if you are looking for this feature to help prioritize

ankitas3 commented 1 year ago

@navneet1v Can we also consider having a boost option for k-NN queries in this case. We have a use case where we would like to boost few k-NN queries being sent in the array. Also this feature is a little crucial for us, when can we expect this to be released ?

aishwaryabajaj-54 commented 1 year ago

hi, @navneet1v can you please help us prioritize this feature?

navneet1v commented 1 year ago

Hi @aishwaryabajaj-54 adding @vamshin for doing the prioritization of this issue. He should be able to better help you here.