opensearch-project / opensearch-benchmark

OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch
https://opensearch.org/docs/latest/benchmark/
Apache License 2.0
108 stars 75 forks source link

Update Vectorsearch Core Operations: Add script score query for exact nearest neighbor search #516

Open jmazanec15 opened 5 months ago

jmazanec15 commented 5 months ago

Description

For vector search, right now, we benchmark ANN methods that approximate nearest neighbor search. In addition to this, some users want to run exact k-NN search that returns the exact nearest neighbors per query. To do this, users can use the following query:

GET my-knn-index-1/_search
{
 "size": 4,
 "query": {
   "script_score": {
     "query": {
       "match_all": {}
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "my_vector2",
         "query_value": [2.0, 3.0, 5.0, 6.0],
         "space_type": "l2"
       }
     }
   }
 }
}

docs: https://opensearch.org/docs/latest/search-plugins/knn/knn-score-script/

I want to add this query as another option for benchmarking purposes.

VijayanB commented 5 months ago

@jmazanec15 You can use generic search operation to perform above query. However, this will not give you recall for script. Is recall required for part of your results? IIRC, search operation can support fixed query, it can't support dynamic values. @gkamat @IanHoang Can you correct me if i am wrong? Thanks

IanHoang commented 5 months ago

@jmazanec15 @VijayanB Are you requesting that this query be added to the vectorsearch workload specifically? If so, please make create an issue in the OSB workloads repository instead and feel free to link the contents there. EDIT: Reopened as these would apply to already built-in OSB core operations that were added a while back.

Search operations can also support dynamic values, through the help of custom param sources. To do this, you'll need to do the following:

  1. Incorporate this query as an operation in operations/default.json. Instead of adding the body field, add a field called param-source with your custom param source name, let's say "knn-score-script-custom-params".
  2. Define a function in workloads.py and register that method as the param source for "knn-score-script-custom-params".

To see examples of these steps, see the geonames workload.