opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
63 stars 66 forks source link

Query Benchmarks for Neural Search #34

Open navneet1v opened 2 years ago

navneet1v commented 2 years ago

Description

Perform the benchmarks for the Query via the new "neural" query type.

Benchmarking Search API

This will provide insights around performance of the new Query type (neural) that we are adding in OpenSearch. We will be using Open Search Benchmarks to perform this.

Metrics Identified:

  1. Total Latency(P90, P99), CPU utilization, Memory Space(Both Heap and Non Heap) for performing Y queries on the cluster.
  2. Average Latency, Memory Space (Both Heap and Non Heap) taken to perform single query.
  3. Performance of the predict API during Query. (We are mainly interested in Latency.)

Relevance of Search

This benchmark will concrete results which we have got from experiments that we have done by combining the BM-25 and K-NN score separately. This will also provide insights around if we need to boost scores for one query type or not and when to boost it.

Metrics Identified:

  1. Measure the metrics(NDCG, Mean Average Precision, Relevancy etc.) that were captured during the Science Experiments

Appendix

Science Experiment Metrics

  1. Normalized Discounted Cumulative Gain : Measure of ranking quality that factors in ordering.
  2. Mean Average Precision : Measure that summarizes precision/recall curve.
  3. Precision : What proportion of positive identifications was actually correct?
  4. Recall : What proportion of actual positives was identified correctly?
br3no commented 1 year ago

This would be a great step into "productifying" this plugin. Is there already an idea about what criteria the benchmarking corpora should fulfill? E.g. size, domain, nature of queries (keywords, passages, questions, etc.). One good starting point would be the BEIR dataset. And what about the models used? It probably makes sense to have a fixed baseline model and to make it easy to extend the benchmarking by adding new models to it.

navneet1v commented 1 year ago

@br3no We have used the BEIR datasets and one specific model only. All the details that you have requested is not updated right now on the issue. Will try to update them ASAP for more visibility.

navneet1v commented 1 year ago

Added the initial Commit here: https://github.com/navneet1v/neural-search/tree/perf-testing/benchmarks/osb

navneet1v commented 1 year ago

I have opened a github issue to add the benchmarks workload in OpenSearch benchmarks workload repo. Also, I have started the work on the same.