[RFC] Hybrid Search Optimizer

wrigleyDan commented 3 weeks ago

Is your feature request related to a problem? Please describe

Introduction

Hybrid search combines keyword and neural search to improve search relevance and this combination shows promising results across industries and in benchmarks. Hybrid search with OpenSearch is linearly combining keyword search (e.g. match queries) with neural search (transforming queries to vector embeddings by using machine learning models). This combination is configured in a search pipeline. It defines the post processing of the result sets for keyword and neural search by normalizing the scores of each and then combining them with one of currently three available techniques (arithmetic, harmonic, geometric). This search pipeline configuration lets OpenSearch users define how to normalize the scores and how to weigh the result sets.

Problem Statement

How do users figure out which parameter set is best for them?
What is the best normalization technique and how much neural/keyword search is ideal?

There exists no “one size fits all” solution. The best configuration depends on a plethora of factors related to any given search application’s data, users, or domain. With judgments, the hybrid search optimizer plugin aims to identify the ideal parameter set.

Describe the solution you'd like

Proposal

Develop tooling that can utilize any kind of judgments (implicit or explicit) together with a query set to find the best set of parameters to apply for hybrid search.

We treat the above stated problem as a parameter optimization problem. With this tooling we want to provide the ability to run grid search over a set of parameter values. For OpenSearch installations where testing all parameter combinations is not feasible we provide a Bayesian optimization approach (ch. 6.4 from Experimentation for Engineers).

As not every optimization is a meaningful optimization we implement support that helps to decide whether an observed increase in search metrics is significant. For every evaluation done, a t-score and a p-value are calculated and stored together with the results to enable efficient decision making.

With this tooling we want to make the experimentation around identifying the best parameters for hybrid search an automated and regular process for continuous search result quality improvements.

We rely on the OpenSearch plugin User Behavior Insights and the new Search Quality Evaluation Framework for generating the necessary implicit judgments based on user behavior data. Of course, users are welcome to provide their own judgments providing flexibility for mature search teams that already have a process for implicit judgments in place.

Hybrid Search Optimization as a Parameter Optimization Challenge

There is a finite number of parameters available for retrieving high quality search results: Normalization techniques: min_max and l2 Combination techniques: arithmetic, harmonic, geometric Weights: values in the range of 0.0 to 1.0

Example search pipeline definition (see OpenSearch documentation):

PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.3,
              0.7
            ]
          }
        }
      }
    }
  ]
}

This leads to a finite number of parameter combinations. Obviously, search platforms are very different, so even a small number of parameter sets may be inefficient to run for a large query set or over a large index. Therefore we provide two options to explore different parameter sets: grid search and Bayesian optimization.

Grid Search

The grid search approach tries out all possible parameter combinations to identify the best set.

Example:

Evaluating which combination of the normalization techniques min_max and l2 and combination techniques arithmetic and harmonic would mean testing four combinations:

min_max + arithmetic_mean
min_max + harmonic
l2 + arithmetic_mean
l2 + harmonic

Every combination is tested and validated with a query set and a set of judgments. The hybrid search optimizer’s objective is to find the parameter combination that maximizes a specific search metric (e.g. DCG, NDCG, AP, etc.). The user chooses the search metric they want to optimize for when creating the grid search optimization job.

The Hybrid Search Optimizer plugin seamlessly integrates with the Search Quality Evaluation Framework plugin that can be leveraged to create query and judgment sets.

This OpenSearch index can live outside of the OpenSearch cluster that runs the hybrid search optimizer. However, the index needs to be accessible.

Since running all queries of a query set can introduce heavy load on the OpenSearch system and running the metrics calculation introduces additional load, you may consider separating them.

We recommend running the hybrid search optimizer in a non-production environment to avoid user or customer impact.

Bayesian Optimization

It may happen that running all parameter combinations is too complex or time-consuming meaning that running all parameter combinations for all queries of a query set can take too long to finish. For these situations users can switch from the grid search approach to a Bayesian Optimization approach.

Problem Setup

Like with the grid search approach we want to maximize a specific search metric. The user chooses the search metric they want to optimize for when creating the Bayesian optimization job. This is the objective function of the hybrid search optimizer. The parameters to optimize do not differ from the ones the hybrid search optimizer uses for grid search.

Gaussian Process Modeling

The hybrid search optimizer uses a Gaussian Process as the underlying surrogate model. It provides us with uncertainty estimates about the predictions. The prior is the calculated search metric for an initial configuration of the hybrid search pipeline. Users can either use an existing pipeline, define a new one or use the default of the hybrid search optimizer.

Acquisition Function

The acquisition function balances exploration (trying out areas of the parameter space where the function is uncertain) and exploitation (focusing on areas where the function is expected to perform well). The hybrid search optimizer uses an acquisition function that measures expected improvement. It selects the next point by maximizing the expected improvement over the current best-known value.

Sequential Optimization

The optimization process starts with sampling the objective function at a few initial points, often chosen at random or based on a space-filling design like Latin Hypercube Sampling. The hybrid search optimizer uses the results from the initial evaluations to update the Gaussian Process model, refining the surrogate model of the objective function. The next point in the parameter space is found by maximizing the acquisition function (expected improvement). By actually running the parameters of the next point, the results for the query seat are retrieved, the search metric is calculated and the actual objective function is evaluated at this new point. Next, the GP model is updated with the new data point and the process repeats. The results of all tested parameter sets stored for review.

Convergence

The process iteratively refines the surrogate model and converges to the optimal set of parameters. Convergence is determined by a stopping criterion, such as a maximum number of iterations, a small improvement over several iterations, or a specific confidence level.

The following illustration shows the Bayesian Optimization process:

RFC_hybrid_optimizer_bayesian_optimization

Optimization Jobs & their States

Depending on the amount of parameter sets to try out, the number of queries to run for each parameter set and the size of the search platform, hybrid search optimization jobs may be tasks that run for seconds, minutes or even hours. To keep track of initialized jobs and see their status the hybrid search optimizer registers these jobs and updates their states accordingly.

CREATED: When a hybrid search optimization job is registered it receives the status CREATED RUNNING: As soon as the job runs the state transitions from CREATED to RUNNING COMPLETED: A finished job has the state COMPLETED

By default, all jobs, their metadata and their states are stored in a job index. Metadata stored with the job will be

The id of the job
The query set used
The search metric to optimize for
The index to query for results
The start date and time of the job
The finish date and time of the job.

The id of the job is the reference to the stored evaluation results in the index holding these.

Scope

The scope of the hybrid search optimizer is to provide users with the best set of parameters for their search pipeline based on their data, their users (when using implicit judgments) and their domain. Out of scope:

Optimization of the individual queries that are combined (keyword search and neural search);
Online testing: all evaluation takes place offline against a predefined query set and judgments; and,
Model choice: users need to choose a suitable model for their neural query prior to using the hybrid search optimizer.

GitHub Repository

This work will initially be built outside of the OpenSearch GitHub project but we hope to transfer the repository to “living” inside the OpenSearch GitHub repository as soon as possible. We are tentatively calling the repository hybrid-search-optimizer.

Contributors

The following individuals will contribute to this effort. Please let us know in the comments if you would like to be involved.

Roadmap

We will attempt to follow the OpenSearch release schedule with incremental progress implemented in OpenSearch releases. This will enable real-time use and feedback.

Conclusion

Identifying the correct parameter set for hybrid search is a challenge that can be treated as an optimization problem. The hybrid search optimizer OpenSearch plugin provides two ways to arrive at an ideal parameter set: Grid search approach: through testing of all possible parameter combinations, the best one is identified by evaluating the search results for each. Bayesian optimization: in instances where testing of all possible parameter combinations is too expensive, users can run a Bayesian optimization algorithm that takes advantage of exploration and exploitation to efficiently identify a suitable parameter set with reasonable tradeoffs. Both approaches rely on query sets and judgments that can either be provided by the user or by the search quality evaluation framework. Both approaches result in a parameter set tuned for a user’s unique data.

Related component

Search:Relevance

Describe alternatives you've considered

No response

Additional context

No response

vibrantvarun commented 1 week ago

This issue should be moved under neural search repo.

minalsha commented 1 week ago

@opensearch-project/admin could you please help move this issue to neural search repo: https://github.com/opensearch-project/neural-search?

navneet1v commented 1 week ago

@wrigleyDan this is an interesting RFC, would love to have this feature.

I want to understand where we are thinking to have this optimizer? outside of Opensearch? or inside opensearch? I am also thinking can this be a python notebook which puts the optimization information in an index of Opensearch and then it can be used by hybrid query later.

smacrakis commented 3 days ago

I don't think we should limit ourselves to those three parameters: normalization x 2; combination x 3; weights 0-10. I realize that the more parameters we have, the more data we need to optimize, but.... Starting with normalization, it might be interesting to look at normScore = Score/P80Score, thus favoring high outliers. For rank-based scoring a la RRF, there is a parameter k to optimize. There is the possibility of combining ranks with scores. Can we do this using, say, XGBoost to investigate non-linear scoring?

wrigleyDan commented 3 days ago

Thanks for your feedback @navneet1v!

I want to understand where we are thinking to have this optimizer? outside of Opensearch? or inside opensearch?

Excellent question, I don't have a perfect answer for that and am happy for any input. In general, I can imagine the optimizer being part of the neural-search plugin as it uses the neural query of OpenSearch and aims to improve installations using neural query as part of a hybrid search setup. From my perspective making this as accessible as possible for all OpenSearch users is a reasonable goal, so shipping it with a component "close to OpenSearch" would make sense.

I am also thinking can this be a python notebook which puts the optimization information in an index of Opensearch and then it can be used by hybrid query later.

I agree. I built a notebook version of the grid search part of the RFC based on an available dataset to see potential limitations. While the process overall works and produces an output in form of search pipeline with search metrics I think additional value comes from a tighter integrated component that can regularly run and produce metrics that can be visualised and analysed via OpenSearch dashboards to help OpenSearch users make an informed decision based on the results from the optimizer without being data scientists.

wrigleyDan commented 3 days ago

Thanks for your thoughts @smacrakis I love the idea of an additional normalizer that integrates a new idea. Assuming that not only more normalization parameters might be added to OpenSearch but also others, the Hybrid Search Optimizer should be able to take these into account when looking for the best combination of parameters. Currently, we are already thinking the concept of the optimizer a step further: we see this as a starting point providing users with the best parameter set based on available configuration options and available judgements. Knowing that there will be queries that will not benefit from the identified "best configuration" we are exploring ways to dynamically identify a suitable set of parameters on a per-query basis. I'll see if we can include rank-based options in these thoughts as well, thanks for that idea.

opensearch-project / neural-search