Open wrigleyDan opened 1 month ago
This issue should be moved under neural search repo.
@opensearch-project/admin could you please help move this issue to neural search repo: https://github.com/opensearch-project/neural-search?
@wrigleyDan this is an interesting RFC, would love to have this feature.
I want to understand where we are thinking to have this optimizer? outside of Opensearch? or inside opensearch? I am also thinking can this be a python notebook which puts the optimization information in an index of Opensearch and then it can be used by hybrid query later.
I don't think we should limit ourselves to those three parameters: normalization x 2; combination x 3; weights 0-10. I realize that the more parameters we have, the more data we need to optimize, but.... Starting with normalization, it might be interesting to look at normScore = Score/P80Score, thus favoring high outliers. For rank-based scoring a la RRF, there is a parameter k to optimize. There is the possibility of combining ranks with scores. Can we do this using, say, XGBoost to investigate non-linear scoring?
Thanks for your feedback @navneet1v!
I want to understand where we are thinking to have this optimizer? outside of Opensearch? or inside opensearch?
Excellent question, I don't have a perfect answer for that and am happy for any input. In general, I can imagine the optimizer being part of the neural-search plugin as it uses the neural query of OpenSearch and aims to improve installations using neural query as part of a hybrid search setup. From my perspective making this as accessible as possible for all OpenSearch users is a reasonable goal, so shipping it with a component "close to OpenSearch" would make sense.
I am also thinking can this be a python notebook which puts the optimization information in an index of Opensearch and then it can be used by hybrid query later.
I agree. I built a notebook version of the grid search part of the RFC based on an available dataset to see potential limitations. While the process overall works and produces an output in form of search pipeline with search metrics I think additional value comes from a tighter integrated component that can regularly run and produce metrics that can be visualised and analysed via OpenSearch dashboards to help OpenSearch users make an informed decision based on the results from the optimizer without being data scientists.
Thanks for your thoughts @smacrakis I love the idea of an additional normalizer that integrates a new idea. Assuming that not only more normalization parameters might be added to OpenSearch but also others, the Hybrid Search Optimizer should be able to take these into account when looking for the best combination of parameters. Currently, we are already thinking the concept of the optimizer a step further: we see this as a starting point providing users with the best parameter set based on available configuration options and available judgements. Knowing that there will be queries that will not benefit from the identified "best configuration" we are exploring ways to dynamically identify a suitable set of parameters on a per-query basis. I'll see if we can include rank-based options in these thoughts as well, thanks for that idea.
we definitely need to think of adding rank-based technique to the list of parameters for optimizer, it's in active implementation phase and is planned for 2.19, the next OpenSearch release https://github.com/opensearch-project/neural-search/issues/659.
You've mentioned that you're planning to use User Behavior Insights and Search Quality Evaluation Framework to get the judgment data. Do you want to put this system in production, or scope is only implementation phase? I can imagine user judgment is variable from person to person, making such data non-deterministic.
Not sure how relevant this is, but a user suggested that we look at Syne Tune for parameter optimization.
Thanks for chiming in @martin-gaievski!
You've mentioned that you're planning to use User Behavior Insights and Search Quality Evaluation Framework to get the judgment data. Do you want to put this system in production, or scope is only implementation phase? I can imagine user judgment is variable from person to person, making such data non-deterministic.
You are correct in saying that user judgement is variable from person to person. What we do is we calculate implicit judgements based on user behavior data. The combination of UBI and the referenced framework does not mean we take individual user judgements but take user interactions and derive judgements from these. The first model included in the referenced framework is COEC, Clicks Over Expected Clicks, which can be described as a rank-based click model. However, it does not matter which judgements are used for the optimizer. You can use it with any available kind of judgements.
We now have a set of notebooks taking everyone through the hybrid search optimization steps in a Github repository using one specific dataset: https://github.com/o19s/opensearch-hybrid-search-optimization
It contains not only the straight forward "find best parameter set by trying out all combinations" but also a more dynamic way that includes feature engineering and model training.
In the README.md
file you can find a link to video walkthrough if you choose to watch rather than explore for yourself.
Happy to chat about any feedback you may have.
Is your feature request related to a problem? Please describe
Introduction
Hybrid search combines keyword and neural search to improve search relevance and this combination shows promising results across industries and in benchmarks. Hybrid search with OpenSearch is linearly combining keyword search (e.g. match queries) with neural search (transforming queries to vector embeddings by using machine learning models). This combination is configured in a search pipeline. It defines the post processing of the result sets for keyword and neural search by normalizing the scores of each and then combining them with one of currently three available techniques (arithmetic, harmonic, geometric). This search pipeline configuration lets OpenSearch users define how to normalize the scores and how to weigh the result sets.
Problem Statement
There exists no “one size fits all” solution. The best configuration depends on a plethora of factors related to any given search application’s data, users, or domain. With judgments, the hybrid search optimizer plugin aims to identify the ideal parameter set.
Describe the solution you'd like
Proposal
Develop tooling that can utilize any kind of judgments (implicit or explicit) together with a query set to find the best set of parameters to apply for hybrid search.
We treat the above stated problem as a parameter optimization problem. With this tooling we want to provide the ability to run grid search over a set of parameter values. For OpenSearch installations where testing all parameter combinations is not feasible we provide a Bayesian optimization approach (ch. 6.4 from Experimentation for Engineers).
As not every optimization is a meaningful optimization we implement support that helps to decide whether an observed increase in search metrics is significant. For every evaluation done, a t-score and a p-value are calculated and stored together with the results to enable efficient decision making.
With this tooling we want to make the experimentation around identifying the best parameters for hybrid search an automated and regular process for continuous search result quality improvements.
We rely on the OpenSearch plugin User Behavior Insights and the new Search Quality Evaluation Framework for generating the necessary implicit judgments based on user behavior data. Of course, users are welcome to provide their own judgments providing flexibility for mature search teams that already have a process for implicit judgments in place.
Hybrid Search Optimization as a Parameter Optimization Challenge
There is a finite number of parameters available for retrieving high quality search results: Normalization techniques: min_max and l2 Combination techniques: arithmetic, harmonic, geometric Weights: values in the range of 0.0 to 1.0
Example search pipeline definition (see OpenSearch documentation):
This leads to a finite number of parameter combinations. Obviously, search platforms are very different, so even a small number of parameter sets may be inefficient to run for a large query set or over a large index. Therefore we provide two options to explore different parameter sets: grid search and Bayesian optimization.
Grid Search
The grid search approach tries out all possible parameter combinations to identify the best set.
Example:
Evaluating which combination of the normalization techniques min_max and l2 and combination techniques arithmetic and harmonic would mean testing four combinations:
Every combination is tested and validated with a query set and a set of judgments. The hybrid search optimizer’s objective is to find the parameter combination that maximizes a specific search metric (e.g. DCG, NDCG, AP, etc.). The user chooses the search metric they want to optimize for when creating the grid search optimization job.
The Hybrid Search Optimizer plugin seamlessly integrates with the Search Quality Evaluation Framework plugin that can be leveraged to create query and judgment sets.
This OpenSearch index can live outside of the OpenSearch cluster that runs the hybrid search optimizer. However, the index needs to be accessible.
Since running all queries of a query set can introduce heavy load on the OpenSearch system and running the metrics calculation introduces additional load, you may consider separating them.
We recommend running the hybrid search optimizer in a non-production environment to avoid user or customer impact.
Bayesian Optimization
It may happen that running all parameter combinations is too complex or time-consuming meaning that running all parameter combinations for all queries of a query set can take too long to finish. For these situations users can switch from the grid search approach to a Bayesian Optimization approach.
Problem Setup
Like with the grid search approach we want to maximize a specific search metric. The user chooses the search metric they want to optimize for when creating the Bayesian optimization job. This is the objective function of the hybrid search optimizer. The parameters to optimize do not differ from the ones the hybrid search optimizer uses for grid search.
Gaussian Process Modeling
The hybrid search optimizer uses a Gaussian Process as the underlying surrogate model. It provides us with uncertainty estimates about the predictions. The prior is the calculated search metric for an initial configuration of the hybrid search pipeline. Users can either use an existing pipeline, define a new one or use the default of the hybrid search optimizer.
Acquisition Function
The acquisition function balances exploration (trying out areas of the parameter space where the function is uncertain) and exploitation (focusing on areas where the function is expected to perform well). The hybrid search optimizer uses an acquisition function that measures expected improvement. It selects the next point by maximizing the expected improvement over the current best-known value.
Sequential Optimization
The optimization process starts with sampling the objective function at a few initial points, often chosen at random or based on a space-filling design like Latin Hypercube Sampling. The hybrid search optimizer uses the results from the initial evaluations to update the Gaussian Process model, refining the surrogate model of the objective function. The next point in the parameter space is found by maximizing the acquisition function (expected improvement). By actually running the parameters of the next point, the results for the query seat are retrieved, the search metric is calculated and the actual objective function is evaluated at this new point. Next, the GP model is updated with the new data point and the process repeats. The results of all tested parameter sets stored for review.
Convergence
The process iteratively refines the surrogate model and converges to the optimal set of parameters. Convergence is determined by a stopping criterion, such as a maximum number of iterations, a small improvement over several iterations, or a specific confidence level.
The following illustration shows the Bayesian Optimization process:
Optimization Jobs & their States
Depending on the amount of parameter sets to try out, the number of queries to run for each parameter set and the size of the search platform, hybrid search optimization jobs may be tasks that run for seconds, minutes or even hours. To keep track of initialized jobs and see their status the hybrid search optimizer registers these jobs and updates their states accordingly.
CREATED: When a hybrid search optimization job is registered it receives the status CREATED RUNNING: As soon as the job runs the state transitions from CREATED to RUNNING COMPLETED: A finished job has the state COMPLETED
By default, all jobs, their metadata and their states are stored in a job index. Metadata stored with the job will be
The id of the job is the reference to the stored evaluation results in the index holding these.
Scope
The scope of the hybrid search optimizer is to provide users with the best set of parameters for their search pipeline based on their data, their users (when using implicit judgments) and their domain. Out of scope:
GitHub Repository
This work will initially be built outside of the OpenSearch GitHub project but we hope to transfer the repository to “living” inside the OpenSearch GitHub repository as soon as possible. We are tentatively calling the repository
hybrid-search-optimizer
.Contributors
The following individuals will contribute to this effort. Please let us know in the comments if you would like to be involved.
Roadmap
We will attempt to follow the OpenSearch release schedule with incremental progress implemented in OpenSearch releases. This will enable real-time use and feedback.
Conclusion
Identifying the correct parameter set for hybrid search is a challenge that can be treated as an optimization problem. The hybrid search optimizer OpenSearch plugin provides two ways to arrive at an ideal parameter set: Grid search approach: through testing of all possible parameter combinations, the best one is identified by evaluating the search results for each. Bayesian optimization: in instances where testing of all possible parameter combinations is too expensive, users can run a Bayesian optimization algorithm that takes advantage of exploration and exploitation to efficiently identify a suitable parameter set with reasonable tradeoffs. Both approaches rely on query sets and judgments that can either be provided by the user or by the search quality evaluation framework. Both approaches result in a parameter set tuned for a user’s unique data.
Related component
Search:Relevance
Describe alternatives you've considered
No response
Additional context
No response