Open conggguan opened 6 months ago
Hey @conggguan thanks for opening the issue. I definitely agree the 1 QueryPhaseSearcher
limit is a pain point that we need to address and I think the main problem is that the function of QueryPhaseSearcher::searchWith
is too broad currently.
Like you mentioned, there are cases such as the neural-search
plugin where basically the HybridQueryPhaseSearcher::searchWith
is used to perform some preprocessing with the Query
and SearchContext
but there are also cases like the ConcurrentQueryPhaseSearcher
where it's used to execute a completely different search path. Cases like the latter are why we don't support multiple QueryPhaseSearcher
s today but cases like the former seem pretty reasonable to move out of the QueryPhaseSearcher
construct entirely.
I think the high level idea of having some sort of preprocess(Query, SearchContext)
is useful, but I'm wondering if you have thought about how to make this support multiple plugins? Especially if we want to be able to mutate SearchContext
from this preprocess entry point then it seems like we would need to have some notion of ordering for the processors.
@conggguan I read through the related issue https://github.com/opensearch-project/neural-search/issues/646 and from that it looks like you are exploring the search pipelines approach instead, does that mean you do not need this functionality?
Regardless it would be nice to hear if you have any thoughts on how to compose the preProcess
steps if there are multiple plugins providing their own preprocess implementation. Thanks!
Hi @jed326 Thanks your reply. As you mentioned, I am exploring the search pipelines approach for now, but this is more of a short-term plan. In the future, as the Neural-Search plugin evolves, I anticipate there will be an increasing need for preprocessing requirements.
And for your concerns of how to compose the different preprocess of plugins, I gave it some thought and came up with the following possibilities.
We can manage and prioritize different action by different level. For example,
Level | Action | Limitation |
---|---|---|
Basic Level | Only validation and error reporting | Only Read |
Mutate SubQuery | Do some change of subQuery | Only for registered query type, and only one preprocess for one query type. |
Mutate QueryCollectorContext | Change/add QueryCollector | |
Mutate SearchContext | Add/change rescore/size/searchExt |
Consider the compatibility, I think can use a similar way, gradually open up some interfaces for developers to use.
Thanks @conggguan. I'm also working on a feature (RFC to be published soon) where we need to modify the QueryCollectorContext
and we need to work around the 1 QueryPhaseSearcher
limitation as well. We were also thinking of a priority based approach similar to what you described where we can perform the preprocessing based on priority, sort of similar to what ActionFilters
do today:
https://github.com/opensearch-project/OpenSearch/blob/aca2e9de7daaadee77c33e21ff5215c3f1e8600f/server/src/main/java/org/opensearch/action/support/ActionFilter.java#L47-L50
But aside from that I think one of the big problems still is that the current searchWIth
plugin interface is very broad and should only be used for changing the actual search execution path while any query/context processing should happen outside of it because only 1 plugin should be changing the search execution path but multiple plugins can introduce their own processing.
Description
This RFC is related opensearch-project/neural-search#646.
Intro
Briefly, we want enhance the performance, divide a neuralSparseQuery to a less computationally expensive query and a more expensive query, we use the former to get a TopDocs firstly, and then use the more computationally expensive query as a rescoreQuery to adjust the score of the results.
limitation
There have lots of entry point to process the query, such as doRewrite(QueryRewriteContext queryShardContext), doToQuery(QueryShardContext queryShardContext). But each of them have no access to searchContext and whole query information. When we build a new feature which needs considering the global compound query information. Beside NeuralSparseQuery, HybridQuery also needs a interface carrying searchContext and whole query.
Existing compromise solution
The current solution is that register a QueryPhaseSearcher by the interface of SearchPlugin. So that we can get a searchWith entry point, all of query will reach this function and with searchContext, in this function, we can do what we want to do of searchContext and query.
pain point
Describe the solution you'd like
Open a plugin preProcess API that can do some custom progress before QueryPhaseSearcher.searchWith.
Design draft
Earlier AggregationProcessor
In QueryPhaseSearcher, we can Override aggregationProcessor() interface to add a preProcess or postProcess. We can do a similar but earlier interface in QueryPhase.
A TwoPhaseScorer
There is a class named TwoPhaseIterator, but indeed, it's two phase means there has two phases to determine the next iterate's step. Build a Scorer that can use score result of first loop and do a second score process. It will make's more easy to do some job about multi-stage query.
open the access of searchContext in QueryBuilder's doToQuery() and add a proProcess interface for searchContext
We can register some proProcess lambda function of this funciton.
Related component
QueryPhaseSearcher, SearchPlugin