opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.88k stars 1.84k forks source link

[BUG] Concurrent Segment Search doesn't work if a Plugin Overrides the QueryPhaseSearcher #9704

Open navneet1v opened 1 year ago

navneet1v commented 1 year ago

Describe the bug As part of Enabling the Normalization and Score Combination Feature(RFC: https://github.com/opensearch-project/neural-search/issues/126) in OpenSearch via Neural Search plugin, we used the Extension point of QueryPhaseSearcher(https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/SearchModule.java#L1248-L1259), but the problem with that extension point is it enables the Searcher at Cluster Level and not at the Index level. This make the Concurrent Search un-usable because another implementation of QueryPhaseSearcher is there.

So cutting this issue to start a conversation that, the fix needs to be done before Concurrent Segment Search can become GA.

To Reproduce Steps to reproduce the behavior:

  1. Use the full distribution of OpenSearch 2.10
  2. Enable the Concurrent Segment Search Feature.
  3. Try doing the search, the concurrent Segment search does't work.

Expected behavior Ideally there should be a way in which Plugins can define when to use their QueryPhaseSearcher at Index level. Also, core needs to make sure that all the QueryPhaseSearcher which are registered are initialized and during runtime based on the index/cluster setting right Searcher is picked up.

Plugins k-NN, MLCommons, Neural Search.

Screenshots NA

Host/Environment (please complete the following information): NA

Additional context The same issue was raised as a feature request too: https://github.com/opensearch-project/OpenSearch/issues/7020

sohami commented 1 year ago

@navneet1v I don't think this is a bug instead it is how the QueryPhaseSearcher is exposed to the plugin and uses either the core implementation of QueryPhaseSearcher or the one provided by plugin. It is upto a particular installation with set of plugins to see which one to use. If the use case here is to have multiple QueryPhaseSearcher support (i.e. core + ones provided by plugins), then that functionality needs to be built based on such use case. For example: With concurrent search, since it is part of core we have created this QueryPhaseSearcherWrapper which will hold on to both DefaultQueryPhaseSearcher and ConcurrentQueryPhaseSearcher and use one or the other based on which path to take for a request.

A plugin can extend this or compose it based on the use case. Like plugin can provide a QueryPhaseSearcher, which can keep map of core provided searchers and its own and then based on request state decide which one to use. Or even build this in the core to make that decision. Not sure why it is tied to concurrent search GA. Based on https://github.com/opensearch-project/OpenSearch/issues/7020 my understanding was you or @martin-gaievski were doing some PoC for the same ? Based on the code shared here, seems like it needs to extend QueryPhaseSearcherWrapper now instead of DefaultQueryPhaseSearcher ?

navneet1v commented 1 year ago

@sohami

A plugin can extend this or compose it based on the use case The problem here is the QueryPhaseSearcherWrapper is marked as opesearch.internal so it is subject to change. If it is marked as opesearch.api then it would make more sense to extend QueryPhaseSearcherWrapper.

Based on the code shared here, seems like it needs to extend QueryPhaseSearcherWrapper now instead of DefaultQueryPhaseSearcher ?

Yes we can change the neural search class to extend the QueryPhaseSearcherWrapper, but still it doesn't solve the problem. What will happen if some other backend plugin comes up and want to implement another query phase searcher. Thing is it cannot happen. Hence I think we should start thinking about this and not come up with these intermediate solutions. Reason why I am tieing this up with Concurrent Search is because we are adding it as a new Feature in OpenSearch Core and it changes the QueryPhaseSearcher.

Also, my main point here is the interface provided for extending QueryPhaseSearcher is not an actual extension.

It is upto a particular installation with set of plugins to see which one to use. Yes that is correct. But from OpenSearch project standpoint we provide full distribution and min distribution. So we need to make sure that in both those installation we are doing it correctly when more than 1 plugin is trying to override the QueryPhaseSearcher.

This is similar to the Codec extension point we had earlier in the OpenSearch. We extended that interface to make sure that more than one codec can be be present in OpenSearch. I was thinking we should come up with a similar solution here too. See https://github.com/opensearch-project/OpenSearch/pull/1387 and https://github.com/opensearch-project/OpenSearch/pull/1404

cc: @nknize

sohami commented 1 year ago

Yes we can change the neural search class to extend the QueryPhaseSearcherWrapper, but still it doesn't solve the problem. What will happen if some other backend plugin comes up and want to implement another query phase searcher. Thing is it cannot happen. Hence I think we should start thinking about this and not come up with these intermediate solutions

@navneet1v I didn't say in my last comment that extending the OpenSearch internal implementation is the best way to go. The way it was done currently is as I explained above where it expects depending on the installed plugin to either use plugin provided implementation or use the core ones. Given there are use cases to have multiple composite searchers we should definitely extend it and you can create PR with the change in core which solves the plugin use case in best way.

Reason why I am tieing this up with Concurrent Search is because we are adding it as a new Feature in OpenSearch Core and it changes the QueryPhaseSearcher

Concurrent search is changing the QueryPhaseSearcher used internally in the core. It doesn't affect how the plugins are implementing the QueryPhaseSearcher or the current extension points.

Also, my main point here is the interface provided for extending QueryPhaseSearcher is not an actual extension.

Agreed and I would think when QueryPhaseSearcher interface was added in plugin this use case was not there (@reta can correct me here) and hence it was provided in its current form. But again you can expand on the current implementation to provide a better one to help with the new use case and probably a generic one to help other plugins as well (if we know about those use cases).

reta commented 1 year ago

Agreed and I would think when QueryPhaseSearcher interface was added in plugin this use case was not there (@reta can correct me here) and hence it was provided in its current form.

@sohami correct, we'd been looking to the implement one specific case, the concurrent search, in a way that does allow injection of different implementations by plugins.

@navneet1v what @sohami is saying is certainly one of the options to implement the desired flow: provide your own QueryPhaseSearcher but delegate to the ConcurrentQueryPhaseSearcher / DefaultQueryPhaseSearcher when needed (it is possible now without any changes in core). Also, the QueryPhaseSearcher is not about just search anymore, it also includes AggregationProcessor.

However, what we probably need is to have the contextual QueryPhaseSearcher that could complement (or may be even replace) the way QueryPhaseSearcher is provided currently, something roughly like that:

interface ContextualQueryPhaseSearcherProvider {
    Optional<QueryPhaseSearcher> forContext(SearchContext context);
}

That would allow plugins to register many providers but only one is going to be picked for the context. Obviously, the issue is that since QueryPhaseSearchers are not composable, plugins could provide competing implementations that somehow have to be resolved deterministically (same one always wins, others loose).