opensearch-project / asynchronous-search

:arrow_forward: Run queries in the background and retrieve partial results along the way
https://opensearch.org/docs/latest/search-plugins/async/index/
Apache License 2.0
28 stars 48 forks source link

[Discuss] Moving asynchronous search repo to OpenSearch core #127

Open Bukhtawar opened 2 years ago

Bukhtawar commented 2 years ago

Is your feature request related to a problem? Please describe. Creating this issue to gather thoughts on moving this repo to core. Historically some decisions were made before the fork was done and plugins were those limited ways to accomplish building asynchronous search. Reasons why moving to core makes sense

Pros

  1. Asynchronous search is a native mechanism to perform search, most search requests could be modelled asynchronously
  2. Other plugins kNN/PPL should be easily integrated with asynchronous search
  3. Kibana core integration with browser session, triggering a search in asynchronous mode for queries running longer, auto-cancellation on session termination
  4. Ability to store response can be extended for synchronous searches as well
  5. Out of the box upgrades, simplifies the plugin release management
  6. Easier integration with search related enhancements like search resource tracking, back-pressure mechanisms etc

Cons

  1. Migration effort for existing users
  2. Proxy overhead of maintaining a common endpoint for both sync and async modes
  3. Would introduce a tighter coupling of the code/artefacts

Describe the solution you'd like Move the current repo to a module in OpenSearch core Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

dblock commented 2 years ago

What are the cons? What can't be done as a plugin vs. being in core?

rramachand21 commented 2 years ago

I agree we should do this - What would we do with the current API exposed to customers? Will we be able to keep the backward compatibility once its moved to core?

dblock commented 2 years ago

@nknize @kartg @andrross @reta would like your (strong) opinions here please

andrross commented 2 years ago

Asynchronous search seems like a low-level building block that other features and plugins would build on top of, and as such seems like a good candidate to build directly into the core. I'm basically rephrasing a question above, but is the intent here to move the exact functionality of the existing plugin into core, so that existing users would get the same experience they just wouldn't have to install an extra plugin? Or is the proposal to build essentially asynchronous search v2.0 that existing users would have to make changes to use?

reta commented 2 years ago

I would agree with @andrross, I would consider it to be baked into core (in most efficient way possible) since it is directly improving the usability of search when asynchronicity is preffered.

Bukhtawar commented 2 years ago

I'm basically rephrasing a question above, but is the intent here to move the exact functionality of the existing plugin into core, so that existing users would get the same experience they just wouldn't have to install an extra plugin? Or is the proposal to build essentially asynchronous search v2.0 that existing users would have to make changes to use?

The idea is to bake this into core, so that existing search users/other search plugins(SQL/PPL/kNN) are able to leverage the same and also to unify search experience by having a common search API with a mode(sync/async) to orchestrate the search

For async search users this might be a breaking change once we cut over in a major version release(maybe 3.0), but till then we plan on supporting BWC

nknize commented 2 years ago

Does the current implementation add thread pool tracking to the _cat/thread_pool API?

Bukhtawar commented 2 years ago

Async search uses the same SEARCH threadpool for handling actual searches, however for overall management related activities like clean up expired stored responses, timer for handling timeouts, we use another threadpool as below which is most likely getting tracked

https://github.com/opensearch-project/asynchronous-search/blob/d57c8518ccc574b74d2bba3da42a318a50acf218/src/main/java/org/opensearch/search/asynchronous/plugin/AsynchronousSearchPlugin.java#L87-L95

andrross commented 2 years ago

The idea is to bake this into core, so that existing search users/other search plugins(SQL/PPL/kNN) are able to leverage the same and also to unify search experience by having a common search API with a mode(sync/async) to orchestrate the search

Would it be fair to describe this as the following?

  1. Implement asynchronous search in core, with a goal of having a unified search API supporting both sync and async modes.
  2. Deprecate the existing asynchronous search plugin

It potentially seems cleaner to reimplement this feature natively without ever needing to move the existing plugin into a core module. We can keep the plugin around for backward compatibility in the meantime, but once the native mechanism is ready and SQL, kNN, etc move to use it then we can remove support for the plugin. Is there anything that would prevent taking that approach?

Bukhtawar commented 2 years ago

We would want to reuse code as much as possible, minimize code duplication. It doesn't seem optimal to reimplement(except for what's needed to move to core) or maintain redundant pieces of code at multiple places. I would choose it as a last resort to avoid breaking changes

nknize commented 2 years ago

we use another threadpool as below which is most likely getting tracked

This is what I was quickly skimming. We need to be sure it is getting tracked so we can "easily" diagnose any thread pool issues if they arise. It makes handling stackOverflow questions much easier :)

It potentially seems cleaner to reimplement this feature natively...

@Bukhtawar it looks like a lot of the code is REST layer API implementation and business logic around managing search handlers. How much is actual core changes?

...without ever needing to move the existing plugin into a core module.

If a large chunk of the plugin is API implementation (e.g., say 80%) then I could see a strong justification to move it to core now so we can ensure consistency of the API as we move to segrep and decoupling readers from writers.

Bukhtawar commented 2 years ago

@nknize Since this is a plugin the core changes are minimal. The plumbing work around search progress listeners to support async search largely sits in core already. The plugin majorly orchestrates it and supports add on features around persisting responses for later retrieval, cancelling long running searches, refreshing results as search progresses etc

nknize commented 2 years ago

Thx @Bukhtawar!

I can see a good reason for moving this as a plugin (not module) to core. The progress not perfection here is that we would be adding the APIs that we need regardless of the async search implementation details (docrep vs segrep). My concern would be enabling this as a module in the "first version" w/o proper threadpool tracking in the cat APIs. We certainly do not want to add resource usage we cannot monitor using the existing APIs.

Wildsoft commented 2 years ago

It would be also great to have client api for async search to be integrated with RestHighLevelClient https://github.com/opensearch-project/asynchronous-search/issues/135

eirsep commented 2 years ago

Just a corollary:

Would there be value in storing final async search response on S3 compared to storing it on disk. Not just a disk space gain but as a comparison between memory overhead of indexing a large async search response (with replicas) vs storing it on S3?

@Bukhtawar thoughts?

msfroh commented 1 month ago

I see that nobody seems to argue against doing this -- it's been idle for 2 years, though.

Could we just move the code from this repo to OpenSearch core's /plugins/ directory?

andrross commented 1 month ago

Could we just move the code from this repo to OpenSearch core's /plugins/ directory?

@msfroh That goes against what we've recently recommended for other plugins. We'd need to be super clear about why its the right decision in this case. If the long term plan is to natively integrate async search into the core (i.e. not an optional plugin) then that might make sense. However, if the likely result is that the code moves into the core repo but otherwise goes untouched (which is sort of the status quo for this feature as I believe there hasn't been any real non-maintenance work recently) then I'm not really convinced its the right thing to do.

msfroh commented 1 month ago

If the long term plan is to natively integrate async search into the core (i.e. not an optional plugin) then that might make sense.

Aha -- I was assuming that's where we would go eventually. Make it a core feature of OpenSearch itself (that may not get much in the way of ongoing development, if it "just works").

andrross commented 1 month ago

I'll also note that 7 of the 19 open issues in this repo right now now are related to failing and/or flaky tests. That does not make me enthusiastic about bringing this code into the core repository.