[RFC] Cloud Native SQL Plugin

fddattal commented 1 month ago

Is your feature request related to a problem? Please describe

I want to run the SQL plugin in a cloud native environment, but I am unable to do so without forking the code.

Today, plugin execution is tightly coupled to OpenSearch core. Access to metadata of index, aliases, templates, node roles, and routing is tightly coupled with cluster state. Furthermore, plugins are vended full access to OpenSearch core including direct access to cluster state internals. This make it difficult to evolve plugin APIs to be more cloud native provider friendly, and eventually run plugins in a stateless mode.

Describe the solution you'd like

I would like to propose an interface to plugins for accessing OpenSearch core. The interface should expose the minimal fine grained state management operations and the operations needed to support plugins. This will ensure both cluster and cloud native implementation can support the plugin uniformly.

Then the SQL plugin will migrate to this interface and be enabled to run in cloud native environments.

Related component

Plugins

Describe alternatives you've considered

No response

Additional context

FAQ

How would the community benefit from this?

The work here is foundational work which is applicable to other plugins. The plugin interface can be expanded upon to enable other plugins them to become cloud native.
This proposal aids in decoupling plugins from OpenSearch Core by restricting their core access through an abstract interface. Having this interface enables: 2.a. The possibility for alternative implementations of core APIs. 2.b. Improve stability guarantees of the plugin APIs. 2.c. Complete audibility of plugin’s access to core APIs via this interface. 2.d. The enabling of alternative runtime environments to run plugins unmodified.

What other RFCs does this align with?

https://github.com/opensearch-project/OpenSearch/issues/13197 1.a. This work adds upon the RFC proposed for cluster state management. Similar to how cluster state management is being refactored to enable cloud native environments, so will plugin to core access.
https://github.com/opensearch-project/OpenSearch/issues/8110 2.a. In addition to code separation, a standard for plugin to core interaction is now defined to make core module implementations swappable.
https://github.com/opensearch-project/security/issues/2860 3.a. In a future where all plugin to core access is done via the interface. Security negotiation can be mediated by the interface and eventually lead to the deprecation of the mutable thread context for security management.

dblock commented 1 month ago

I like this proposal because it's effectively identical to https://github.com/opensearch-project/opensearch-sdk-java except that you are offering to refactor core vs. exposing a new interface on top of core. Generally, refactoring core is a costly undertaking, everything takes 10x more work. We're seeing that even relatively small changes to enable things like #10684 require weeks of iterating, just to get gradle checks to pass. You also will need to deal with things like settings, therefore the proposed plugin interface should include that (and possibly other things).

Wouldn't it be easier to make https://github.com/opensearch-project/opensearch-sdk-java the plugin interface you're proposing and implementing the SQL plugin on top of that? If you shortcut/remove/bypass all the remote aspects of that SDK, you effectively have a clean plugin interface, and you can iterate on that completely independently from core.

dbwiddis commented 3 weeks ago

@dblock's comments about SDK prompted me to think of the fact that we really wanted to implement those using OpenSearch Java Client. That continues on to the whole conversation around generated code and reference specifications.

Which brings me to the question: do we need to write these interfaces or can we generate them from the OpenAPI spec?

dblock commented 3 weeks ago

Which brings me to the question: do we need to write these interfaces or can we generate them from the OpenAPI spec?

I think we can. It's a ton of boilerplate. That project has moved ahead quite well!

https://github.com/opensearch-project/opensearch-api-specification

@Xtansia is working on a java generator from API, it could generate anything

fddattal commented 3 weeks ago

Hey folks! Thanks for the comments!

To summarize my understanding:

@dblock's feedback is that we should use the opensearch-sdk-java as our target for a cloud native sql plugin. This helps in two ways:

It avoids needing to refactor core allowing for faster iteration time
We would end up defining an interface which is already similar to the SDK

@dbwiddis' feedback is that we should generate the core api interface using the API spec. This helps to avoid manually maintaining the java code for interfacing with core.

I've taken a look at the opensearch-sdk-java and I do believe this would offer us a path forward if we invest a bit more into it. There are two main challenges which I see:

Today there's no in-process version of Extensions. I'd like to run the code in the same process to avoid the performance overhead of networking. We would need to prioritize this issue https://github.com/opensearch-project/opensearch-sdk-java/issues/688
The SDKClient which the opensearch-sdk-java exposes to Extensions is a concrete class which leaves little flexibility for myself to inject custom cloud native behavior. An interface would allow me that flexibility. Similar to the client generation, we could also generate this interface and completely rely on the spec for its definition.

In a similar vein, a solution which I have considered is migrating the SQL plugin to the Client interface rather than the concrete NodeClient. This would enable me to inject custom logic to my Client subclass to support the plugin.

I see this as a similar approach to the opensearch-sdk-java as both solutions are API based and would allow for functionality being swapped at an API action level.

Given that this Client is already in core and exposed to plugins, I would not expect additional core refactoring to enable this solution (except for below).

@dblock @saratvemulapalli I want to understand another point: I see in the opensearch-sdk-java that the Client interface is intended for deprecation. Could you help me understand why it's being considered for deprecation? What are its downsides? I assume the replacement would be a generated interface/client?

To add to the API discussion. I've been having discussions with @shwetathareja on decomposing cluster state access in core, and we feel that the existing public API's are not sufficiently fine-grained for accessing cluster state components.

To be compatible with her RFC here: https://github.com/opensearch-project/OpenSearch/issues/13197 we would need to introduce or extend the existing cluster state API so that we may access the following (not an exhaustive list, this is just for SQL):

Individual indices metadata
Individual index settings
Individual cluster setting values

This is fine grained access is particularly a problem for cluster state because

Cluster state can be large. Thus, accessing these APIs, even locally, can have a performance overhead
The full cluster state may not be supported in cloud native environments (eg: routing table may not be supported)

These APIs changes would be applicable to both SDK and Client based approaches.

Looking forward to hearing your thoughts!

dblock commented 3 weeks ago

Thanks for keeping an open mind @fddattal!

In picking similar-ish solutions in terms of complexity, risk and cost I would always take the one that's viable most long term.

Today there's no in-process version of Extensions.

Correct. There's probably a very minimal solution that assumes that both sides are the same version of the message being sent and passes it in-memory rather than through serialization/deserialization.

@dblock @saratvemulapalli I want to understand another point: I see in the opensearch-sdk-java that the Client interface is intended for deprecation. Could you help me understand why it's being considered for deprecation? What are its downsides? I assume the replacement would be a generated interface/client?

I think you are looking for https://github.com/opensearch-project/opensearch-sdk-java/issues/732 that's all @dbwiddis.

fddattal commented 3 weeks ago

Hey folks! I've taken a look at https://github.com/opensearch-project/OpenSearch/issues/13336 and I think that there's a shared solution that would position us well in the long term for a full extension migration.

Proposal

What I am proposing is the following:

We introduce a new client interface which plugins and extensions will use to interact with core.
The interface itself can be maintained as a separate library or module.
Data exchange over the interface is done using simple Java Pojo's.
The interface is asynchronous.
The library would maintain all the Pojo's for its Request/Response objects.
OpenSearch and cloud native environments may provide alternative implementations of this interface to suit their needs.

Tenets

The API definition should not depend on core
The API should function using message passing "as-if" executing remotely
Internal communication details are abstracted by the interface

Interface Definition

package org.opensearch.sdk;

import org.opensearch.sdk.model.ClearScrollRequest;
import org.opensearch.sdk.model.ClearScrollResponse;
import org.opensearch.sdk.model.DeleteCustomRequest;
import org.opensearch.sdk.model.DeleteCustomResponse;
import org.opensearch.sdk.model.GetCustomRequest;
import org.opensearch.sdk.model.GetCustomResponse;
import org.opensearch.sdk.model.GetIndexMappingsRequest;
import org.opensearch.sdk.model.GetIndexMappingsResponse;
import org.opensearch.sdk.model.GetIndexSettingRequest;
import org.opensearch.sdk.model.GetIndexSettingResponse;
import org.opensearch.sdk.model.GetSystemSettingRequest;
import org.opensearch.sdk.model.GetSystemSettingResponse;
import org.opensearch.sdk.model.MultiSearchRequest;
import org.opensearch.sdk.model.MultiSearchResponse;
import org.opensearch.sdk.model.PutCustomRequest;
import org.opensearch.sdk.model.PutCustomResponse;
import org.opensearch.sdk.model.ResolveIndicesAndAliasesRequest;
import org.opensearch.sdk.model.ResolveIndicesAndAliasesResponse;
import org.opensearch.sdk.model.ScrollRequest;
import org.opensearch.sdk.model.SearchRequest;
import org.opensearch.sdk.model.SearchResponse;

import java.util.concurrent.CompletionStage;

public interface Client {

    // for sql plugin

    CompletionStage<SearchResponse> search(SearchRequest request);

    CompletionStage<SearchResponse> scroll(ScrollRequest request);

    CompletionStage<ClearScrollResponse> clearScroll(ClearScrollRequest request);

    CompletionStage<GetIndexMappingsResponse> getIndexMappings(GetIndexMappingsRequest request);

    CompletionStage<GetIndexSettingResponse> getIndexSetting(GetIndexSettingRequest request);

    CompletionStage<GetSystemSettingResponse> getSystemSetting(GetSystemSettingRequest request);

    CompletionStage<ResolveIndicesAndAliasesResponse> resolveIndicesAndAliases(ResolveIndicesAndAliasesRequest request);

    CompletionStage<MultiSearchResponse> multiSearch(MultiSearchRequest request);

    // for data store interface

    CompletionStage<PutCustomResponse> putCustom(PutCustomRequest request);

    CompletionStage<GetCustomResponse> getCustom(GetCustomRequest request);

    CompletionStage<DeleteCustomResponse> deleteCustom(DeleteCustomRequest request);
}

Pros / Cons

Pros:

This proposal gives us the flexibility for building a solution that would work for OpenSearch and cloud native envrionments
It is aligned with efforts to decompose cluster state - https://github.com/opensearch-project/OpenSearch/issues/13197
The interface can be maintained outside of the core allowing us to develop it independently

Cons:

This will introduce a competing standard and require plugin migration
Additional effort would be needed to enhance the interface to expose new core capabilities.
1. Albiet this can be seen as a good thing because it would be a forcing function for API evolution stewardship
The implementers of the interface (core, cloud native) would need to perform object transformation to convert from the API Pojo to their internal core representations

What's Not Addressed Here

Versioning - The API itself is not currently versioned, but the API could be extended to add versioning at either the client or api action level.
Cross-plugin/module calls - Currently there's no way for a plugin or module to expose api actions for other plugins or modules to consume. We could expose a mechanism to register api action handlers with the client and execute those in a way similar to org.opensearch.Client#execute(ActionType, Request, ActionListener).

dblock commented 3 weeks ago

@fddattal I like this. Do you think it makes sense to reuse https://github.com/opensearch-project/opensearch-sdk-java for that and possibly produce multiple artifacts (that eventually merge)?

fddattal commented 3 weeks ago

@dblock I like that approach and it's certainly feasible to do so. We would modify the SDK package to be a multi-module gradle project, and the existing extension code would become one module, while the API would be another. Each releasing their own artifacts.

fddattal commented 3 weeks ago

I've put together a POC for this approach here:

I was able to:

Refactor the opensearch-sdk-java and consume it in sql and implement the new api interface in core.
Refactor enough of the SQL core and implement the SDK Client API to get a simple SQL select working

I found I spent a lot of time creating the model objects and writing the code to convert to/from the model and core data structures. I think having some standard tooling in the API to aide with this would help drive adoption.

shwetathareja commented 2 weeks ago

Data exchange over the interface is done using simple Java Pojo's.

@fddattal : one clarification, these Pojo are re-using existing OpenSearch cluster state sub objects/ data structures or altogether new Pojo's

fddattal commented 2 weeks ago

one clarification, these Pojo are re-using existing OpenSearch cluster state sub objects/ data structures or altogether new Pojo's

@shwetathareja These Pojos are new altogether and would live outside of core in the SDK package.

peternied commented 2 weeks ago

[Triage - attendees 1 2 3 4 5 6 7 8] @fddattal Thanks for creating this RFC

fddattal commented 2 weeks ago

Next Steps

Hey folks, I don't see any blocking concerns so I am moving ahead with a full implementation.

Below is what to expect regarding high level changes. Please let me know if you have any concerns!

Investigation Work

Investigate the level of effort required to fully refactor SQL to only use the SDK Pojo's throughout versus translating to SDK Pojo's at the API layer. Today the plugin is using core's objects which may contain features not supported by the SDK APIs. Depending on how this goes it would inform us on whether we need to add a generic request/response transformation layer in the SDK API now versus simply have the sql plugin migrate. A generic request transformation layer would be needed later to enable plugin-to-plugin interactions via the org.opensearch.sdk.Client.

OpenSearch Java SDK

Package will be refactored as a gradle multi-module project
New module opensearch-java-sdk-api will be introduced to contain the o.o.sdk.Client and associated Request/Response objects
[Contingent on Investigation 1 finding reader/writer interface is needed] Add a token based reader/writer interface similar to the xcontent API in core. All Request/Response objects would support interfacing with these token streams.

SQL Plugin

SQL plugin will access the the o.o.sdk.Client dependency exposed transitively through core.
[Contingent on Investigation 1 finding reader/writer interface is not needed] Refactor all read only paths to use the sdk.Client and Pojo's entirely
Cloud native compatible SQL integ tests will be renamed from *IT.java to *CloudNativeIT.java
A new remote cluster test gradle task will be added to run all *CloudNativeIT.java's

Core

Core :server will consume the org.opensearch.sdk:opensearch-java-sdk-api as an api dependency
A new parameter and overloaded method will be added to Plugin.java to recieve a concrete org.opensearch.sdk.Client implementation
Core :server will maintain a private implementation of org.opensearch.sdk.Client
Updates to the test framework will be made so that cloud native providers can inject infrastructure provisioning steps during integ tests
Updates to the test framework to allow core to inject custom RestClient implementations

shwetathareja commented 2 weeks ago

Core :server will consume the org.opensearch.sdk:opensearch-java-sdk-api as an api dependency

@fddattal can you elaborate why :server module would need this dependency?

@shwetathareja These Pojos are new altogether and would live outside of core in the SDK package.

How are you going to generate these Pojo? Using OpenAPI specification? What would be the performance overhead of translating across objects?

fddattal commented 2 weeks ago

Thanks for your comments @shwetathareja !

can you elaborate why :server module would need this dependency?

Core :server needs to provide an implementation of the o.o.sdk.Client interface to plugins. An instance of this class would need to be injected into plugins during their initialization in Node.java.

How are you going to generate these Pojo? Using OpenAPI specification?

For now, I plan to hand write the Java classes by hand and use Lombok to generate the boilerplate. We certainly could use a spec to generate the API in the future as more target runtimes are supported.

What would be the performance overhead of translating across objects?

I've benchmarked (on my laptop) translating a large search request with 10k boolean clauses and a large search response with 10k hits and found the request took 0.03 ms and response took 0.1 ms. If performance overhead due to object translation becomes a problem we could introduce interfaces in the API which map 1:1 with their corresponding core objects and have the core object implement those interfaces. For such implementations, the translation would be a noop.

Benchmark                                                Mode  Cnt       Score       Error  Units
SDKTranslatorBenchmark.testTranslateLargeSearchRequest   avgt    5   38929.944 ±   456.177  ns/op
SDKTranslatorBenchmark.testTranslateLargeSearchResponse  avgt    5  114267.749 ± 10530.752  ns/op

owaiskazi19 commented 2 weeks ago

Sorry to chime late. @fddattal since the RFC is around running SQL independently than core. And one of option provided was opensearch-sdk-java. 2 years back when we started working on sdk, SQL was the first plugin I integrated with sdk to run independently. Commit history of my fork for any help around.

dblock commented 2 weeks ago

@owaiskazi19 whoa I totally forgot about that one! full circle

andrross commented 2 weeks ago

SQL plugin will access the the o.o.sdk.Client dependency exposed transitively through core.

@fddattal I'm assuming "core" here means the :server module. Correct me if I'm wrong. But isn't the whole point here to break the dependency that the plugin has on the :server module?

fddattal commented 1 week ago

Thanks for your comment @andrross !

Correct me if I'm wrong. But isn't the whole point here to break the dependency that the plugin has on the :server module?

Yes it's true that "core" means the :server in this context.

I'd state our goal slightly differently - Refactor the read-only paths in SQL plugin which access core to one unifying cloud native interface. In doing so we would remove deep core access channels such as direct cluster state access and unnecessary core details being exposed to the plugin.

The steps taken here will help us to fully remove the core dependency in the future. However, the work will not be sufficient to do so because both non-read-only paths and accesses from core to plugin will not be refactored.

andrross commented 1 week ago

I'd state our goal slightly differently - Refactor the read-only paths in SQL plugin which access core to one unifying cloud native interface. In doing so we would remove deep core access channels such as direct cluster state access and unnecessary core details being exposed to the plugin.

@fddattal We currently have the org.opensearch.plugins java namespace in the :server module (with the main entry point being org.opensearch.plugins.Plugin), which on the surface looks like a nice clean interface for extending the functionality of OpenSearch. However, the problem, as has been stated many times, is that many very low level details of the server are exposed through this interface resulting in very tight coupling. With opensearch-java-sdk-api and o.o.sdk.Client will you essentially be implementing a "version 2" of this Plugin interface that doesn't expose any :server dependencies? Are there additional complexities here that I'm missing?

saratvemulapalli commented 1 week ago

Thanks @fddattal for the RFC. I've worked through extensions[1] which was essentially de-coupled the compute from OpenSearch but still relied on internal implementations of opensearch :server. I see this proposal as plugins will be de-coupled from internal implementations of core and yet access the information they'd like. I love the proposal, essentially see it as PluginsV2 (I agree with @andrross).

Here are my thoughts and proposal:

Tenets (in addition to the ones listed by @fddattal)

Interface abstracts inner implementations (including Pojo's) of core (:server)
The interface exposed to clients does the heavy lifting for sync and async implementations.
Core (:server) would have a default implementation in min distribution.

I'd like this interface to de-couple plugins in the following verticals, and deliver in phases:

Decouple In Memory Store (a.k.a cluster state): As we move the cluster management to Cloud Native[2], plugins should be agnostic of local vs remote implementations of cluster state, which this RFC already covers and your usecase for SQL.
Decouple plugin storage: Plugins rely on OpenSearch to store metadata in indices for persistence. This configuration doesn’t have to be in opensearch index. @dbwiddis's RFC[3] proposes an abstraction with alternative cloud native storage.
Decouple plugin configuration: Plugins rely on OpenSearch settings[4] to expose knobs for users to configure features. Plugins should be agnostic of local vs remote implementations of Cluster settings.
Decouple plugin security and tenancy: Plugins use OpenSearch security plugin and implement custom authorization mechanisms[5]. I'd like an interface which offloads AuthZ to a central authority. Also tangentially I am seeing use cases of plugins trying to support multi-tenancy[6], which could be solved.

I would want to deliver these verticals in phases and incrementally. Eventually once we have adoption we can explore the options of deprecating existing access's to the core :server

I would like to have the interface defined in OpenSearch repo as a library and default implementation in :server. Additionally plugins will take a dependency on the library. I do not mind calling it org.opensearch.plugins.Pluginv2 or org.opensearch.sdk.

Few things I'd like to push it down the road:

@dbwiddis' feedback is that we should generate the core api interface using the API spec. This helps to avoid manually maintaining the java code for interfacing with core.

I would not prioritize API generation until the interface is widely adopted and we are seeing pains of manually maintaining it. With the experience of Plugin interface[6], it really hasn't changed a lot for a long time.

Extensions are another way of extending features of OpenSearch but for phased and incremental approach I'd like the interface to be flexible enough to support extensions in future.

I would also like to get feedback from @reta @AmiStrn @samuel-oci @rursprung @abseth-amzn who worked in the area of plugins.

[1] https://github.com/opensearch-project/OpenSearch/issues/2447 [2] https://github.com/opensearch-project/OpenSearch/issues/13197 [3] https://github.com/opensearch-project/OpenSearch/issues/13336 [4] https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/settings/Settings.java [5] https://github.com/opensearch-project/security/issues/1895 [6] https://github.com/opensearch-project/ml-commons/issues/2358

dblock commented 1 week ago

With opensearch-java-sdk-api and o.o.sdk.Client will you essentially be implementing a "version 2" of this Plugin interface that doesn't expose any :server dependencies? Are there additional complexities here that I'm missing?

I think that's it. We put it in a different repo (a facade) to reduce blast radius when API/implementation/internals change in core. Instead of having to figure out how to fix N plugins we fix the SDK, which itself then depends on a stable version of core rather than on the next moving version of core.

andrross commented 1 week ago

@saratvemulapalli I think we're aligned and I really like how you've broken much of this work down. A couple points to follow up on though:

I would like to have the interface defined in OpenSearch repo as a library and default implementation in :server

I agree with this. Probably something like a new library in libs/pluginv2 or libs/plugin-api (or whatever we want to name it). I think this creates the groundwork for integration with JPMS where we can explicitly declare what gets publicly exported by these libraries.

Interface abstracts inner implementations (including Pojo's) of core (:server)

Just to clarify, as we build on the vision and work started in #5910 we should be moving many parts of the code in :server into a respective library (like libs/common and libs/core) and that would be fair game for the new plugin API to take a dependency (provided the library is exporting the class as a public API). This would avoid duplicating and translating POJOs for things like the Java object representation of the query DSL. Is this in line with your thinking as well?

reta commented 1 week ago

@andrross @saratvemulapalli @dblock I think the plugin model was pushed to its limits and dealing with those was the primarily reason behind introducing the extensions as a replacement (and the path forward). I think revamping / extending the plugin APIs (org.opensearch.plugins.Pluginv2) is the step in reverse direction, that would make the extensibility even more difficult to evolve and maintain.

Regarding in-proc / out-proc extensions, I think we could stick to out-proc model and explore the UNIX domain sockets [1] , which is very efficient way to communicate with processes on the same host - it should eliminate the additional latency concerns at large.

[1] https://openjdk.org/jeps/380

dblock commented 1 week ago

I agree with this. Probably something like a new library in libs/pluginv2 or libs/plugin-api (or whatever we want to name it). I think this creates the groundwork for integration with JPMS where we can explicitly declare what gets publicly exported by these libraries.

I personally dislike the option of putting this into this repo a lot because of ho difficult it is to iterate in core, and because the dependency would not be against a stable, previously released version of OpenSearch, but against all the same internals.

dblock commented 1 week ago

I think revamping / extending the plugin APIs (org.opensearch.plugins.Pluginv2) is the step in reverse direction, that would make the extensibility even more difficult to evolve and maintain.

Extensions = out-of-proc plugins with a well defined SDK Plugins v2 = in-proc plugins with a well defined SDK

We're basically saying let's do the well defined SDK part and worry about in-proc vs. out-of-proc later as just a runtime feature. Seems like a good shortcut to change the dependency from N plugins -> core to N plugins -> 1 sdk -> core without rewriting everything, no?

reta commented 1 week ago

Seems like a good shortcut to change the dependency from N plugins -> core to N plugins -> 1 sdk -> core without rewriting everything, no?

-1 to be fair, with the means we have at our disposal, I think we should:

not do Plugins v2 (I understand the API boundary / cleanup but this is why we have extensions now)
not focus on in-proc extensions (we need very compelling reasons to dismiss IPC mechanisms JVM/Java supports to justify work on that)

This is just my opinions, thanks @dblock !

dblock commented 1 week ago

@reta First, as always, thank you. I really appreciate, value, and respect your strong opinions.

One of the things @fddattal says above is "I want to run the SQL plugin in a cloud native environment, but I am unable to do so without forking the code.". AFAIK he's tying to remote/swap some parts of OpenSearch, such as access to metadata. Practically speaking the SQL plugin is tightly coupled with core, so if you wanted to cleanly do that you would have to fork. The alternative you propose is to rewrite SQL on top of extensions, but that also means figuring out how to host and run extensions, implementing security, and so much more. It's a pretty tall order, so adding plugin interfaces to the SDK that look similar to extension interfaces seems like at least a reasonable step forward, no?

reta commented 1 week ago

It's a pretty tall order, so adding plugin interfaces to the SDK that look similar to extension interfaces seems like at least a reasonable step forward, no?

Thanks @dblock , enhancing existing plugin APIs is fine (it is not != Plugins v2), we do this and it makes sense.

The alternative you propose is to rewrite SQL on top of extensions, but that also means figuring out how to host and run extensions, implementing security, and so much more

Changing the extensibility model (for getting the benefits) requires efforts - it has to be done or extensions will never take off (I believe).

anastead commented 1 week ago

SQL "plugin" ideally should be an extension of the query processing in the core. It happened to be plugin implementation and ideally moving forward we should not enforce an additional layer for out of proc for SQL as an example. Each plugin needs to be evaluated and categorized as core, application (auxiliary) or UX (dashboards) plugin (feature). I don't think we should map each plugin to one extensible interface - one size fits all approach. Ideally we should have an extensibility model for each of these categories. Can we approach this more pragmatically and understand the needs of SQL /core plugins?

andrross commented 1 week ago

Can we approach this more pragmatically and understand the needs of SQL /core plugins?

Thanks @anastead. To quote @fddattal "Access to metadata of index, aliases, templates, node roles, and routing is tightly coupled with cluster state.". Specifically, the SQL plugin needs to know things like "does index 'foo' exist?" or "give me the mappings for index 'foo'". To do this today, the plugin will use ClusterService and invoke something like clusterService.state().metadata().index("foo").mapping(). Now, Frank is trying to run the SQL plugin in an environment that doesn't have a real ClusterService instance so this becomes difficult to inject index mappings. Note that I say "difficult" but certainly not impossible because a ClusterService could be modified to fetch index mappings from anywhere as long as the public signature doesn't change.

enhancing existing plugin APIs is fine (it is not != Plugins v2), we do this and it makes sense.

As @reta suggesting, this may be the pragmatic solution for these core plugins, as we could reduce the coupling and expose only the information intended to be exposed instead of things like the entire ClusterService. However, enhancing will mean introducing new classes and/or methods as APIs, which will almost certainly increase complexity in the short term, at least for the core which has to continue to support all APIs, old and new.

dbwiddis commented 1 week ago

@fddattal I'm working on implementing these interface methods and have a few comments/questions (CC: @dhrubo-os and @arjunkumargiri):

    CompletionStage<PutCustomResponse> putCustom(PutCustomRequest request);

    CompletionStage<GetCustomResponse> getCustom(GetCustomRequest request);

    CompletionStage<DeleteCustomResponse> deleteCustom(DeleteCustomRequest request);

The use of the Custom class is a bit confusing here, as I see two different imports for the Custom class related to Cluster State (which I assume is where you got the name), one as a subclass of ClusterState and one as a subclass of Metadata. Both inherit from ToXContentFragment among others.

My implementations will want to implement ToXContentObject (ideally), and probably don't need the other inheritance, so I'm pondering a new class in this SDK package. Which also leads to a follow-up question: do we need the Custom name or is there a better name here?

Also, I'll note the package name org.opensearch.sdk is used with the Extensions framework. It's reasonable to have some overlap, as extensions might eventually implement this interface, but having the same package name in two different repositories will require some deconfliction.

Naming things is hard.

fddattal commented 1 week ago

Hey folks, thanks for all the comments! It's great to see so much interest in this area!

@reta I see that you have concerns for re-building plugins from the ground up given that extensions exist. I agree, we should not do this. I prefer an incremental evolution to our APIs that take us a step in the right direction towards decoupling plugins from core.

I conceded that extensions have the preferred architecture for plugin maintenance, but I do believe there will always be a future where some plugins exist which cannot be run outside of the main opensearch process. Taking the categorization that @anastead mentioned, SQL plugin is a core plugin because of its strict latency requirements. SQL functions as a request coordinator, and document processor at the Lucene level. As such, it would be performance prohibitive for having these components live out of the main process.

Given this, I believe we can incrementally evolve the current plugins api to help decouple it from core. Doing this will provide the core more flexibility to evolve without breaking plugins. Furthermore, having this decoupling would enable me to easily provide custom implementations of core services to the sql plugin.

Proposal

To tackle the direct issue of ClusterService usage, I propose the following:

Two new interface which thinly abstract over ClusterService access for plugins be defined:
1. PluginMetadataClient This interface would expose the subset of pull based ClusterService access paths
2. PluginMetadataNotificationService This interface would expose the push based ClusterService access paths
A new plugin interface be created called MetadataAwarePlugin which serves as the mechanism to vend these core abstractions to plugins
In the javadoc, we will document this new plugin type, the core abstractions, and what core access paths they intend to abstract over.
It would be the plugin author's responsibility to know that such core abstractions exist, and they can chose to migrate their code to these abstractions.
The SQL plugin will then be refactored to replace its core access paths with the core abstractions.

Here's how this might look in code:

package org.opensearch.plugins;

import org.opensearch.action.support.IndicesOptions;
import org.opensearch.cluster.metadata.MappingMetadata;
import org.opensearch.common.settings.Setting;

/**
 * MetadataAwarePlugin is a plugin type which exposes core metadata access abstractions.
 */
public interface MetadataAwarePlugin {
    default void initMetadataAwarePlugin(
        PluginMetadataClient pluginMetadataClient,
        PluginMetadataNotificationService pluginMetadataNotificationService
    ) {}
}

/**
 * PluginMetadataClient abstracts over internal OpenSearch metadata access paths for plugins.
 */
public interface PluginMetadataClient {    
    <T> ActionFuture<T> getSystemSettingValue(Setting<T> setting);
    <T> void getSystemSettingValue(Setting<T> setting, ActionListener<T> responseListener);

    ActionFuture<String[]> concreteIndexNames(String[] indexExpressions, IndicesOptions indicesOptions);
    void concreteIndexNames(String[] indexExpressions, IndicesOptions indicesOptions, ActionListener<String[]> responseListener);

    ActionFuture<MappingMetadata> getIndexMappingMetadata(String indexName);
    void getIndexMappingMetadata(String indexName, ActionListener<MappingMetadata> responseListener);

    <T> ActionFuture<T> getIndexSetting(String indexName, Setting<T> setting);
    <T> void getIndexSetting(String indexName, Setting<T> setting, ActionListener<T> responseListener);
}

/**
 * PluginMetadataNotificationService abstracts over OpenSearch notification capabilities for plugins.
 */
public interface PluginMetadataNotificationService {
    <T> void registerSystemSettingUpdateListener(Setting<T> setting, Consumer<T> callback);
}

Pros/Cons

Pros

Simple incremental step in the direction of decoupling plugins from core
Low maintenance burden because the interface semantics mirror core

Cons

This interface could not be maintained as a separate library from core until data exchange objects are moved out of core

Future Improvements

In a future were plugins only reply upon these core abstraction interfaces and data objects defined outside of core, then opensearch core will have complete flexibility to evolve its internals without breaking plugins.

fddattal commented 1 week ago

The use of the Custom class is a bit confusing here, as I see two different imports for the Custom class related to Cluster State (which I assume is where you got the name), one as a subclass of ClusterState and one as a subclass of Metadata. Both inherit from ToXContentFragment among others.

@dbwiddis That's a good catch! I think its in our best interest to define two separate APIs for modifying index level and cluster level customs. What are your thoughts?

dbwiddis commented 1 week ago

To tackle the direct issue of ClusterService usage, I propose the following:

@fddattal I've already started down the road of implementing against the proposed CompletionStage style here but now I see you're proposing an ActionFuture based implementation with specific arguments closer to my proposal in #13336.

I can work with either and am focusing on the lower-level implementations first, but I'd really like us to align on an interface style sooner rather than later. I thought we had.

The CompletionStage variant encapsulates arguments inside the Request and return values inside the Response classes, which is much more future-proof. For example, I initially added just "Index" and "id" strings to my GetCustomRequest implementation and later realized I needed FetchSourceContext. This was an easy addition to the request object and did not change the implementation in the code. (I'm currently only using one, but there will soon be dozens.)

The interface is also completely independent of core, as it uses base Java functionality

Your new proposal requires changing method arguments for the interface if an additional field is required. I don't like this at all.

I strongly prefer the request/response object model. I'm flexible on including ActionListeners and there's some benefit to PlainActionFuture which we don't get currently with CompletableFuture/CompletionStage, but that's a relatively easy fix if we want to make a PlainActionCompletableFuture compatible with the CompletionStage style.

Can we align on an approach?

samuel-oci commented 5 days ago

Sorry to chime in late, I agree with @reta we shouldn't be doing pluginsV2, there are already too many things in place. In particular I would like to understand better what kind of specific problem this suggestion is solving. @fddattal can you please elaborate to explain the problem you encountered and why this solves it?

fddattal commented 5 days ago

I would like to understand better what kind of specific problem this suggestion is solving. @fddattal can you please elaborate to explain the problem you encountered and why this solves it?

Absolutely @samuel-oci! The particular problem that I have comes down to the fact that I deploy a custom distribution of OpenSearch. In my distribution, cluster based components don't exist. So, all code which depends on those direct cluster object do not work.

For example the following would not work because accessing clusterService.state() does not work in my environment.

MappingMetadata mappings = clusterService.state().metadata().index("foo").mapping();

So what I am proposing is we introduce an interface to abstract these core details to allow plugins to run without needing knowledge of the distribution they're running on top of.

Taking the same example, you can imagine an interface that exposes an operation:

CompletionStage<GetMappingMetadataResponse> getMappingMetadata(GetMappingMetadataRequest);

// request/response objects

class GetMappingMetadataRequest {
  String index;
}

class GetMappingMetadataResponse {
  MappingMetadata metadata;
}

In the standard OpenSearch distribution, this interface would just thinly wrap the existing access pattern via cluster service.

However, if I had this interface, then I would cleanly be able to stub out its implementation so that it works in my custom distribution and the plugin would then be able to run without a fork.

The full extend of what this interface would need to support for the SQL plugin is detailed in this comment: https://github.com/opensearch-project/OpenSearch/issues/13274#issuecomment-2076253291

OpenSearch core would likewise benefit in having this interface because of the greater degree of decoupling which it provides.

I strongly prefer the request/response object model. ... Can we align on an approach?

Agreed @dbwiddis ! I am also inclined to prefer CompletionStage's and the request/response object model! I am happy to standardize on this style.

andrross commented 5 days ago

@reta @samuel-oci The "plugin V2" term isn't really precisely defined and can have different meaning to different folks so I'm going to avoid using it (I definitely think reimplementing the whole plugin interface would be the wrong approach anyway...)

What @fddattal is describing here seems pretty reasonable to me. He wants to run the SQL plugin in some way where the index metadata comes from some place other than cluster state, so tightly coupling the SQL plugin code to call clusterService.state().metadata().index("foo").mapping() doesn't make it easy to inject custom logic. The fix for this is pretty straightforward from a Java perspective: introduce some generic interface that exposes a getIndexMappings(String indexName) method with the default implementation continuing to do clusterService.state().metadata().index("foo").mapping() but it allows custom environments to inject whatever implementation they want (note the scope of his problem is a little bit bigger than just index mappings, but the problem is generally the same and I think its helpful to highlight one specific use case for discussion).

The primary question as I see it is where should this interface live? It absolutely could be a library completely external to OpenSearch core that plugins could choose to use if they want and it could evolve independently from core. However, the SQL plugin likely isn't the only plugin that could benefit from reduced compile-time dependency on core details. I see this as a potential opportunity to add something like an "IndexMetadataProvider" interface in the plugin API that could replace plugins' tight coupling on the entire ClusterService public Java API, for example. So should this be a part of an effort by the core to present a more constrained API to plugins?

I'll also note that the general shape of the problem is the same that @dbwiddis is looking at, except in his case he's dealing with wanting a more abstract way to store plugin metadata that isn't tightly coupled to the concept of a system index or cluster state.

samuel-oci commented 5 days ago

What @fddattal is describing here seems pretty reasonable to me. He wants to run the SQL plugin in some way where the index metadata comes from some place other than cluster state, so tightly coupling the SQL plugin code to call clusterService.state().metadata().index("foo").mapping() doesn't make it easy to inject custom logic.

@andrross @fddattal Not sure I understand the "why", in what scenario this is helpful to not be able to access cluster state from a plugin? If you want to run a plugin not from within an OpenSearch node why not just leverage extensions?

anastead commented 5 days ago

@samuel-oci why would we have to rewrite a core plugin to a new extensions interface when we just want to abstract some state namely cluster state? it seems like an overkill, am i missing something

reta commented 5 days ago

@samuel-oci why would we have to rewrite a core plugin to a new extensions interface when we just want to abstract some state namely cluster state? it seems like an overkill, am i missing something

Thanks @anastead , I think we have 2 alternative routes to pursue, largely with following tradeoffs:

extension: do more on SQL plugin (or better to say, extension), less (if anything) on OpenSearch core side
plugin (continue to be): do less (if anything) on SQL plugin, do more on OpenSearch core side

@fddattal @dbwiddis (and I think you as well) lean towards the 2nd option: do less (if anything) on SQL plugin, do more on OpenSearch core side. To be fair, I am OK with the approach, but I am not satisfied with the APIs being suggested to become part of OpenSearch. The API is very much oriented at specific narrow problem at hand, or to say it in another words - bringing somethings you need right now and who knows what is coming next. The functionality of PluginMetadataClient is already provided (or could be provided if missed somehow) by Client / NodeClient in a more generic way. So I don't really see why we even need that.

samuel-oci commented 5 days ago

@samuel-oci why would we have to rewrite a core plugin to a new extensions interface when we just want to abstract some state namely cluster state? it seems like an overkill, am i missing something

@reta I do not understand why we would like to prevent access to cluster state from a plugin? unless we are not planning to run it from within OpenSearch node. (I think I am missing something? :) ). I am just trying to understand the benefits of it and understand the "why" part. If we do not intend to run it from an OpenSearch node I am proposing extensions as the mechanism of choice.

bringing somethings you need right now and who knows what is coming next.

This is my feeling as well, but I try not to jump straight away into that conclusion, which is why I am trying to make sure I understand the use case.

reta commented 5 days ago

Thanks @samuel-oci

@reta I do not understand why we would like to prevent access to cluster state from a plugin?

We are not preventing that, I think the "cluster state" is somewhat misleading here, the plugin needs cluster wide data (source of truth) which is already exposed / could be accessed through different means.

samuel-oci commented 5 days ago

The particular problem that I have comes down to the fact that I deploy a custom distribution of OpenSearch. In my distribution, cluster based components don't exist. So, all code which depends on those direct cluster object do not work.

This sounds like a very specific problem in a private modified opensearch code running somewhere in @fddattal environment. What's the incentive to change public core in that case?

andrross commented 5 days ago

What's the incentive to change public core in that case?

Great question! As @fddattal stated "This proposal aids in decoupling plugins from OpenSearch Core by restricting their core access through an abstract interface...". But I agree with @reta that the details matter here. If the proposed interface doesn't get us towards those goals then it probably should not go in core. For me, providing a way for plugins to get index metadata (a common use case) that doesn't require interacting with the entirety of ClusterService and ClusterState seems like a win for decoupling.

samuel-oci commented 5 days ago

This proposal aids in decoupling plugins from OpenSearch Core by restricting their core access through an abstract interface

I re-read the previous comments and I think I better understand the intention now. So let me try again :) @fddattal are you trying to re-implement elements of core such as clusterState, and therefore you want to have a higher level interface that wraps on top of that? Hypothetical example: if you want the entire cluster state to be stored in a MYSQL database and never loaded into memory or see such object ever again?

shwetathareja commented 4 days ago

I want to add my 2 cents to the discussion:

Today, OpenSearch Cluster Manager has multiple responsibilities like metadata management, shard management, health checks, node membership, serving admin APIs along with the overall cluster management.

One of the goal which RFC https://github.com/opensearch-project/OpenSearch/issues/13197 aims to achieve is to refactor the ClusterManager in a way to separate out the metadata management (along with pluggable storage) from other aspects of the cluster as the first step. It could enable cloud native providers to simply build on top metadata management lib and may not need to set it up as a cluster infrastructure.

Lets take create-index API as an example: MetadataCreateIndexService class has most of the logic around running validation, applying the template, submits a task in pending queue, implements the ClusterStateTaskExecutor which actually executes the logic to create process task (create-index) and creates a new cluster state object with updated metadata etc. Then, eventually wait for the state to be published to all the nodes, shards to be assigned and then send the response back.

Now, the intention is to separate out the validations, applying the template and generate the new index object (resource) as Metadata management, processing this resource to generate the new over all cluster state as cluster management aspect and finally the pluggable storage layer (local or remote). The Shard management can itself be separated out. But, for now lets keep it out of the discussion.

With this context, Plugins can have both read and write access to Cluster State. What @fddattal wants to separate out the read access to Cluster State or simply put the metadata access layer as the first step. So, this metadata access layer has to be inside core. Also, I don't think we should be doing un-necessary transformation by creating new Pojos. The access layer performance characteristics with in the cluster setup should be almost as same as access to ClusterService.state() object. One of the issue that i see with NodeClient is the overhead of serde.

reta commented 4 days ago

Thanks @shwetathareja, we basically need to make Cluster Management core APIs fine grained (fe cluster view vs cluster management), right? I think this is would be useful and than we could may be introduce more lightweight ClusterViewPlugin (just a rough idea) for the plugins that do not need access to full fledged cluster management. Did I captured that correctly? Thank you.

One of the issue that i see with NodeClient is the overhead of serde.

It has executeLocally family of methods, no serde.

shwetathareja commented 4 days ago

Thanks @reta yes, you captured the essence right 👍 more on the lines of Metadata.

@fddattal was exploring the performance characteristics of NodeClient. Yes, we can use executeLocally so the transport overhead goes away but lets take TransportGetIndexAction, when executed locally as well it will have the overhead of generating response object which would add certain overhead as opposed to directly accessing the object from state() object. Also, transport actions can trip circuit breakers which will not be desirable here.

reta commented 4 days ago

Thanks @reta yes, you captured the essence right 👍 more on the lines of Metadata.

Got it, thanks. So I think we should take a step back and focus on Cluster Management APIs first, that would help us to design the plugin related abstractions after.

opensearch-project / OpenSearch