opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.9k stars 1.84k forks source link

[META] Split and modularize the server monolith #8110

Open nknize opened 1 year ago

nknize commented 1 year ago

Related modularization discuss issue (more than just JPMS support): #5910 Related JPMS issue: #1588

[WIP] This meta issue tracks the scope for refactoring the monolithic :server module into separable libraries to support JPMS and serverless or cloud native implementation extensions.

Phase 0 - JPMS Support (Eliminate top level split packages, see #1838)

Phase 1 - Decoupling to support serverless and cloud native implementation extensions

Resulting libraries:

:libs:opensearch-common :libs:opensearch-core :libs:opensearch-cluster

relates #1838

nknize commented 1 year ago

/cc @rishabh6788 this work has been largely advancing by me over the last few months and reaching some good breakthrough milestones. The Streamable classes (e.g., StreamInput, StreamOutput, Writeable, NamedWriteable) and OpenSearchException class PR is up to tease out a big part of the tightly coupled classes across the :server module into separable libraries. This unlocks a big portion of :server. The meta issue here seeks to resolve the remaining split packages across server as outlined in the #1838 issue you opened a while ago. A separate goal of #5910 is to ensure as many of the existing extensible classes are in the right library and right package name such that we can thoughtfully expose or restrict extension points to downstream plugin consumers.

If you have some time I'd love for you to jump in and help review the PRs.

andrross commented 1 year ago

Thanks @nknize! I want to start the discussion on expediting the completion of phase 0 (elimination of packages split across jars) AND backporting it the 2.x branch. I think the current state where there is major structural divergence between main and 2.x is undesirable due to causing conflicts on backports. Given that plugins also maintain a branch developing against OpenSearch main, I think they are dealing with the same problem. Also, plugins are not binary-compatible across versions anyway: they must at a minimum update version numbers and re-compile. I think it easier for everyone to complete the structural refactoring quickly and do it across both branches (but also not do it near a release to not add any extra variables as teams are completing integration for that release).

@reta @dblock What do you think?

reta commented 1 year ago

I think it easier for everyone to complete the structural refactoring quickly and do it across both branches (but also not do it near a release to not add any extra variables as teams are completing integration for that release).

@andrross Agree :100: , the elimination of packages splits would be a huge win towards getting JPMS done, though for 2.x this is going to be 100% breaking change (that goes way beyond @opensearch.internal). Would we selectively keep the structural differences between 2.x and 3.x+ ?

nknize commented 1 year ago

Would we selectively keep the structural differences between 2.x and 3.x+ ?

I think we should look at that on a case by case basis? Let's lean into progress not perfection here so we can iterate quickly.

I also think we shouldn't have FUD around breaking java signatures and package names in minor releases at this juncture (as long as we don't do it too close to a release). The codebase isn't well defined enough for that (hence @opensearch.internal marked on the majority of the codebase). Once we have JPMS and we start defining exports in module-info.java we can use that as the mechanism to collectively invent and simplify on the java surface area we want to expose external to core and guarantee non-breaking across minors on those classes. The only thing we strictly guarantee non breaking at this point is the DSL and index encodings (which is heavily enforced by lucene anyway). #7998 is a nice next step utility for us to quickly and iteratively check what downstream plugins are affected by the change while we're working on the refactor. It doesn't mean we have to open a PR or even fix the breaking change. But it does enable a developer (like me) to quickly check the impact and include that in my PR description w/o reviewers having to do it themselves.

andrross commented 1 year ago

I think we should look at that on a case by case basis?

I agree with a case by case basis. I think the value here of getting rid of a blocker to adopting JPMS is clear, and the "breakages" will be super simple to resolve as it is almost exclusively a matter of fixing imports (and the plugins maintaining a version against main have to resolve these issues anyway), so there's a compelling case for doing the items marked as phase 0 now.

I also agree that it is not practical to be 100% strict on not changing any Java APIs given how tightly coupled the code bases are.

nknize commented 1 year ago

Semi related, I opened an issue for someone to add useful Java annoations we could use to enforce bwc. Not a requirement for this but another mechanism that would be helpful once we start carving out the publicly exposed java classes through module-info.java.

dblock commented 1 year ago

@reta @dblock What do you think?

The extensibility project (#2447) proposed that we first make plugins depend on an SDK reducing the N plugins : 1 core dependency tree to N plugins : 1 SDK : 1 core, first, therefore avoiding this pain (only the SDK will need to change when core changes), and we are at experimental release for 2.9, but it looks like we're not choosing to wait for that to deliver and aren't going to try to port plugins to extensions as a prerequisite. I don't have any solutions or workarounds to what the downstream dependencies are experiencing or will experience, you just can't both refactor and not cause disruption.

Given that, I would keep doing what @nknize is doing, and GA 3.0 as soon as possible to stop the pain of backports to 2.x.

The only other idea I have is to make extensions work as plugins (in-proc) so that we can still cut off the direct dependency on core. Finally, if we go the SDK route, I am not sure what additional benefits core having all this modularity brings other than to make it easier to work inside of it. Here's my original comment to the refactoring proposal along those lines: https://github.com/opensearch-project/OpenSearch/issues/5910#issuecomment-1399031768.

rishabh6788 commented 1 year ago

/cc @rishabh6788 this work has been largely advancing by me over the last few months and reaching some good breakthrough milestones. The Streamable classes (e.g., StreamInput, StreamOutput, Writeable, NamedWriteable) and OpenSearchException class PR is up to tease out a big part of the tightly coupled classes across the :server module into separable libraries. This unlocks a big portion of :server. The meta issue here seeks to resolve the remaining split packages across server as outlined in the #1838 issue you opened a while ago. A separate goal of #5910 is to ensure as many of the existing extensible classes are in the right library and right package name such that we can thoughtfully expose or restrict extension points to downstream plugin consumers.

If you have some time I'd love for you to jump in and help review the PRs.

Tagging correct Rishabh @rishabhmaurya

nknize commented 1 year ago

...GA 3.0 as soon as possible to stop the pain of backports to 2.x.

+1 to work towards 3.0 GA

Backport pain will always exist Within a repo when a namespace changes.

...first make plugins depend on an SDK reducing the N plugins : 1 core dependency tree to N plugins : 1 SDK : 1 core, first, therefore avoiding this pain

A big gap in this plan is the sdk still transitively exports all of the core classes to "downstream" plugins defeating strong encapsulation provided by JPMS (which is the primary objective of phase 0). This means core isolation through the sdk is not achieved and plugins can bypass the sdk and override core logic. This is exacerbated by the sdk also itself not providing JPMS support. Opening RFCs to deprecate core mechanisms like ThreadContext is a side effect we can eliminate with JPMS support. The near term pain (reduced by good DevOps mechanisms like checkCompatibility) I think is worth it to achieve the strong encapsulation we need to make SDK truly isolate the core classes.

These efforts are far better together.

andrross commented 1 year ago

GA 3.0 as soon as possible to stop the pain of backports to 2.x

Major versions upgrades are disruptive to all users (including non-plugin developer users). I have a hard time justifying the disruption to the community at large that a major version release brings with the refactoring changes here.

I don't have any solutions or workarounds to what the downstream dependencies are experiencing or will experience, you just can't both refactor and not cause disruption.

Assuming that downstream dependencies are maintaining a version against main and against 2.x, then they are already experiencing the disruption, and the way to resolve that disruption is to backport the refactorings and bring the branches in line with each other. Any community plugins (outside of the opensearch-project) that are only maintaining a version against the 2.x branch are blissfully unaware of these structural changes and would be disrupted by the backports. But we don't truly support semver with plugins today anyway because plugins have to make code changes and recompile for all releases. We should still be thoughtful about the "breaking" changes that we backport, but I'm trying to make the case that the changes in phase 0 here (structural that are trivial for a consumer to resolve) are worth it on balance.

nknize commented 1 year ago

Once Phase 0 is complete we can selectively export specific packages we want downstreams to be able to import and extend. Those we don't can be refactored to an .internal namespace that's not exposed. Technically this completely removes the need for OpenSearch-SDK because the SDK is already defined and implemented through the core (libraries and modules). To make life easier we could just move those exported API classes along with existing sdk extension classes to new .sdk. sub packages in the existing core libraries. This enables us to finally provide enforceable API BWC and the breaking changes will be all but gone (not going to say completely eliminated because there may be some corner cases). This will also enable us begin working on actually supporting semver in the public API. It greatly simplifies the extensibility implementation because it reduces it further from the N plugins : 1 SDK : 1 core to N plugins : 1 core w/ "SDK"

Note too that ML Commons, common utils, and parts of security can move to core libraries further simplifying cross plugin dependencies. The one "gotcha" (blocker) is the code implemented in Kotlin should be refactored to java before bringing to core and we should enforce no Kotlin code in the core libraries.

dblock commented 1 year ago

A big gap in this plan is the sdk still transitively exports all of the core classes to "downstream" plugins defeating strong encapsulation provided by JPMS (which is the primary objective of phase 0).

This is only current implementation for build time, not at runtime, to go faster in dev. At runtime extensions do not depend on core transitively.

dblock commented 1 year ago

Once Phase 0 is complete we can selectively export specific packages we want downstreams to be able to import and extend. Those we don't can be refactored to an .internal namespace that's not exposed. Technically this completely removes the need for OpenSearch-SDK because the SDK is already defined and implemented through the core (libraries and modules).

This is misleading. You still have jar hell. The SDK doesn't just move classes around or annotate them, it introduces a proper r/w API, and allows ~a plugin~ an extension to have compatibility across multiple (major) versions, and to fully remote extensions on a separate JVM runtime that is not core and brings none of the latter along. You can do semver with an SDK instead of doing semver along with massive core.

nknize commented 1 year ago

This is only current implementation for build time...

At build time is exactly where all of core classes are exposed such that a developer can override core logic.

At runtime extensions do not depend on core transitively.

At runtime the classloader needs core classes directly (e.g., ClusterState for SDKClusterService) without the runtime dependency you'd get a CNF exception. This is what modularity prevents.

nknize commented 1 year ago

This is misleading. You still have jar hell.

I don't understand what jar hell has to do with this. JPMS is about not exposing packages to downstreams not about highjacking classes with the same package and classname.

...it introduces a proper r/w API,

A proper r/w API is about strong encapsulation and not allowing downstreams to access classes you don't want them to hijack at runtime. Take ThreadContext right now. Without JPMS any downstream can import ThreadContext and use its public methods, including stashContext to hijack the context. This is dangerous (unknowingly or not). Enter JPMS. In order for downstreams to even see it in their IDE the core codebase would have to explicitly exports org.opensearch.common.util.concurrent; WIthout it downstreams can't even import the class because the package is not visible (the specific error: java: package org.opensearch.common.util.concurrent is not visible! To achieve the same without JPMS would require complete logic refatoring to prevent stashing the context, (e.g., create an Immutable version, etc.etc.etc.). JPMS simplifies this encapsulation we're essentially trying to duplicate in the SDK.

In the context of an SDK I encourage anyone confused about the benefits of refactoring to support JPMS to achieve a proper r/w API with strong encapsulation to read JEP 403 and Project Jigsaw documentation.

You can do semver with an SDK instead of doing semver along with massive core.

💯 And you don't need a separate opensearch-sdk to achieve this. Nor do you need a separate SDK to achieve separating JVM processes (which doesn't alone solve the encapsulation issue).

dblock commented 1 year ago

The goals of refactoring as proposed is to avoid leaking internals into plugins by establishing a r/w API with strong encapsulation. Even when achieved, it will be difficult to have a large ecosystem of plugins for all the reasons that are not solved by refactoring.

If ~plugins~ extensions don't depend on core at all, then this benefit is not needed, and thus refactoring unnecessary for that purpose (it has other benefits).

I think we should do both, but I believe that extensions give a lot more benefits a lot quicker, and more incrementally with less disruption, as plugins can migrate to extensions gradually, extension authors don't have to work or deal with core anymore, and don't need to release with every new minor version of OpenSearch, making developing plugins/extensions a lot cheaper.

dbwiddis commented 1 year ago

At runtime the classloader needs core classes directly (e.g., ClusterState for SDKClusterService) without the runtime dependency you'd get a CNF exception. This is what modularity prevents.

Hey @nknize just for some context, the SDK's use of ClusterState was (and still is) temporary to have a value to return for SDKClusterService.state(). However, that was identified as a huge performance impact and we (mostly @joshpalis) have been working to rewrite all using REST API calls in https://github.com/opensearch-project/opensearch-sdk-java/issues/674 . We still have the cluster state request transport call but I’ve proposed removing it in https://github.com/opensearch-project/opensearch-sdk-java/issues/767 and we have to figure out how to handle some calls like https://github.com/opensearch-project/opensearch-sdk-java/issues/354 (which I think @joshpalis worked around when addressing 674 issue).

TLDR: we're trying to get rid of it, but we're not there yet.

dbwiddis commented 1 year ago

If ~plugins~ extensions don't depend on core at all, then this benefit is not needed, and thus refactoring unnecessary for that purpose (it has other benefits).

@dblock In an ideal future world (thinking 3.x) this is the goal for extensions. The only commonality is intended to be request/response classes auto-generated from spec files... and probably some needed XContent parsing classes.

nknize commented 1 year ago

If plugins extensions don't depend on core at all....

Even with opensearch-sdk the plugins extensions depend on core. You can't remove (or even minimize) that dependency without a refactor, and even still the opensearch-sdk will take a dependency on the minimal refactor and transitively expose to downstream consumers. That's what phase 1 of this meta issue is about. Refactoring critical path logic away from the dependency chain through base libraries such that downstream only takes a dependency on the base library(ies) needed to implement it's logic. For example, if I want to write a simple PCA aggregation as a plugin or extension, I have to take a dependency on opensearch-{opensearch-version} to implement my Aggregator AggregatorFactory etc. I have to do this even if it's written as an extension using opensearch-sdk. With the refactor to separate libraries all I have to do is take a dependency on opensearch-core and opensearch-aggregations and implement the aggregations boiler plate logic. With that separable library (with strong encapsulation) solved by JPMS, what value does turning that aggregation into an extension bring? I think enumerating that here with a concrete example will help identify the key mechanisms provided by the opensearch-sdk implementation.

dblock commented 1 year ago

Even with opensearch-sdk the ~plugins~ extensions depend on core.

That is simply incorrect.

You are right that to expose aggregation into the SDK without a transitive dependency a refactor in core will be required.

Re: value of extensions I invite you to read https://github.com/opensearch-project/opensearch-sdk-java#introduction.

nknize commented 1 year ago

That is simply incorrect.

AnomalyDetection takes a dependency on core. At least it's implementation dependency is (marginally) better than opensearch-sdk's api dependency. To load the core classes needed at runtime means the classes have to be available to the class loader.. hence, "dependency". Let's take another example. AnalysisExtension which has a slew of core dependency classes, many which rely on other core classes like o.o.env.Environment

I invite you to read...

I've read it...now we're in discussion mode. The part I've always liked about extensibility is "our goal is to isolate plugin interactions with OpenSearch". This is achieved by running in a separate DiscoveryNode, but let's be clear, this is starting up a new node. So it's being achieved through isolating via another node. That's not a bad thing (though it takes some bootstrapping). I just want to ensure we don't oversell extensibility as solving the "sdk" isolation conversation. It's still taking dependencies on core. I prefer we handle this isolation in core itself and keep the "sdk" api components in the core, not in opensearch-sdk.

dblock commented 1 year ago

@dbwiddis comment above explains that this dependency is temporary to save developer time to prove that AD as an extension can perform better than AD as a plugin; the SDK will not have a dependency on core at GA

nknize commented 1 year ago

comment above explains that this dependency is temporary..

That's in regard to ClusterState, I'm talking about the project as a whole (e.g., "Let's take another example. AnalysisExtension..."). You're not going to get away from requiring opensearch-sdk transitively taking a dependency on both core and lucene. Analyzers implement lucene Tokenizers, TokenStream, etc. What's the point in attempting to write all sorts of opensearch-sdk scaffolding logic to try and wrap these when JPMS already solves your encapsulation? So let's not oversell that extensions is going to provide something it's not going to provide. I think the real value in extensions is isolating the plugin such that it doesn't crash your entire cluster. THAT is what the mission of extensibility should be. Not "eliminate the need to take a dependency on core".

reta commented 1 year ago

@nknize (not adding oil but sharing some thoughts) my understanding of the extensions is very much aligned with what @dblock has been saying. To me, as the end result is that Java extensions do no depend on core but sdk (and sdk consequently should not have any dependency of core), otherwise we are repeating the plugins path pretty much:

But ...

Let's take another example. AnalysisExtension which has a slew of core dependency classes, many which rely on other core classes like o.o.env.Environment

... completely changes the extension model for me, replicating this feature (and many similar) without depending on core would be a nightmare (I don't see any other viable choices here vs depending on core).

What's the point in attempting to write all sorts of opensearch-sdk scaffolding logic to try and wrap these when JPMS already solves your encapsulation?

We are asking a bit to much from JPMS here, extension author may decide they don't want to use JPMS but rely on classpath instead, that would lift the restrictions (this is how we use modular Apache Lucene today essentially). I don't know how could we enforce that (may be variation of JarHell could be an answer actually).

nknize commented 1 year ago

...the end result is that Java extensions do no depend on core but sdk (and sdk consequently should not have any dependency of core),

Maybe my message is being missed but I'm not saying anything different from a theoretical pov. Of course it would be nice for an extension / plugin to have a one line dependency (call it opensearch-sdk, opensearch-api it doesn't matter and how it's packaged is separate from implementation). What I'm saying is to look at reality, and I'm giving very real use cases where the extensibility "SDK" proposal does not work.

Let me be direct. Extensibility should be separate from the SDK (e.g., it uses and interacts with the SDK but does not define it). Leave the "extensibility" conversation to talking about the ExtensionsManager and how the distributed system (firing up extensions logic) launches the extension in its own discovery node. SDK is a different problem and JPMS in the core repo (libs, modules, plugins) plays an integral part here the current opensearch-sdk design ignores.

replicating this feature (and many similar) without depending on core would be a nightmare

Yep. Same with aggs, scripted aggs, query languages, and many compute scenarios (e.g., spark SQL). This is the reality we live in. So I'm asking we be pragmatic not wishful and theoretical. (e.g., quantum physics says I can theoretically run through a wall unscathed). Let's explore all options on a case by case basis. For example, if an extension or plugin only implements a small/tidy aggregation it should only take a dependency on the aggregation API, no need to take a dependency on a bloated fatjar that brings all sorts of other API classes that are never loaded. In my mind this is implemented by :libs:opensearch-aggregations defining the exports section of module-info.java so leave it in the core repo where the aggregation framework and API is defined and easily testable. This doesn't mean it's a dependency of core its just an API that lives in the core repo, not a downstream repo.

We are asking a bit to much from JPMS here, extension author may decide they don't want to use JPMS .

I'm not talking about JPMS in an extension. I'm talking about JPMS for strong encapsulation in the core repo classes. To not leak the core classes and expose vulnerabilities downstreams (e.g., security) is already looking at immutability changes that would be unnecessary if JPMS were already being used. Let's stop the "reinvention" of mechanisms when many goals are already solved by the language (like JPMS).

Key takeaways: separate the extensions and sdk conversations. Start the SDK conversation in the core by defining the core API through JPMS module-info.java, not a downstream repo. Anything not enabled by JPMS is addressed on a case by case basis.

rishabhmaurya commented 1 year ago

~We probably need to address lucene split packages in server module - https://github.com/opensearch-project/OpenSearch/tree/main/server/src/main/java/org/apache/lucene as it would start failing when lucene jars are present in module path?~ ignore, I see it to be part of phase-0.

dblock commented 1 year ago

Start the SDK conversation in the core by defining the core API through JPMS module-info.java, not a downstream repo.

Nomenclature-wise we have opensearch-core so maybe calling it "SDK" is more confusing than anything because we created https://github.com/opensearch-project/opensearch-sdk-java and called it the "OpenSearch SDK for Java"?