Open Xtansia opened 1 year ago
I like it!
I'm curious if @reta or @saratvemulapalli or @peternied have thoughts?
@Xtansia certainly huge +1, we have been discussing this subject in the past and, the plugins / extensions would benefit the most from usage of the OpenAPI as the universal specification format (comparing to Smithy fe, no need to learn new tool). I think this also strong argument to use OpenAPI for core clients (the POCs we have are good starting points).
Regarding the thin-client integration, the consensus we've reached so far is that opensearch-java
should be the only recommended way to connect to OpenSearch from Java. We could use ServiceLoader
s to discover and load the thin clients, something like (very roughly):
public class public class OpenSearchClient {
<A> plugin(Class<A> );
<E> extension(Class<E> );
}
So whenever the thin-client for plugin or/and extension is on classpath, it could be used (the wiring details could be laid out later on, this is just high level idea).
But this is Java only, I think for other languages (Rust/Python/Go/Ruby/...), we would need to work on finding out the "pluggable" options or just use different approaches altogether. This is probably the only unanswered question I have at the moment.
@Xtansia +1 Great presentation on the topic! I have a few questions after reading it:
Thanks all for the feedback and questions. I've done my best to answer your questions as I understood them, feel free to ask follow-up questions or clarify if I've misunderstood.
@reta
I think for other languages (Rust/Python/Go/Ruby/...), we would need to work on finding out the "pluggable" options or just use different approaches altogether. This is probably the only unanswered question I have at the moment.
I didn't want to get too deep into language specific suggestions as I'm not up-to-speed on what's idiomatic for all the languages. But the most basic approach here is just something along the lines of an IOpenSearchClient
interface and the extension client just takes it as a constructor argument. C# & Rust at least have some mechanisms for nicer ergonomics around this, essentially being able to define extension methods so that you could define OpenSearchClient.Security() -> SecurityExtensionClient
from outside the library that owns OpenSearchClient
.
@peternied
- Is the OpenSearch-Project responsible for managing code generation? What about distribution?
This depends on how you define "managing code generation" and "distribution", so if you could be more specific that would be great. In general the OpenSearch project would be responsible for implementing and distributing the code generators for the language clients we currently support along with associated reusable automation/CI workflows. The project would not be responsible for the execution of the generators nor the hosting of the final generated artifacts, except for where the project owns the given extension. So an external third-party extension developer will be responsible for their own running of generation and distributing their client artifacts, but making use of the tooling we'll provide.
- Will pre-release extensions and extension features be compatible with thin clients?
Could you please expand on this question, by "extension features" is that like feature flags? For pre-release extensions, as the extension owners will control generation it'd be reasonable to expect they could generate and publish pre-release clients.
- Have you considered non-open-source extensions?
As the extension owner controls the actual generation & publish step, they would be able to take our open tooling and run it inside their private CI and publish their thin clients to internal package registries if they desired.
- With multiple thin clients loaded at once, could using data types and APIs from other extensions cause issues with type duplication or incompatibility?
This is a potential concern, however we can define any truly global types in the core client/library, and handle the mapping in the code generators so that they output a reference to the shared type rather than recreating the definition.
- How will extension developers be prompted to author OpenAPI specs to enable thin client generation?
In the general case this will fall under a larger umbrella item of complete documentation for "Getting Started Developing An Extension" and associated examples. For those that have begun development or soon to begin within the OpenSearch project, it'll more likely be a push (either just a nudge, or some assistance in initial authoring) from the clients maintainers as we roll out the generators and want to get the adoption going.
Thanks for the thoughtful response, just a couple follow ups
Is the OpenSearch-Project responsible for managing code generation? What about distribution?
This depends on how you define "managing code generation" and "distribution", so if you could be more specific that would be great.
I think some of the following is covered by your response, but it would help me understand the future if it was clearly laid out in this or another future proposal
Let us consider OpenSearch extensions, like anomaly detection, called AD
for short. Since it is part of the OpenSearch Project, its hosted inside of github.com/opensearch-project/AD. The AD extension offers new APIs, a perfect fit for a thin client. Since there will be N different language clients, who is responsible for generating the clients?
Honing in on the AD client for python. Does this client get generated and checked into github.com/opensearch-project/AD-py-client, somewhere else, or not under direct source control? What if the AD extension team/contributors want to add improvements to the client, how do they do that?
Moving to distribution, how is the thin client consumed? For python the typical place is pypi. How does the AD thin client for python get registered and updated?
Do these answers change for an extension produced outside of the OpenSearch Project?
Will pre-release extensions and extension features be compatible with thin clients?
Could you please expand on this question, by "extension features" is that like feature flags?
Not feature flags, I mean will the process allow for release vs snapshot builds of thin clients?
Thanks for clarifying and expanding, hope this clears things up a bit more.
Let us consider OpenSearch extensions, like anomaly detection, called
AD
for short. Since it is part of the OpenSearch Project, its hosted inside of github.com/opensearch-project/AD. The AD extension offers new APIs, a perfect fit for a thin client. Since there will be N different language clients, who is responsible for generating the clients?
In this example, the AD maintainers/owners would be responsible for the triggering of the generation, as they'd be in the best position to make decisions around when the API of the extension has changed sufficiently. In general, I'd expect this to be almost entirely automated, whether triggered via tag push or a manual workflow run that regenerates all clients at once thus requiring minimal specific knowledge from the extension maintainers. It would just not be scalable for the core clients maintainers to directly monitor and own all extensions' generation. However, the clients maintainers would provide ongoing support and guidance for the process as well as taking care of the onboarding.
Honing in on the AD client for python. Does this client get generated and checked into github.com/opensearch-project/AD-py-client, somewhere else, or not under direct source control? What if the AD extension team/contributors want to add improvements to the client, how do they do that?
I see this as a choice to be made by the AD maintainers, it should be possible to take any of these approaches with the tooling the clients maintainers will provide:
n
repos each with checked in source code of the language client, allowing transparency of changes, adding extra libraries or making improvements.In essence, I believe the tooling and GitHub actions etc. provided should be composable and flexible such that the extension maintainers can use it in their work flow as they please. Even if there may end up being a "golden path" that's recommended within the OpenSearch project.
Moving to distribution, how is the thin client consumed? For python the typical place is pypi. How does the AD thin client for python get registered and updated?
In general, they will be consumed in whatever is the standard for the language, e.g. PyPi for Python, NuGet for .NET, Maven for Java. The clients maintainers (working with the build automation maintainers) will provide automations for publishing the clients to their given artifact registry. These already exist for the core clients, so will mostly be re-using or expanding upon existing automations. There will be some difference in requirements for OpenSearch project extensions versus external ones, as we have requirements around security, artifact signing and separation of ownership, whereas external devs will more than likely just need a relatively simple GitHub workflow they can plug a NPM API key into for example. So we'll require that support from the build maintainers to aid in provisioning of access to our artifact registry accounts for repos within the OpenSearch project and ensuring we meet our requirements.
Not feature flags, I mean will the process allow for release vs snapshot builds of thin clients?
Unfortunately some languages and artifact registries do not have a good concept of a "snapshot build", so it may not be reasonable to actively publish a snapshot build of a given client. It will certainly be possible to have prerelease versions that publish e.g. explicitly versioned betas or release candidates (1.2.0-beta.1
, 1.0.0-rc.3
etc.)
What/Why?
Prior Art
What are you proposing?
Allowing the creation of thin-clients for extensions which will be composable with the core client in every supported programming language. Extensions will publish their REST interfaces in the shape of OpenAPI specs, a generator will then consume the spec and output a client that can be composed with the core client. Manual work and intervention will be minimized by automating as much of the process as possible, providing build-and-test tooling such as CI workflows, so that both OpenSearch project-owned extensions and externally developed extensions can benefit with uniform support.
What users have asked for this feature?
There have been many requests for complete support of plugins in clients and as extensions are an evolution of plugins, these can be treated as a direct indicator of need for support of extensions due to the fact that the plugins will necessarily be migrated into extensions in time:
What problems are you trying to solve?
When a user wants to invoke an extension from a programming language of their choice, they currently lack an easy and reliable method to do so. At present a user would need to directly invoke a “raw” HTTP client directly implementing any additional authentication alongside their core OpenSearch client, or in some cases a languages OpenSearch client exposes a “raw” request method alleviating some of the duplication. However, the current solutions lack any definition of what endpoints are available, their request shape, strong typing of query parameters or any documentation.
What is the extension owner experience going to be?
Extension owners will author an OpenAPI REST specification for their extension. A code generator will then consume it and output the complete compilable & runnable source code for a high level thin-client that is composable with the core client in every supported programming language. Automations such as CI workflows will be provided to streamline the process of generating and publishing a given thin-client, and there will be minimal ongoing maintenance overhead by the extension owner for the generated thin-client.
Example flow of adding a new API:
Are there any security considerations?
At this time, there are no specific security considerations related to this proposal.
Are there any breaking changes to the API?
No, there are no breaking changes to the API, as this relates to an entirely new development.
What is the user experience going to be?
Users will have an easy and reliable way to invoke any extension from their preferred programming language. They will also have access to all new APIs or API updates immediately after an extension is released. Furthermore, due to the thin per-extension nature of the clients users will be able to pull in only the necessary extensions.
Example flow:
Are there any breaking changes to the User Experience?
Previously a subset of plugin APIs were included directly in the language clients, differing in coverage between languages. As extensions will be published as separate packages rather than bundled into the core clients this will be a breaking change for the relatively small set of plugins that were covered and will now be migrated to extensions. This is relatively minor as coverage for plugin APIs was generally poor if not non-existent.
Why should it be built? Any reason not to?
Building this proposal brings value to the OpenSearch community by providing a high-standard solution that supports both first-party and third-party extensions uniformly, increasing the "feel good factor" for third-party developers. It also lowers the barrier to entry and increases velocity, scalability, and reliability with well-thought-out tooling and automation. Not building this proposal could limit the flexibility, ease of use, and integration of extensions for the end-users.
What will it take to execute?
The language client maintainers will need to:
Extension owners will need to:
Questions:
Why independent thin-clients?
Will the explosion in number of thin-clients cause issues?
The number of thin-clients as we multiply the number of extensions by the number of supported languages will almost certainly be huge in the not too distant future. However, we can mitigate as much of the burden as possible by providing high-quality prebuilt automations (i.e. GitHub Actions) to streamline the process of generating and publishing a given thin-client. As they will in the general case be 100% generated, there will be essentially zero ongoing maintenance overhead by the extension owner. Extension owners can further reduce any overhead by taking a piecemeal approach to which language clients they generate and publish depending on demand (i.e. a machine learning extension may only primarily care about Python).
Why not bundle extension clients into the core client?
As it will be possible for independent developers to create extensions, it would not be feasible to include client code for all extensions in the core client. So we would end up having to draw some kind of delineation which would naturally end up being only first-party (opensearch-project owned) extensions or a subset thereof. This would in turn require supporting both the directly bundled approach and supporting the externally generated thin-client approach. There will likely be many first-party extensions as well so would not be scalable to include all in one client so would lead to further disparity of treatment between extensions.
Why OpenAPI specifications?
Defining the extensions API in a ubiquitous spec language such as OpenAPI, enables developers to generate clients more easily as well as allowing users to use off-the-shelf tooling to generate clients or interact with the APIs as they desire. It can also be used in a spec-first approach where the spec is used to generate the necessary scaffolding to set up the routes and actions in the extension itself. Using other spec languages such as Smithy as the basis for code generation were considered in the context of https://github.com/opensearch-project/opensearch-java/issues/284, however OpenAPI was chosen as it’s a de-facto standard within the community.
Can this proposal be extended to support other types of specifications besides OpenAPI?
At this time, OpenAPI is the chosen specification language for this proposal due to its wide adoption and tooling support. Other specification types such Smithy often have mechanisms to be converted into OpenAPI, potentially allowing extension authors to write their specifications in something other than OpenAPI and merely publish the converted output.
How will this proposal affect developers who are not familiar with OpenAPI?
OpenAPI is a widely adopted standard with plenty of resources available for learning, so it should not be a significant barrier to entry for most developers.
Will extension owners be able to extend upon their generated thin-clients?
As extension owners will have full control over the generation and publishing of their clients, they will be able to do anything they like with regard to modifying or extending them. The primary recommended manner in which extension owners would be able to extend upon the generated clients would be for them to create a new package/library that depends on the generated client. In their higher-level library they could implement any new logic or features necessary and recommend it as the definitive client for the extension, with users still free to use the simple generated client if desired. There may be other solutions such as checking-in the generated client source to Git and making any necessary additions within the same library.
Any further questions?