opensearch-project / opensearch-java

Java Client for OpenSearch
Apache License 2.0
114 stars 181 forks source link

[PROPOSAL] Experiment with using Smithy API spec to generate missing APIs #284

Closed Xtansia closed 2 months ago

Xtansia commented 1 year ago

What/Why

What are you proposing?

Performing an experiment using the WIP Smithy API specification for OpenSearch to generate missing functionality in the client for a small API without re-creating the entire client.

What users have asked for this feature?

What problems are you trying to solve?

What is the developer experience going to be?

A script/tool that can be run to automatically fetch the API spec and generate the chosen code.

Are there any security considerations?

n/a

Are there any breaking changes to the API

This experiment is only successful if it does not break compatibility.

What is the user experience going to be?

Same as developer experience.

Are there breaking changes to the User Experience?

See above.

Why should it be built? Any reason not to?

The learnings from this experiment, if successful:

What will it take to execute?

Xtansia commented 1 year ago

Options Explored

Pre-existing OpenAPI generator

A quick inspection of the existing Java generator for openapi-generator shows it is generally only workable when generating completely from scratch, as it expects to be generating a whole client, project files and all. There is some flexibility within the CLI tool to reduce surface area of what you desire to generate, however not to level necessary. All the generated code is also specific to whatever HTTP & JSON libraries have had templates written for.

Verdict: 👎 Any generated code would be completely independent of the existing client which is not desirable.

Custom OpenAPI templates/generator

The OpenAPI generator tool does allow for customising templates of an existing generator, or writing a whole new one. This would allow for being able to generate code that correctly integrates with the custom JSON and transport code used by the existing client. However an architectural limitation of the custom generators means it's only possible to generate files per service (ie. client class) or per model, not per API operation. Meaning it's unable to generate the "Request" classes like currently exist in the client that encapsulate body, query params & endpoint in one class. (This may be less of an issue in other languages where a file maps to a module containing multiple types rather than a single type/class)

Verdict: 👎 Generated code could be inter-operable with existing client, however would have an entirely different calling convention, making it unsuitable as we want to work towards regenerating the entire client with minimal breaking changes.

Custom Smithy Build plugin/generator

Smithy Build allows building plugins (written in Java/Kotlin) that are passed the Smithy model and allow processing it in whatever manner you desire, in a way that is orders of magnitude more flexible than the OpenAPI generator approach. Due to the ability to process the model from the start with custom code it is relatively easy to generate code in the same structure as the existing client. As we'd be dealing with the Smithy model directly it'd be possible to implement and handle custom Smithy traits if desired. Unfortunately the means by which to create these plugins is not documented, and so required studying the smithy-kotlin & smithy-rs plugins to understand how it works. The requirement of using Java/Kotlin will be fine here, but may be problematic for other language clients.

Verdict: 👍 (mostly) Will allow us to generate (near-)identical code to the rest of the client, without having to deal with parsing and manipulating the model from scratch. Comes at cost in onboarding new contributors due to lack of documentation.

Unsolved Issues

Xtansia commented 1 year ago

I created a draft PR with my current code for the Smithy build plugin generator #318

dblock commented 1 year ago

Good start!

  1. Dumb question, does it have to be Kotlin whereas everything else is Java?
  2. Is there really no "out-of-the-box-thirdparty-generator" that can get us far enough?
  3. What does this code generate? without me running it :)
  4. To answer (3) I think you want to hook up the generator to generate some code that can replace existing java client code (when we see a bunch of deleted .java files we can say that this much was replaced by the generator).
Xtansia commented 1 year ago

Good start!

  1. Dumb question, does it have to be Kotlin whereas everything else is Java?

No it doesn't have to be Kotlin, it could be written in Java. Kotlin just has the advantage of being less verbose, and the other plugins I was researching off of were in Kotlin so made it easier.

  1. Is there really no "out-of-the-box-thirdparty-generator" that can get us far enough?

None that I've found, Smithy has no Java generator, and all the OpenAPI generators expect to be generator from scratch and naturally don't know to create the specialised output desired for the client. We could get functional code, but it'd never be able to replace the current code.

  1. What does this code generate? without me running it :)
  2. To answer (3) I think you want to hook up the generator to generate some code that can replace existing java client code (when we see a bunch of deleted .java files we can say that this much was replaced by the generator).

Yeah the actual getting it to run inside this repo hasn't yet been done yet, as it's a bit tricky with the spec living in a different repo. And I've been experimenting on the remote store api which doesn't yet exist in the client, so not replacing anything existing. This is the spec & config I've been experimenting with so far https://github.com/opensearch-project/opensearch-api-specification/compare/main...Xtansia:opensearch-api-specification:experiment/codegen and this is the output for that config https://gist.github.com/Xtansia/f3eaaf8aeb1121299484e9e8a3513700. Once I get some proper semblance of it running contained within the repo, then I'll update the PR so that it includes the generated output.

dblock commented 1 year ago

Thanks @Xtansia! Looks like great progress. Add license headers to those generated files, and a header that says that the file was generated ;)

reta commented 1 year ago

Thanks @Xtansia, few clarifications from my side:

  1. Is there really no "out-of-the-box-thirdparty-generator" that can get us far enough?

It seems like one of the advertised Smithy features is the ability to generate the OpenAPI spec which could be fed to openapi-generator to generate a client for a wide variety of the programming languages and frameworks

A quick inspection of the existing Java generator for openapi-generator shows it is generally only workable when generating completely from scratch, as it expects to be generating a whole client, project files and all.

This is not true: the generator is highly configurable in what and how should be generated. It could generate only models, models etc (used it in the past and 100% sure of that).

Smithy vs OpenAPI

@dblock @wbeckler This is personal perspective: there are many specs like Smithy, backed by different vendors. The OpenAPI is vendor neutral, widely supported / accepted / understood specification, which seems to be a reasonably good way to share contracts for web APIs. The OpenAPI v3.1 brought a full alignment with JSON Schema draft 2020-12, this is big deal in a way how large pieces (models) could be described and referred to. I think it would be very beneficial to use OpenAPI spec as a basis for client code generation, just my 5 cents

Xtansia commented 1 year ago

Thanks @Xtansia, few clarifications from my side:

  1. Is there really no "out-of-the-box-thirdparty-generator" that can get us far enough?

It seems like one of the advertised Smithy features is the ability to generate the OpenAPI spec which could be fed to openapi-generator to generate a client for a wide variety of the programming languages and frameworks

You are correct and that's the foundation of how I tested with openapi-generator, all of these experimentations use the Smithy spec as its starting point.

A quick inspection of the existing Java generator for openapi-generator shows it is generally only workable when generating completely from scratch, as it expects to be generating a whole client, project files and all.

This is not true: the generator is highly configurable in what and how should be generated. It could generate only models, models etc (used it in the past and 100% sure of that).

That's correct, and is what I was hinting at by "There is some flexibility within the CLI tool to reduce surface area of what you desire to generate, however not to level necessary.", I was unable to find quite the level of filtering I desired, as I wanted more advanced filtering than "only models" but potentially I missed something and did it a disservice. However we then reach the point where regardless of the filtering, any code generated by the off-the shelf openapi-generator is never going to match up with the existing codebase of the client, and then if we do custom templates with openapi-generator the architectural limitations mean we'd be still be unable to match up with the existing client.

Smithy vs OpenAPI

@dblock @wbeckler This is personal perspective: there are many specs like Smithy, backed by different vendors. The OpenAPI is vendor neutral, widely supported / accepted / understood specification, which seems to be a reasonably good way to share contracts for web APIs. The OpenAPI v3.1 brought a full alignment with JSON Schema draft 2020-12, this is big deal in a way how large pieces (models) could be described and referred to. I think it would be very beneficial to use OpenAPI spec as a basis for client code generation, just my 5 cents

I believe there was an issue somewhere to discuss the benefits of each at some point, I'm unsure where though. It would be possible to generate from OpenAPI spec, but as far as I can tell to get code that matches the existing Java client would mean doing it from near-scratch.

reta commented 1 year ago

I think you mean this one, https://github.com/opensearch-project/opensearch-api-specification/issues/30 ?

Just recently we have had the discussion with @wbeckler regarding Smithy w/o OpenAPI, and one of the ideas we where thinking about is to have POCs to compare both approaches. I believe yours is the great one to showcase Smithy case.

Xtansia commented 1 year ago

For reference this is as far as I got with the custom openapi-generator templates/generator before hitting the fact I could not generate code in the same structure as the existing client: https://github.com/opensearch-project/opensearch-java/compare/main...Xtansia:opensearch-java:experiment/codegen-openapi

VachaShah commented 1 year ago

@Xtansia This is awesome, thank you! For @reta's question, is it also possible to post what gets generated (maybe a snippet of a class) from the custom Smithy build plugin vs the custom open-api generator? This would give more insights into what kind of client is generated for each option.

Xtansia commented 1 year ago

@VachaShah @reta Here's the example output for that openapi-generator, https://gist.github.com/Xtansia/b88ad73bde610a5e677b0b293c6155ea, while it's obviously not complete functional code, the main thing to notice is how the models only represent the body of the request/response, and the query params are shifted to the operation method and "flattened" (ie. if there's n query params then the method will have n+ params)

dblock commented 1 year ago

@VachaShah @reta Here's the example output for that openapi-generator, https://gist.github.com/Xtansia/b88ad73bde610a5e677b0b293c6155ea, while it's obviously not complete functional code, the main thing to notice is how the models only represent the body of the request/response, and the query params are shifted to the operation method and "flattened" (ie. if there's n query params then the method will have n+ params)

Mustache templates seems way easier than kotlin code, isn't it?

Xtansia commented 1 year ago

@VachaShah @reta Here's the example output for that openapi-generator, https://gist.github.com/Xtansia/b88ad73bde610a5e677b0b293c6155ea, while it's obviously not complete functional code, the main thing to notice is how the models only represent the body of the request/response, and the query params are shifted to the operation method and "flattened" (ie. if there's n query params then the method will have n+ params)

Mustache templates seems way easier than kotlin code, isn't it?

That's entirely subjective, and depends on the complexity of the template. I personally find code generally easier to reason about. But it's not really templates vs code, you could go to the effort of bringing mustache into the smithy plugin if you wanted to. The problem is entirely the generated code compatibility with the openapi-generator.

reta commented 1 year ago

For reference this is as far as I got with the custom openapi-generator templates/generator before hitting the fact I could not generate code in the same structure as the existing client: main...Xtansia:opensearch-java:experiment/codegen-openapi

@Xtansia thanks for sharing, I picked up your work here https://github.com/opensearch-project/opensearch-java/pull/336, there are no generator changes at the moment (no need, templates are just enough) but I hope to finalize the PoC by getting as close to exiting generated model as possible.

Xtansia commented 1 year ago

Thanks for putting together that more complete PoC for OpenAPI @reta! It has been really helpful.

I wanted to distil a summary of where we're currently at with regard to the pros & cons etc. I've tried to be even-handed, but it's possible I've missed some points, so do let me know if you've got anything to add or correct and I'll make sure to update it.

Explored Approaches / Proof of Concepts

Smithy Build Plugin

PoC PR: https://github.com/opensearch-project/opensearch-java/pull/318

Pros

Cons

Off-the-shelf OpenAPI Generator With Customised Templates

PoC PR: https://github.com/opensearch-project/opensearch-java/pull/336

Pros

Cons

Other Potential Approaches

The following approaches may be worth exploring to bridge the gap between the two current PoCs:

Modify Smithy Build Plugin To Utilise Templates

Provides only slight differentiation from current Smithy Build Plugin approach, in that it would allow for making some of the code generation more human friendly, but at the cost of having to integrate templating and mapping a cleaner model to pass into Mustache.

From Scratch OpenAPI Generator

Create a minimal CLI tool that uses existing OpenAPI model libraries to load and parse the spec, manipulating it and then passing into Mustache templates. Gives benefits of operating off of OpenAPI without the architecture restrictions of the openapi-generator tool.

Desirable Attributes

There's likely some common desirable attributes we have for whatever the solution ends up being, and I think it'd be good to solidify those.

reta commented 1 year ago

@Xtansia thanks a lot for putting everything together, I think the summary is quite fair. The only things I would suggest to add to OpenAPI pros: The OpenAPI spec could be used to generate large number of language/client combinations out fo the box, I am not sure how that would translate to Smithy. Thank you!

dblock commented 1 year ago

Is there some consensus on what to choose? I think the way we decide is that the person doing the work makes the call.

wbeckler commented 1 year ago

Not only is a diversity of approaches possible, it may be necessary, because smithy can't directly generate most languages. For example, there's a first attempt at Ruby based on OpenAPI: https://github.com/opensearch-project/opensearch-ruby/issues/139

The question here may be just whether the smithy + gradle plugin approach is the right approach for incremental java client API generation. As a byproduct of this approach, the clients for non-smithy languages would rely on an OpenAPI spec generated from the Smithy models.

reta commented 1 year ago

Is there some consensus on what to choose? I think the way we decide is that the person doing the work makes the call.

I don't think we have a clear winner :D (as in most cases in tech), one of the most convincing benefits (for me at least) of using OpenAPI - you just need to know one thing (which is an open specification), and this is it (whereas with Smithy, you have to know both).

The question here may be just whether the smithy + gradle plugin approach is the right approach for incremental java client API generation. As a byproduct of this approach, the clients for non-smithy languages would rely on an OpenAPI spec generated from the Smithy models.

That is very possible, the thing I am personally concerned is that we would definitely run into Smithy -> OpenAPI generation issues and essentially would have to deal with that.

Xtansia commented 1 year ago

Is there some consensus on what to choose? I think the way we decide is that the person doing the work makes the call.

I don't think we have a clear winner :D (as in most cases in tech), one of the most convincing benefits (for me at least) of using OpenAPI - you just need to know one thing (which is an open specification), and this is it (whereas with Smithy, you have to know both).

The question here may be just whether the smithy + gradle plugin approach is the right approach for incremental java client API generation. As a byproduct of this approach, the clients for non-smithy languages would rely on an OpenAPI spec generated from the Smithy models.

That is very possible, the thing I am personally concerned is that we would definitely run into Smithy -> OpenAPI generation issues and essentially would have to deal with that.

What main issues do you foresee with using Smithy as the definition language and then converting to OpenAPI, other than the adoption/learning hurdle for Smithy?

Do you know of any commonly used definition language for OpenAPI that don't involve writing thousands of lines of YAML/JSON directly? 😅

reta commented 1 year ago

Thanks @Xtansia

What main issues do you foresee with using Smithy as the definition language and then converting to OpenAPI, other than the adoption/learning hurdle for Smithy?

This is yet another generator (== code), right? And I don't know at least 2 things: a) how good it is b) does it support OpenApi 3.1 (this would be very handy for Json Schemas)

Do you know of any commonly used definition language for OpenAPI that don't involve writing thousands of lines of YAML/JSON directly? sweat_smile

:D the good part - it is rarely needed to modify anything at all (I do have real world examples of that from the past), but we want things "our way" (for a reason), so here we are :smile:

nhtruong commented 1 year ago

Here's my take on Smithy vs OpenAPI.

Smithy is superior in terms of features and maintainability. However, it has not been widely adopted like the well-established OpenAPI. It lacks parsers for many languages, and most contributors are not familiar with it.

I used to think that Smithy was a lot more readable (well it is, from the API Design standpoint). But weirdly enough, when writing the PoC generator for Ruby, I much preferred looking at the JSON object representing an endpoint in OpenAPI over its Smithy equivalence. I had the overview of the entire endpoint in one place in OpenAPI VS referencing several files in Smithy.

Where do we go from here?

1. Smithy all the way (for both spec and code generation)

This requires contributors to learn Smithy and its limited toolset to be able to contribute to the project. If I were a programmer looking to contribute to an Open Source project and learn something new along the way, I'd more likely pick an OpenAPI project over a Smithy one:

More importantly, we might risk alienating developers who have experience in maintaining OpenAPI specs, and in writing code generators for OpenAPI.

The biggest advantage of this approach is utilizing all of Smithy unique features without worrying about whether they are translatable to OpenAPI.

2. Only use Smithy as a tool to write OpenAPI Spec more efficiently

In other words, we keep maintaining the opensearch-api-specification repo as a Smithy project with Gradle plugin to generate OpenAPI Spec. We will avoid Smithy features that are not translatable to OpenAPI.

As @Xtansia has pointed out, writing JSON and YAML for OpenAPI from scratch is a pain. Smithy's Mixin and Traits make it a lot easier and faster to maintain than editing OpenAPI Json/Yaml files. Though there are 3 counter points that weaken the advantages of this approach:

3. OpenAPI all the way (scratch off what we have so far with Smithy and start over)

Even though we spent quite a bit of time setting up the Smithy spec repo (You can see a lot of care for this project in the Developer Guide <3), there are barely any endpoints in the spec, and most are incomplete. So, starting over with OpenAPI is not too wild of an idea. In fact, this might be the best approach, in my opinion:

reta commented 1 year ago

Thanks @nhtruong

Most importantly, we can leverage ElasticSearch's rest-api-spec and translate it to OpenAPI spec programmatically (It's a lot harder to write a program to translate that to Smithy spec). This will save us a lot of time back-filling existing endpoints, and cut down on human errors from writing the spec by hand. (Note that said rest-api-spec lacks info on Response and Request Body, as @Xtansia pointed out in our discussion. While many clients like Rust and Ruby do not need this info, strongly-typed language clients like Java and the future TypeScript client require it. We will still need to add this data manually afterward.)

It is funny you mentioned that, I just posted the same kind of question / comment here https://github.com/opensearch-project/opensearch-api-specification/issues/74#issuecomment-1424308290 this morning. Yes, @Xtansia is very right, the rest-api-spec has no bodies but here is what I came up with:

nhtruong commented 1 year ago

@reta it's quite a funny coincidence. I discovered that ES spec while studying this commit to review this PR. Told @wbeckler about it yesterday and he told me this morning that you just brought up the same thing 😆

Anyway, it's cool that OpenAPI 3.1 can do that. I'll have to look into it. But I have a hunch that we should still convert it to native OpenAPI spec to make it easier to maintain, more consistent with the spec for future endpoints, and more compatible with off-the-shell parsers. (Again just a hunch. Still need to do research on it)

Xtansia commented 1 year ago

Just wanted to share I'd started looking at writing a PoC "from-scratch" OpenAPI based generator, as I'd mentioned in a previous comment. It's still not quite at the point I want to create a draft PR, but if you want to have a look, it's pushed up here: https://github.com/Xtansia/opensearch-java/tree/experiment/codegen-openapi-scratch/java-codegen

nhtruong commented 1 year ago

@reta I'm working on converting the old OpenSchema API Spec to OpenAPI Spec. I misunderstood what you said by using the JSON schema. JSON schema is only meant for the request/response bodies, not the entire API spec right?

reta commented 1 year ago

thanks @nhtruong

JSON schema is only meant for the request/response bodies, not the entire API spec right?

Correct for request/response bodies, but the spec has own schema here [1]

[1] https://github.com/opensearch-project/OpenSearch/blob/main/rest-api-spec/src/main/resources/schema.json

Xtansia commented 1 year ago

I have raised a draft PR with my working PoC for the from scratch OpenAPI based generator: https://github.com/opensearch-project/opensearch-java/pull/366

sachetalva commented 1 year ago

Hi,

Out of the options listed in this response(https://github.com/opensearch-project/opensearch-java/issues/284#issuecomment-1424822276), I think the 2nd option might be a good way forward, i.e we can define the API models in Smithy and for use-cases which require OpenAPI we can consume the OpenAPI build output from Smithy.

We (myself and pgtgrly) just wanted to add some more context on why we had proposed to use Smithy instead of OpenAPI for the opensearch api specification repo (https://github.com/opensearch-project/OpenSearch/issues/1590).

Some of the key reasons were:

In order to speed up API model generation in Smithy, it should also be possible to programmatically generate the Smithy initial API model files using the OpenSearch rest-api-spec JSONs (https://github.com/opensearch-project/OpenSearch/tree/main/rest-api-spec/src/main/resources/rest-api-spec/api). Once the inital model files are created, these can be enhanced to include the request and response bodies.

We would also like to add the discussion this thread by Disney Streaming about the reason they went for Smithy over OpenAPI for it’s highly customizable codegen capabilities (https://www.reddit.com/r/scala/comments/zftfwf/comment/izemlqq/).

nhtruong commented 1 year ago

I've generated the OpenAPI Spec from the legacy spec (with come caveats). The specs and some of my findings when working on this can be found here.

@sachetalva Thanks for the input. Some of the points you mentioned above are not unique to Smithy:

Regarding "[Disney Streaming] went for Smithy over OpenAPI for it’s highly customizable codegen capabilities", I think you mislinked the thread? I don't see it in the thread you quoted. Regardless, we will have to write our own client generators for OpenSearch due to the level complexity of its API and the current clients:

Sure we can customize an existing codegen to handle those, in theory, but at that point, I think it's a lot simpler to write our own generator to retrofit new endpoints into the current clients' structures.

nhtruong commented 1 year ago

One thing I just discovered when translating the legacy spec to OpenAPI spec is that, Smithy has another advantage over OpenAPI: The ability to group related endpoints into one location (Similar to that of the legacy spec). This can't be done with OpenAPI because of the restrictions on Path and Operation references.

dblock commented 9 months ago

@Xtansia what do we want to do with this issue?

Xtansia commented 2 months ago

Closing this issue as the spec is now solely authored in OpenAPI and #366 has been merged