opensearch-project / opensearch-py

Python Client for OpenSearch
https://opensearch.org/docs/latest/clients/python/
Apache License 2.0
359 stars 178 forks source link

[PROPOSAL] Use a git submodule to include the OpenSearch.openapi.json file #626

Closed jayaddison closed 5 months ago

jayaddison commented 11 months ago

What/Why

What are you proposing?

The generate_api.py utility script currently retrieves the latest version of the OpenSearch API spec based on the main branch of the opensearch-api-specification.git repository.

In the case of updating the contents of the development tip in opensearch-py (this repo), that's what we want, but there are a few scenarios that are challenging:

The proposed solution to this is to use a git submodule to reference a specific version of opensearch-api-specification.git. The submodule contents would be cloned locally as part of the existing nox -rs generate command.

What users have asked for this feature?

As far as I know, I'm the only requestor so far.

What problems are you trying to solve?

In addition to the two items mentioned above (development against a branched spec, debugging of past issues), I like to increase the confidence I have that a software release can be rebuilt again in the future. Specifying dependency versions tends to helps a lot for that - but it does have a tradeoff which is that those versions need to be managed.

What is the developer experience going to be?

This effect of this change is on developers of the opensearch-py library itself.

Are there any security considerations?

The changes here remove one HTTP(S) request, and replace it with a local file read to a git submodule checkout.

The content we receive shouldn't change in either case, and the access level is gated -- and should be equivalent -- whether a git clone of the submodule occurs or based on the HTTP request, because the repository is public.

One additional subprocess is created within the nox step -- a call to the local git binary on the host system.

Are there any breaking changes to the API

No.

What is the user experience going to be?

The opensearch-py developer experience when generating client code from the OpenSearch spec should be unchanged.

When an updated version of the OpenSearch API specs become available, an additional manual step is required to update the submodule to the desired version; this update is then commited using git commit -m "..." or similar as with any other change entry.

Are there breaking changes to the User Experience?

Developers could unexpectedly find that when they run the nox -rs generate script, they do not see recent updates from the spec repository without explicitly updating to include those.

Why should it be built? Any reason not to?

This could improve some of the development flexibility for the client code in this repository; however there is no evidence of branching in the spec repository, so it is unclear whether this is valuable in practice.

The proposal could make it easier to revisit historic versions of the client and regenerate code as it existed at the time.

Generally this proposal intends to make it easier to replicate a precise build of the opensearch-py client reliably.

What will it take to execute?

I have some prototype code available in a branch, and I'll offer this in a 'draft' status until there's consensus on whether the feature is worthwhile.

Any remaining open questions?

The process for updating the submodule version is undefined.

Additional context

This proposal branches off from some discussion on pull request #617.

Xtansia commented 11 months ago

The overall architecture of building and distributing the specs is not yet concrete: https://github.com/opensearch-project/opensearch-api-specification/issues/74 and https://github.com/opensearch-project/opensearch-api-specification/issues/84

So appropriate version tagging & branching of the spec is still to come.

I'm somewhat weary of using git submodules for this as they tend to be quite a nuisance to work with, and it means checking out the complete source and history of the spec repo for a single JSON file.

The .NET client repo currently does this the "dirty" way of downloading the spec to a local file and checks it into the client repo. Downloading the latest is then an explicit choice that's passed when running the generator. This was mirroring the previous mechanism by which the generator achieved it before the fork.

jayaddison commented 11 months ago

Thanks for the context @Xtansia. I agree that git submodules can be cumbersome sometimes. I won't open/close my pull request from the draft status until a versioning strategy is agreed upon.

I've found the ApiGenerator code for the .NET client, and it seems there are eight or so client languages that depend on the spec (Go, Java, JavaScript, .NET, PHP, Python, Ruby, Rust, ... ?).

For the possible approaches (1) and (3) from opensearch-project/opensearch-api-specification#84, I get the sense that it might be possible to provide supporting automation to distribute information to the client repositories when tagged specifications are created.

Whether that could be the creation of GitHub issues as notifications to developers, or something more advanced like pull requests against each client seem to depend on how consistent the approach is across the repositories.

(two disclaimers: I'm not hugely experienced with advanced capabilities of GitHub automation, and I'm all-too-familiar with projects that attempt to over-specify developer experience to the extent that it causes unaffected developer friction -- so it's worth being cautious about grand ideas. asking each client to update a file could be fine -- and even if that or submodules became automated, it'd be no guarantee of absence in bugs in autogenerated code anyway)

saimedhi commented 5 months ago

Closing this issue as discussed here.