opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
84 stars 119 forks source link

[FEATURE] Document MLClient API using spec files #2297

Open dbwiddis opened 3 months ago

dbwiddis commented 3 months ago

Is your feature request related to a problem?

As a downstream plugin, Flow Framework consumes the ML Commons API via its Client. As such, we are tightly coupled with the API. During the recent 2.13.0 release cycle, a new feature was added to one of the ML Commons APIs. Because we were not closely monitoring the development of the feature, it wasn't noticed until manual testing of a workflow using the new feature discovered its lack of implementation.

There is little visibility into new features without tracking Issues and PRs. Particularly around releases, the documentation is not always written until after the "code freeze" (Release Candidate generation) date, greatly shortening the time window to detect and react to the updates. In addition, this requires human intervention to search for and identify these changes.

We would like to develop an automated way to discover, via regular testing, when the API specification differs from our implementation, as early as possible.

What solution would you like?

In general: some file published in some common location that we can machine-read to identify required and optional parameters for the APIs we implement.

Specifically: OpenSearch recently transisitioned to using the OpenAPI specification for its API. See, for example, the /cat API: https://github.com/opensearch-project/opensearch-api-specification/blob/main/spec/namespaces/cat.yaml

ML Commons could publish a similar API specification for its Client, using OpenAPI. It's easy to do, with web-based authoring tools. I did so for the Extensions SDK Hello World extension at the Swagger website without knowing anything about the spec: https://github.com/opensearch-project/opensearch-sdk-java/blob/main/src/main/java/org/opensearch/sdk/sample/helloworld/spec/openapi.yaml

Then over at Flow Framework we could just read in this spec file, iterate over the APIs we care about and parse the parameters, and make sure we have them all covered.

What alternatives have you considered?

Having fallible humans regularly monitoring PRs, Change Log, and Documentation Website hoping to catch all the changes.

Do you have any additional context?

Other clients (in other languages) may want to consume this spec to auto-generate client classes. See https://github.com/opensearch-project/opensearch-api-specification/issues/189

navneet1v commented 3 months ago

+1 on this.

owaiskazi19 commented 2 months ago

This is how core detects the breaking change https://github.com/opensearch-project/OpenSearch/pull/12974 of the downstream plugins. We should have something similar for ml-commons as Github workflow.