[PROPOSAL] Restructuring The Repo

nhtruong commented 9 months ago

One of the biggest hurdles contributors have to deal with when contributing to this repo is its complex structure and rules. The ideas above aim to address some of such issues.

1. Naming of operations in the same group.

OpenSearch has many operations (i.e. url and http-verb parings) that serve identical purposes. They are grouped into operation groups. To distinguish one operation from another in the same group, we use their HTTP verb and differences in path params:

For example, the 4 operations of the search group are currently named:

@http(method: "GET", uri: "/_search")
Search_Get

@http(method: "POST", uri: "/_search")
Search_Post

@http(method: "GET", uri: "/{index}/_search")
Search_Get_WithIndex

@http(method: "POST", uri: "/{index}/_search")
Search_Post_WithIndex

While the cat.health group, which only has 1 operation:

@http(method: "GET", uri: "/_cat/health")
CatHealth

But it can get very complicated with groups that have many operations or many path parameters.

@http(method: "GET", uri: "/_nodes/hot_threads")
NodesHotThreads

@http(method: "GET", uri: "/_cluster/nodes/hot_threads")
NodesHotThreads_DeprecatedDash 

@http(method: "GET", uri: "/_cluster/nodes/hotthreads")
NodesHotThreads_DeprecatedCluster

@http(method: "GET", uri: "/_nodes/hotthreads")
NodesHotThreads_Deprecated

@http(method: "GET", uri: "/_nodes/{node_id}/hot_threads")
NodesHotThreads_WithNodeId

@http(method: "GET", uri: "/_nodes/{node_id}/hotthreads")
NodesHotThreads_WithNodeId_Deprecated

@http(method: "GET", uri: "/_cluster/nodes/{node_id}/hot_threads")
NodesHotThreads_WithNodeId_DeprecatedDash

@http(method: "GET", uri: "/_cluster/nodes/{node_id}/hotthreads")
NodesHotThreads_WithNodeId_DeprecatedCluster

Another downside of this is it's not clear if not clear what kind of component the model represents. CatHealth could be the name of the CatHealth operation, or it could be the out of the operation.

Proposed Naming Format:

OP_<namespace>_<group>_<optional_number>

The search operations would be come

@http(method: "GET", uri: "/_search")
OP_Search_1

@http(method: "POST", uri: "/_search")
OP_Search_2

@http(method: "GET", uri: "/{index}/_search")
OP_Search_3

@http(method: "POST", uri: "/{index}/_search")
OP_Search_4

While the cat.health operation will be come OP_Cat_Health This new naming format is a lot simpler and more clear on what type of structure is being defined. Note that the @http trait right above each operation definition already states what http-verb and path params accompanying the operation.

2. Naming of Operation Components

This is just a small change to make the namespace and operation name pop out more: OP_Cat_Health_QueryParam --> OP_Cat_Health_QUERYPARAM OP_Search_1_Input --> OP_Search_1_INPUT We're simply putting all words used to describe component types in ALL CAPS.

3. Splitting structures.smithy into multiple files

Right now, each operation group has a file called structures.smithy that holds all components of every operation in the group: shared query string params, qperation inputs and outputs, response bodies, and request bodies. Some of these files are quite large without the request and response bodies.

Proposed files:

inputs.smithy
outputs.smithy
query_params.smithy
request_body.smithy

4. No shared default values

There are many path and query params that are repeated in different operations like index_name and node_id. While they have identical schemas, they often differ slightly in the descriptions and especially the default values. When defining the shared smithy models for these params, we should avoid adding the @default traits to avoid accidentally inheriting these values from the shared models.

5. Organize shared data models by namespace/function rather than by types

Data models shared across repo are being placed in files like

common_booleans.smithy
common_enum_lists.smithy
...

While it works fine right now, once we add very complex data structures from response and request bodies, we will encounter reference hell where bit and pieces of a large data structures will appear in a dozen files.

Proposed solution

I don't have an exact proposal on how we organize these models right now, but the general ideal is putting related models as close together as possible as we divide them into namespace and functionality. It will become clearer when I backfill the repo with request and response bodies.

Note that all of these changes can be done programmatically while I backfill the missing request and response bodies.

dblock commented 9 months ago

While the cat.health operation will be come OP_Cat_Health

I like the intent, but OP feels strange. When are these names used, or are they purely internal? If they are internal, what's the benefit of adding OP and not calling these the same as the REST path, including variable parts, e.g. Get_Cat_Health and Post_Index_Search (everything starts with a verb, everything matches REST path)?

We're simply putting all words used to describe component types in ALL CAPS.

CAPS are usually CONSTANTS in other languages. What do these things represent in OpenSearch code? They should match those names IMO.

Splitting structures.smithy into multiple files

Seems reasonable.

No shared default values

Makes sense. Again I would say these need to match what's in OpenSearch core more than have their own variation.

Organize shared data models by namespace/function rather than by types

Same as above. Anything shared has to also be shared in OpenSearch core, then it makes sense to share these models here.

nhtruong commented 7 months ago

This is no longer needed as we've migrated over to native OpenAPI

opensearch-project / opensearch-api-specification