opensearch-project / opensearch-benchmark

OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch
https://opensearch.org/docs/latest/benchmark/
Apache License 2.0
107 stars 75 forks source link

[RFC] Renaming Components and Adding Branches to Workloads Repository #324

Open IanHoang opened 1 year ago

IanHoang commented 1 year ago

Synopsis

This is an RFC for a proposal to improve the nomenclature of several components within OpenSearch Benchmark to make them conform to standard terminology, for better readability and ease of maintenance. Although renaming components might seem like a small change, the proposed replacements will impact users using legacy versions of OSB (across at least eight minor versions of OSB). This RFC addresses our motivation, proposes suitable replacements, and recommends ways to mitigate inconveniences brought on by these replacements.

Meta Tasks: https://github.com/opensearch-project/opensearch-benchmark/issues/325


Motivation

Leading up to the release of OSB 1.0.0, members of the community have spent time identifying and resolving various issues across OSB’s code base. When members dove into the code base to understand different components, many found that a handful of components were too verbose, inconsistently formatted, lacked clarity, or were no longer appropriate in their context.

In addition to this, community members have received recurring questions and noticed confusion regarding these components — what they mean, how they work, and interact, etc. Since OSB is still in an early state in its development, we propose that the community finalize on suitable replacements and rename these components as soon as possible. It would be better to rename these components now rather than later, when additional features and workloads will have been incorporated into OSB. Additionally, we can take this time to determine if other major components should be renamed.


Recommendations

We have identified a list of OSB components that are in question. Feel free to add others components and provide your rationale behind why and what they should be renamed to.

Naming Conventions: if we have to use underscores (_) in customer facing components, any areas that are exposed to customers should be hyphenated (-). Literals within the codebase are restricted to using underscores by the programming language syntax.


Avoiding Breaking Changes with Legacy Versions and New Versions of OSB

Some component names in the OSB repository also live in the workloads repository. Because of this, if any are altered, there will be breaking changes in the workloads repository. Once we have finalized the names and replaced them in the OSB repository, we will need to come up with a way for OSB users to be able to seamlessly use legacy versions (0.0.1 to 1.0.0 or any versions before the changes have been implemented) without encountering issues. Of all these options, we are leaning towards option 1.

1. Adding More Branches

Add new branches containing updated changes while preserving the original branches. By adding more branches, we keep the legacy formats and updated formats in the same repository, making it much easier to manage and, eventually, deprecate the legacy branches in the future. The branches are currently named 1,2,3,6,7 and refer to the first three versions of OpenSearch and versions 6 and 7 of Elasticsearch. To distinguish the new branches from the legacy branches, the new branches will follow the naming convention - (e.g. OS-1 and OS-2 for OpenSearch versions 1.X and OpenSearch versions 2.X). Additionally, this new naming convention will clear up any previous confusion users had with what 1,2,3,6,7 represented in the workloads repository.

The only drawbacks we can see with this idea is that there will be an excess amount of branches. Despite this, managing excess branches for a short period of time is still more appealing than maintaining an extra repository, additional layer of directories, or forcing all OSB users to upgrade to the latest version.

2. Creating a Separate Repository (legacy-workloads)

Create a new repository with the new changes. However, this would also be a nuisance to deal with as we’d have an additional repository to maintain for a short period of time as we would eventually deprecate the repository with the legacy formats.

3. Creating distinct directories in workloads

Adds an additional layer between the workload name and its contents. When users have OSB installed on their machine, the path to get to the contents of each workload is already long and adding an additional layer would make it even more cumbersome.

4. Deprecating all versions prior to newly-released version with these changes.

This is another option but the least effective, as it forces all users of OSB to upgrade.


The changes proposed above intend to be incorporated prior to the next major release (2.X). It's important that these changes occur before other major features are built on top of these pre-existing components. Although they pose to be a brief inconvenience for some, the proposed changes will benefit the long term vision for OpenSearch Benchmark and the OpenSearch community.

We are looking forward to your feedback and support for this proposal.

How Can You Help?

IanHoang commented 7 months ago

Meta issue https://github.com/opensearch-project/opensearch-benchmark/issues/325

IanHoang commented 3 months ago

Maintainers have had a discussion regarding renaming components. We have created a document discussing plan and timeline for 2.0.0. Community members are welcome to read and comment on this RFC for any ideas or additional proposals. We will finalize on final list of items to rename and update this RFC and META Issue tracker next Wednesday.

andrross commented 3 months ago

I am personally very supportive of simplifying and standardizing terms in the OSB interface. Assuming you implement this and release a 2.0.0 major version, how will you ensure that conventions and standards are followed going forward for new features? (I don't have any great answers for you so please do share any mechanisms you come up with!)

IanHoang commented 3 months ago

@andrross To ensure that we adhere to new conventions and standards, here are a few ideas:

These are just some ideas but if you or anyone else have any ideas or comments, we're open to hearing them!

andrross commented 3 months ago

@IanHoang I'm talking more about ongoing development. Let's say a contributor comes along and adds a new feature that uses the phrase "execution" as a part of a CLI option for that feature. If you specifically review that change then you will almost certainly work to use the phrase "run" instead. How can you ensure that all such changes get the right level of scrutiny to make sure the terminology in use remains consistent?

IanHoang commented 3 months ago

@andrross To ensure that the renamed terminology remains consistent in ongoing development, maintainers and reviewers will need to review the naming conventions thoroughly to ensure that contributors aren't reverting back to the legacy terminologies. Luckily, OSB doesn't have an extensive list of unique terminologies like test procedures or test executions, so it shouldn't be too daunting. Upholding code review best practices (ensure that the PR changes are healthy sizes, inspect naming, check functionality, etc.) would also improve our ability to catch moments when users use legacy terms.

One idea to improve this and to make the effort a little less manual is to leverage a custom Github Actions. This Github Actions would search and comment on the legacy terminologies found in PRs (similar to how style-job Github Actions is used on PRs in the documentation repository). However, this would only detect when legacy terminologies are used and doesn't detect situations where users are referring to the renamed term but in a completely new way. The only way I can imagine catching those situations is through a thorough review.

Although it's not a perfect solution, a combination of those two (automated Github Actions + maintainers and contributors following PR best practices) should help ensure that the renamed terminology remains consistent.