[Proposal] Create Upgrade Testing Framework

chelma commented 1 year ago

Summary Of Work Being Proposed

It is proposed that the OpenSearch Project create a framework that makes it easy to test the results of performing an cluster version upgrade on ElasticSearch/OpenSearch clusters. This framework will accelerate development of improvements to the use-story for upgrades. Additionally, it will enable cluster administrators to create simulacra of their real-world clusters and attempt an upgrade in a safe environment to determine the impact on data, metadata, plugins, etc. Finally, it provides a place for the wider community to centralize its knowledge of how to perform an upgrade, edge-cases associated with different versions of the software/plugins, and document incompatibilities and their resolution instead of spreading that knowledge out across blog posts, private wikis, and tribal knowledge.

Terminology

Cluster Admin: individual responsible for the maintenance and/or support of one or more ElasticSearch/OpenSearch Clusters
OpenSearch Developer: individual who contributes code to the OpenSearch Project, either its core repositories or the constellation of repositories connected to them
Upgrade: moving from one version of ElasticSearch/OpenSearch to another without necessarily changing the underlying hardware hosting the cluster
Migration: moving an ElasticSearch/OpenSearch cluster from one underlying hosting solution to another without necessarily changing the version of the cluster

Assumptions

AS1: We assume that the latest version of ElasticSearch for which an upgrade path to OpenSearch will be supported is 7.17.x

Why Is The Work Needed?

The existing backwards compatibility (BWC) tests in the OpenSearch Project repos currently capture component-level, happy-path expectations. However:

The BWC test framework does not cover more complex, "real-world" cluster setups with multiple plugins, much less upgrades that don't conform to the N+1 version constraint.
The BWC test framework is not designed to capture tests/expectations that are expected to fail.
The BWC test framework mixes the responsibilities of standing up a cluster, setup up for individual tests, and executing the tests in a way that makes extension and re-use of the framework difficult.
The BWC test framework is not factored to be usable by non-developers to simulate upgrades of their real-world setups.
The BWC test framework performs cluster setup using tarballs unpacked on the local filesystem, limiting portability, cross-platform testing, test repeatability, and test isolation.

What Use-Cases Will The Work Resolve?

UC1: As an OpenSearch Developer, I want a way to easily test code and capabilities focused on improving the experience of upgrading production clusters between versions of ElasticSearch and OpenSearch in a manner that simulates the actual user-experience of performing real-world upgrades, including multiple plugins and upgrades between cluster versions that exceed the N+1 rule
UC2: As a Cluster Admin, I want a way to easily construct a representative copy of my real-world cluster's data/metadata/plugins and simulate an upgrade from ElasticSearch to OpenSearch
UC3: As a Cluster Admin, I want a way to easily construct a representative copy of my real-world cluster's data/metadata/plugins and simulate an upgrade to a newer version of OpenSearch
UC4: As a Cluster Admin who has simulated an upgrade, I want an easy-to-understand report on compatibility, incompatibility, and other changes between the starting and ending versions of my cluster
UC5: As a Cluster Admin who has performed an upgrade outside the context of the simulate framework, I want to be able to contribute key acceptance criteria, roadblocks, workarounds, and other findings to an authoritative, centralized repository
UC6: As a user of the simulation framework, I want to be able to easily share my own upgrade experience so my feedback can be incorporated into the framework

Project Tenets

The framework should be easy to pick up and use by non-experts
The framework should not require any specific hardware setup to run
The framework should produce easily-understood results
The framework should be flexible to accommodate novel setups and configurations
The framework should be easy to contribute to
The framework should have simple, composable, informative tests
The framework should focus on high-value tests rather than exhaustive tests

Proposed Design

It is proposed to make a command-line tool that can be executed to test upgrade between arbitrary cluster configurations. The tool will be composed of multiple abstraction layers to separate responsibilities and enhance extensibility. Docker will be used to set up the test cluster on the user's machine.

Python-Based Orchestration Layer

A Python orchestration layer will serve as the user portal into the framework via a command-line interface. It will accept an incoming test request, set up the required cluster, execute the requested upgrade, initiate the analysis/testing steps at appropriate times, provide terminal output indicating progress, and provide a reference to the detailed final results. Python is a flexible and practical language for interacting with operating systems and widely used in-industry. Additionally, the language allows new features and bug fixes to be easily tested via direct modification without needing to re-compile the code or use specific integrated development environments, decreasing the level of effort to contribute to the framework.

It invokes the Configuration Management Layer, the Analysis Layer, and the Report Generation Layer without being intimately coupled to their internal details
It should be made up of composable commands that can be used outside of the context of performing a holistic simulated upgrade (e.g. give me a local test cluster, but don’t perform a migration on it).
It should surface access to the test cluster to the user so they can interact w/ it (e.g. poke around in the node containers) if they wish to.
It should make it easy for users to supply their own configuration - container images, test data, etc.

Docker-Based Configuration Management

Docker will be used to set up and configure the test cluster. This ensures the portability, isolation, and repeatability of the framework. Docker is widely-used in industry for this purpose and will lessen the knowledge-burden required to use the framework. Users will be able to test clusters backed by arbitrary distros on any device (including laptops) rather than needing dedicated hosts. Node/Cluster setup will be repeatable, and test teardown will be automatic. Users and can easily bring their own setup by swapping out which Docker image(s) the framework uses for the simulated migration. Users can build an image on-device from a supplied Dockerfile or improve the setup time by pulling pre-built images from a user-selected repo.

It should not encapsulate items like data, which are intimately tied to the analysis/tests to be executed
Its interface expectations w/ other layers may expose dependency on containerization, but not a specific brand of containerization (e.g. Docker vs. Podman)

Python-Based Analysis Layer

A Python analysis layer will interrogate the test cluster at each point of its upgrade to determine whether it is proceeding according to expectations or not and produce partial results that will later be combined into a final report. For example, one expectation might be that the same number of documents exist in the test cluster before and after the upgrade. Another expectation could be that the representation format of a given field might change over the course of the upgrade. Another expectation might be the upgrade failed due to an incompatibility between two plugin versions.

It should be invoked iteratively by the Orchestration Layer at each stage of the migration and produces partial results at each step
It will need to make its analyses/expectations easily targetable as composable sets, perhaps via the use of something like tags (e.g. intersection of “PreMigration”, “SecurityPluginEnabled”, “OpenSearch1_X”)
“Failure” of an expectation should probably not result in termination of the simulated migration, but rather a line-item in the generated report

Report-Generation Layer

The report-generation layer will assemble the partial results created by the orchestration layer’s periodic invocation of the analysis layer into a final report for the user.

The report should provide the user a representation of the key characteristics of their starting cluster; a representation of the key characteristics of the ending state of the cluster; a high level indication of where their upgrade met or did not meet expectations; and a link to in-depth logs of the operations performed by the framework.
The user should be provided a quick way to upload their report to a centralized location w/ comments as a continuous improvement mechanism (e.g. submit an issue to the GitHub repo).
The generated report should be portable and readable by any user-platform

Open Questions

Additional thought will need to be placed into how, specifically, configuration of a test cluster will be decoupled from the tests that are executed against. Items like test data seem reasonable to upload to the test cluster by the analysis layer as a pre-requisite, with specific data shapes/patterns being intimately tied to tests that interrogate them. Items like the number and type of nodes in the cluster seem reasonable to be supplied in a general way by the orchestration layer to the analysis layer. However - what about user setup details (e.g. “User Foo w/ password Bar should retain access after a migration”)?
Is Docker/Python the best choice to base the framework around? Helm Charts has been mentioned as an alternative, as has Go.

Alternatives to Docker

Docker is the industry-standard tool for containerization. However, it is not a free-and-open-source tool. Per Docker, there are carveouts for individual developers, small companies, and open source development, but otherwise a license is required (see here). It's questionable whether this framework would qualify for their open source carveout (see here). In the event that the framework does not qualify, a license would likely be required for at least some users to leverage/contribute to the framework. License fees are small ($9/user/month, see here) and would only be needed by the specific Cluster Admins and OpenSearch Developers using the tool.

Therefore, it seems like Docker is a reasonable choice for building the framework around. However, if it is deemed that Docker is not a viable choice, for whatever reason, then a possible alternative is Podman.

About Podman

Disclaimer: the author has minimal experience w/ Podman outside of reading docs/blog posts.

Podman is a Linux-native, free, open source containerization program see here that is compatible with the same Open Containers Initiative mechanisms and formats that Docker relies on. This means that it supposedly behaves quite similarly to Docker, and can use most Docker images in public repos without issue.

The biggest differences between Docker and Podman for our use-cases appear to be:

No docker-compose: Podman does not have a capability equivalent to docker-compose that allows for specifying the configuration of a multi-container cluster. What it does have is a pods concept inherited from Kubernets where containers can be spun up together in a conceptual grouping (see here and see here). This allows for sharing of things like networks, but without the additional layer of management features provided by the docker-compose CLI calls and and docker-compose.yaml files. As a result, the code required to setup and create test clusters will be more complex.
Modular capabilities vs. all-in-one: Podman is designed to be modular, meaning it does one thing well (run and manage running containers). However it also means that it doesn’t do anything else - such as building container images (a separate tool, buildah, is used for that; see here). This means more setup and configuration to use in comparison to the all-in-one solution of Docker.
Different user experience: This might be obvious, but the experience of using Podman instead of Docker will be different, from installation onwards. While the CLI calls appear mostly the same, I suspect we’ll discover substantive differences in experience as we develop against Podman. This increases the knowledge-burden of using and contributing to the upgrade framework.

dblock commented 1 year ago

Does this issue belong in https://github.com/opensearch-project/opensearch-devops? Or somewhere else? This repo is really for producing the distribution of OpenSearch.

chelma commented 1 year ago

Maybe. Probably? Looking for some guidance here as this is my first time posting a proposal to the project. Where do you think it should live and get visibility?

mch2 commented 1 year ago

@dblock @chelma I think we should move this to the main repo for feedback/discussion.

dblock commented 1 year ago

@chelma I suggest bringing some of this discussion in some presentation form to the community meeting, too!

chelma commented 1 year ago

@dblock Great idea, will do!

gregschohn commented 1 year ago

I like the usage of docker, but it will slightly narrow the scope of what kinds of tests you'd like to run. I'm thinking of performance tests - confirming that there are no regressions in performance for a given workload across a similarly situated cluster. Another one, would be to test on windows clusters, once OpenSearch is available for > 1 release.

Those can be future concerns, but it would be nice to not over constrain the design now.

dblock commented 1 year ago

I think simulation framework doesn't give this proposal justice. This sounds like it's an "upgrade testing framework".
I would take the goal to supersede and delete the bwc framework code in OpenSearch.
I'd like to get a tool that can analyze a production cluster and generate an upgrade configuration that I can then run through the migration/upgrade framework to see whether that would work. Then replicate my cluster with full cluster replication and run the tools against my production data copy (this time it will test search queries), then I want to do the real upgrade and failover.
I'd like to be able to leverage GitHub for collaboration/knowledge base/contribution regressions/configurations/etc.

peternied commented 1 year ago

Great proposal thanks for putting this out there!

I don't think this tool should live inside of OpenSearch - I think it should be part of its own repository. The way upgrade testing is conducted shouldn't be tightly coupled with the version of OpenSearch.
Include a mock of the input(s) to the tool, it would help clarify the range of supported scenarios.

We've added a GitHub Action [1] that orchestrates the BWC framework for ad-hoc version to version tests, with the following input. If we could cleanly use this tool to replace the under-the-covers components of this GitHub Action our team would gladly adopt.

jobs:
  last-supported-major-to-current:
    ...
    - uses: ./.github/actions/run-bwc-suite
      with:
        plugin-previous-branch: "1.3"
        plugin-next-branch: "2.x"
        report-artifact-name: BWC-Last-Supported-Major

  current-to-next-unreleased-major:
    ...
    - uses: ./.github/actions/run-bwc-suite
      with:
        plugin-previous-branch: "2.x"
        plugin-next-branch: "main"
        report-artifact-name: BWC-Next-Major

[1] https://github.com/opensearch-project/security/pull/2253

Cluster Admin uses cases are framed around a knowledgeable admin. Depending on how we want this tool to be used it might be worthwhile to invest in lowering the barrier to extracting value from the tool. Automatically discovering the current cluster configuration is a great way to add instant value. Helping the cluster admin know what is tested or what could be tested might also be of value.
This tool is framed around relatively local testing (containers), supporting remote/cloud managed cluster as sources or destinations would be useful for many more migration scenarios.
Might want to describe the capabilities of tests that are executed, seems like the tool will need some level of test harness / tracking.
Checkout https://github.com/opensearch-project/opensearch-benchmark while built for performance benchmarking on a single cluster with cluster standup / interaction, maybe it can be extended or aspects reused.
Orthogonally to the functionality proposed - I'd recommend the tool be written in a strongly typed language.

chelma commented 1 year ago

Per comments and discussion, changed name to "Upgrade Testing Framework".

chelma commented 1 year ago

@dblock @peternied I wrote up a new doc [1] exploring the user experience for the framework that I think addresses most of your comments/suggestions. Would love to get your eyes on it if you have a few spare cycle.

[1] https://github.com/opensearch-project/opensearch-migrations/issues/29

opensearch-project / opensearch-migrations