opensearch-project / opensearch-benchmark-workloads

Official workloads used by OpenSearch Benchmark (OSB)
https://opensearch.org/docs/latest/benchmark/
11 stars 58 forks source link

OpenSearch Benchmark Workloads

This repository contains the default workload specifications for the OpenSearch benchmarking tool OpenSearch Benchmark.

You should not need to use this repository directly, except if you want to look under the hood or create your own workloads.

How to Contribute

If you want to contribute a workload, please ensure that it works against the main version of OpenSearch (i.e. submit PRs against the main branch). We can then check whether it's feasible to backport the track to earlier OpenSearch/Elasticsearch versions.

After making changes to a workload, it's recommended for developers to run a simple test with that workload in test-mode to determine if there are any breaking changes.

See all details in the contributor guidelines.

Following are the steps to consider when contributing.

Create a README.md

For an example workload README file, go to the http_logs.

Verify the workload’s structure

The workload must include the following files:

Both default.json file names can be customized to have a descriptive name. The workload can include an optional workload.py file to add more dynamic functionality. For more information about a file’s contents, go to Anatomy of a workload.

Testing the workload

Create a PR

After testing the workload, create a pull request (PR) from your fork to the opensearch-project workloads repository. Add a sample output and summary result to the PR description. The OpenSearch Benchmark maintainers will review the PR.

Once the PR is approved, you must share the data corpora of your dataset. The OpenSearch Benchmark team can then add the dataset to a shared S3 bucket. If your data corpora is stored in an S3 bucket, you can use AWS DataSync to share the data corpora. Otherwise, you must inform the maintainers of where the data corpora resides.

For more details, see this guide

Backporting changes

With each pull request, maintainers of this repository will be responsible for determining if a change can be backported. Backporting a change involves cherry-picking a commit onto the branches which correspond to earlier versions of OpenSearch/Elasticsearch. This ensures that workloads work for the latest main version of OpenSearch as well as older versions.

Changes should be git cherry-picked from main to the most recent version of OpenSearch and backward from there. Example:

main → OpenSearch 2 → OpenSearch 1 → Elasticsearch 7 → ... 

In the case of a merge conflict for a backported change a new pull request should be raised which merges the change.

License

There is no single license for this repository. Licenses are chosen per workload. They are typically licensed under the same terms as the source data. See the README files of each workload for more details.