opensearch-project / opensearch-benchmark

OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch
https://opensearch.org/docs/latest/benchmark/
Apache License 2.0
110 stars 77 forks source link

[Create-Workload Enhancements] Rearchitect Create-Workload Feature #587

Open IanHoang opened 3 months ago

IanHoang commented 3 months ago

Overview

This is an issue based off one of the proposed priorities in this RFC: https://github.com/opensearch-project/opensearch-benchmark/issues/395

Background

As of now, OSB's create-workload is a monolith that uses a two modules of functions to create a custom workload. It was inherently designed to be a quick and easy way to build custom workloads off of small corpora. While this approach has worked in the past, there is an increasing demand for building custom workloads based off of complex workloads and more users are using this feature to achieve this.

Users who have been using this feature have mentioned that the create-workload code currently is difficult to extend, maintain, and, for newcomers to OSB, difficult to follow and interpret.

We should rearchitect the code to be more organized and scalable, which in turn will make it easier to extend and maintain. This work will also serve as the foundation for future development, such as extracting a random sampling of the documents and repairing incomplete workloads.

Proposed Design

While the existing approach is considered modular, create-workload in its current state is unwieldy. We have gathered feedback from users who have extended the feature and have used the feature to build custom workloads based on complex production workloads that are up to 10TB. Based on the feedback received, we should rearchitect create-workload to have the following components:

Proposed priority

It also makes it difficult for newcomers to come and understand the code easily. This approach would promote encapsulation and abstraction, overall making create-workload more organized and scalable as well as will be easier to extend and maintain.

IanHoang commented 2 months ago

Received feedback to add support for pbzip2 compression now that OSB supports it. Will create a separate PR for it.

gkamat commented 2 weeks ago

@IanHoang, it may be helpful to add some child tasks to this issue, since there are multiple items here.