Open rishabh6788 opened 1 month ago
tagging @dblock @reta @andrross @peternied @getsaurabh02 for feedback and comments.
tagging @dblock @reta @andrross @peternied @getsaurabh02 for feedback and comments.
Thanks a lot @rishabh6788 , the workflow + manual approval look great to me to start with
@rishabh6788 Nice work - looking forward to seeing it in action!
@rishabh6788 What happens to the issue created in the "Create an issue in the repo" step? Is it automatically closed somewhere?
@rishabh6788 What happens to the issue created in the "Create an issue in the repo" step? Is it automatically closed somewhere?
The issue's lifecycle will be maintained by workflow, once it opens and gets added to the PR, the maintainer has to add appropriate comment and then it will auto-close. See sample https://github.com/opensearch-project/opensearch-benchmark/issues/567
@rishabh6788 This looks great. Few questions:
@rishabh6788 This looks great. Few questions:
- Assuming you are planning to share the format of the comment to trigger perf runs in separate issue. Would be great if we can consider to perform the runs both by freshly indexing the data with each run vs restoring the data from snapshot.
- Does maintainer needs to be aware of or verify any load on backend infra before approving the request (assuming no) ? If there are any failure on execution post approval, can we post those errors to this newly created issue ? Or even tag infra team member to help with it ?
Thank you for the feedback @sohami. For 1, The plan is to use a separate json file to handle all the varied configurations so it will give user flexibility to choose indexing only benchmark, search-only benchmark (will use our existing snapshots) and also use any feature flags, such as enabling concurrent-search on the cluster. The user will have to add a comment on the PR which will have the id of the configuration they are interested to run. The issue is only for pausing the workflow and get an approval from a maintainer if a non-maintainer person submits a request, the request will auto close once a maintainer has approved/denied the request and the workflow will continue further with execution.
For 2, The maintainer need not worry about the load on the infra, we will have monitors and alerts to let infra team know if there are any faults and errors. In case of workflow failure a comment will be added on PR with job link for maintainer to look and inform infra team if they see any infra related issues. We can also add a feature to create an issue for infra team to look at.
Is your feature request related to a problem? Please describe
Creating this issue to gather feedback and track progress on adding a feature to submit performance benchmark runs on pull requests created in OpenSearch or any plugins repository and update the result on the same.
The basic fundamentals of the proposed design are:
Describe the solution you'd like
The solution that we are proposing is very similar to how
gradle-check
runs on each PR. We will be leveraging github-actions and jenkins remote job execution feature to enable this automation. The choice was between either use alabel
trigger orissue_comment
trigger to enable this workflow. We recommend usingissue_comment
as it gives flexibility to easily manage many different cluster and workload configurations, not that is not possible with label but then there would be too many labels to maintain and manage.Who can submit performance runs?
Any developer working on OpenSearch project can submit benchmark run request by adding a comment on the pull request, the comment will have to be in a particular format for the job to be able to process.
Who can approve/deny performance run request?
Any available maintainer can approve/deny the performance run request.
How the approval mechanism work?
We will use the tried and trusted https://github.com/trstringer/manual-approval github actions to manage approvals. How it works is once a legitimate comment is added on the PR the approval action will create an issue and add that issue to the PR in a comment, the maintainer then has to comment
approve
ordeny
on the issue. Once done the issue will auto-close and the rest of the execution will continue.Why approval is needed?
Running benchmark will require significant infra resources as it will be spinning up a production grade cluster and a benchmarking client and we don't want a bad actor abuse the system by submitting too many requests and overwhelm the system. The other reason is the probability of execution of a malicious code, given we will be running OpenSearch process that contains the code submitted by the developer.
How will you manage cluster and workload settings?
The plan is to have json file containing various cluster and workload configurations with each having a unique-id. This id will be part of the formatted comment to be added to invoke the workflow. The user needs to look at the config file and decide which configuration they want to use to run benchmark.
Below is the sample flow:![pr_benchmark_flow](https://github.com/opensearch-project/OpenSearch/assets/19776689/557d315d-6cf0-48cd-99d2-bedc1c308a39)
Related component
Other
Describe alternatives you've considered
The alternative to avoid approval is to let maintainer add the required comment to initiate the workflow. In this case the pull request submitter will have to ask the maintainer to add the desired comment on their behalf. I am open to any other recommendation to make this as smooth as possible for the maintainers.
Additional context
The jenkins benchmark job that will be triggered is https://build.ci.opensearch.org/job/benchmark-test/.
The cluster will be spun up using opensearch-cluster-cdk in an AWS account.
opensearch-benchmark will be used to run performance benchmark.
Related Milestones: