[PROPOSAL] Run performance benchmarks on pull requests

rishabh6788 commented 1 month ago

Is your feature request related to a problem? Please describe

Creating this issue to gather feedback and track progress on adding a feature to submit performance benchmark runs on pull requests created in OpenSearch or any plugins repository and update the result on the same.

The basic fundamentals of the proposed design are:

Opensource in true sense, any contributor who wants to run performance benchmark on their pull requests should be able to do so.
To prevent abuse and execution of malicious code the maintainers will have the final authority to allow or deny the request.
Maintainers of the repo should be able to submit benchmarks without any approval.
The users should have a baseline to compare their results against.

Describe the solution you'd like

The solution that we are proposing is very similar to how gradle-check runs on each PR. We will be leveraging github-actions and jenkins remote job execution feature to enable this automation. The choice was between either use a label trigger or issue_comment trigger to enable this workflow. We recommend using issue_comment as it gives flexibility to easily manage many different cluster and workload configurations, not that is not possible with label but then there would be too many labels to maintain and manage.

Who can submit performance runs?

Any developer working on OpenSearch project can submit benchmark run request by adding a comment on the pull request, the comment will have to be in a particular format for the job to be able to process.

Who can approve/deny performance run request?

Any available maintainer can approve/deny the performance run request.

How the approval mechanism work?

We will use the tried and trusted https://github.com/trstringer/manual-approval github actions to manage approvals. How it works is once a legitimate comment is added on the PR the approval action will create an issue and add that issue to the PR in a comment, the maintainer then has to comment approve or deny on the issue. Once done the issue will auto-close and the rest of the execution will continue.

Why approval is needed?

Running benchmark will require significant infra resources as it will be spinning up a production grade cluster and a benchmarking client and we don't want a bad actor abuse the system by submitting too many requests and overwhelm the system. The other reason is the probability of execution of a malicious code, given we will be running OpenSearch process that contains the code submitted by the developer.

How will you manage cluster and workload settings?

The plan is to have json file containing various cluster and workload configurations with each having a unique-id. This id will be part of the formatted comment to be added to invoke the workflow. The user needs to look at the config file and decide which configuration they want to use to run benchmark.

Below is the sample flow: pr_benchmark_flow

Related component

Other

Describe alternatives you've considered

The alternative to avoid approval is to let maintainer add the required comment to initiate the workflow. In this case the pull request submitter will have to ask the maintainer to add the desired comment on their behalf. I am open to any other recommendation to make this as smooth as possible for the maintainers.

Additional context

The jenkins benchmark job that will be triggered is https://build.ci.opensearch.org/job/benchmark-test/.
The cluster will be spun up using opensearch-cluster-cdk in an AWS account.
opensearch-benchmark will be used to run performance benchmark.

Related Milestones:

[ ] https://github.com/opensearch-project/opensearch-build/issues/4804
[ ] #14826

rishabh6788 commented 1 month ago

tagging @dblock @reta @andrross @peternied @getsaurabh02 for feedback and comments.

reta commented 1 month ago

tagging @dblock @reta @andrross @peternied @getsaurabh02 for feedback and comments.

Thanks a lot @rishabh6788 , the workflow + manual approval look great to me to start with

peternied commented 1 month ago

[Triage - attendees 1 2 3] @rishabh6788 Thanks for creating this issue, looking forward to a pull request to resolve

peternied commented 1 month ago

@rishabh6788 Nice work - looking forward to seeing it in action!

andrross commented 1 month ago

@rishabh6788 What happens to the issue created in the "Create an issue in the repo" step? Is it automatically closed somewhere?

rishabh6788 commented 1 month ago

@rishabh6788 What happens to the issue created in the "Create an issue in the repo" step? Is it automatically closed somewhere?

The issue's lifecycle will be maintained by workflow, once it opens and gets added to the PR, the maintainer has to add appropriate comment and then it will auto-close. See sample https://github.com/opensearch-project/opensearch-benchmark/issues/567

sohami commented 3 weeks ago

@rishabh6788 This looks great. Few questions:

Assuming you are planning to share the format of the comment to trigger perf runs in separate issue. Would be great if we can consider to perform the runs both by freshly indexing the data with each run vs restoring the data from snapshot.
Does maintainer needs to be aware of or verify any load on backend infra before approving the request (assuming no) ? If there are any failure on execution post approval, can we post those errors to this newly created issue ? Or even tag infra team member to help with it ?

rishabh6788 commented 3 weeks ago

@rishabh6788 This looks great. Few questions:

Assuming you are planning to share the format of the comment to trigger perf runs in separate issue. Would be great if we can consider to perform the runs both by freshly indexing the data with each run vs restoring the data from snapshot.

Does maintainer needs to be aware of or verify any load on backend infra before approving the request (assuming no) ? If there are any failure on execution post approval, can we post those errors to this newly created issue ? Or even tag infra team member to help with it ?

Thank you for the feedback @sohami. For 1, The plan is to use a separate json file to handle all the varied configurations so it will give user flexibility to choose indexing only benchmark, search-only benchmark (will use our existing snapshots) and also use any feature flags, such as enabling concurrent-search on the cluster. The user will have to add a comment on the PR which will have the id of the configuration they are interested to run. The issue is only for pausing the workflow and get an approval from a maintainer if a non-maintainer person submits a request, the request will auto close once a maintainer has approved/denied the request and the workflow will continue further with execution.

For 2, The maintainer need not worry about the load on the infra, we will have monitors and alerts to let infra team know if there are any faults and errors. In case of workflow failure a comment will be added on PR with job link for maintainer to look and inform infra team if they see any infra related issues. We can also add a feature to create an issue for infra team to look at.

opensearch-project / OpenSearch