peterzhuamazon commented 2 months ago

[RFC] Building a GitHub Automation App for OpenSearch GitHub Org

Introduction

As the OpenSearch organization continues its journey towards a more open and transparent future, we have faced several operational challenges that require manual steps by OpenSearch-Project Admins. As OpenSearch is moving towards to a foundation model that focused on scalability, efficiency, and transparency, we want to solve these challenges with a permanent solution. This RFC proposes developing a GitHub Automation App specifically for the OpenSearch GitHub Organization to handle automation tasks on behalf of admins going forward.

Motivation

The OpenSearch project wants to create a transparent, open-source, and community focused development model. However, the manual processes we apply to manage the repositories in our GitHub Orgnization are not efficient enough to scale. For instance, if an issue is opened in the wrong repository, the repo maintainers must tag the opensearch-project/admins group in a comment and wait for someone to manually transfer the issue. Similarly, if PR authors forget to add a label that triggers a specific GitHub Action, a repo maintainer must step in, which further delays the review process. These kind of issues happen regularly. In order to address the challenges, we have created a detailed Problem Statement section that outlines the specific issues we face and proposed solutions based on the GitHub Automation App. By automating key tasks through the App, we can enhance the efficiency of the repository management, reduce the dependencies on admins/maintainers, and build a more seamless collaboration environment.

Tenets

Open-source: Engage community and ensure source code is publicly accessible and reviewed during development
Security: Have appropriate authentication and authorization mechanism for each user request
Usability: Clear documentations and SOPs for users to follow, with user-friendly interface
Efficiency: Automate repetitive / complicated tasks on behalf of admins
Scalability: Apply modular design to handle different automation tasks and operations
Performance: Ensure responsive operations to support increasing amount of demands
Availability: Provide users with reliable service and abilities to inform user about potential maintenance
Transparency: Provide clear visibility into the talks that the app is processing
Integration: Integrate with GitHub and other tools used by the organization natively

Problem Statement

1. Issue Management Automation

(1.1) Manual Issue Transfers: Transferring issues between repositories is currently a manual process, where users must tag the @opensearch-project/admin group, and an admin must manually transfer the issue. This process is time-consuming and can lead to delays in issue resolution.
- https://github.com/opensearch-project/opensearch-build/issues/4505
(1.2) Add RFC/Meta Issue to Roadmap: Assigning new RFCs or Meta issues to the appropriate roadmap project board and categorizing them is currently a manual process. Several issues are missed on the new OpenSearch Roadmap because of this.
- https://github.com/opensearch-project/.github/issues/196
(1.3) Merging Backport PRs and Remove Merged Branches: The existing Backport workflow can occasionally cause friction when a maintainer forgets to merge a PR. Moreover, repositories rules require that all PRs be "approved", including Backports, which means an extra-step to merge a Backport PR even when that PR has passed all checks.
- https://github.com/opensearch-project/automation-app/issues/34
- A proof of concept of using github actions to achieve auto-merge with a followup branch removal has been implemented since, but still requires changes to all the repos to take effects:
  - https://github.com/opensearch-project/documentation-website/pull/6893

The App will automate issue management tasks, such as transferring issues between repositories, assigning RFC/Meta issues to the roadmap project board, auto-merging backport PRs when GitHub Checks pass, etc. Make sure issues are efficiently and consistently handled.

2. Label Management and Documentation Support

(2.1) Enforce Documentation: There is no current enforcement mechanism to ensure that developers to open a doc issue on documentation-website repo before release, leading to documentation gaps and delayed involvement from tech writers.
- https://github.com/opensearch-project/documentation-website/issues/6365
- https://github.com/opensearch-project/opensearch-build/issues/4455
(2.2) Automate Labels: Ensuring that the appropriate labels are added to issues and PRs or get created on demand.

The App will enforce the use of labels, such as need-documentation, and automate the addition of labels based on user requirements. The label will then trigger issue creations on documentation-website repository. This will ensure that documentation requirements are raised early for quick follow-up before the release process happens.

3. Permission and Access Control Management

(3.1) Manage Repo Access: When team members leave, or new maintainers are added, their GitHub access must be manually updated by other maintainers. This activity can be missed or delayed overtime. Also, verifying a potential new maintainer's contributions and organize community voting process is cumbersome at the moment.

The App will automate access control management by moving departing members to an Emeritus section, removing their access, and making PRs and announcements. When a new maintainer gets nominated, the app will assess their contributions and start a community voting thread, ensuring that access control is properly managed.

4. Executor of Metrics Project

(4.1) Metrics Dashboard Integration: We currently have a public metrics dashboard that uses a metrics cluster as backend. However, there is no integrated way for users to interact with these metrics from within GitHub, nor is there an easy way to display important information during release phases, such as build status, test reports, or health status of each plugins as part of the release progress.
- https://github.com/opensearch-project/opensearch-metrics/issues/57

The App will serve as the frontend executor of the metrics cluster, acting as a bridge between users and the backend metrics cluster. It will provide an interface to display useful metrics on demand and showcase important information during release phase. We can think of other use cases as well.

5. Bulk Operations Across Multiple Repositories

(5.1) Multi-Repository Updates: Making the same changes across multiple repositories requires either asking each repository’s maintainer to implement the changes, or assigning one person to manually open PRs across all repositories. Using the GH client is also not efficient as only admins with access to multiple repositories can do so.

The App will enable bulk PR creation across multiple repositories. This will simplify the process of implementing organization-wide changes, ensuring consistency and reducing the time required to manage multi-repo updates.

Proof of Concept

In the past few months, we have made a few proof of concept GitHub Apps to tackle issue management automation.

Auto Issue Transfer:

A user creates an issue on Repo A
Repo A maintainer discovers that the issue should belong to Repo B
Repo A maintainer adds a label called Transfer to Repo B to the issue
The App detects issues.labeled event on the issue, verify label content, and identify that Repo B as destination
The App verifies that the user who added the label is a maintainer of Repo A
The App transfers the issue to Repo B, and comments on the issue about the transfer
Example: https://github.com/opensearch-ci-bot/test-docs-repository/issues/17

Add RFC/Meta Issues to Roadmap Project:

A user creates a RFC/Meta issue on Repo A with RFC/Meta issue template
The template enforce user to add RFC or Meta label upon issue creation
The App detects issues.labeled event on the issue, verify label content as either RFC or Meta
The App adds the issue onto the OpenSearch Roadmap Project
Repo A maintainer add a follow-up label Roadmap:Security since the issue is related to Security
- Note: We will enforce issue creator to select the roadmap entry from a drop-down list upon issue creation later
The App detects issues.labeled event on issue, verify label content, and identify that Security as field entry
The App assigns the issue with OpenSearch Roadmap field, with the value of the field being Security
Test Issues:
- https://github.com/peterzhu-organization/github-test-1/issues/4
- https://github.com/opensearch-project/opensearch-build/issues/4936
In Practice:
- https://github.com/opensearch-project/OpenSearch/issues/12602

Opportunities

While the RFC outlines several key places in which the App can help improve the repo management, we think that there could be additional opportunities for enhancement. We invite the community to propose more use cases and features that could be added into the GitHub Automation App.

Things to consider:

Are there any other manual processes that could benefit from the automation?
What additional features would make this app be more useful to the community?
Are there specific metrics that could be displayed or easily accessible with this app?

We encourage the community to provide feedbacks and suggestions by commenting on this RFC issue. Let us know what you think.

Next Steps

We will go ahead and create Meta issues and Design Proposals to the public, and start working on the App based on the aforementioned Proof of Concept.

Conclusion

By automating key tasks with the GitHub Automation App, we could further reduce manual intervention, formalize process, and improve transparency. The app will create a more efficient, scalable, and contributor-friendly environment.

Thanks for reading.

gaiksaya commented 2 months ago

Wondering if it would be too many labels to transfer issues from one repo to another. How about somethings like @app please transfer this issue to security repo Only when the app is mentioned, the event can be triggered making it push based model? We don't have to keep monitoring the label events. Also most maintainers currently tag admins in similar fashion for the transfer. So wouldn't be much of a change from user perspective.

peterzhuamazon commented 2 months ago

Wondering if it would be too many labels to transfer issues from one repo to another. How about somethings like @app please transfer this issue to security repo Only when the app is mentioned, the event can be triggered making it push based model? We don't have to keep monitoring the label events. Also most maintainers currently tag admins in similar fashion for the transfer. So wouldn't be much of a change from user perspective.

Thanks for comment. Note that the label transfer method is just a POC and we have since reviewed it and decided to use other methods. The section above is to show case what the app is capable of, will definitely improve the functions later.

rishabh6788 commented 2 months ago

agree with @gaiksaya on too many labels. I have a few questions:

Is the code logic written in app? Where can I see the code for PoC?
Is it always listening to all the github events across all the repos?
wrt auto merge of backport prs, would it still require approval or not?

peterzhuamazon commented 2 months ago

agree with @gaiksaya on too many labels. I have a few questions:

1. Is the code logic written in app? Where can I see the code for PoC?

2. Is it always listening to all the github events across all the repos?

3. wrt auto merge of backport prs, would it still require approval or not?

Hi @rishabh6788 ,

Thanks for commenting.

The POC code is currently in a private fork, and I will publish it in a new repo when ready.
We can define exactly which event to listen to on which repo. As of now it is only listening to issues.labeled event on all repos of opensearch-project for adding issues to roadmap.
The app will backport then approve the PR. Once the checks pass as green it should merge it and remove the branch as well. As of now that behavior is created using github actions, which is cumbersome. By using the app to achieve this it is more intuitive and easy.

Thanks.

dblock commented 2 months ago

I am really excited to see a bot that does all of this!

A few things that are important.

One should be able to contribute workflows/tasks to the bot easily. So code organization needs to have a single workflow that can be authored independently from other workflows.
The blast ratio of this bot are huge. It will have r/w admin-level access to the org. So we need a robust set of tests.

@peterzhuamazon I recommend moving whatever you have in a private repo to public as early as possible and starting with a very simple workflow so we can ensure that (1) and (2) are robust enough from the beginning.

peterzhuamazon commented 1 month ago

I am really excited to see a bot that does all of this!

A few things that are important.
1. One should be able to contribute workflows/tasks to the bot easily. So code organization needs to have a single workflow that can be authored independently from other workflows.

2. The blast ratio of this bot are huge. It will have r/w admin-level access to the org. So we need a robust set of tests.
@peterzhuamazon I recommend moving whatever you have in a private repo to public as early as possible and starting with a very simple workflow so we can ensure that (1) and (2) are robust enough from the beginning.

Thanks @dblock for the suggestions here.

Yeah, we plan to get the source code out in a new repo soon. As of now, the app is running with the poc code, and only focus on adding issues to project. We should soon break the code into different layers of abstractions/objects so it can be easily maintained and extended on.

Thanks!

rishabh6788 commented 1 month ago

Thanks for the reply @peterzhuamazon, looks promising and excited to see what this app can do for us.

Not sure if it has already been considered, can we have an app that listens to all the major events, push, pull_request, issue_comment and label. The ones that are not in-scope of this project can be a no-op and we can implement logic for the ones we want. For e.g. for transferring we have an action performed when an admin or repo-maintainer adds a comment, such as transfer: <destination-repo> and the app just transfers the issue.

The same can be extended in future to probably add CHANGELOG entries to pull_requests across all repos that opt-in and other actions that you have already mentioned.

Also, it would be great if before the code PoC we can have a high-level design review on how the app design would look like and what all components are involved in its functioning. I have basic queries like, where this app will be running, will it be pull based or webhook based implementation etc.

gaiksaya commented 1 month ago

Agree to @rishabh6788's suggestion. A global app that does everything. I believe more than listening (pull based model), push based model would be great where we tag the app and then the action item to carry. For example:

@opensearch-ci-infra Transfer this issue to foo repo
@opensearch-ci-infra Add Changelog entry
@opensearch-ci-infra Run the performance test on this PR

In this way, you don't have to keep listening but just act on mention based events. We can make it simple by just keeping it comment based. All action items can follow the same pattern. It would be great to take the scaling and increasing scope into consideration. Starting small and incrementally increasing the AI would be the way to go.

peterzhuamazon commented 1 month ago

Hi @rishabh6788 @gaiksaya ,

The current framework I am building is already taking care of the listen step, we do not need to worry about that implementation.

We can, specify exactly what we are listening so we are not overwhelmed by events.

Also, this is a global app framework, but each action would have its own listener, so they are not conflict and step over each other, so it can do all the things at the same time, while not compromise the performance here.

We do, however, would be limited by the github app quota, if all the actions are run under the same app id. If needed, we can create more github app entities, and wrap all under the same umbrella, and they can act coordinately together on our defined actions.

Thanks.

gaiksaya commented 1 month ago

My recommendation was contradictory to this. Listening eventually will become overwhelming and result in lots of API calls across org. Was suggesting to use push based model where an app will act only when tagged (mentioned in GitHub terms). We do however need to have robust checks as to who is mentioning the app but that is doable as well. Might need some more research on this front but would be much easier as comments are easy than labels/events which are permission based and might need time to be adapted by the community. Based on the requirements and asks, it would be great to see the overall design with pros and cons so that we can make a call accordingly.

peterzhuamazon commented 1 month ago

My recommendation was contradictory to this. Listening eventually will become overwhelming and result in lots of API calls across org. Was suggesting to use push based model where an app will act only when tagged (mentioned in GitHub terms). We do however need to have robust checks as to who is mentioning the app but that is doable as well. Might need some more research on this front but would be much easier as comments are easy than labels/events which are permission based and might need time to be adapted by the community. Based on the requirements and asks, it would be great to see the overall design with pros and cons so that we can make a call accordingly.

We are not just based on labels, all the event we can monitor. The probot framework provide multiple ways to interact so we dont need to implement it ourselves. As the framework is based on different actions, each action can be its own instance so there is no overwhelm cases. If the action is too big just split it into more fine-grained actions.

peterzhuamazon commented 4 weeks ago

The code is now public for review: https://github.com/opensearch-project/automation-app

Thanks.

opensearch-project / opensearch-build