thefrontside / playhouse

Frontside's Backstage Portal
https://backstage.frontside.services/
25 stars 13 forks source link

Github Webhook Entity Provider #16

Open taras opened 2 years ago

taras commented 2 years ago

Motivation

GitHub is an important source of information for a Backstage portal. It contains all project source code which is increasingly becoming the source of truth for the configuration of assets in an engineering organization ecosystem. A GitHub repository contains configurations of CI/CD systems, package dependencies, and deployment configurations to name just a few examples. The trends of treating a repository as a source of truth will only increase with the adoption of GitOps practices.

Backstage has several mechanisms used to interact with GitHub. The GithubURLReader used to read a single or multiple files from a repository via the GitHub REST API. Various GitHub processors in @backstage/plugin-catalog-backend-module-github package used to read organization information and discover repositories. GitHubEntityProvider can be used to pull groups and users for an organizations into Backstage. Each of these mechanisms provide some functionality but they don't cover all of the use cases for ingesting data from Backstage.

Further more, Custom Processors such as GithubDiscoveryProcessor are being deprecated in favour of Entity Providers. Entity Providers are replacing Custom Processors for ingestion because custom processing ingestion pipeline proved to be inefficient when processing data from large GitHub instances with hundreds of organizations and thousands of repositories. For large GitHub instances, custom processors resulted in long delays in processing because the custom processing pipeline would attempt to indiscriminately ingest every location regardless of wether the location has new data. Entity Providers are a more scalable approach because they allow reacting to change in GitHub instances without proactively looking for changes.

The most efficient way of ingesting data from large GitHub instances is to trigger an entity provider as a response to a webhook. You can find an example of this in #10. In the most rudimentary form, we trigger the provider's read function when a specific event is triggered. Here is an example of triggering read on GithubOrgEntityProvider when a person is added to a team or an organization.

  router.post('/github/webhook', async (req, _res) => {
    const event = req.headers["x-github-event"];
    if (event == "membership") {
      await githubOrgEntityProvider.read();
      env.logger.info("Successfully triggered database update via github webhook event");
    }
    // TODO: we should forward requests to smee for local development
  });

This code in this webhook will have to handle all of the entity providers that will interact with GitHub. The complexity of this webhook handler will grow as the number of entity providers in this webhook increases. It will become increasingly difficult to debug this webhook and will inevitably lead to confusion. I would like for us to get ahead of this by introducing an API to make it easier to write and debug entity provider for the GitHub webhook.

The specific APIs are TBD and @cowboyd will have very good opinions on the subject. I wanted to share some thoughts to get the ball rolling.

Detailed Design

Installation of a GitHub Webhook plugin

The GitHub Webhook Plugin will allow a developer to install the webhook as a regular Backstage plugin. This plugin will mount an express route that will receive events once the webhook is added to an organization in GitHub.

Debug-ability

The goal of this plugin is to make debugging easier by giving developers a way to inspect the behavior of the entity providers that are handling events received by the webhook. Visibility into execution of the webhook will be provided by the Effection Inspector. For Effection Inspector to show execution of the entity providers, each entity provider must be written as an Effection task.

TypeScript types

We want to make it as easy as possible to write strictly typed handlers for these events. The API for extending the webhook handler should use types from https://github.com/octokit/webhooks#importing-types to guide implementors in hooking into the webhook.

Extensibility

In addition to providing the Effection Inspector, Effection provides some useful building blocks for a custom API...

taras commented 2 years ago

@cowboyd I captured the ideas that I wanted to put down on paper. Can you please continue in Extensibility section to describe what this API might look like?

taras commented 2 years ago

It appears there is an RFC in Backstage https://github.com/backstage/backstage/issues/11082