mchmarny / github-activity-counter

Cloud Run service for GitHub event Webhook to monitor repo or org activity in real-time in Stackdriver and analyze activity through ad-hoc SQL queries in BigQuery
Apache License 2.0
45 stars 2 forks source link
bigquery cloudrun dataflow github pubsub stackdriver webhook

github-activity-counter

Simple Cloud Run service you can configure as a target for GitHub event Webhook to monitor repository (or organization) activity in real-time.

Besides capturing the event throughput metrics in Stackdriver, this service also normalizes the GitHub activity data and stores the results in an easy to query BigQuery table which can be used in Google Sheets or Data Studio.

Why

Supported Event Types

The current implementation supports following event types:

You can customize this service to support additional event types

Extracted Data

Element Type Description
ID string Immutable ID for specific WebHook delivery (important in case of duplicate WebHook submissions)
Repo string Fully-qualified name of the repository (e.g. mchmarny/github-activity-counter)
Type string The type of GitHUb event (see Events) for complete list
Actor string GitHub username of the user who initialized that event (e.g. PR author vs the PR merger who could be a automation tool like prow)
EventAt time Original event time (not the WebHook processing time, except for push which could include multiple commits)

Pre-requirements

GCP Project and gcloud SDK

If you don't have one already, start by creating new project and configuring Google Cloud SDK. Similarly, if you have not done so already, you will have set up Cloud Run.

Setup

To setup this service you will:

To start, clone this repo:

git clone https://github.com/mchmarny/github-activity-counter.git

And navigate into that directory:

cd github-activity-counter

Configure Dependencies

To work properly, the Cloud Run service will require a few dependencies:

To create these dependencies run the bin/setup script:

bin/setup

In addition to the above dependencies, the bin/setup script also create a specific service account which will be used to run Cloud Run service. To ensure that this service is able to do only the intended tasks and nothing more, we are going to configure it with a few explicit roles:

Finally, the ensure that our service is only accepting data from GitHub, we are going to created a secret that will be shared between GitHub and our service:

export HOOK_SECRET=$(openssl rand -base64 32)

The above openssl command creates an opaque string. If for some reason you do not have openssl configured you can just set HOOK_SECRET to a your own secret. Just don't re-use other secrets or make it too easy to guess.

Build Container Image

Cloud Run runs container images. To build one for this service we are going to use the included Dockerfile and submit it along with the source code as a build job to Cloud Build using bin/image script.

You should review each one of the provided scripts for content to understand the individual commands

bin/image

Deploy the Cloud Run Service

Once you have configured all the service dependencies, we can now deploy your Cloud Run service. To do that run bin/service script:

bin/service

The output of the script will include the URL by which you can access that service.

Setup GitHub WebHook

GitHub has good instructions on how to setup your WebHook. In short it amounts to:

Test

To test the setup you can create an issue in the repo where you configured the WebHook. In the WebHook log there should be an indication the WebHook worked (response 200) or didn't.

Similarly on the Cloud Run side, you should be able to see the logs generated by eventcounter service using the function logs link and eventually there should be now data in the BigQuery table.

Query

There is an endless ways you could analyze this data (e.g. Types of activities per repo or Average activity frequency per user). Here for example is a SQL query for type of activities per user over last 28 days:

SELECT actor, type, count(1) as activities
FROM eventcounter.events
WHERE event_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 28 DAY)
GROUP BY actor, type
ORDER BY 3 desc

You can find a few more query samples in the queries directory.

Cleanup

To cleanup all resources created by this sample execute the bin/cleanup script.

bin/cleanup

Disclaimer

This is my personal project and it does not represent my employer. I take no responsibility for issues caused by this code. I do my best to ensure that everything works, but if something goes wrong, my apologies is all you will get.