ministryofjustice / operations-engineering

This repository is home to the Operations Engineering's tools and utilities for managing, monitoring, and optimising software development processes at the Ministry of Justice. • This repository is defined and managed in Terraform
https://user-guide.operations-engineering.service.justice.gov.uk/
MIT License
14 stars 5 forks source link

FIREBREAK: 🔥 Create a Dashboard For Displaying KPIs #4420

Closed connormaglynn closed 4 months ago

connormaglynn commented 6 months ago

Background

We have a lot of automated jobs in the operations-engineering repository. These jobs were built to solve a problem. We have very little visibility over whether these jobs actually still solve that initial problem except for when we have to react to a symptom.

For an example, we have a job to configure standards on a repository via a label. This job runs everyday, although we don't have an easy way to determine if it provides any value besides manually checking the logs everyday.

Questions / Assumptions

What hypothesis do we want to test?/What do we want to learn?

We want to learn the following:

Peripheral benefits of doing this:

Definition of done

connormaglynn commented 6 months ago

To add a bit of commentary - this is a large idea with multiple approaches for a solution! ✅

Ideally, use this problem as an excuse to research a technology stack that interests you (that may also solve the problem 🙈 ) 🚀

connormaglynn commented 6 months ago

High-Level Overview of solution architecture 👇

The solution will consist of three components:

  1. KPI Dashboard - the UI interface that stakeholders will primarily interact will to view and analyse KPI metrics i.e. view line graphs, bar charts etc.
  2. KPI Database - a database to store simple time series data of KPIs i.e. number of repositories using the "standards" label over time. Used to populate the KPI Dashboard
  3. Exporter - exporters will sit on clients and push data to the KPI Database (probably through an API). An exporter/push method of gathering metrics is preferred since the primary data source will be ephemeral (i.e. cron jobs that don't live long enough to be scraped).

Image

connormaglynn commented 6 months ago

Custom Implementation

🆙 Update

👀 Remaining Work

📝 Notes

connormaglynn commented 6 months ago

Custom Implementation

🆙 Update

Image

👀 Remaining Work

📝 Notes

connormaglynn commented 6 months ago

Grafana

🆙 Update

Image

👀 Remaining Work

📝 Notes

connormaglynn commented 6 months ago

🆙 Update

👀 Remaining Work

connormaglynn commented 6 months ago

🆙 Update

👀 Remaining Work