ministryofjustice / modernisation-platform

A place for the core work of the Modernisation Platform • This repository is defined and managed in Terraform
https://user-guide.modernisation-platform.service.justice.gov.uk
MIT License
681 stars 289 forks source link

[SPIKE] Better monitoring for GitHub Actions #6464

Closed richgreen-moj closed 7 months ago

richgreen-moj commented 8 months ago

User Story

As a MP Engineer I want to have better visibility of the status of my GitHub Actions So that I can spot errors and avoid conflicts more easily

Value / Purpose

We have certain workflows which take a long time to complete e.g. Terraform: Scheduled Baseline .

During a busy working day we may have multiple PRs we wish to merge in but we want to deconflict hitting API errors etc. by not having too many concurrent GH actions running. We do get failure notifications but these are more in the moment and can get lost in Slack.

I feel like we could benefit from a birds eye view of the status of our GitHub actions to aid us in spotting failing actions and helping to deconflict when merging in PRs.

I'm not sure how best we could do this but some ideas are:

something to consider - https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/adding-a-workflow-status-badge

Useful Contacts

@richgreen-moj

Additional Information

No response

Proposal / Unknowns

No response

Definition of Done

ASTRobinson commented 7 months ago

mocked up a badge dashboard and showed the team at yesterday's stand-up (received some good feedback) https://user-guide.modernisation-platform.service.justice.gov.uk/user-guide/workflow-status.html

have also been reaching out to other teams to see if / how they are monitoring workflows (most just use a Slack message on failure).

had a quick look at some 3rd party tools (datadog, thundra-foresight) and a few more need to dig in a little deeper to assess cost and value for money.

ASTRobinson commented 7 months ago

Added badges to repo readme to test visibility improvement 🤔 https://user-guide.modernisation-platform.service.justice.gov.uk/user-guide/workflow-status.html Image

ASTRobinson commented 7 months ago

During the spike, I delved into the Dashboard/Status Page as demonstrated here. I presented this to the team and received positive feedback. Additionally, I implemented the concept of status badges on the repository homepage (example here). To further refine these features, I've created two GitHub tickets:

Issue #6675 - Implementation of status badges on repositories. Issue #6676 - Development of a comprehensive dashboard overview status page.

In my exploration, I also assessed several third-party solutions. However, the majority proved to be either costly, offered poor value for money, or were outdated without ongoing support. Moreover, these external tools exhibited limited integration capabilities, which would require additional user interface interactions. Such an approach could potentially detract our engineers and users from the primary codebase, diminishing efficiency.

Considering these challenges and in light of GitHub's recent beta release of Actions Usage Metrics public beta as of March 28, 2024, it's clear that investing time in these third-party solutions may not be the optimal strategy at this juncture. However, revisiting GitHub's integrated metrics at a later stage should be considered as a valuable approach.