Closed alexwlchan closed 1 year ago
Okay, so a brief recap of the structure: there's a Docker Hub account called wellcometravis
with email address wellcomedigitalplatform@wellcome.ac.uk
. I found the password for this account as part of #5723.
This account is the owner of the wellcome
organization, but I'm not sure we still use it – I think we replaced it entirely with Amazon ECR.
As a first step, I've removed a bunch of ex-staff from the organisation.
Working from the end of the list forwards: ❓
Not deleted images:
feature_similarity
– still used in the data-science reponginx
– only pulled a day ago, so looks like this one's still in use somewherenginx_api-gw
– we're still mirroring this image into an ECR repo; we use it in one of the data-science APIs; it was only pulled 24 days agopa11y_dashboard
– not pushed for 6 years, but apparently pulled just 16 days ago? I can't find references to it in our codebase, but leaving for now.palette_similarity
– still used in the data-science repoDeleted images:
Dev tooling:
aws-azure-login
– last pushed 4 years ago (2019) and last pulled a year ago (2022). I think everyone is using the Node app installed directly now; if not we can easily recreate this.finatra_service_base
– Finatra is an HTTP framework we haven't used in Wellcome Collection apps since 2019, so we can ditch this.node-docker:8.9.1
– it was pushed 6 years ago (2017) and last pulled 3 years ago (2020). This corresponds to the release of Node 8.9.1. I can't find any references to this image in our GitHub repos and we're using much newer versions of Node now.sbt_wrapper
– this image is now stored in ECRterraform_wrapper:77
– last pushed 4 years ago (2019) and never pulled. I think this dates from a time when we did everything using Makefiles; these days I think everyone uses the Terraform binary installed directly on their Mac or a HashiCorp provided image, but nobody is using a Docker wrapper we haven't maintained.A whole bunch of now-deprecated build tooling:
weco-deploy
– a deployment tool that we no longer use, so we definitely don't need this Docker image.release_tooling
– the precursor to weco-deploy, which we doubly don't use any moreimage_builder
– another bit of old build toolingscalafmt
– rather than running it as a standalone tool in a container, now we have it installed as an sbt plugin in the project and we run it through sbtpublish_service
– this was used for publishing Docker images, but the only extant references I can find are in archived Loris and Archivematica repos. None of our current apps use this.jslint
– not referenced anywhere, pretty sure we use stuff installed in the npm/yarn project insteadformat_json
– not referred to anywherebuild_test_lambda
– not referenced anywherebuild_test_python
– not referenced anywherebuild_tooling
– not referenced anywherepublish_lambda
– not referenced anywheretest_lambda
– not referenced anywheretest_python
– not referenced anywhereformat_python
– now replaced with the official Black Docker image, see #5724python3
– used as a base for now-deleted build toolingturtlelint
– was this used for ontology files? I don't even rememberImages related to Loris, a IIIF Image API server that we haven't used since March 2021.
loris
loris-uwsgi
cache-cleaner
(a component for purging an EFS cache based on the age and size of items)cache_cleaner
(yes we had the same app in two repos)Images related to Archivematica, which are
archivematica-mysql
– we use RDS as our MySQL instance in Archivematica. Possibly we had this in a Docker Compose file at some point, but it hasn't been pushed for 4 years and we wouldn't run a database in Docker anyway.
clamavd
, which is now stored in ECR:A bunch of per-resource web apps, not pushed for 5 years, which feel like a throwback to a very different way of doing the website:
common_webapp
wellcomecollection-nginx-proxy
wellcomecollection-router
wellcomecollection_eventbrite_nginx
wellcomecollection_eventbrite_webapp
wellcomecollection_events_nginx
wellcomecollection_events_webapp
wellcomecollection_exhibitions_nginx
wellcomecollection_exhibitions_webapp
wellcomecollection_whats_on_nginx
wellcomecollection_whats_on_webapp
wellcomecollection
Pipeline apps that are now stored in ECR:
transformer_sierra
– catalogue pipelinelogtstash_transit
(sic) – only has experimental tags, pulled 17 times; although we use bits of Logstash in places, this isn't it.fluentbit
– used for logging, but the image we actually use is stored in ECRcatalogue_webapp
content_webapp
Everything else:
palette_api
– not referred to anywheremyrepo
– seems to have been used for ephemeral images in front-end buildsbuildkite
– similar purpose, ephemeral images in front-end buildsfake-sns
– we were using this in our tests to mock SNS, but we replaced it with a LocalStack container in early 2022java11
– used to be the base image for some of our apps, now replaced with images from ECR Publictypesafe_config_base
– not used since 2021elasticdump
– this is a Dockerised version of an npm package that we thought we might use in the snapshot generator, but we haven't used in agesnginx_webapp
– we have a lot of different nginx images, but I can't find any references to this particular onesqs_freezeray
– a way to download messages from SQS but wrapped in some ECS packaging that we no longer usesqs_redrive
– a way to move messages between SQS queues, but largely superseded by the SQS redrive feature that AWS have introduced nativelyphp7-fm
and php73-fm
– these were a couple of images from big Wellcome that ended up in our account, but they haven't been pulled in ages. We want to close this account so I'm going to delete them, and the definitions are in the Git history if they need to be rebuilt.logstash_transit
– another holdover from old approaches to logging, not referred to anywhereDown to a mere 8 images!
I've copied the feature_similarity
, palette_similarity
and nginx_api-gw
images into ECR repositories in the data account, and they're now running there, so I'm going to delete the copies in Docker Hub.
The flake8
and tox
images are both mirrored in ECR, and all the repos get the images from there, so those should be okay to delete.
There are two repositories left: nginx:156
and pa11y_dashboard:latest
. Both were pulled fairly recently, but I can't find any references to them in our codebases – I'm 99% sure these aren't us (and if they are, I have no idea where they're defined!).
I'm going to delete them so I can mop up the rest of the account.
Done. The wellcome
organization is gone and the wellcometravis
user have both been deactivated.
There's a whole bunch of stuff in the
wellcome
Docker Hub namespace and I think ~95% of it is unused.We should go through this account and see whether we need to keep any of it, or if it can be safely deleted (and not handed over).