Closed davidkelliott closed 3 months ago
This issue is stale because it has been open 90 days with no activity.
This issue is stale because it has been open 90 days with no activity.
Following some discussion, we think this story is around the use of up-to-date AMI images for ECS/EKS containers.
I've gone through the code in Modernisation Platform Environments and created a spreadsheet to document the use of EKS/ECS, making note of where hardcoded ami values are being used.
ECS
EKS
eks_node_version
which is optimised to get the latest available AMI for the current K8s version of EKS cluster.Here's a blog with some template code for automating the update of EC2 instances in an auto scaling group that is hosting ECS services https://aws.amazon.com/blogs/industries/automate-patching-by-replacing-amazon-ecs-container-instances/ Essentially it looks up the latest version of the ECS-optimised AMI for your desired platform and then updates the launch template with the new value. Care is taken to drain nodes and take them offline one by one to avoid downtime.
Retrieving latest AMIs:
ECS
The ECS TF module uses a data call to retrieve the latest ECS-optimised AMI image by querying the Systems Manager Parameter Store API. https://github.com/terraform-aws-modules/terraform-aws-ecs/blob/master/examples/ec2-autoscaling/main.tf#L162C3-L165
This is then used to describe the image id for the ECS auto scaling group https://github.com/terraform-aws-modules/terraform-aws-ecs/blob/master/examples/ec2-autoscaling/main.tf#L296
Members could make use of this module or build this in to their code, rather than hard-coding AMI IDs.
Or via SSM parameter store:
aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2023/recommended --region eu-west-2
EKS
The EKS TF Module can be used with a data call to get, for instance, the latest bottlerocket EKS-optimised image: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_managed_node_group/main.tf#L527-L535
Or via SSM parameter store:
aws ssm get-parameter --name /aws/service/bottlerocket/aws-k8s-1.30/x86_64/latest/image_id --region eu-west-2 --query "Parameter.Value" --output text
Based on my findings of usage of ECS and EKS in across the MP here is a list of options that members could consider to ensure their infrastructure is patched with the latest AMIs:
Options
Reconsider whether workloads would be appropriate for Cloud Platform
Make users aware of the latest AMIs as they are released via an updates channel in Slack?
My Recommendation:
Raise a ticket to explore whether options 1/2/3 would be suitable for all of the applications I've identified who are running ECS/EKS with pinned AMI IDs in their code...
@sukeshreddyg suggested that we could write a lambda script that scans the AMIs in use by clusters in member accounts and compares that with the latest versions so that we can alert MP team when they are out of date. I will draft a story to explore this further.
Stories to write:
User Story
As a modernisation platform engineer I want customers to use the most recent AMIs with their clusters So that they are using up-to-date software
User Type(s)
Analytical Platform users Data Platform users Performance Monitoring Other potential platform customers on MP
Value
Where ECS or EKS use EC2 instances, we need to ensure that they are using the latest recommended versions. We will start with investigating how we find out the latest versions and make users aware of this, then how we make these upgrades at a platform level if needed.
Questions / Assumptions / Hypothesis
Has this already been covered with the new ECS module raised after this issue was created? If so, is it just a question of migrating legacy users across?
Proposal
This story is about finding out where customers are not making use of up-to-date AMI images for ECS/EKS - for example, where they're hard coding the AMI rather than retrieving the latest version with a data call. It's a bit more free-form than that because this is a spike, but that's my interpretation.
Definition of done
Reference
How to write good user stories