ministryofjustice / analytical-platform

Analytical Platform • This repository is defined and managed in Terraform
MIT License
8 stars 4 forks source link

🚚 Migrate Airflow workloads to APC #4490

Open jacobwoffenden opened 4 weeks ago

jacobwoffenden commented 4 weeks ago

User Story

As an Analytical Platform engineer I want (current) Airflow jobs to schedule on APC So that we can fully retire the Airflow EKS clusters

Value / Purpose

Airflow EKS clusters are partially managed in Terraform, pinned to IMDSv1, use kube2iam, and have no observability 😭

Migrating these workloads to APC will allow us to retire more clusters and make use of the newer capabilities in EKS and the supported tooling.

Useful Contacts


User Types

Platform Engineering


If we... [do a thing] Then... [this will happen]


Migrate Airflow workloads to APC

Additional Information

This was sort of started in DPAT but never happened

Blocked by:

Definition of Done

jacobwoffenden commented 4 weeks ago

Blocked while Airflow component is being worked on

jacobwoffenden commented 3 weeks ago

Comms sent to ask-data-engineering with sheet to fill in

jacobwoffenden commented 3 weeks ago

Moving back to blocked while IRSA is being worked on

jacobwoffenden commented 3 weeks ago

I've cut a new release of the cross-account-ecr action, published a new version of template-airflow-python which used the new v1 action and correctly adds APC accounts to repo policy.

I then updated the example dag to use the new image version and APC dev context ( and below is the output when running it (even though it fails because it can't use IRSA yet, it still pulls)

vscode ➜ /workspaces/modernisation-platform-environments (main) [ aws: analytical-platform-compute-development:modernisation-platform-sandbox@eu-west-2 ] [ context: arn:aws:eks:eu-west-2:381491960855:cluster/analytical-platform-compute-development ] $ kubectl --namespace airflow get events                                     
LAST SEEN   TYPE     REASON      OBJECT                                        MESSAGE
59s         Normal   Scheduled   pod/task-1-cecda48866f94f90a3357d96206822b6   Successfully assigned airflow/task-1-cecda48866f94f90a3357d96206822b6 to
58s         Normal   Pulling     pod/task-1-cecda48866f94f90a3357d96206822b6   Pulling image ""
53s         Normal   Pulled      pod/task-1-cecda48866f94f90a3357d96206822b6   Successfully pulled image "" in 5.264s (5.264s including waiting). Image size: 76701464 bytes.
jacobwoffenden commented 2 weeks ago

APC OIDC added to APDP

jacobwoffenden commented 2 weeks ago

We've tested @AntFMoJ's toy DAG on APC with IRSA cross account and its working 🎉

Unfortunately we are now blocked in discussion with Modernisation Platform about reuse of network ranges.

jacobwoffenden commented 1 week ago


jacobwoffenden commented 4 days ago

Moving to blocked while we figure out how to proceed with Direct Connect.