Open jacobwoffenden opened 5 months ago
Blocked while Airflow component is being worked on
Comms sent to ask-data-engineering with sheet to fill in https://docs.google.com/spreadsheets/d/1B8DOsSgnxGV1FjRv8dLv0wqDMo2RiiMqedFogLBpQEQ
Moving back to blocked while IRSA is being worked on
I've cut a new release of the cross-account-ecr action, published a new version of template-airflow-python which used the new v1 action and correctly adds APC accounts to repo policy.
I then updated the example dag to use the new image version and APC dev context (https://github.com/moj-analytical-services/airflow/pull/3613) and below is the output when running it (even though it fails because it can't use IRSA yet, it still pulls)
vscode ➜ /workspaces/modernisation-platform-environments (main) [ aws: analytical-platform-compute-development:modernisation-platform-sandbox@eu-west-2 ] [ context: arn:aws:eks:eu-west-2:381491960855:cluster/analytical-platform-compute-development ] $ kubectl --namespace airflow get events
LAST SEEN TYPE REASON OBJECT MESSAGE
59s Normal Scheduled pod/task-1-cecda48866f94f90a3357d96206822b6 Successfully assigned airflow/task-1-cecda48866f94f90a3357d96206822b6 to ip-10-200-33-237.eu-west-2.compute.internal
58s Normal Pulling pod/task-1-cecda48866f94f90a3357d96206822b6 Pulling image "189157455002.dkr.ecr.eu-west-1.amazonaws.com/template-airflow-python:v0.4"
53s Normal Pulled pod/task-1-cecda48866f94f90a3357d96206822b6 Successfully pulled image "189157455002.dkr.ecr.eu-west-1.amazonaws.com/template-airflow-python:v0.4" in 5.264s (5.264s including waiting). Image size: 76701464 bytes.
APC OIDC added to APDP
We've tested @AntFMoJ's toy DAG on APC with IRSA cross account and its working 🎉
Unfortunately we are now blocked in discussion with Modernisation Platform about reuse of network ranges.
Updates:
Moving to blocked while we figure out how to proceed with Direct Connect.
Meeting with HMCTS' network architect on 11/07/24 @ 11:30 BST
Escalated to HMCTS head of DTS people and profession on 24th July 2024. Our ask has now been raised with the lead PlatOps in HMCTS. Currently awaiting on a response. If no movement by the end of the week will escalate to Martyn.
Meeting help with DTS PlatOps 5/8/24 and has been escalated. Waiting for meeting to be be arranged with HMCTS stakeholders.
Meeting with HMCTS arranged for 5/9/24 to discuss scope of work
Had meeting with HMCTS, they are going to put is in touch with CloudGateway
Sent chaser email on 15/10 and 22/10
meeting arranged with cloudgateway for 4/11
VPN endpoint data sent cloud gateway, awaiting response
Updated VPN configuration parameters and sent over. Apparently we are waiting on commercials too.
Pencilled some time in on Thursday 21/11 to bridge with CGW
nonprod was cutover on 21/11 🎉 prod is being arranged for 27/11
Now 2/12, maintenance posted https://status.analytical-platform.service.justice.gov.uk/posts/details/PK6KZ5V
User Story
As an Analytical Platform engineer I want (current) Airflow jobs to schedule on APC So that we can fully retire the Airflow EKS clusters
Value / Purpose
Airflow EKS clusters are partially managed in Terraform, pinned to IMDSv1, use kube2iam, and have no observability ðŸ˜
Migrating these workloads to APC will allow us to retire more clusters and make use of the newer capabilities in EKS and the supported tooling.
Useful Contacts
@jacobwoffenden
User Types
Platform Engineering
Hypothesis
If we... [do a thing] Then... [this will happen]
Proposal
Migrate Airflow workloads to APC
Additional Information
This was sort of started in DPAT https://github.com/ministryofjustice/analytical-platform/issues/2843 but never happened
Blocked by:
Definition of Done