ministryofjustice / analytical-platform

Analytical Platform • This repository is defined and managed in Terraform
https://docs.analytical-platform.service.justice.gov.uk
MIT License
12 stars 4 forks source link

✨ Add CloudTrail to AP Users permissions to allow for better Airflow Error Tracking #4368

Closed mshodge closed 2 months ago

mshodge commented 5 months ago

Describe the feature request.

Airflow is used by a lot of analysts and data scientists, but sometimes jobs fail without proper or adequate logging information on the Airflow UI. When I was a Data Engineer, my go to would be to go to CloudTrail and look for why the job was failing. However, since leaving the Data Engineering team I no longer have access to the CloudTrail logs. As part of my MLOps work we are exploring how to use Airflow for model training; however, some users of Airflow for model training have experienced issues with their jobs getting stuck and not completing. I would like to know why these things happen, and the only way of doing that is to use check the CloudTrail logs.

The CloudTrail logs, from experience, can be dense (as so much traffic occurs), but with the right guidance we can help AP users use this as a resource to look at Airflow failures.

An alternative is to create a service where users can on the Control Panel for their relevant CloudTrail logs, but this seems to be a bigger task and would rely on users being linked to their Airflow jobs (which aren't currently recorded on the Control Panel) or for users to register Airflow jobs on the CP as well.

Describe the context.

No response

Value / Purpose

Helps users understand issues with their Airflow runs without having to report to AP team straight away.

User Types

No response

jhpyke commented 4 months ago

Hi,

Just to clarify the ask in this case, is this referring to the CloudWatch log groups currently accessible via the Airflow UI (to allow for better filtering using Cloudwatch), or the CloudTrail logs generated by the actions that a given DAG takes? or some third option I hadn't considered? Just so we can better understand the permissions desired and how they might compare to the current available permissionset.

Thanks, Jake

github-actions[bot] commented 2 months ago

This issue is being marked as stale because it has been open for 60 days with no activity. Remove stale label or comment to keep the issue open.

github-actions[bot] commented 2 months ago

This issue is being closed because it has been open for a further 7 days with no activity. If this is still a valid issue, please reopen it, Thank you!