Closed SoumayaMauthoorMOJ closed 7 months ago
@SoumayaMauthoorMOJ - Note that the proposed solution (option 3 of #3777 ) would not be implemented by the team. We can have a chat on an alternative solution around creating Terraform.
Sure can you set something up for Monday 29th April?
Hi @SoumayaMauthoorMOJ!
Can you clarify what the result bucket will be please
Thanks!
@jacobwoffenden s3://mojap-athena-query-dump/{workgroup} please :-) (s3://dbt-query-dump/ is specific to create-a-derived-table)
Describe the feature request.
Create Athena workgroups for Airflow athena-based workflows using IAC with 1TB query limit and following tags based on data engineering tagging convention:
Similar to here (but written in Pulumi)
Ideally we would create workgroups per domain, but this will require more effort in terms of coming up with appropriate domains and helping users assign the appropriate domains. So I suggest we should just stick with these workgroups for now.
There will be a separate ticket to see if there's a way of setting a default workgroup via Airflow-wide environment variables
Describe the context.
Athena costs spiked in February:
https://mojdt.slack.com/archives/CBVUV2613/p1710165396335149
It was difficult to track the culprit because most people and airflow use the primary workgroup which means all costs is lumped into one bucket
Value / Purpose
Using named workgroups with tags allows you to have better Athena cost granularity and to set limits.
This is an interim solution until we build Athena monitoring solution to track costs at the per user level.
User Types
No response
Timelines
Ideally as quickly as possible, but more realistically 1st week of April
Definition of Done