ministryofjustice / analytical-platform

Analytical Platform • This repository is defined and managed in Terraform
https://docs.analytical-platform.service.justice.gov.uk
MIT License
11 stars 4 forks source link

✨ Grant Airflow DAG permission to trigger create-a-derived-table GitHub workflow #4166

Open SoumayaMauthoorMOJ opened 5 months ago

SoumayaMauthoorMOJ commented 5 months ago

Describe the feature request.

Grant airflow DAG permission to trigger create-a-derived-table GitHub workflow using the most appropriate method, which could be saving a Github personal taken to AWS Secrets/Parameters, or using a GitHub app

Describe the context.

We currently do not have a process for combining Airflow -> create-a-derived-table pipelines, apart from simply guessing the Airflow pipeline completion time and scheduling the time appropriately. There are multiple solutions, but a simple work-around is to use an Airflow bash operator and a curl command to workflow_dispatch the github action e.g:

curl -L \
  -X POST \
  -H “Accept: application/vnd.github+json” \
  -H “Authorization: Bearer <token>” \
  -H “X-GitHub-Api-Version: 2022-11-28” \
  https://api.github.com/repos/moj-analytical-services/create-a-derived-table/actions/workflows/<action_name>/dispatches \
  -d ‘{“ref”:“main”}’

This would require the Airflow DAG to have the relevant github permission.

Value / Purpose

No response

User Types

No response

SoumayaMauthoorMOJ commented 5 months ago

@jacobwoffenden let me know if you need more info :-)

SoumayaMauthoorMOJ commented 5 months ago

According to Github:

Personal access tokens are intended to access GitHub resources on behalf of yourself. To access resources on behalf of an organization, or for long-lived integrations, you should use a GitHub App. For more information, see "About creating GitHub Apps."

SoumayaMauthoorMOJ commented 4 months ago

@tomholt1 as discussed, please liaise on this ticket with the AP :-)

Ed-Bajo commented 4 months ago

@tomholt1 - Can we just clarify what the ask is for the AP team here. We can definitely generate a PAT for the runners, but worth us understanding what other facilitation your team will require (if any)?

tomholt1 commented 4 months ago

Hey @Ed-Bajo, I think the context has changed a little since the ticket has been opened. We won't be needing a PAT as we will be planning to create a github app and authenticate that way using a JWT. What we require is just the relevant permissions to be able to run the above curl command in a DAG, that will trigger a github action upon the airflow jobs success

jacobwoffenden commented 4 months ago

@tomholt1 if you're going down the route of a GitHub Application, I'm not sure if you need input from our team anymore, it was originally suggested because we could provision a fine-grained access token using @moj-data-platform-robot

jacobwoffenden commented 4 months ago

Is this the GitHub App https://github.com/organizations/moj-analytical-services/settings/apps/airflow-dags-github-actions ?

tomholt1 commented 4 months ago

Nope, I haven't set one up yet, although I'd be intrigued to know who set that up as I wonder if they're trying to achieve the same thing we are. So just to confirm - we don't need any updated airflow permissions to run the above curl command in an airflow dag?

jacobwoffenden commented 4 months ago

I'm unable to see easily when/who created a GitHub App

As for using a GitHub App to trigger a GitHub Actions workflow, I'm not sure this is something the Analytical Platform would want to facilitate, so it would be the responsibility of Data Engineering to ensure the application is correctly configured, and the tokens are stored securely for consumption within an Airflow DAG.

It might also be worth consulting with @ministryofjustice/operations-engineering about GitHub App vs. fine-grained token for this use case.

tomholt1 commented 3 months ago

Updated use case

We currently do not have a process for combining Airflow -> create-a-derived-table pipelines, apart from simply guessing the Airflow pipeline completion time and scheduling the time appropriately. There are multiple solutions, but a simple solution is to use an Airflow bash operator and a curl command to workflow_dispatch the github action e.g:

curl -L \
  -X POST \
  -H “Accept: application/vnd.github+json” \
  -H “Authorization: Bearer <auth_token>” \
  -H “X-GitHub-Api-Version: 2022-11-28” \
  https://api.github.com/repos/moj-analytical-services/create-a-derived-table/actions/workflows/<action_name>/dispatches \
  -d ‘{“ref”:“main”}’

The plan is now to create a Github App in moj-analytical-services to control the permissions for the above curl request. After some digging it looks like we will need to create a JWT to authenticate the request.

More on JWT for authentication, we will be needing a PEM file & a ClientID to generate the JWT

The Github App should hold the following permissions:

I don't have the relevant access to create a github app so it would be great if this could be created and some guidance around authenticating would be great, thanks team!

darren1988 commented 1 month ago

To be discussed at next refinement session

SoumayaMauthoorMOJ commented 1 month ago

@darren1988 can you invite @tomholt1 and I to the refinement session? I think @tomholt1 has made some progress on this with another team

tomholt1 commented 1 month ago

No progress has been made as I need the Github App created, I just chased @darren1988 on this

SoumayaMauthoorMOJ commented 1 month ago

oops :-) thanks for clarifying

SoumayaMauthoorMOJ commented 2 days ago

Any update on this? I'm conscious @tomholt1 and I will be leaving soon :-)

darren1988 commented 2 days ago

@SoumayaMauthoorMOJ this ticket has been refined and is scheduled to go into our next sprint commencing on 10/10/24 - 30/10/24