nikhils98 / prefect-work-pool-hosting

Self host Prefect with ECS work pool using Pulumi
https://nikhils98.github.io/posts/self-host-prefect-with-ecs-work-pool/
1 stars 0 forks source link

Lack of documentation #1

Open mpecovnikplanet opened 2 weeks ago

mpecovnikplanet commented 2 weeks ago

Thank you very much for this example repo. It is really helping me with my own problem. But I do think the example would be even better if you would add some docs to the pulumi-based classes and what they actually do.

For instance, it is not clear what DeploymentFargateTaskDef actually does compared to the worker service? Do I understand correctly, that

I am new to ECS so sorry if I being stupid. Thank you for your answers in advance.

nikhils98 commented 1 week ago

Hi thanks for reaching out! I'm very glad that you find this repository useful. Unfortunately I haven't gotten a chance to add documentation for the pulumi codebase so I agree that some of the bits may be difficult to comprehend which is why I appreciate you asking this question.

Your understanding of server_ecs_service and worker_ecs_service is absolutely right.

deployment_fargate_task_def on the other hand is unrelated to the flow that "does the work". If we look at the main function in the main.py file at the root of the repository, you'll notice the line mean_and_median.deploy. This line updates the deployment in the Prefect server and is run automatically via github actions when changes are merged in main - ref build_and_deploy.yml

However our Prefect server is inaccessible from the internet, the reason for which is explained under the section Dynamically provision infrastructure in the accompanying guide so Github cannot really access it. What it does instead is launch a fargate task that runs in the same VPC as the Prefect server and that fargate task does the deployment. deployment_fargate_task_def is the task def for that fargate task.

I hope this is useful.

One interesting thing to note is that our deployment is configured to always fetch the image tagged as mean_and_median-latest from ECR. So unless you change anything in the args to the mean_and_median.deploy line, there's no need to run the deployment task. But this is an optimization that I deliberately didn't go into as it would have made things even more complicated.