terraform-aws-modules / terraform-aws-app-runner

Terraform module to create AWS App Runner resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/app-runner/aws
Apache License 2.0
33 stars 19 forks source link

Allow image_identifier to be lifecycle ignored #13

Closed booi closed 2 months ago

booi commented 4 months ago

Is your request related to a new offering from AWS?

No

Is your request related to a problem? Please describe.

Yes. When using apprunner in image mode you supply the image_identifier which includes the tag. Typically docs will just write someorg/someproject:latest.

However, in production you usually deploy a specific tag (in our case a semver or git sha). someorg/someproject:v1.5. This is to prevent accidentally deploying the wrong version due to a race condition in whoever sets the latest tag last.

But if you deploy a specific tag and then rerun terraform, terraform picks up that the image_identifier has changed and wants to reset it back to latest.

Describe the solution you'd like.

An option that would effectively add a lifecycle ignore_changes to the image_identifier field. That way terraform can do the initial provision but subsequent deployments can change the image_identifier without bringing terraform out of sync.

Describe alternatives you've considered.

We switched our deploys to tag latest right before deployment. This doesn't eliminate the issue but reduces the window that it could be an issue.

Additional context

bryantbiggs commented 4 months ago

this sounds like a bug for the upstream AWS provider

booi commented 4 months ago

this sounds like a bug for the upstream AWS provider

Hi @bryantbiggs, could you explain how it might be an upstream AWS provider issue? The solution I mentioned does work in the AWS provider, it just needs to be configured in this module.

bryantbiggs commented 4 months ago

because of this statement (if its true):

But if you deploy a specific tag and then rerun terraform, terraform picks up that the image_identifier has changed and wants to reset it back to latest

Using the ignore changes lifecycle is a very heavy handed solution and I suspect nearly everyone using this module would not want that behavior

booi commented 4 months ago

because of this statement (if its true):

But if you deploy a specific tag and then rerun terraform, terraform picks up that the image_identifier has changed and wants to reset it back to latest

Using the ignore changes lifecycle is a very heavy handed solution and I suspect nearly everyone using this module would not want that behavior

Most people do not use terraform for deployment so I'm not sure how that can be true. Is there an alternative solution?

bryantbiggs commented 4 months ago

How do you know that?

booi commented 4 months ago

Well professionally I've worked for a number of companies from small to large (3000+) that all use terraform to provision and configure cloud resources but generally hand off deployment to a CI/CD like github actions, jenkins etc.

Terraform is inherently more risky than a typical deployment as it affects cloud infrastructure and certainly isn't the recommended way to do deployments with ECS or k8s. It would also require you to commit the new image tag to git from a terraform action which isn't that common and doesn't play well in larger multiuser environment.

Have you seen different?

bryantbiggs commented 4 months ago

Disclaimer: I work at AWS as a container specialist with our container service teams (EKS, ECS, Lambda, etc.)

Have you seen different?

I have seen quite a bit 😬

Terraform is inherently more risky than a typical deployment as it affects cloud infrastructure and certainly isn't the recommended way to do deployments with ECS or k8s. It would also require you to commit the new image tag to git from a terraform action which isn't that common and doesn't play well in larger multiuser environment.

These are all very subjective statements. However, there are various parts where static configurations meet dynamically changing values (artifact versions and autoscaling values are the two most common culprits). You can see some of these pains here and here

In short, to support these we'd have to do a hacky thing by supporting dual resources - one with a lifecycle ignore block, and another without. Its a pattern that I abhor and am reluctant to support if at all possible. Instead, I would encourage users to explore alternative avenues, such as using an ECR data source and pulling the image value from there as the input to your AppRunner module configuration, or sometimes SSM parameter store works just as well

booi commented 4 months ago

I would encourage users to explore alternative avenues, such as using an ECR data source and pulling the image value from there as the input to your AppRunner module configuration

OHH I’m not sure why I didn’t think of that. That’s definitely a better approach. Let me see if that’s doable and if there’s a “first-run” issue. Thanks!

booi commented 4 months ago

@bryantbiggs I did a bunch of digging into this as it did come up again and I'm kind of at a loss. I tried to use an aws_ecr_image data query to insert the correct image into the image configuration which seems to work 'ok'-ish with 2 fairly major problems.

  1. On first run, the ECR repo is empty (terraform just created it). So the data query actually fails the entire terraform plan. It requires a lot of manual work to get it past this point.
  2. It's not easy to select the correct image. The 2 best candidates are (1) the latest tag or (2) the most recent image. Both have their shortcomings. (1) AppRunner only actually deploys if the deployed image tag is different than the one currently running. So the actual apprunner service will not be using the latest tag. Even if we also tag the current image with latest, that means terraform will force apprunner to redeploy to the latest tag even if it's the same causing terraform to run for at least 2-5 minutes waiting for the service to redeploy. (2) the most recent image is not necessarily the "current" image. CI/CD will build and push a container even if it doesn't intend on running it (e.g. a dev or test container).

I'm not sure how this module fits into a production environment with these shortcomings. Let me know if you think there's a different workaround. I do feel like an optional lifecycle policy on the image configuration would solve all these problems.

turingbeing commented 4 months ago

(2) the most recent image is not necessarily the "current" image. CI/CD will build and push a container even if it doesn't intend on running it (e.g. a dev or test container).

We segregate our ECR infrastructure to facilitate this, Dev / QA / UAT images are pushed to a different ECR, and only Approved Releases that are merged to main are built and pushed to the "Live / Production" ECR.

We also employ a fix-forward strategy, so generally don't rollback to previous container versions, most of the time those are caught by staging before making it to live.

We don't use the latest tag, but instead the short hash from the repo, which also corresponds to "app version". In our case most_recent works nicely.

Deployments are handled by Buildkite.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 2 months ago

This issue was automatically closed because of stale in 10 days

github-actions[bot] commented 1 month ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.