hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.28k stars 1.72k forks source link

Terraform changes are always detected with the Cloud run revision when deploying with the gcloud CLI using tags #17218

Open JSkimming opened 7 months ago

JSkimming commented 7 months ago

TL;DR To work around the issue

Thanks to @yanweiguo for the fix in this comment.

The issue is caused because the gcloud CLI generates a revision if a tag is applied when updating the Cloud Run service. The fix is to update the service without a tag, then apply a tag in a later command.

Here are the working steps:

  1. terraform apply
  2. gcloud run services update revision-cant-be-ignored --project my-example-proj --region northamerica-northeast2 --no-traffic --image us-docker.pkg.dev/cloudrun/container/hello
  3. gcloud run services update-traffic revision-cant-be-ignored --project my-example-proj --region northamerica-northeast2 --set-tags staging=LATEST
  4. gcloud run services update-traffic revision-cant-be-ignored --project my-example-proj --region northamerica-northeast2 --remove-tags staging --to-latest
  5. terraform apply -refresh-only
  6. terraform plan

Community Note

Terraform Version

1.7.3

Affected Resource(s)

google_cloud_run_v2_service

Terraform Configuration

provider "google" {
  project = "my-example-proj"
  region  = "northamerica-northeast2"
}

data "google_project" "project" {
}

resource "google_cloud_run_v2_service" "default" {
  name     = "revision-cant-be-ignored"
  location = "northamerica-northeast2"
  ingress  = "INGRESS_TRAFFIC_ALL"

  template {
    containers {
      image = "gcr.io/cloudrun/hello"

      env {
        name  = "FOO"
        value = "bar"
      }

      env {
        name  = "mickey"
        value = "mouse"
      }
    }
  }

  # Ignore this config as it is changed by a deployment.
  lifecycle {
    ignore_changes = [
      client,
      client_version,
      template[0].containers[0].image,
      #template[0].revision,
    ]
  }
}

Debug Output

If I DO NOT ignore the revision, terraform always detects changes after a deployment using the gcloud CLI.

Show output ``` data.google_project.project: Reading... google_cloud_run_v2_service.default: Refreshing state... [id=projects/my-example-proj/locations/northamerica-northeast2/services/revision-cant-be-ignored] data.google_project.project: Read complete after 2s [id=projects/my-example-proj] Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: ~ update in-place Terraform will perform the following actions: # google_cloud_run_v2_service.default will be updated in-place ~ resource "google_cloud_run_v2_service" "default" { id = "projects/my-example-proj/locations/northamerica-northeast2/services/revision-cant-be-ignored" name = "revision-cant-be-ignored" # (27 unchanged attributes hidden) ~ template { - revision = "revision-cant-be-ignored-00007-fic" -> null # (6 unchanged attributes hidden) # (2 unchanged blocks hidden) } # (1 unchanged block hidden) } Plan: 0 to add, 1 to change, 0 to destroy. ```

If I DO ignore the revision, then terraform fails to apply as it tries to use the current revision.

Show output ``` google_cloud_run_v2_service.default: Modifying... [id=projects/my-example-proj/locations/northamerica-northeast2/services/revision-cant-be-ignored] β”‚ Error: Error updating Service "projects/my-example-proj/locations/northamerica-northeast2/services/revision-cant-be-ignored": googleapi: Error 409: Revision named 'revision-cant-be-ignored-00007-fic' with different configuration already exists. β”‚ β”‚ with google_cloud_run_v2_service.default, β”‚ on main.tf line 9, in resource "google_cloud_run_v2_service" "default": β”‚ 9: resource "google_cloud_run_v2_service" "default" { β”‚ ```

Expected Behavior

Using Terraform to create the Cloud run instance and the gcloud CLI to deploy new images, I expect to be able to run terraform plan and detect no changes if ignoring the image.

NOTE: I deploy using a tag before promoting 100% of traffic to the latest revision.

Actual Behavior

The Cloud run service is always showing as requiring terraform plan changes after deploying new images using the gcloud CLI.

Steps to reproduce

  1. terraform apply
  2. gcloud run services update revision-cant-be-ignored --project my-example-proj --region northamerica-northeast2 --tag staging --no-traffic --image us-docker.pkg.dev/cloudrun/container/hello
  3. gcloud run services update-traffic revision-cant-be-ignored --project my-example-proj --region northamerica-northeast2 --remove-tags staging --to-latest
  4. terraform apply -refresh-only
  5. terraform plan

Changes are now detected.

Show output ``` data.google_project.project: Reading... google_cloud_run_v2_service.default: Refreshing state... [id=projects/my-example-proj/locations/northamerica-northeast2/services/revision-cant-be-ignored] data.google_project.project: Read complete after 2s [id=projects/my-example-proj] Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: ~ update in-place Terraform will perform the following actions: # google_cloud_run_v2_service.default will be updated in-place ~ resource "google_cloud_run_v2_service" "default" { id = "projects/my-example-proj/locations/northamerica-northeast2/services/revision-cant-be-ignored" name = "revision-cant-be-ignored" # (27 unchanged attributes hidden) ~ template { - revision = "revision-cant-be-ignored-00007-fic" -> null # (6 unchanged attributes hidden) # (2 unchanged blocks hidden) } # (1 unchanged block hidden) } Plan: 0 to add, 1 to change, 0 to destroy. ```

Important Factoids

It's crucial to have step 2 (above) to use a tag. If we just deploy without a tag then the cloud run revision does not show as changes in the final terraform plan

  1. terraform apply
  2. gcloud run services update revision-cant-be-ignored --project my-example-proj --region northamerica-northeast2 --image us-docker.pkg.dev/cloudrun/container/hello
  3. terraform apply -refresh-only
  4. terraform plan

No changes are detected.

Show output ``` data.google_project.project: Reading... google_cloud_run_v2_service.default: Refreshing state... [id=projects/my-example-proj/locations/northamerica-northeast2/services/revision-cant-be-ignored] data.google_project.project: Read complete after 2s [id=projects/my-example-proj] No changes. Your infrastructure matches the configuration. ```

References

No response

b/325032069

JSkimming commented 7 months ago

To add some context. Our deployment process uses Terraform to create infrastructure and the gcloud CLI to deploy changes, e.g. build docker images and update the cloud run service image.

As part of the release, we run terraform plan to detect changes, and if there are changes, the fully automated release process pauses, requiring manual approval before applying any infrastructural changes.

We also deploy to a staging revision and execute final tests before promoting 100% of traffic to the latest revision. This bit is crucial as the problem does not occur if we deploy immediately sending 100% of traffic to the latest release (e.g. don't use tags).

But as we are detecting terraform changes all our releases require manual approval.

Our workaround is to ignore the Cloud run revision, which works fine, until we do want to make Terraform changes to the Cloud run service, at which pont it fails becuase it cannot deploy a duplicate revision.

edwardmedia commented 7 months ago

@JSkimming not sure how much we can do here. Have you considered import before you we do want to make Terraform changes to the Cloud run service?

JSkimming commented 7 months ago

@edwardmedia Thanks for picking this up.

My view is: If the revision is not set in the Terraform files, a new one should always be set if changes are detected. Which is my reading of the documentation:

revision - (Optional) The unique name for the revision. If this field is omitted, it will be automatically generated based on the Service name.

In the scenario I describe, where the revision is not set using Terraform but is picked up in the terraform state after synchronising with the infrastructure, I am left with two choices:

  1. Ignore the revision, then I can't make changes to the Cloud run service (e.g. change an environment variable) with terraform, as it then fails with the message

    Error 409: Revision named 'revision-cant-be-ignored-00007-fic' with different configuration already exists.

  2. Don't ignore the revision, and always apply changes.

I think the behaviour should match the documentation "If this field is omitted, it will be automatically generated based on the Service name."

JSkimming commented 7 months ago

Have you considered import before you we do want to make Terraform changes to the Cloud run service?

@edwardmedia What would import do? The cloud-run service is already managed by Terraform, and is part of the state?

edwardmedia commented 7 months ago

@JSkimming because a diff has occurred between your config and the actual state on the server after you deployed the functions via gcloud. Import should bring your terraform state to match the one on the server, and then you can apply whatever changes you need in the config. Keep the ignore .. part in the config. This approach might help in addressing below need.

which works fine, until we do want to make Terraform changes to the Cloud run service, at which pont it fails becuase it cannot deploy a duplicate revision.
JSkimming commented 7 months ago

@edwardmedia Thanks for the clarification. I've tried the import, but it gives the error Error: Resource already managed by Terraform

$ terraform import google_cloud_run_v2_service.default projects/my-example-proj/locations/northamerica-northeast2/services/revision-cant-be-ignored

google_cloud_run_v2_service.default: Importing from ID "projects/my-example-proj/locations/northamerica-northeast2/services/revision-cant-be-ignored"...
data.google_project.project: Reading...
google_cloud_run_v2_service.default: Import prepared!
  Prepared google_cloud_run_v2_service for import
data.google_project.project: Read complete after 2s [id=projects/my-example-proj]

β”‚ Error: Resource already managed by Terraform
β”‚
β”‚ Terraform is already managing a remote object for google_cloud_run_v2_service.default. To import to this address you must first remove the existing object from the state.

Nonetheless, while I appreciate your suggestion, it still seems like a workaround, for which I already have one: to comment and uncomment the #template[0].revision line of the ignore_changes section.

Further background

I raise the issue because I believe there is either a bug or a flaw in the implementation. Namely, I can't rely on the default behaviour for auto-generating the revision.

The behaviour I believe it should provide is, by not specifying the revision it always auto-generates the revision. Also, changes to the cloud-run service that deploy new revisions outside of Terraform do not show as changes when running terraform plan.

Many fields in the state behave similarly, e.g. changes in the infrastructure that are maintained in the state do not show as changes with terrafrom plan but are updated with the next terraform apply. I'm suggesting revision be one such field.

I actually think there should not be a revision argument but instead a revision-suffix to mirror the CLI, but that may be outside the scope of this issue.

Summary

Can revision be auto-generated if omitted (as per the documentation), even if ignored?

edwardmedia commented 7 months ago

@JSkimming to solve the error you encountered during the import, you may want to review state rm command before you run the import.

Thanks for the suggestion for comparing the provider's functionality to the gcloud's. While we try to make the terraform tool as powerful as possible. there is a difference between them. Terraform still has limitation.

Taking a look at the api, some of the functionalities seem not supported directly.

I don't think there is a perfect solution for your case. Keep in mind, Terraform apply always tries to bring the state on the server up to match what you defined in the terraform config. Since you have updated the state on the server via gcloud, if you apply again without any changes in the config, what do you expect to happen?

ignore_changes is a workaround and might be helpful depending upon how you design your workflow.

Does this make sense?

JSkimming commented 6 months ago

@edwardmedia Thanks for the suggestion of using state rm. It didn't work, as there is still an inconsistency between the state of the terraform config and the data returned by the API. Namely, the terraform config does not specify a revision, and the API returns a revision.

However, your pointers to the API documentation did give me an alternative avenue for investigation.

When I deploy using a tag:

gcloud run services update revision-cant-be-ignored --tag staging --no-traffic --image gcr.io/cloudrun/hello

Then, after testing the tag-specific endpoint, I send all the traffic to latest and delete the tag.

gcloud run services update-traffic revision-cant-be-ignored --to-latest --clear-tags

At this point, the API returns a revision. Consequently, terraform plan highlights the inconsistency requiring a terraform apply.

But if I deploy without a tags.

gcloud run services update revision-cant-be-ignored --image us-docker.pkg.dev/cloudrun/container/hello

Then, the API does not return a revision, and terraform plan indicates there are no changes.

So, I have a workaround that involves making a second update that does not require manual intervention; the downside is the deployment creates two revisions instead of one.

JSkimming commented 6 months ago

Ultimately, though, the API is behaving inconsistently. By that I mean. Deploying using a tag, then deleting the tag and sending all traffic to latest leaves the service in an identical state as deploying without a tag. Unfortunately, the API returns a revision in the former situation and no revision in the latter. I've compared the returned JSON; the only difference is computed values and the presence or absence of revision.

I still believe this can and should be handled in the terraform provider, e.g. If no revision is specified in the configuration, then the provider could ignore the revision from the API.

Though to be consistent with the CLI, I think the terraform config should take revision_suffix, not a revision, and treat revision as a computed attribute as it does with latest_created_revision and latest_ready_revision.

What do you think, @edwardmedia?

edwardmedia commented 6 months ago

revision needs to be set computed?

JSkimming commented 6 months ago

revision needs to be set computed?

Yes.

Given how I now understand the limitations of revision (it must start with <service name>-), if it isn't, the API returns the error "The revision name must be prefixed by the name of the enclosing Service with a trailing -" it makes the most sense that only the suffix can be specified (as it does with the CLI)

Show output ``` $ terraform apply data.google_project.project: Reading... data.google_project.project: Read complete after 2s [id=projects/my-example-proj] Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # google_cloud_run_v2_service.default will be created + resource "google_cloud_run_v2_service" "default" { + conditions = (known after apply) + create_time = (known after apply) + creator = (known after apply) + delete_time = (known after apply) + effective_annotations = (known after apply) + effective_labels = (known after apply) + etag = (known after apply) + expire_time = (known after apply) + generation = (known after apply) + id = (known after apply) + ingress = "INGRESS_TRAFFIC_ALL" + last_modifier = (known after apply) + latest_created_revision = (known after apply) + latest_ready_revision = (known after apply) + launch_stage = (known after apply) + location = "northamerica-northeast2" + name = "revision-cant-be-ignored" + observed_generation = (known after apply) + project = "my-example-proj" + reconciling = (known after apply) + terminal_condition = (known after apply) + terraform_labels = (known after apply) + traffic_statuses = (known after apply) + uid = (known after apply) + update_time = (known after apply) + uri = (known after apply) + template { + max_instance_request_concurrency = (known after apply) + revision = "version-1" + service_account = (known after apply) + timeout = (known after apply) + containers { + image = "gcr.io/cloudrun/hello" + env { + name = "FOO" + value = "bar" } + env { + name = "mickey" + value = "mouse" } } } } Plan: 1 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes google_cloud_run_v2_service.default: Creating... β•· β”‚ Error: Error creating Service: googleapi: Error 400: template.revision: The revision name must be prefixed by the name of the enclosing Service with a trailing - β”‚ Details: β”‚ [ β”‚ { β”‚ "@type": "type.googleapis.com/google.rpc.BadRequest", β”‚ "fieldViolations": [ β”‚ { β”‚ "description": "The revision name must be prefixed by the name of the enclosing Service with a trailing -", β”‚ "field": "template.revision" β”‚ } β”‚ ] β”‚ } β”‚ ] β”‚ β”‚ with google_cloud_run_v2_service.default, β”‚ on main.tf line 9, in resource "google_cloud_run_v2_service" "default": β”‚ 9: resource "google_cloud_run_v2_service" "default" { β”‚ β•΅ ```
JSkimming commented 5 months ago

@edwardmedia, what happens now? Does another team pick this up?

maleksah commented 4 months ago

Hello,

I have the same issues on my clients projects.

We have cloud run services that are deployed by terraform (all infra staff, like Cloud run vpc access, secrets, cpu, memory, max instances, etc...). We put the ignore changes on the container image, because we delegate this responsibility to other application pipeline

lifecycle {
    ignore_changes = [
      template[0].containers[0].image,
    ]
  }

We have another CD pipeline (application centric) that updates the image of the cloud run services (gcloud run deploy...).

The issue is, everytime I do a plan after a gcloud deploy on a CR service, I see changes on revision, terraform wants to set it to null so the service is redeployed even if there is no changes with the current revision, the deployment isn't working if I put changes on a vpc access for example (Error 409: Revision named 'xxxx-00085-woc' with different configuration already exists)

if I put

    ignore_changes = [
      template[0].containers[0].image,
      template[0].revision,
    ]
  }

Is it a way to fix this please?

For example, if we have the lifecycle ignore_changes on revision, it generates a new revision if there is any change of the CR service.

Thanks

yanweiguo commented 3 months ago

This is duplicated with https://github.com/hashicorp/terraform-provider-google/issues/13410.

The cause is explained in https://github.com/hashicorp/terraform-provider-google/issues/13410#issuecomment-2143043183.

@JSkimming @maleksah would using the following two commands instead of gcloud run services update --tag work for you?

  1. gcloud run services update (without --tag)
  2. gcloud run services update-traffic --set-tags

You have to look up the latest created revision name in step 1.

JSkimming commented 2 months ago

@yanweiguo Thanks for the tip, it works.

You have to look up the latest created revision name in step 1.

Fortunalty, I don't need to look up the latest created revision as I use LATEST like this.

gcloud run services update-traffic [SERVICE] --set-tags staging=LATEST