google_cloud_run_v2_service always tainted and must be replaced if deployed to

dv01d commented 7 months ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to a user, that user is claiming responsibility for the issue.
Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version

Teerform v1.7.0 hashicorp/google v5.27.0

Affected Resource(s)

google_cloud_run_v2_service

Terraform Configuration

resource "google_cloud_run_v2_service" "hello" {

  name     = "hello-${var.env}"
  project  = local.project_id
  location = local.default_region
  labels   = {
    application_name = "hello-${var.env}"
  }
  #Use a dummy image to initialize
  template {
    service_account = local.cloudrun_sa

    containers {
      image = "us-docker.pkg.dev/cloudrun/container/hello"

      # Testing envrionments shouldn't be running all the time so set min to 0
      # Similarly they shouldn't see much traffic so max should be 2
      ports {
        container_port = 8080
      }

      env {
        name = "SECRET"
        value_source {
          secret_key_ref {
            secret = "super-secret-${var.env}"
            version = "latest"
          }
        }
      }
    }
    scaling {
      min_instance_count = 0
      max_instance_count = 2
    }

  }

  # Prevent Terraform from managing ongoing deployments or deleting the resource
  lifecycle {
    ignore_changes = [
      client,
      client_version,
      template[0].containers[0].env[0].name,
      template[0].containers[0].env[0].value_source,
      template[0].containers[0].image, 
      template[0].labels["application_name"],
      template[0].labels["commit-sha"],
      template[0].labels["managed-by"], ]
  }

Debug Output

No response

Expected Behavior

Should have at least tried to merge in changes, ignored it, and more importantly not delete everything.

Actual Behavior

Any changes to a deployment results in taint, and rather than updating or reconciling it is destroyed and recreated.

Steps to reproduce

terraform plan/apply
Deploy to cloudrun service created by terraform, or even just edit and save yaml for the cloudrun service resulting in a noop (not even a new revision).
`terraform plan/apply' results in deletion and recreation of cloudrun service

Important Factoids

Attempted many ignore statements as you can see, to try and prevent deletion, but at this point I can't tell what I can ignore that doesn't require tf to NOT recreate the service every time.

References

Similar behavior related to revisions here, but seems worse than originally documented as this always causes some sort of dataloss through destroy: https://github.com/hashicorp/terraform-provider-google/issues/14569

ggtisc commented 6 months ago

Hi @dv01d!

Please share the output log and the attributes that you are changing to see in detail the harassment that the API is taking, because until now there were changed different attributes but the result finished in updated-in-place

dv01d commented 6 months ago

Actually, not changing anything at the moment. As stated I was doing a noop and editing and saving the yaml via the console. Terraform wants to destroy it. The intent here is to be able to have terraform 'create' the resource, and have the cloud run instance updated via any other means (i.e. deploy a new image) like gcloud, console, and leverage CI/CD, but that doesn't seem possible. I just tried again after update to 5.28, and while the after "deployment" run wanted to delete it, I couldn't replicate it again even when changing the image and deploying from the console. So perhaps it is solved through subsequent runs from the update/upgrade of the provider.

Here is some plan output with preventing destroy:

OpenTofu planned the following actions, but then encountered a problem:

  # google_cloud_run_v2_service.test is tainted, so it must be replaced
-/+ resource "google_cloud_run_v2_service" "test" {
      - annotations             = {} -> null
      - client                  = "cloud-console" -> null
      ~ conditions              = [
          - {
              - execution_reason     = ""
              - last_transition_time = "2024-04-30T19:34:03.008453Z"
              - message              = ""
              - reason               = ""
              - revision_reason      = ""
              - severity             = ""
              - state                = "CONDITION_SUCCEEDED"
              - type                 = "RoutesReady"
            },
          - {
              - execution_reason     = ""
              - last_transition_time = "2024-04-30T19:27:03.475384Z"
              - message              = ""
              - reason               = ""
              - revision_reason      = ""
              - severity             = ""
              - state                = "CONDITION_SUCCEEDED"
              - type                 = "ConfigurationsReady"
            },
        ] -> (known after apply)
      ~ create_time             = "2024-04-30T19:27:03.337249Z" -> (known after apply)
      ~ creator                 = "email@example.com" -> (known after apply)
      - custom_audiences        = [] -> null
      + delete_time             = (known after apply)
      ~ effective_annotations   = {} -> (known after apply)
      ~ etag                    = "\"CKePxbEGEJj6hrIB/cHJvamVjdHMvcHJqLXQtY2xvdWRydW4tZ"" -> (known after apply)
      + expire_time             = (known after apply)
      ~ generation              = "2" -> (known after apply)
      ~ id                      = "projects/prj-t-cloudrun-ecgb/locations/us-central1/services/test-test" -> (known after apply)
      ~ ingress                 = "INGRESS_TRAFFIC_ALL" -> (known after apply)
      ~ last_modifier           = "email@example.com" -> (known after apply)
      ~ latest_created_revision = "projects/prj-t-cloudrun-ecgb/locations/us-central1/services/test-test/revisions/test-test-00001-qds" -> (known after apply)
      ~ latest_ready_revision   = "projects/prj-t-cloudrun-ecgb/locations/us-central1/services/test-test/revisions/test-test-00001-qds" -> (known after apply)
      ~ launch_stage            = "GA" -> (known after apply)
        name                    = "test-test"
      ~ observed_generation     = "2" -> (known after apply)
      ~ reconciling             = false -> (known after apply)
      ~ terminal_condition      = [
          - {
              - execution_reason     = ""
              - last_transition_time = "2024-04-30T19:34:03.044660Z"
              - message              = ""
              - reason               = ""
              - revision_reason      = ""
              - severity             = ""
              - state                = "CONDITION_SUCCEEDED"
              - type                 = "Ready"
            },
        ] -> (known after apply)
      ~ traffic_statuses        = [
          - {
              - percent  = 100
              - revision = ""
              - tag      = ""
              - type     = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
              - uri      = ""
            },
        ] -> (known after apply)
      ~ uid                     = "738b39ba-854f-45c2-9971-1cee712e6967" -> (known after apply)
      ~ update_time             = "2024-04-30T19:33:59.373407Z" -> (known after apply)
      ~ uri                     = "https://test-test-i4f7rpijka-uc.a.run.app" -> (known after apply)
        # (5 unchanged attributes hidden)

      ~ template {
          - annotations                      = {} -> null
          - labels                           = {} -> null
          ~ max_instance_request_concurrency = 80 -> (known after apply)
          - session_affinity                 = false -> null
          ~ timeout                          = "300s" -> (known after apply)
            # (1 unchanged attribute hidden)

          ~ containers {
              - args       = [] -> null
              - command    = [] -> null
              - depends_on = [] -> null
                # (1 unchanged attribute hidden)

              ~ ports {
                  ~ name           = "http1" -> (known after apply)
                    # (1 unchanged attribute hidden)
                }

              - resources {
                  - cpu_idle          = true -> null
                  - limits            = {
                      - "cpu"    = "1000m"
                      - "memory" = "512Mi"
                    } -> null
                  - startup_cpu_boost = false -> null
                }

              - startup_probe {
                  - failure_threshold     = 1 -> null
                  - initial_delay_seconds = 0 -> null
                  - period_seconds        = 240 -> null
                  - timeout_seconds       = 240 -> null

                  - tcp_socket {
                      - port = 3000 -> null
                    }
                }

                # (1 unchanged block hidden)
            }

            # (1 unchanged block hidden)
        }

      - traffic {
          - percent = 100 -> null
          - type    = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST" -> null
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.
╷
│ Error: Instance cannot be destroyed
│ 
│   on cloudrun.tf line 152:
│  152: resource "google_cloud_run_v2_service" "test" {
│ 
│ Resource google_cloud_run_v2_service.test has lifecycle.prevent_destroy set, but the plan calls for this resource to be destroyed. To avoid this error and continue with the plan, either
│ disable lifecycle.prevent_destroy or reduce the scope of the plan using the -target flag.

ggtisc commented 6 months ago

If I'm understanding this issue doesn't happens when you update the resource properties, but if you change from the provider version 5.27.0 to 5.28.0 then this forces the destruction of the existing resource instead an update-in-place. Is that right?

hashicorp / terraform-provider-google