hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.31k stars 1.73k forks source link

Cloud Run: cannot reconcile service edited through console #13410

Open mattmoor opened 1 year ago

mattmoor commented 1 year ago

Community Note

Terraform Version

terraform -v
Terraform v1.2.3
on darwin_arm64
+ provider registry.terraform.io/chainguard-dev/ko v0.0.4
+ provider registry.terraform.io/hashicorp/google v4.47.0
+ provider registry.terraform.io/hashicorp/google-beta v4.47.0
...

Affected Resource(s)

Terraform Configuration Files

This should affect virtually any Cloud Run service deployed through terraform.

Debug Output

N/A

Panic Output

N/A

Expected Behavior

terraform reconciles the service

Actual Behavior

After ~20 minutes it times out and prints an error with a 409 because the named revision already exists.

Steps to Reproduce

  1. Deploy a service via terraform,
  2. Edit it via the Console's editor (not yaml),
  3. Deploy the service again via terraform.

Important Factoids

The Knative resource model used by Cloud Run supports "bring your own revision name" where you can use spec.template.metadata.name to name the revision that the Service will create. This is used by the Cloud Run console when edits are made.

If changes are made to the service without removing or updating this name, then things will fail to deploy.

cc @steren

References

b/272367711

edwardmedia commented 1 year ago

@mattmoor when you execute step 3, do you mean you want to re-deploy the terraform config which is the same as step 1? Can you share your config and the debug log?

mattmoor commented 1 year ago

@edwardmedia it doesn't matter, it could be asking terraform to reconcile things back to how they were, or deploying something new.

The edit we made for 2. was to add a trivial env var to trigger a rollout, e.g. FOO=bar env var.

You can repro it with the examples in https://github.com/chainguard-dev/terraform-google-prober

mattmoor commented 1 year ago

I'd recommend the basic one as the complex one spins up GCLB, which is pricy.

https://github.com/chainguard-dev/terraform-google-prober/tree/main/examples/basic

steren commented 1 year ago

here are some more details:

The YAML of my service is now:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: tf-test
  namespace: '607903476290'
  selfLink: /apis/serving.knative.dev/v1/namespaces/607903476290/services/tf-test
  uid: 5493a021-319a-446e-92eb-99e7bfd39d48
  resourceVersion: AAXxsHOMS4Y
  generation: 2
  creationTimestamp: '2023-01-07T18:09:16.917847Z'
  labels:
    cloud.googleapis.com/location: us-central1
  annotations:
    run.googleapis.com/client-name: cloud-console
    serving.knative.dev/creator: steren.giannini@gmail.com
    serving.knative.dev/lastModifier: steren.giannini@gmail.com
    client.knative.dev/user-image: us-docker.pkg.dev/cloudrun/container/hello
    run.googleapis.com/ingress: all
    run.googleapis.com/ingress-status: all
spec:
  template:
    metadata:
      name: tf-test-00002-juj
      annotations:
        run.googleapis.com/client-name: cloud-console
        autoscaling.knative.dev/maxScale: '100'
    spec:
      containerConcurrency: 80
      timeoutSeconds: 300
      serviceAccountName: 607903476290-compute@developer.gserviceaccount.com
      containers:
      - image: us-docker.pkg.dev/cloudrun/container/hello
        ports:
        - name: http1
          containerPort: 8080
        env:
        - name: FOO
          value: Bar
        resources:
          limits:
            cpu: 1000m
            memory: 512Mi
  traffic:
  - percent: 100
    latestRevision: true

Note the spec.template.metadata.name attribute set.

I run terraform plan note that it *does not call out that spec.template.metadata.name will be reset to null

terraform plan
data.google_iam_policy.noauth: Reading...
data.google_iam_policy.noauth: Read complete after 0s [id=3450855414]
google_cloud_run_service.default: Refreshing state... [id=locations/us-central1/namespaces/steren-playground/services/tf-test]
google_cloud_run_service_iam_policy.noauth: Refreshing state... [id=v1/projects/steren-playground/locations/us-central1/services/tf-test]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # google_cloud_run_service.default will be updated in-place
  ~ resource "google_cloud_run_service" "default" {
        id                         = "locations/us-central1/namespaces/steren-playground/services/tf-test"
        name                       = "tf-test"
        # (4 unchanged attributes hidden)

      ~ metadata {
          ~ annotations      = {
              - "client.knative.dev/user-image"     = "us-docker.pkg.dev/cloudrun/container/hello" -> null
              ~ "run.googleapis.com/client-name"    = "cloud-console" -> "terraform"
              - "run.googleapis.com/ingress"        = "all" -> null
                # (3 unchanged elements hidden)
            }
            # (6 unchanged attributes hidden)
        }

      ~ template {

          ~ spec {
                # (3 unchanged attributes hidden)

              ~ containers {
                    # (3 unchanged attributes hidden)

                  - env {
                      - name  = "FOO" -> null
                      - value = "Bar" -> null
                    }

                    # (2 unchanged blocks hidden)
                }
            }

            # (1 unchanged block hidden)
        }

        # (1 unchanged block hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

I run terraform apply and it indeed seems to hang.

2 things:

  1. Is spec.template.metadata.name treated specially by Terraform? I would expect it to be reset to null in the plan and apply
  2. Why does it hang?
FabianFrank commented 1 year ago

This problem also happens if you provision a cloud run service using terraform google_cloud_run_v2_service and then deploy revisions in CI/CD. Either Terraform will always detect and apply "changes" because the revision has changed or if you add the revision to lifecycle ignore_changes then you get the error Revision named ... with different configuration already exists.

jamiezieziula commented 1 year ago

I had this same issue - I deployed a cloudrun service via Terraform (terraform cloud) and then subsequently deployed new revisions with updated image tags via this GitHub actions workflow.

I was getting the same error:

Error 409: Revision named 'service-name-00048-fts' with different configuration already exists.

I was able to circumvent this error with 2 code changes. The first - adding the random_uuid terraform generator, and providing a unique revision name to the terraform resource.

resource "random_uuid" "cloudrun_revision_id" {
  keepers = {
    first = timestamp()
  }
}

resource "google_cloud_run_v2_service" "service" {
  name         = var.cloudrun_name

  template {
    revision = "${var.cloudrun_name}-${random_uuid.cloudrun_revision_id.result}"
  }
}

Secondly, ignoring the following lifecycle changes:

  lifecycle {
    ignore_changes = [
      annotations,
      client_version,
      client,
      labels,
      template.0.annotations,
      template.0.labels,
    ]
  }

Not ideal, but a fairly simple workaround that allowed me manage the service from two angles. I hope this helps!

FabianFrank commented 1 year ago

@jamiezieziula the problem with that solution though is that every time you've deployed a new revision without terraform then the next terraform run will create a new revision even if there are no changes.

enchorb commented 1 year ago

Any update on a fix for this that doesn't involve deploying a new revision on every TF apply?

steren commented 1 year ago

While I agree this should be fixed, I am also wondering if the same issue occurs in the v2 resources.

I recommend switching to v2 as a workaround.

FabianFrank commented 1 year ago

I recommend switching to v2 as a workaround.

It does reproduce with v2, see https://github.com/hashicorp/terraform-provider-google/issues/13410#issuecomment-1404610413

steren commented 1 year ago

I'm not sure everyone on that thread is talking about the same things.

Let me recap:

  1. Any update to spec.template will create a new Revision. This is how Cloud Run works.
  2. If spec.template.metadata.name is set, and a revision already exists with this name, Cloud Run will reject the update. This is how Cloud Run works.
  3. The issue reported by @mattmoor is when using Terraform, then using the UI to make a change. The UI will set spec.template.metadata.name. What is unclear is why this name isn't just reconciled by Terraform.

If you have an issue with 1. or 2., unfortunately, these are "working as intended". Please confirm that you are the conditions 3.

FabianFrank commented 1 year ago
  1. Any update to spec.template will create a new Revision. This is how Cloud Run works.

This is expected and desired. You want changes to the terraform config to update your Cloud Run service.

  1. If spec.template.metadata.name is set, and a revision already exists with this name, Cloud Run will reject the update. This is how Cloud Run works.

I do not set this property in my terraform config or anywhere else manually. I expect terraform, the UI, gcloud CLI, REST API, etc. to generate a new unique revision name when needed. This generally works, except, see below.

  1. The issue reported by @mattmoor is when using Terraform, then using the UI to make a change. The UI will set spec.template.metadata.name. What is unclear is why this name isn't just reconciled by Terraform.

AFAIK it boils down to what you set ignore_lifecycle_changes to. Usually after realizing the UI/API deployments cause terraform to want to create a new revision people set it to something like this:

  lifecycle {
    ignore_changes = [template[0].revision, labels, annotations, template[0].annotations, template[0].containers[0].image, client, client_version, template[0].labels]
  }

However, ignoring template[0].revision seems to stop Terraform from generating a new revision name when it actually should deploy a new revision and then it fails to deploy with the error googleapi: Error 409: Requested entity already exists. If you stop ignoring template[0].revision then Terraform will detect changes and redeploy unexpectedly if you for example deploy a new image (which is ignored via template[0].containers[0].image) because the revision has changed.

justinmahood commented 1 year ago

Hey folks, there may be a few things getting tangled up here but want to share the results of some testing that may help.

@mattmoor's original issue (Deploy a CR service via v1 Terraform -> make a change in another client that deploys a new revision -> deploy again via TF and fail) will always be a problem with the v1 Terraform resource. The v2 TF resource should work correctly in that circumstance, provided that you did not provide a revision name in the initial V2 Terraform deployment. No ignore_changes should be required.

@FabianFrank , can you share more details of your repro using the v2 resource, per https://github.com/hashicorp/terraform-provider-google/issues/13410#issuecomment-1404610413? Assuming you did not provide a revision name in the initial terraform deployment, I cannot reproduce this behavior.

trriplejay commented 1 year ago

Hi @justinmahood, I can't speak for @FabianFrank but it seems like I'm experiencing the same issue as him using google_cloud_run_v2_service.

My resource definition does not include a revision field, even for the initial creation. my ignore_changes is set to ignore template[0].containers[0].image because that is the field I want to update outside of the context of terraform (via gcloud run deploy <service> --image <image>)

There are two issues I'm running into, neither of which result in the behavior I would ideally want:

with the revision field left out, and also not included in ignore_changes:

so, it seems like the solution to that would be to add this revision field to ignore_changes, and this does solve the problem of creating unnecessary revisions, however if I have to actually make a change to the TF definition after my gcloud run deploy (for example changing max_instance_count from 10 to 20) that is when I see this other error:

Error: Error updating Service "projects/.../services/myservice": googleapi: Error 409: Requested entity already exists

even though the plan action seems to make the correct plan.

so... the repro steps for my case would be:

  1. deploy service using google_cloud_run_v2_service. revision should not be present, and template[0].containers[0].image as well as template[0].revision should be in the lifecycle ignore_changes block
  2. deploy a new revision of the service using something like gcloud run deploy <service> --image <myimage>
  3. modify a value in the TF definition. plan -> successfully shows 1 to change, apply results in above mentioned 409

When i do a terraform state show google_cloud_run_v2_service.myservice, i can see that there is a revision in there:

 revision                         = "myservice-00004-bub"

and my guess is that a subsequent apply is trying to create another revision with that same name, which explains the 409, but what i want is for it to let google auto-generate a new revision name. is something like that possible?

FabianFrank commented 1 year ago

@trriplejay explained it perfectly, that is what I am experiencing!

steren commented 1 year ago

I think I know what's happening: When you use gcloud run deploy or the Cloud Console clients, a ("nice") revision name is set by the client.

Subsequent Terraform updates would need to either remove or update this name, ignoring it means that the same name is used, therefore being rejected.

Cloud Run team could evaluate updating the behavior of these clients, so that they do not set a generated "nice" names, but leave the name field empty. This was originally done because the server-side generated revision names are a bit ugly (no generation number and a large set of random letters), these server-side generated names of Cloud Run were put in place to be consistent with Knative. We'll follow up.

trriplejay commented 1 year ago

interesting, thanks for the explanation @steren! do you think there is some workaround? maybe directly using the api to create a revision without giving a name?

FabianFrank commented 1 year ago

I think the correct behavior would be to ignore changes in the revision that occur outside terraform, but still generate a new revision when a change needs to be applied. Sort of like a one way ignore_changes.

mattcollier commented 1 year ago

Hello All, I too have been impacted by this issue as accurately described by others. Since this issue made it clear that the issue had to do with a revision name that was not being regenerated, I focused my efforts there.

I spotted the autogenerate_revision_name flag in the example here: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/cloud_run_service#example-usage---cloud-run-service-sql

Which is documented here: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/cloud_run_service#autogenerate_revision_name

What this documentation does not say is that the default is false! I found that detail in the source: https://github.com/hashicorp/terraform-provider-google/blob/a56669d837a2ef157a470ed1c4c13cc52526c9ad/google/resource_cloud_run_service.go#L783

So, I added autogenerate_revision_name = true to my google_cloud_run_service resource and I was able to get a successful deployment, a new revision was created. Prior to this, I was seeing Error 409: Requested entity already exists.

Before I get too excited, I was hoping that one of you folks could confirm my findings. Thank you.

[UPDATE] I went ahead and made another Cloud Run revision using the Google Console. I was then able to deploy another revision via terraform without difficulty.

trriplejay commented 1 year ago

@mattcollier that sounds perfect, however I notice that you're using the v1 resource google_cloud_run_service rather than google_cloud_run_v2_service. It looks like v2 does not support this flag. I might switch to v1 if it resolves this issue though. thanks for finding that!

i wonder if there's a reason this was left out of v2?

steren commented 1 year ago

As I described above, the root cause comes from the 2018-ish design choice of having gcloud and Cloud Console name revisions by default because automatic names weren't so nice. We should change that. Cloud Run API should just generate good names automatically and clients should not implicitly name revisions if they don't want to.

autogenerate_revision_name = true in google_cloud_run_service was probably added to address this problem or to mimic the behavior of other clients. But it is a logic built into the Terraform resource, it basically "patches" the root cause.

Others on the team could chime in, but I don't think we want this feature added to google_cloud_run_v2_service, the idea with google_cloud_run_v2_service is that is exactly maps to the Cloud Run Admin API v2 resources. This enables the Cloud Run team to guarantee that any Cloud Run feature added to the Admin API v2 automatically appears in google_cloud_run_v2_service. Therefore, we want to avoid any hand-crafting f the behavior. I am not even sure if the infrastructure used allows it.

It's great that autogenerate_revision_name = true exists in google_cloud_run_service, but as I said, we want to fix that at the root, in the Cloud Run API and CLI/UI clients

MarekUniq commented 1 year ago

Hi!

@trriplejay - We have very similar issue with resource "google_cloud_run_v2_service".

Background "terraform" is used to keep cloud infrastructure configuration up to date. "gcloud run deploy" is used in pipelines to deploy newer versions.

"terraform" is executed ~once per day to validate/update infrastructure configuration. "gcloud run deploy" could be executed multiple times per day.

Scenario 1. If there have been "gcloud run deploy" then terrform identifies that template.revision is changed and terraform initiates new deploy (unnecessary deploy because there is no real configuration update/drift to fix). It makes one unnecessary/redundant deploy per day per cloudrun service. It makes tens or hundreds unnecessary/redundant deploys per day.

Scenario 2. If there have been "gcloud run deploy" and there is setting lifecycle { ignore_changes = [ template[0].revision then next terraform execution which discovers a real configuration drift (example: template { scaling { max_instance_count = 6 -> 5) reports error: Error: Error updating Service "projects//locations//services/***": googleapi: Error 409: Requested entity already exists . . Any workarounds? Any plans to fix it? By fixing I mean that either 1) "scenario 1" should not cause unnecessary deploy OR 2) "scenario 2" should perform successful deploy

Regards Marek Lรคll

Hi @justinmahood, I can't speak for @FabianFrank but it seems like I'm experiencing the same issue as him using google_cloud_run_v2_service.

My resource definition does not include a revision field, even for the initial creation. my ignore_changes is set to ignore template[0].containers[0].image because that is the field I want to update outside of the context of terraform (via gcloud run deploy <service> --image <image>)

There are two issues I'm running into, neither of which result in the behavior I would ideally want:

with the revision field left out, and also not included in ignore_changes:

  • initial deployment works. subsequent changes on the TF side create revisions as expected
  • gcloud run deploy successfully deploys a new revision
  • now a tf plan sees that the revision is different, and will deploy a new revision just to remove the "revision" field, even if no other changes were made to the service definition
~ template {
          - revision                         = "myservice-00004-bub" -> null

so, it seems like the solution to that would be to add this revision field to ignore_changes, and this does solve the problem of creating unnecessary revisions, however if I have to actually make a change to the TF definition after my gcloud run deploy (for example changing max_instance_count from 10 to 20) that is when I see this other error:

Error: Error updating Service "projects/.../services/myservice": googleapi: Error 409: Requested entity already exists

even though the plan action seems to make the correct plan.

so... the repro steps for my case would be:

  1. deploy service using google_cloud_run_v2_service. revision should not be present, and template[0].containers[0].image as well as template[0].revision should be in the lifecycle ignore_changes block
  2. deploy a new revision of the service using something like gcloud run deploy <service> --image <myimage>
  3. modify a value in the TF definition. plan -> successfully shows 1 to change, apply results in above mentioned 409

When i do a terraform state show google_cloud_run_v2_service.myservice, i can see that there is a revision in there:

 revision                         = "myservice-00004-bub"

and my guess is that a subsequent apply is trying to create another revision with that same name, which explains the 409, but what i want is for it to let google auto-generate a new revision name. is something like that possible?

trriplejay commented 1 year ago

hi @MarekUniq , yeah that's almost exactly what I'm trying to do. For now I'm just going to live with the extra deployments.

It sounds like @steren wants to fix your scenario 1 by updating the gcloud client so that it will stop sending its friendly revision name and then it should play nicely with terraform.

MarekUniq commented 1 year ago

hi @MarekUniq , yeah that's almost exactly what I'm trying to do. For now I'm just going to live with the extra deployments.

It sounds like @steren wants to fix your scenario 1 by updating the gcloud client so that it will stop sending its friendly revision name and then it should play nicely with terraform.

Hi!

@trriplejay @steren - Just to point out that it may happen that fixing "gcloud client" is not enough. While creating a repeatable test case, I used the "Edit & Deploy New Revision" button in the Google Cloud GUI CloudRun and the result is the same. The same means that terraform will detect that template.revision is changed and trigger an additional deployment.

Therefore, I think the Google Cloud GUI CloudRun button "Edit & Deploy New Revision" should also be fixed.

Regards Marek Lรคll

steren commented 1 year ago

Yes, please see my comment

MarekUniq commented 1 year ago

Subsequent Terraform updates would need to either remove or update this name, ignoring it means that the same name is used, therefore being rejected.

There are two ways to interpret ignoring (ignore_changes ):

  1. ignore while detecting changes but use it in case there is going to be new deploy (use terraform value)
  2. ignore while detecting changes and also ignore in case there is going to be new deploy (use current deploy value)

lifecycle { ignore_changes = [ template[0].revision - should follow case 1. lifecycle { ignore_changes = [ template[0].containers[0].image - should follow case 2.

That would be my expectation. I understand that Terraform always uses option 2. Adding support for option 1 would also solve this problem. (Additional keyword ignore_changes_only_for_compareor similar keyword would help)

justinmahood commented 1 year ago

Hey folks, just wanted to update the thread on what we're doing on the Cloud Run team to address this issue. As @steren mentioned above, this is an issue with our two major clients (gcloud CLI and the GCP Console UI) setting a 'prettified' revision name in the spec.

After evaluating, we're going to change our clients to leave the revision name empty by default. We're also updating the behavior of the control plane to use the 'pretty' name scheme if a revision name is not specified.

TL;DR - We're updating our clients and control plane, that will address the root cause of this problem. We'll keep this thread posted when there's an update.

MarekUniq commented 1 year ago

TL;DR - We're updating our clients and control plane, that will address the root cause of this problem. We'll keep this thread posted when there's an update.

@justinmahood Very likely this will fix the major part of the issue. Thank you very much for your effort!

There is very similar issue related to 4 other properties. It is minor compared to revision but still quite unpleasant. Here is the scenario:

  1. terraform apply (to create resource "google_cloud_run_v2_service")
  2. gcloud --project "{{project}}" run deploy "{{service-name}}" --image "europe-north1-docker.pkg.dev/{{project}}/{{repository}}/image:develop-1204" --region "europe-north1"
  3. terraform apply

The step "3. terraform apply" identifies the following differences to apply:

# google_cloud_run_v2_service.{{service-name}} will be updated in-place
~ resource "google_cloud_run_v2_service" "{{service-name}}" {
    ~ annotations             = {
        - "client.knative.dev/user-image" = "europe-north1-docker.pkg.dev/{{project}}/{{repository}}/image:develop-1204" -> null
      }
    - client                  = "gcloud" -> null
    - client_version          = "424.0.0" -> null
      id                      = "projects/{{project}}/locations/europe-north1/services/{{service-name}}"
      name                    = "{{service-name}}"
      # (17 unchanged attributes hidden)

    ~ template {
        ~ annotations                      = {
            - "client.knative.dev/user-image" = "europe-north1-docker.pkg.dev/{{project}}/{{repository}}/image:develop-1204" -> null
          }
        - revision                         = "{{service-name}}-00002-roy" -> null
          # (4 unchanged attributes hidden)

          # (3 unchanged blocks hidden)
      }

      # (1 unchanged block hidden)
  }

Those changes are irrelevant because they are just informative, those differences don't change deployment behavior):

Yes, I can ignore informative properties with clause:

lifecycle {
    ignore_changes = [
      annotations["client.knative.dev/user-image"],
      client,
      client_version,
      template[0].annotations["client.knative.dev/user-image"],
    ]
  }

And now the "not nice" part. Imagine the scenario goes on with step 4:

  1. terraform apply
  2. gcloud run deploy
  3. terraform apply -- Change in terraform config. Example: scaling.min_instance_count is increased
  4. terraform apply

then the last intentional update was done by terraform but in the informative properties you can still see values:

but it is not true. In reality the last deploy was done by:

Are there any suggestions about how to overcome this little nuance?

hedlund commented 1 year ago

Cloud Run team could evaluate updating the behavior of these clients, so that they do not set a generated "nice" names, but leave the name field empty. This was originally done because the server-side generated revision names are a bit ugly (no generation number and a large set of random letters), these server-side generated names of Cloud Run were put in place to be consistent with Knative. We'll follow up.

@steren Do you have any Issue Tracker ID for this fix to Cloud Run that we can monitor? Or should we talk to our Google customer engineer about it?

Trying to migrate a few services to the Cloud Run second generation execution runtime, and thought to use the V2 resources at the same time, and this popped up like an unpleasant blast from the past (we used CR prior to the autogenerate_revision_name being introduced in the old resources), making it very difficult for us to use. ๐Ÿ˜„

gnubibi33 commented 1 year ago

Hello, any update ? :) Same trouble for me. Deploy CR V2 with Terraform and updating image with gcloud, and not able to apply new conf with terraform after that :(

++

glasnt commented 1 year ago

I was getting similar issues, but a harder error, not just that the revision existed, but the service already exists:

โ”‚ Error: Error creating Service: googleapi: Error 409: Resource 'server-dcab' already exists.
โ”‚
โ”‚   with module.dynamic-python-webapp.google_cloud_run_v2_service.server,
โ”‚   on infra/service.tf line 17, in resource "google_cloud_run_v2_service" "server":
โ”‚   17: resource "google_cloud_run_v2_service" "server" {

This was based on a partial application of a terraform apply, as opposed to a manual edit.

The resolution in my case was to delete the service completely and reapply it.

(same error code, same resource, different scope on the error, but thought it appropriate to add to this existing issue)

robinshin commented 1 year ago

I'm having the exact same issue, I guess it's not yet fixed. Any update?

marcusthelin commented 1 year ago

We need a solution for this asap

dgniewek commented 1 year ago

Any updates on that issue in v2 service?

pinkertr commented 1 year ago

Any updates here GCloud team? We want this asap

enricojonas commented 1 year ago

It's frustrating that v2 was introduced without considering this real-world scenario. We are still fighting this issue and it causes a lot of confusion / errors but at the same time we are forced to use v2 for new features... Any timeline on the gcloud fix?!

sebastiangug commented 1 year ago

has this been progressed at all? the issue has been open for ~5 months now, I've had to upgrade to the v2 resource for other features and now completely stuck on this.

justinmahood commented 1 year ago

Hey folks! Good news: As of today, the requisite features to resolve the core issue here have been rolled out everywhere.

Prerequisites:

Specific workflow:

There should no longer appear any conflict with revision name.

simon-verzijl commented 10 months ago

Hey folks! Good news: As of today, the requisite features to resolve the core issue here have been rolled out everywhere.

Prerequisites:

  • Cloud Run Service v2 terraform provider
  • gcloud version >= 446.0

Specific workflow:

  • Define a service using the google_cloud_run_v2_service terraform provider (do not set a revision name), and apply.
  • Deploy an updated container using gcloud run deploy or the Cloud Console UI
  • Update the TF resource to make a change to the service definition. Apply it.

There should no longer appear any conflict with revision name.

I found that this does work unless you use the --revision-suffix parameter in gcloud run (which we use to be able to easiy identify which build number is currently active)

i.e. : gcloud run deploy cloudrun-test-v2 --project myProject --platform managed --region europe-west1 --no-traffic \ --image=gcr.io/cloudrun/hello --revision-suffix=12345

If we then run terraform plan (without changing anything in terraform config), terraform will see a change in the revision like :

  ~ template {
          - revision                         = "cloudrun-test-v2-12345-xyz" -> null

which results in a redeploy after every gcloud run deploy we do

Adding template[0].revision to the lifecycle ignore_changes (and then making a change in terraform for this cloud run which we do want to result in a redeploy) gives :

Error: Error updating Service "projects/myProject/locations/europe-west1/services/cloudrun-test-v2": googleapi: 
Error 409: Revision named 'cloudrun-test-v2-12345-xyz' with different configuration already exists.

When leaving out the --revision-suffix=12345 it works just fine though. But in that case we can't add our build-number in the suffix anymore.

dai0115 commented 7 months ago

I am still facing this issue. I have met the prerequisites:

Cloud Run Service v2 terraform provider gcloud version >= 446.0 Additionally, as per the comment above, I have also removed the --revision-suffix specification. Are there any other options that should not be specified when deploying from gcloud? Currently, I am specifying the following:

--tag=latest --no-use-http2 --allow-unauthenticated --quiet --platform=managed

nakiami10 commented 7 months ago

I can confirm, after multiple trial and error, leaving out the template[0].revision out from the ignore changes does indeed create a new successful deployment and changes to the resources.

The only issue I had is that the suffix name is fully auto generated, however, I manage my deployments and rollouts thru tagging anyway. I just ensure that I have the traffic 100% to that revision.

My use cases is a bit complex, since I deploy on different AZs, and able to automate reduces the potential issues of wrong revisions in wrong region.

Hey folks! Good news: As of today, the requisite features to resolve the core issue here have been rolled out everywhere. Prerequisites:

  • Cloud Run Service v2 terraform provider
  • gcloud version >= 446.0

Specific workflow:

  • Define a service using the google_cloud_run_v2_service terraform provider (do not set a revision name), and apply.
  • Deploy an updated container using gcloud run deploy or the Cloud Console UI
  • Update the TF resource to make a change to the service definition. Apply it.

There should no longer appear any conflict with revision name.

I found that this does work unless you use the --revision-suffix parameter in gcloud run (which we use to be able to easiy identify which build number is currently active)

i.e. : gcloud run deploy cloudrun-test-v2 --project myProject --platform managed --region europe-west1 --no-traffic \ --image=gcr.io/cloudrun/hello --revision-suffix=12345

If we then run terraform plan (without changing anything in terraform config), terraform will see a change in the revision like :

  ~ template {
          - revision                         = "cloudrun-test-v2-12345-xyz" -> null

which results in a redeploy after every gcloud run deploy we do

Adding template[0].revision to the lifecycle ignore_changes (and then making a change in terraform for this cloud run which we do want to result in a redeploy) gives :

Error: Error updating Service "projects/myProject/locations/europe-west1/services/cloudrun-test-v2": googleapi: 
Error 409: Revision named 'cloudrun-test-v2-12345-xyz' with different configuration already exists.

When leaving out the --revision-suffix=12345 it works just fine though. But in that case we can't add our build-number in the suffix anymore.

EricStG commented 4 months ago

Still getting the issue here too. Either we see a changes to template.revision or a 409 if we ignore it Not using --revision-suffix, but we are using --no-traffic and --tag

arisp8 commented 4 months ago

We are using the deploy-cloudrun GitHub Action to deploy instances to Cloud Run.

We use the tag option, which translates to this gcloud CLI command: gcloud run deploy cloud-run-name --image {image_url} --tag provided-tag

(For completeness, we also use these options: --update-env-vars, --update-secrets, --format, --region, --project)

When running terraform plan after gcloud has deployed a newer revision, we get this output:

~ template {
  - revision                         = "cloud-run-name-00012-abc" -> null
    # (6 unchanged attributes hidden)

    # (4 unchanged blocks hidden)
}

I've tried this with both gcloud v446 suggested above, and with more recent versions such as 477.0.0 and the issue persists.

When removing the --tag option, terraform plan returns no changes which is the intended behaviour. Given that we need to use --tag removing it is not an option, so my suggestion is for people to remove it if they have an alternative that works for them.

Are there any ways to make this work while keeping tag? Adding template.0.revision to ignore_changes doesn't seem to be a viable approach.

yanweiguo commented 4 months ago

The gcloud v446 and above do not set a revision name unless --tag or --revision-suffix is set. This is intended behavior to construct a name in gcloud client side and set it when --tag or --revision-suffix is used. In this case, the cause is explained in this comment

yanweiguo commented 4 months ago

My understanding for this issue is:

  1. Create a service via Terraform
  2. Update the service via gcloud with --tag or --revision-suffix flag. This step construct a name in gcloud side and set it as the revision name to be created.
  3. Run terraform plan or terraform apply, it shows diffs. Terraform want to update the service to remove the revision name set by gcloud in step 2.

@dai0115 @nakiami10 @EricStG @arisp8 What's the purpose and expected behavior for step 3 above? If you intend to update the service and create a new revision, updating the service with revision name removal sounds the right way to me.

arisp8 commented 4 months ago

Hi @yanweiguo - thanks for the additional questions!

  1. We create the service via terraform so that we can maintain all the core settings we want to apply in each revision (e.g. memory, cpus, liveness probes, etc.)
  2. We then use gcloud run deploy cloud-run-name --image {image_url} --tag abc to deploy a new revision. We use the tag so that we're able to connect directly to specific revisions using https://abc---service-name-cloudrundomain-nw.a.run.app/. The revision name is still an automatic name generated by Google Cloud.
  3. Run terraform plan. Given that we didn't update the service, we just deployed a new revision with default settings (the only exception being --tag) then we would expect no diff at this step.

Before using Cloud Run V2 we were using V1 which didn't have the same problem because we could use ignore_changes to make terraform ignore the ones that we expect to come from our deployment script. Despite having tried, I wasn't able to make it work the same way with v2.

EricStG commented 4 months ago

Hi @yanweiguo for us, we see a different behaviour when we use --tag, even if we clear them after the fact

Basically, our google_cloud_run_v2_service has this lifecycle

  lifecycle {
    ignore_changes = [
      template[0].containers[0].image,
      traffic,
      client,
      client_version
    ]
  }

Running:

Leads to a clean plan (no new revision being created)

but running:

Results in a new revision being created in the plan

yanweiguo commented 3 months ago

Hi @yanweiguo - thanks for the additional questions!

  1. We create the service via terraform so that we can maintain all the core settings we want to apply in each revision (e.g. memory, cpus, liveness probes, etc.)
  2. We then use gcloud run deploy cloud-run-name --image {image_url} --tag abc to deploy a new revision. We use the tag so that we're able to connect directly to specific revisions using https://abc---service-name-cloudrundomain-nw.a.run.app/. The revision name is still an automatic name generated by Google Cloud.
  3. Run terraform plan. Given that we didn't update the service, we just deployed a new revision with default settings (the only exception being --tag) then we would expect no diff at this step.

Before using Cloud Run V2 we were using V1 which didn't have the same problem because we could use ignore_changes to make terraform ignore the ones that we expect to come from our deployment script. Despite having tried, I wasn't able to make it work the same way with v2.

@arisp8 As I explained, gcloud run deploy cloud-run-name --image {image_url} --tag abc generates a revision name and set it in client side when deploying the service. After this, the service in Cloud Run side has been updated with the specified revision name. This is working as intended. Then Terraform finds this diff and wants to remove it.

Would using the following two commands instead of gcloud run deploy cloud-run-name --image {image_url} --tag abc work for you?

  1. gcloud run deploy cloud-run-name --image {image_url}
  2. gcloud run services update-traffic --set-tags

You have to look up the latest created revision name in step 1 though.

You said v1 with ignore_changes works for you. Did you set autogenerate_revision_name to true in v1?

yanweiguo commented 3 months ago

Hi @yanweiguo for us, we see a different behaviour when we use --tag, even if we clear them after the fact

Basically, our google_cloud_run_v2_service has this lifecycle

  lifecycle {
    ignore_changes = [
      template[0].containers[0].image,
      traffic,
      client,
      client_version
    ]
  }

Running:

  • terraform apply
  • gcloud run deploy (without --tag)
  • terraform plan

Leads to a clean plan (no new revision being created)

but running:

  • terraform apply
  • gcloud run deploy --tag whatever
  • gcloud run deploy --clear-tags
  • terraform plan

Results in a new revision being created in the plan

@EricStG gcloud run deploy --tag generates a revision name and sets it in the template as the revision name to deploy. gcloud run deploy --clear-tags doesn't remove that revision name from the template.

Would using the following two commands instead of gcloud run deploy cloud-run-name --image {image_url} --tag abc work for you?

  1. gcloud run deploy cloud-run-name --image {image_url}
  2. gcloud run services update-traffic --set-tags

You have to look up the latest created revision name in step 1 though.

EricStG commented 3 months ago

I'll give that a try We're already getting the revision name so it should be trivial. For anyone interested, that's how we do it:

revision=$(gcloud run deploy service-name --image <image> --format=get\(status.latestReadyRevisionName\))