google_monitoring_uptime_check_config stops working on update

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to a user, that user is claiming responsibility for the issue.
Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version & Provider Version(s)

Terraform v1.9.6 on linux_amd64

provider registry.terraform.io/hashicorp/google v6.5.0

Affected Resource(s)

google_monitoring_uptime_check_config

Terraform Configuration

variable "project" { default = "my-project" }
variable "region" { default = "europe-west3" }
variable "timeout" { default = "50s" }

resource "google_cloud_run_service" "hello" {
  name     = "hello"
  project  = var.project
  location = var.region
  template {
    spec {
      containers {
        image = "us-docker.pkg.dev/cloudrun/container/hello"
      }
    }
  }
}
resource "google_monitoring_uptime_check_config" "hello" {
  display_name     = "tf-hello"
  project          = var.project
  timeout          = var.timeout
  period           = "60s"
  selected_regions = ["EUROPE", "SOUTH_AMERICA", "USA_VIRGINIA"]
  http_check {
    use_ssl = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
    }
    service_agent_authentication {
      type = "OIDC_TOKEN"
    }
  }
  monitored_resource {
    labels = {
      "location"           = google_cloud_run_service.hello.location
      "project_id"         = google_cloud_run_service.hello.project
      "service_name"       = google_cloud_run_service.hello.name
      "configuration_name" = "" # necessary to avoid permadiff
      "revision_name"      = "" # necessary to avoid permadiff
    }
    type = "cloud_run_revision"
  }
}

Debug Output

No response

Expected Behavior

Both creation and update of uptime check via this code work.

Actual Behavior

Creating the uptime check works fine but changing any parameters causes it to start failing with "The request was not authorized to invoke this service. Read more at https://cloud.google.com/run/docs/securing/authenticating Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#401"

Steps to reproduce

terraform apply
change something, like increase timeout by 1 second
terraform apply

Important Factoids

TF and provider with latest versions. Modifying the uptime check with Cloud Console or gcloud works fine. Example:

create

gcloud monitoring uptime create \
"ag-hello" \
--timeout=50 \
--period=1 \
--regions=europe,south-america,usa-virginia \
--protocol=https \
--status-classes=2xx \
--service-agent-auth=oidc-token \
--resource-labels="configuration_name=","location=europe-west3","project_id=project-id","service_name=hello" \
--resource-type=cloud-run-revision \

Wait a bit to see the uptime check working
Find the name of the newly created check gcloud monitoring uptime list-configs --project dlh-opsd-dev-aircraft-change --filter='name:ag-hello' --format 'value(name)'
Update with gcloud monitoring uptime update "ag-hello-KklYrBidLoI" --timeout=51
Uptime check keeps working

References

No response

EDIT: modify the code to move variables to the top.

b/372054640

Considering the error it probably only happens with service_agent_authentication { type = "OIDC_TOKEN" } in the config.

Hi @Ziemowit-LH!

I tried to replicate this issue with the next configuration, but everything was successful without errors:

resource "google_cloud_run_service" "crs_19733" {
  name     = "crs-19733"
  location = "us-central1"
  template {
    spec {
      containers {
        image = "us-docker.pkg.dev/cloudrun/container/hello"
      }
    }
  }
}

resource "google_monitoring_uptime_check_config" "monitoring_uptime_check_config_19733" {
  display_name = "Monitoring Uptime Check Config 19733"
  timeout          = "50s"
  period           = "60s"
  selected_regions = ["EUROPE", "SOUTH_AMERICA", "ASIA_PACIFIC", ]
  http_check {
    request_method = "GET"
    use_ssl        = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
    }
    service_agent_authentication {
      type = "OIDC_TOKEN"
    }
  }
  monitored_resource {
    labels = {
      "location"           = "us-central1"
      "project_id"         = "my-project"
      "service_name"       = "crs-19733"
    }
    type = "cloud_run_revision"
  }
}

Could you show an example of what you are using for the configuration_name and revision_name labels to make a new try?

Hello @ggtisc

Ah, I see you did not provide values for "configuration_name" and "revision_name" at all, which results in permadiff on these 2 lines and replacement of the resource on each terraform apply.

You do get the following on each apply, right?

      ~ monitored_resource {
          ~ labels = { # forces replacement
              - "configuration_name" = null
              - "revision_name"      = null
                # (3 unchanged elements hidden)
            }
            # (1 unchanged attribute hidden)
        }

This is exactly why, in my code, you see

      "configuration_name" = ""
      "revision_name"      = ""

which does work around the permadiff but in turn causes the issue I opened the defect for.

Note: giving null values here works as if the lines were not present, i.e. permadiff and replacement.

I tried to replicate this issue, but if I use only "" for the configuration_name and revision_name I got the next error:

Error: Error creating UptimeCheckConfig: googleapi: Error 404: Error confirming monitored resource:
│ type:          "cloud_run_revision"
│ labels {
│   key: "configuration_name"
│   value: ""
│ }
│ labels {
│   key: "location"
│   value: "us-central1"
│ }
│ labels {
│   key: "project_id"
│   value: "my-project"
│ }
│ labels {
│   key: "revision_name"
│   value: ""
│ }
│ labels {
│   key: "service_name"
│   value: "crs-19733"
│ }
│  is in project: 1234567890
│ Details:
│ [
│   {
│     "@type": "type.googleapis.com/google.rpc.DebugInfo",
│     "detail": "[ORIGINAL ERROR] generic::not_found: Could not fetch resource metadata for cloud_run_revision"
│   }
│ ]
│ 
│   with google_monitoring_uptime_check_config.monitoring_uptime_check_config_19733,
│   on main.tf line 35, in resource "google_monitoring_uptime_check_config" "monitoring_uptime_check_config_19733":
│   35: resource "google_monitoring_uptime_check_config" "monitoring_uptime_check_config_19733" {

On the other hand if I don't use these labels I get a permadiff as user even if there are no changes after runnnig a new terraform apply

Hi, Thanks for checking! I'm a bit confused because I get nothing like that.

Let's recap: I took your code and modified it a bit to reduce the number of replacements we need to do.

resource "google_cloud_run_service" "crs_19733" {
  name     = "crs-19733"
  location = "europe-west3"
  project  = "my-project"
  template {
    spec {
      containers {
        image = "us-docker.pkg.dev/cloudrun/container/hello"
      }
    }
  }
}

resource "google_monitoring_uptime_check_config" "monitoring_uptime_check_config_19733" {
  project          = google_cloud_run_service.crs_19733.project
  display_name     = "Monitoring Uptime Check Config 19733"
  timeout          = "51s"
  period           = "60s"
  selected_regions = ["EUROPE", "SOUTH_AMERICA", "ASIA_PACIFIC", ]
  http_check {
    request_method = "GET"
    use_ssl        = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
    }
    service_agent_authentication {
      type = "OIDC_TOKEN"
    }
  }
  monitored_resource {
    labels = {
      "location"           = google_cloud_run_service.crs_19733.location
      "project_id"         = google_cloud_run_service.crs_19733.project
      "service_name"       = google_cloud_run_service.crs_19733.name
      "configuration_name" = ""
      "revision_name"      = ""
    }
    type = "cloud_run_revision"
  }
}

I create the uptime check -it works. Then I modify the timeout - it fails. The company has policies that prevent me from creating resources outside of EU. Could you check this region?

Edit: I modified the code in the opening post for easier testing by others.

@ggtisc Additional info: The error "generic::not_found: Could not fetch resource metadata for cloud_run_revision" you encountered appears when the API did not register the new Cloud Run service yet (hence the 404). It generally works on the second run.

Please retry with the code below and only modify the variable lines. And run again if the first run returns 404.

variable "project" { default = "my-project" }
variable "region" { default = "europe-west3" }
variable "timeout" { default = "51s" }

resource "google_cloud_run_service" "hello" {
  name     = "hello"
  project  = var.project
  location = var.region
  template {
    spec {
      containers {
        image = "us-docker.pkg.dev/cloudrun/container/hello"
      }
    }
  }
}

resource "google_monitoring_uptime_check_config" "hello" {
  display_name     = "wellhellothere"
  project          = var.project
  timeout          = var.timeout
  period           = "60s"
  selected_regions = ["EUROPE", "SOUTH_AMERICA", "USA_VIRGINIA"]
  http_check {
    use_ssl = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
    }
    service_agent_authentication {
      type = "OIDC_TOKEN"
    }
  }
  monitored_resource {
    labels = {
      "location"           = google_cloud_run_service.hello.location
      "project_id"         = google_cloud_run_service.hello.project
      "service_name"       = google_cloud_run_service.hello.name
      "configuration_name" = "" # permadiff workaround
      "revision_name"      = "" # permadiff workaround
    }
    type = "cloud_run_revision"
  }
}

What's also interesting is that lifecycle does not work for this completely. If I comment out configuration_name and andrevision_name` and add:

  lifecycle {
    ignore_changes = [
      monitored_resource[0].labels.revision_name,
      monitored_resource[0].labels.configuration_name,
      ]
  }
``` to the uptime check nothing changes (i.e. it would still replace the uptime check).
But with 
```hcl
  lifecycle {
    ignore_changes = [
      monitored_resource[0].labels
    ]
  }

it actually does work as expected and will not want to replace the uptime check.

hashicorp / terraform-provider-google