hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.36k stars 1.75k forks source link

google_monitoring_uptime_check_config stops working on update #19733

Open Ziemowit-LH opened 1 month ago

Ziemowit-LH commented 1 month ago

Community Note

Terraform Version & Provider Version(s)

Terraform v1.9.6 on linux_amd64

Affected Resource(s)

google_monitoring_uptime_check_config

Terraform Configuration

variable "project" { default = "my-project" }
variable "region" { default = "europe-west3" }
variable "timeout" { default = "50s" }

resource "google_cloud_run_service" "hello" {
  name     = "hello"
  project  = var.project
  location = var.region
  template {
    spec {
      containers {
        image = "us-docker.pkg.dev/cloudrun/container/hello"
      }
    }
  }
}
resource "google_monitoring_uptime_check_config" "hello" {
  display_name     = "tf-hello"
  project          = var.project
  timeout          = var.timeout
  period           = "60s"
  selected_regions = ["EUROPE", "SOUTH_AMERICA", "USA_VIRGINIA"]
  http_check {
    use_ssl = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
    }
    service_agent_authentication {
      type = "OIDC_TOKEN"
    }
  }
  monitored_resource {
    labels = {
      "location"           = google_cloud_run_service.hello.location
      "project_id"         = google_cloud_run_service.hello.project
      "service_name"       = google_cloud_run_service.hello.name
      "configuration_name" = "" # necessary to avoid permadiff
      "revision_name"      = "" # necessary to avoid permadiff
    }
    type = "cloud_run_revision"
  }
}

Debug Output

No response

Expected Behavior

Both creation and update of uptime check via this code work.

Actual Behavior

Creating the uptime check works fine but changing any parameters causes it to start failing with "The request was not authorized to invoke this service. Read more at https://cloud.google.com/run/docs/securing/authenticating Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#401"

Steps to reproduce

  1. terraform apply
  2. change something, like increase timeout by 1 second
  3. terraform apply

Important Factoids

TF and provider with latest versions. Modifying the uptime check with Cloud Console or gcloud works fine. Example:

  1. create
    gcloud monitoring uptime create \
    "ag-hello" \
    --timeout=50 \
    --period=1 \
    --regions=europe,south-america,usa-virginia \
    --protocol=https \
    --status-classes=2xx \
    --service-agent-auth=oidc-token \
    --resource-labels="configuration_name=","location=europe-west3","project_id=project-id","service_name=hello" \
    --resource-type=cloud-run-revision \
  2. Wait a bit to see the uptime check working
  3. Find the name of the newly created check gcloud monitoring uptime list-configs --project dlh-opsd-dev-aircraft-change --filter='name:ag-hello' --format 'value(name)'
  4. Update with gcloud monitoring uptime update "ag-hello-KklYrBidLoI" --timeout=51
  5. Uptime check keeps working

References

No response

EDIT: modify the code to move variables to the top.

b/372054640

Ziemowit-LH commented 1 month ago

Considering the error it probably only happens with service_agent_authentication { type = "OIDC_TOKEN" } in the config.

ggtisc commented 1 month ago

Hi @Ziemowit-LH!

I tried to replicate this issue with the next configuration, but everything was successful without errors:

resource "google_cloud_run_service" "crs_19733" {
  name     = "crs-19733"
  location = "us-central1"
  template {
    spec {
      containers {
        image = "us-docker.pkg.dev/cloudrun/container/hello"
      }
    }
  }
}

resource "google_monitoring_uptime_check_config" "monitoring_uptime_check_config_19733" {
  display_name = "Monitoring Uptime Check Config 19733"
  timeout          = "50s"
  period           = "60s"
  selected_regions = ["EUROPE", "SOUTH_AMERICA", "ASIA_PACIFIC", ]
  http_check {
    request_method = "GET"
    use_ssl        = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
    }
    service_agent_authentication {
      type = "OIDC_TOKEN"
    }
  }
  monitored_resource {
    labels = {
      "location"           = "us-central1"
      "project_id"         = "my-project"
      "service_name"       = "crs-19733"
    }
    type = "cloud_run_revision"
  }
}

Could you show an example of what you are using for the configuration_name and revision_name labels to make a new try?

Ziemowit-LH commented 1 month ago

Hello @ggtisc

Ah, I see you did not provide values for "configuration_name" and "revision_name" at all, which results in permadiff on these 2 lines and replacement of the resource on each terraform apply.

You do get the following on each apply, right?

      ~ monitored_resource {
          ~ labels = { # forces replacement
              - "configuration_name" = null
              - "revision_name"      = null
                # (3 unchanged elements hidden)
            }
            # (1 unchanged attribute hidden)
        }

This is exactly why, in my code, you see

      "configuration_name" = ""
      "revision_name"      = ""

which does work around the permadiff but in turn causes the issue I opened the defect for.

Note: giving null values here works as if the lines were not present, i.e. permadiff and replacement.

ggtisc commented 1 month ago

I tried to replicate this issue, but if I use only "" for the configuration_name and revision_name I got the next error:

Error: Error creating UptimeCheckConfig: googleapi: Error 404: Error confirming monitored resource:
β”‚ type:          "cloud_run_revision"
β”‚ labels {
β”‚   key: "configuration_name"
β”‚   value: ""
β”‚ }
β”‚ labels {
β”‚   key: "location"
β”‚   value: "us-central1"
β”‚ }
β”‚ labels {
β”‚   key: "project_id"
β”‚   value: "my-project"
β”‚ }
β”‚ labels {
β”‚   key: "revision_name"
β”‚   value: ""
β”‚ }
β”‚ labels {
β”‚   key: "service_name"
β”‚   value: "crs-19733"
β”‚ }
β”‚  is in project: 1234567890
β”‚ Details:
β”‚ [
β”‚   {
β”‚     "@type": "type.googleapis.com/google.rpc.DebugInfo",
β”‚     "detail": "[ORIGINAL ERROR] generic::not_found: Could not fetch resource metadata for cloud_run_revision"
β”‚   }
β”‚ ]
β”‚ 
β”‚   with google_monitoring_uptime_check_config.monitoring_uptime_check_config_19733,
β”‚   on main.tf line 35, in resource "google_monitoring_uptime_check_config" "monitoring_uptime_check_config_19733":
β”‚   35: resource "google_monitoring_uptime_check_config" "monitoring_uptime_check_config_19733" {

On the other hand if I don't use these labels I get a permadiff as user even if there are no changes after runnnig a new terraform apply

Ziemowit-LH commented 1 month ago

Hi, Thanks for checking! I'm a bit confused because I get nothing like that.

Let's recap: I took your code and modified it a bit to reduce the number of replacements we need to do.

resource "google_cloud_run_service" "crs_19733" {
  name     = "crs-19733"
  location = "europe-west3"
  project  = "my-project"
  template {
    spec {
      containers {
        image = "us-docker.pkg.dev/cloudrun/container/hello"
      }
    }
  }
}

resource "google_monitoring_uptime_check_config" "monitoring_uptime_check_config_19733" {
  project          = google_cloud_run_service.crs_19733.project
  display_name     = "Monitoring Uptime Check Config 19733"
  timeout          = "51s"
  period           = "60s"
  selected_regions = ["EUROPE", "SOUTH_AMERICA", "ASIA_PACIFIC", ]
  http_check {
    request_method = "GET"
    use_ssl        = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
    }
    service_agent_authentication {
      type = "OIDC_TOKEN"
    }
  }
  monitored_resource {
    labels = {
      "location"           = google_cloud_run_service.crs_19733.location
      "project_id"         = google_cloud_run_service.crs_19733.project
      "service_name"       = google_cloud_run_service.crs_19733.name
      "configuration_name" = ""
      "revision_name"      = ""
    }
    type = "cloud_run_revision"
  }
}

I create the uptime check -it works. Then I modify the timeout - it fails. The company has policies that prevent me from creating resources outside of EU. Could you check this region?

Edit: I modified the code in the opening post for easier testing by others.

Ziemowit-LH commented 1 month ago

@ggtisc Additional info: The error "generic::not_found: Could not fetch resource metadata for cloud_run_revision" you encountered appears when the API did not register the new Cloud Run service yet (hence the 404). It generally works on the second run.

Please retry with the code below and only modify the variable lines. And run again if the first run returns 404.

variable "project" { default = "my-project" }
variable "region" { default = "europe-west3" }
variable "timeout" { default = "51s" }

resource "google_cloud_run_service" "hello" {
  name     = "hello"
  project  = var.project
  location = var.region
  template {
    spec {
      containers {
        image = "us-docker.pkg.dev/cloudrun/container/hello"
      }
    }
  }
}

resource "google_monitoring_uptime_check_config" "hello" {
  display_name     = "wellhellothere"
  project          = var.project
  timeout          = var.timeout
  period           = "60s"
  selected_regions = ["EUROPE", "SOUTH_AMERICA", "USA_VIRGINIA"]
  http_check {
    use_ssl = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
    }
    service_agent_authentication {
      type = "OIDC_TOKEN"
    }
  }
  monitored_resource {
    labels = {
      "location"           = google_cloud_run_service.hello.location
      "project_id"         = google_cloud_run_service.hello.project
      "service_name"       = google_cloud_run_service.hello.name
      "configuration_name" = "" # permadiff workaround
      "revision_name"      = "" # permadiff workaround
    }
    type = "cloud_run_revision"
  }
}

What's also interesting is that lifecycle does not work for this completely. If I comment out configuration_name and andrevision_name` and add:

  lifecycle {
    ignore_changes = [
      monitored_resource[0].labels.revision_name,
      monitored_resource[0].labels.configuration_name,
      ]
  }
``` to the uptime check nothing changes (i.e. it would still replace the uptime check).
But with 
```hcl
  lifecycle {
    ignore_changes = [
      monitored_resource[0].labels
    ]
  }

it actually does work as expected and will not want to replace the uptime check.