google_monitoring_uptime_check_config delete fails due to broken google api

alexanderlumix commented 1 month ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to a user, that user is claiming responsibility for the issue.
Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version & Provider Version(s)

Terraform v1.7.3 on darwin_arm64

provider registry.terraform.io/hashicorp/google v5.2.0

Affected Resource(s)

google_monitoring_uptime_check_config

Terraform Configuration

Before

resource "google_monitoring_uptime_check_config" "service-uptime-check" {
  display_name = "service-uptime-check"
  timeout = "60s"

  http_check {
    path = "/"
    port = "443"
    use_ssl = true
    validate_ssl = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
      status_value = 0
    }
  }

  monitored_resource {
    type = "uptime_url"
    labels = {
      project_id = xxxx
      host = "example.com"
    }
  }

  content_matchers {
    content = "OK"
    matcher = "CONTAINS_STRING"
  }
}

resource "google_monitoring_alert_policy" "service-uptime-check" {
  display_name = "service-uptime-check"
  combiner     = "OR"
  conditions {
    display_name = "service-uptime-check"
    condition_threshold {
      filter     = "metric.type=\"monitoring.googleapis.com/uptime_check/check_passed\" AND metric.label.check_id=\"${split("/", google_monitoring_uptime_check_config.service-uptime-check.id)[3]}\" AND resource.type=\"uptime_url\""
      duration   = "60s"
      comparison = "COMPARISON_GT"
      threshold_value = 1
      aggregations {
        alignment_period   = "60s"
        per_series_aligner   = "ALIGN_NEXT_OLDER"
        cross_series_reducer = "REDUCE_COUNT_FALSE"
        group_by_fields      = ["resource.label.*"]
      }

      trigger {
        count = 1
      }
    }
  }
}

After

Debug Output

google_monitoring_uptime_check_config.livekit-uptime-check: Destroying... [id=projects/example/uptimeCheckConfigs/service-uptime-check-9WA23SgrDNo] ╷ │ Error: Error when reading or editing UptimeCheckConfig: googleapi: Error 400: Request contains an invalid argument.

Expected Behavior

The resources should be deleted.

Actual Behavior

Resources are not deleted. I have found the culprit here which is the API behavior of https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.uptimeCheckConfigs/delete In combination with an alert that uses a filter that includes:

metric.label.check_id

This combination creates a link between the uptime check and alert that is presented here in UI:

And in UI is solved and presents the following dialog when you try to delete a check:

But if you try to delete the uptime check from the API and you will get a 400 error.

{ "error": { "code": 400, "message": "Request contains an invalid argument.", "status": "INVALID_ARGUMENT" }

Steps to reproduce

terraform apply

Important Factoids

No response

References

No response

ggtisc commented 1 month ago

Hi @alexanderlumix!

I tried to replicate this issue, but after creating the resources with terraform applyand then destroying those same resources with terraform destroy everything was successful without errors. They were deleted completely from the tfstate file and from the cloud console on Monitoring Alerting and Monitoring Uptime Checks respectively.

Could you provide more information to see what is happening like your providers configuration without sharing sensitive information, just the structure you are using like this example:

provider "google" {
  billing_project = "my-project"
  project = "my-project"
  region = "us-central1"
}

terraform {
  required_providers {
    google = {
      # source  = "hashicorp/google-beta"
      version = "5.2.0"
    }
  }
}

alexanderlumix commented 1 month ago

Hi @alexanderlumix!

I tried to replicate this issue, but after creating the resources with terraform applyand then destroying those same resources with terraform destroy everything was successful without errors. They were deleted completely from the tfstate file and from the cloud console on Monitoring Alerting and Monitoring Uptime Checks respectively.

Could you provide more information to see what is happening like your providers configuration without sharing sensitive information, just the structure you are using like this example:
provider "google" {
  billing_project = "my-project"
  project = "my-project"
  region = "us-central1"
}

terraform {
  required_providers {
    google = {
      # source  = "hashicorp/google-beta"
      version = "5.2.0"
    }
  }
}

Hey,

Did you use the metric.label.check_id in your filter? Because this is the part that is important, this what links the two resources.

terraform { required_providers { ... google = { source = "hashicorp/google" version = "5.2.0" } ... } }

ggtisc commented 1 month ago

Hi @alexanderlumix!

I used the same code you sent including the metric.label.check_id for the condition_threshold argument, as well as the terraform version and Google version. So maybe you just need to refresh your environment, check credentials, authentication and try again to run a new terraform apply. Also you could try to execute a terraform init -upgrade before

hashicorp / terraform-provider-google