hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.33k stars 1.73k forks source link

google_monitoring_alert_policy, google_monitoring_dashboard, google_logging_metric when using for each cycle always requests changes for TF apply/plan #10118

Open xoloitzcuintle314 opened 3 years ago

xoloitzcuintle314 commented 3 years ago

Hello, community.

I am working on creating of universal approach for creating multiply monitoring resources (alarms, log-based metrics, dashboards) for different GCP resources (GKE, LB, etc). Currently facing that GCP terrraform resources:google_monitoring_alert_policy, google_monitoring_dashboard, google_logging_metric always requests changes for TF apply/plan when using for each cycle.

Please find the code and extra info below:

Terrafrom version = 1.0.7, 0.13 (Tested on both, with same result) GCP provider version = 3.74.0, 3.71.0 (Tested on both, with same result) State file location = gcs, local (Tested on both, with same result)

Terraform module for alarms (alerts conditions creating via MQL):

data "google_monitoring_notification_channel" "notification_channels" {
     for_each = var.gcp_all_notification_channels_mail
}

locals {
  notification_channels_ids = [for notification_channel in data.google_monitoring_notification_channel.notification_channels: notification_channel.name]
}

resource "google_monitoring_alert_policy" "alert_policy" {
    for_each = (var.gcp_alarms)
    combiner = "OR"
    display_name = (each.key)
        conditions {
            display_name = each.value.title
            condition_monitoring_query_language { 
            query = format("fetch %s | metric '%s' | filter %s | group_by %s | every %s | condition %s",each.value.fetch, each.value.metric, each.value.filter, each.value.group_by, each.value.every, each.value.condition)
            duration = each.value.duration
            trigger {
                count = each.value.trigger
                }
            }
        }
    documentation {
        content = (each.value.documentation)
    } 
    notification_channels = local.notification_channels_ids
}

variables.tf

variable "gcp_project_id" {
    description = "GCP project ID for module implementation"
    type = string
}

variable "gcp_alarms" {
  description             = "List of GCP alarm, combaining multiple conditions"
  type                    = map(object({
    title                 = string
    fetch                 = string
    metric                = string
    filter                = string
    group_by              = string
    every                 = string
    condition             = string
    trigger               = number
    notification_channels  = list(string)
    duration              = string
    documentation         = string
  }))
  default         = {}
}

.tfvars example:

gcp_gke_alarms = {

    #Node level
    "CPU usage is too high on GKE node [Terraform_new]" = {
        title                = "CPU usage is too high on GKE node [Terraform_new]",
        fetch                = "k8s_node",
        metric               = "kubernetes.io/node/cpu/allocatable_utilization",
        filter               = "(resource.cluster_name == '{cluster_name}' && resource.node_name =~ 'node_name-.*')",
        group_by             = "1m, [value_allocatable_utilization_max: max(value.allocatable_utilization)]",
        every                = "1m | group_by [resource.node_name], [value_allocatable_utilization_max_max:  max(value_allocatable_utilization_max)]",
        condition            = "val() > 0.5 '1'",
        duration             = "60s",
        trigger              = 0,
        notification_channels  = ["projects/project/notificationChannels/id", "projects/project/notificationChannels/id", "projects/project/notificationChannels/id"],
        documentation = link
    },
    "RAM usage is too high on GKE node [Terraform_new]" = {
        title                = "RAM usage is too high on GKE node [Terraform_new]",
        fetch                = "k8s_node",
        metric               = "kubernetes.io/node/memory/allocatable_utilization",
        filter               = "(resource.cluster_name == 'cluster_name' && resource.node_name =~ 'node_name-.*') && (metric.memory_type == 'non-evictable')",
        group_by             = "1m, [value_allocatable_utilization_max: max(value.allocatable_utilization)]",
        every                = "1m",
        condition            = "val() > 0.5 '1'",
        duration             = "60s",
        trigger              = 0,
        notification_channels  = ["projects/project/notificationChannels/id", "projects/project/notificationChannels/id", "projects/project/notificationChannels/id"],
        documentation = link
    },

That code should implement a similar alarms set (e.g CPU, RAM, Volume, Traffic, Errors, etc.) for multiple clusters.

The same approach used for dashboards and log base metrics, modules:

Module for log-based metrics:

resource "google_logging_metric" "logging_metric" {
  for_each        = (var.gcp_logs_based_metrics)
  name            = (each.key)
  description     = each.value.description
  filter          = each.value.filter
  metric_descriptor {
    unit          = each.value.unit
    metric_kind   = each.value.metric_kind
    value_type    = each.value.value_type
  }
}

Module for dashboards:

resource "google_monitoring_dashboard" "gcp-gke-dashboards" {
for_each = var.gcp_gke_dashboards
  dashboard_json = <<EOF
{
  "category": "CUSTOM",
  "displayName": "[${each.value.studio}] ${each.key}",
---
JSON
---
EOF
}

After creating of resources described above I am again run TF apply/plan and see output for changes requested in resources (changing alarm condition trigger, notification channels, dashboard position, and other) Output:

The expected result after implementation of sets of alarms for multiply cluster that there shouldn't be any changes.

But i get result: 0 to add, 15 to change, 0 to destroy.

For alerts:

conditions {
            name         = “projects/project_id/alertPolicies/id/conditions/id”
            # (1 unchanged attribute hidden)
          ~ condition_monitoring_query_language {
                # (2 unchanged attributes hidden)
              + trigger {
                  + count = 0
                }
            }
        }
        # (1 unchanged block hidden)
    }
4:07
},

For dashboard:

  ~ {
                          ~ widget = {
                              ~ xyChart = {
                                  ~ dataSets          = [
                                      ~ {
                                          ~ timeSeriesQuery    = {
                                              + apiSource        = “DEFAULT_CLOUD”
                                                # (2 unchanged elements hidden)
                                            }
                                            # (3 unchanged elements hidden)
                                        },
                                    ]
                                    # (3 unchanged elements hidden)
                                }
                                # (1 unchanged element hidden)
                            }
                          + xPos   = 0
                            # (3 unchanged elements hidden)
                        },
                      ~ {

Please elaborate:

1) Is this GCP terraform issue (because for other terraform providers e.g AWS no such issue); 2) Is there any way to implement such an approach in a different way ?

Thank you.

b/275102303

Affected Resource(s)

melinath commented 1 year ago

We should exclude google_monitoring_dashboard from this ticket since that's covered by https://github.com/hashicorp/terraform-provider-google/issues/7242