hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.52k stars 4.6k forks source link

azurerm_monitor_diagnostic_setting re-applies changes/updates every time running plan or apply... #10388

Open kon1180 opened 3 years ago

kon1180 commented 3 years ago

Community Note

Terraform (and AzureRM Provider) Version

terraform {
  required_version = "~> 0.13.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 2.44.0" # "~> 2.33.0"
    }
  }
}

Affected Resource(s)

azurerm_monitor_diagnostic_setting when applying to Azure SQL DB and Synapse Pool

Expected Behaviour

After running Terraform apply, Terraform apply/plan again should say no changes

Actual Behaviour

After running Terraform plan/apply, will apply the same changes again and again. Below is a pic of what I am seeing everytime I re-run Terraform plan, ie. it is applying those same changes every time, but terraform should only update what has changed net: Terraform plan, everytime...

Steps to Reproduce

  1. terraform plan

Code

The resource created and the azurerm_monitor_diagnostic_setting used to send resource logs out: Azure SQL DB:

# Create Azure SQL DB
resource "azurerm_sql_database" "sqldb" {
  name                = "${var.org}-sqldb-dgtlbi-${var.environment}-001"
  resource_group_name = var.resource_group_name
  location            = var.location
  server_name         = azurerm_sql_server.sqlserver.name
  max_size_bytes      = 2147483648 # 2GB
  edition             = "Basic"

}

# Sending SQL DB Logs to Qradar eventhub
# Logging categories: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/resource-logs-categories#microsoftsqlserversdatabases
resource "azurerm_monitor_diagnostic_setting" "sqldb_log_qradar" {
  name                           = "${var.org}-sqldb-dgtlbi-${var.environment}-001-log-name"
  target_resource_id             = azurerm_sql_database.sqldb.id
  eventhub_name                  = local.qradar_eventhub_name
  eventhub_authorization_rule_id = local.qradar_eventhub_authorization_rule_id

  log {
    category = "Deadlocks"

    retention_policy {
      enabled = true
      days    = 0
    }
  }

  log {
    category = "Errors"

    retention_policy {
      enabled = true
      days    = 0
    }
  }

  log {
    category = "SQLSecurityAuditEvents"

    retention_policy {
      enabled = true
      days    = 0
    }
  }

  metric {
    category = "AllMetrics"
    enabled  = true

    retention_policy {
      days    = 0
      enabled = true
    }
  }
}

Azure Synapse Pool


# Create Azure Synapse Pool - no workspace
resource "azurerm_mssql_database" "syn" {
  name      = "${var.org}-syn-dgtlbi-${var.environment}-001"
  server_id = azurerm_sql_server.sqlserver.id
  sku_name  = "DW100c"

}

# Sending Synapse Pool Logs to Qradar eventhub
# Logging categories: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/resource-logs-categories#microsoftsqlserversdatabases
resource "azurerm_monitor_diagnostic_setting" "syn_log_qradar" {
  name                           = "${var.org}-syn-dgtlbi-${var.environment}-001-log-name"
  target_resource_id             = azurerm_mssql_database.syn.id
  eventhub_name                  = local.qradar_eventhub_name
  eventhub_authorization_rule_id = local.qradar_eventhub_authorization_rule_id

  log {
    category = "SQLSecurityAuditEvents"

    retention_policy {
      enabled = true
      days    = 0
    }
  }

  log {
    category = "SqlRequests"

    retention_policy {
      enabled = true
      days    = 0
    }
  }

  log {
    category = "ExecRequests"

    retention_policy {
      enabled = true
      days    = 0
    }
  }

  metric {
    category = "AllMetrics"
    enabled  = true

    retention_policy {
      days    = 0
      enabled = true
    }
  }
}
richl-pg commented 3 years ago

Seeing exactly the same thing

kon1180 commented 3 years ago

Seeing exactly the same thing

Is there another resource provider you are using so as to not have this issue? I know using azurerm_resource_group_template_deployment was buggy in the same way as above, so I switched back to the older azurerm_template_deployment, which does the exact same thing but no problems.

kon1180 commented 3 years ago

a quick work around, I ran apply with the script I wanted, then set this within the resource blocks as I cannot have production having changes every single time, etc:

  # ignore_changes is here given the bug I reported: https://github.com/terraform-providers/terraform-provider-azurerm/issues/10388
  lifecycle {
    ignore_changes = [log, metric]
  }
surlypants commented 3 years ago

this looks to be the result of casing mismatch (at least on my end with 2.45.0):

.../Microsoft.OperationalInsights/workspaces/... -> .../microsoft.operationalinsights/workspaces/...

(forces replacement)

kon1180 commented 3 years ago

this looks to be the result of casing mismatch (at least on my end with 2.45.0):

.../Microsoft.OperationalInsights/workspaces/... -> .../microsoft.operationalinsights/workspaces/...

(forces replacement)

Changing to all lower case didn't work, ended up forcing replacement on 2.44 just as you saw on 2.45

surlypants commented 3 years ago

the case mismatch is a bug between the azure api and the provider. this has affected many other az resources. i am saying that said casing bug is now impacting this resource.

eg, search for "case" in the provider changelog: https://github.com/terraform-providers/terraform-provider-azurerm/blob/master/CHANGELOG-v2.md

surlypants commented 3 years ago

woof. i just realized that this behavior is quite problematic, particularly with linux webapps. The replacement of the resource triggers the container to be recycled twice: first as TF removes the diag setting and again as it puts it back.

ouch

surlypants commented 3 years ago

please treat the casing issue as high priority. ongoing destruction and recreation is unfit for production

kon1180 commented 3 years ago

please treat the casing issue as high priority. ongoing destruction and recreation is unfit for production

Is there a way for me to do that?

surlypants commented 3 years ago

@kon1180 can you run a plan (without your workaround) and see if the resourceID is being changed (per casing) forcing a replace ?

tomlewis0 commented 3 years ago

I experience what also appears to be the same issue as the owner with Terraform attempting to update the resource due to apparent changes across all log and metric blocks not defined in the config.

This seems unrelated to #10857 as highlighted by @surlypants.

@kon1180 can you run a plan (without your workaround) and see if the resourceID is being changed (per casing) forcing a replace ?

The log_analytics_workspace_id is identified as unchanged in this case.

weq commented 3 years ago

As far as I can tell it is a "Case" issue that makes it believe that it has been altered. The state file contains something like log_analytics_workspace_id: /subscriptions/GUID/resourceGroups/rg/providers/Microsoft.OperationalInsights/workspaces/ws" and it wants to alter it to all lowercase.

AliMBajwa commented 3 years ago

Also running into this - i've noticed it will also apply a retention policy even if you've not given one in your configuration.

bizintelligence commented 3 years ago

I am facing the same issue. Every time we deploy change terraform shows update-in-place. It also happens with Global parameters for Azure Data Factory resource.

cforce commented 2 years ago

same here - bug unsolved - i am astonished that this is unsolved since 9 months

flo-02-mu commented 2 years ago

I did a bit of an investigation: As soon as one diagnostic setting (regardless whether it is "log" or "monitor") is enabled for a target resource, the API returns all available diagnostic settings for that target resource with the default values (enabled: false and retention: 0):

az rest --method get --url https://management.azure.com/subscriptions/****/resourceGroups/test-for-github-10388/providers/Microsoft.KeyVault/vaults/test-for-github-10388/providers/Microsoft.Insights/diagnosticSettings/github-issue-10388?api-version=2021-05-01-preview
{
  "id": "/subscriptions/****/resourcegroups/test-for-github-10388/providers/microsoft.keyvault/vaults/test-for-github-10388/providers/microsoft.insights/diagnosticSettings/github-issue-10388",
  "identity": null,
  "kind": null,
  "location": null,
  "name": "github-issue-10388",
  "properties": {
    "eventHubAuthorizationRuleId": null,
    "eventHubName": null,
    "logAnalyticsDestinationType": null,
    "logs": [
      {
        "category": "AuditEvent",
        "categoryGroup": null,
        "enabled": false,
        "retentionPolicy": {
          "days": 0,
          "enabled": false
        }
      },
      {
        "category": "AzurePolicyEvaluationDetails",
        "categoryGroup": null,
        "enabled": false,
        "retentionPolicy": {
          "days": 0,
          "enabled": false
        }
      }
    ],
    "metrics": [
      {
        "category": "AllMetrics",
        "enabled": true,
        "retentionPolicy": {
          "days": 0,
          "enabled": false
        }
      }
    ],
    "serviceBusRuleId": null,
    "storageAccountId": null,
    "workspaceId": "/subscriptions/****/resourceGroups/oasis-tst-loganalytics/providers/Microsoft.OperationalInsights/workspaces/oasis-tst-analytics"
  },
  "resourceGroup": "test-for-github-10388",
  "tags": null,
  "type": "Microsoft.Insights/diagnosticSettings"
}

Terraform then interprets this existence as change from null to false and wants to change it back - which does not work. A simple way of changing the behaviour would be to only check for enabled settings and treat "null" and "disabled" identical. E.g. by adding

if *v.Enabled{
    results = append(results, output)
}

at https://github.com/hashicorp/terraform-provider-azurerm/blob/main/internal/services/monitor/monitor_diagnostic_setting_resource.go#L480

This has some side-effects though when explicitly specifying "enabled: false" like it is done in the tests (e.g. https://github.com/hashicorp/terraform-provider-azurerm/blob/main/internal/services/monitor/monitor_diagnostic_setting_resource_test.go#L277) The setup would mean to either specify "enabled = true" or to not specify the block at all... not sure if this is desired.

PPACI commented 2 years ago

I would like to see the proposed solution implemented. Some Azure Policy and Microsoft Defender for cloud will automatically enables diagnostic log. This operation will then change the state as explained above.

The ignore lifecycle doesn't work in this scenario.

For us, the workaround is to specify the automatic policy in the terraform code as if some audit-log were setup by terraform. It works but we would like to keep some diagnostic log (specially the security ones) outside of terraform and we currently are unable to.

The proposed "explicitly specify enabled=true/false to manage" would be perfect for us.

I can propose a PR with the changes if that's OK for you.

SenorRagequit commented 2 years ago

Still happening. As soon as you define anything, azure automatically sets everything else to 0 if not defined. Terraform sees that 0 as a new change, since 0 isn't NULL.
grafik

biswarup1290dass commented 2 years ago

I am seeing similar problem for AKS Diagnostics where I split diff log category to two types of storage (few configured to store in Log Analytics and few in Storage Account).

Below is my TF file to create the azurerm_monitor_diagnostice_setting resources:

resource "azurerm_monitor_diagnostic_setting" "aks_audit" {
  lifecycle {
    ignore_changes = [target_resource_id]
  }
  name                       = ${local.diag_name}
  log_analytics_workspace_id = azurerm_log_analytics_workspace.workspace.id
  target_resource_id         = data.azurerm_kubernetes_cluster.aks.id

  log {
    category = "cloud-controller-manager"
    enabled  = true

    retention_policy {
      days    = 90
      enabled = true
    }
  }

  log {
    category = "cluster-autoscaler"
    enabled  = true

    retention_policy {
      days    = 90
      enabled = true
    }
  }

  log {
    category = "kube-apiserver"
    enabled  = true

    retention_policy {
      days    = 90
      enabled = true
    }
  }

  log {
    category = "kube-controller-manager"
    enabled  = true

    retention_policy {
      days    = 90
      enabled = true
    }
  }

  log {
    category = "kube-scheduler"
    enabled  = true

    retention_policy {
      days    = 90
      enabled = true
    }
  }

  log {
    category = "guard"
    enabled  = true

    retention_policy {
      days    = 90
      enabled = true
    }
  }

  metric {
    category = "AllMetrics"
    enabled  = true

    retention_policy {
      days    = 90
      enabled = true
    }
  }
}

resource "azurerm_monitor_diagnostic_setting" "aks_audit_sa" {
  lifecycle {
    ignore_changes = [target_resource_id]
  }
  name                       =  ${local.sa_diag_name}
  storage_account_id         = azurerm_storage_account.audit_sa.id
  target_resource_id         = data.azurerm_kubernetes_cluster.aks.id
  log {
    category = "kube-audit"
    enabled  = true

    retention_policy {
      days    = 90
      enabled = true
    }
  }

  log {
    category = "kube-audit-admin"
    enabled  = true

    retention_policy {
      days    = 90
      enabled = true
    }
  }
}

The plan shows something like

# azurerm_monitor_diagnostic_setting.aks_audit will be updated in-place
  ~ resource "azurerm_monitor_diagnostic_setting" "aks_audit" {
        id                 = <REDACTED>
        name               = <REDACTED>
        # (2 unchanged attributes hidden)

      - log {
          - category = "csi-azuredisk-controller" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "csi-azurefile-controller" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "csi-snapshot-controller" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "kube-audit" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "kube-audit-admin" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }

        # (7 unchanged blocks hidden)
    }

  # azurerm_monitor_diagnostic_setting.aks_audit_sa will be updated in-place
  ~ resource "azurerm_monitor_diagnostic_setting" "aks_audit_sa" {
        id                 = <REDACTED>
        name               = <REDACTED>
        # (2 unchanged attributes hidden)

      - log {
          - category = "cloud-controller-manager" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "cluster-autoscaler" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "csi-azuredisk-controller" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "csi-azurefile-controller" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "csi-snapshot-controller" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "guard" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "kube-apiserver" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "kube-controller-manager" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
      - log {
          - category = "kube-scheduler" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }

      - metric {
          - category = "AllMetrics" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }
        # (2 unchanged blocks hidden)
    }

Plan: 0 to add, 2 to change, 0 to destroy.

This is the plan on every TF plan exec, w/o any variable changes. Had to add the lifecycle meta arg when this happened last time but the mentioned request handling on the Azure API end is forcing the provider to identify a change every time.

I had to explicitly specify all the categories(log as well as metric) that I don't need as below to make TF dodge the "undesired" changes.

ex:

log {
    category = "CATEGORY"
    enabled  = false
    retention_policy {
      days    = 0
      enabled = false
    }
  }

Till this issue remains open/unresolved, I need to keep adding whenever a new diagnostic log category is introduced by Azure and enforced.

murdibb commented 2 years ago

Hi all,

Below my workaround which resulted in no unneeded changes detected. AzureRM provider version: 2.93.1 Terraform version: 0.13.7

1) a local list variable containing the categories I want to enable logs/metrics

locals {
  example_recovery_vault_diagnostic_categories_enabled = [
    "AzureSiteRecoveryJobs",
    "AzureSiteRecoveryRecoveryPoints"
  ]
}

2) data source to get ALL the available categories

data "azurerm_monitor_diagnostic_categories" "example" {
  resource_id = azurerm_recovery_services_vault.example.id
}

3) dynamic block that generates all the categories including the ones I need, and also the ones I want to disable with respective settings. Condition for "enable = true" is as you can see, if the category matches my list or not.

resource "azurerm_monitor_diagnostic_setting" "example" {
  name                           = azurerm_recovery_services_vault.example.name
  target_resource_id             = azurerm_recovery_services_vault.example.id
  log_analytics_workspace_id     = data.azurerm_log_analytics_workspace.example.id
  log_analytics_destination_type = "AzureDiagnostics"

  dynamic "log" {
    iterator = log_category
    for_each = data.azurerm_monitor_diagnostic_categories.example.logs

    content {
      enabled  = contains(local.example_recovery_vault_diagnostic_categories_enabled,log_category.value) ? true : false
      category = log_category.value
      retention_policy {
        days     = 0
        enabled  = contains(local.example_recovery_vault_diagnostic_categories_enabled,log_category.value) ? true : false
      }
    }
  }

  dynamic "metric" {
    iterator = metric_category
    for_each = data.azurerm_monitor_diagnostic_categories.example.metrics

    content {
      enabled  = contains(local.example_recovery_vault_diagnostic_categories_enabled,metric_category.value) ? true : false
      category = metric_category.value
      retention_policy {
        days     = 0
        enabled  = contains(local.example_recovery_vault_diagnostic_categories_enabled,metric_category.value) ? true : false
      }
    }
  }
}
or-shachar commented 2 years ago

@murdibb worked for us - thanks!

MageshSrinivasulu commented 2 years ago

It works for us as well. Tested it.

guidooliveira commented 2 years ago

Hi all,

Below my workaround which resulted in no unneeded changes detected. AzureRM provider version: 2.93.1 Terraform version: 0.13.7

  1. a local list variable containing the categories I want to enable logs/metrics
locals {
  example_recovery_vault_diagnostic_categories_enabled = [
    "AzureSiteRecoveryJobs",
    "AzureSiteRecoveryRecoveryPoints"
  ]
}
  1. data source to get ALL the available categories
data "azurerm_monitor_diagnostic_categories" "example" {
  resource_id = azurerm_recovery_services_vault.example.id
}
  1. dynamic block that generates all the categories including the ones I need, and also the ones I want to disable with respective settings. Condition for "enable = true" is as you can see, if the category matches my list or not.
resource "azurerm_monitor_diagnostic_setting" "example" {
  name                           = azurerm_recovery_services_vault.example.name
  target_resource_id             = azurerm_recovery_services_vault.example.id
  log_analytics_workspace_id     = data.azurerm_log_analytics_workspace.example.id
  log_analytics_destination_type = "AzureDiagnostics"

  dynamic "log" {
    iterator = log_category
    for_each = data.azurerm_monitor_diagnostic_categories.example.logs

    content {
      enabled  = contains(local.example_recovery_vault_diagnostic_categories_enabled,log_category.value) ? true : false
      category = log_category.value
      retention_policy {
        days     = 0
        enabled  = contains(local.example_recovery_vault_diagnostic_categories_enabled,log_category.value) ? true : false
      }
    }
  }

  dynamic "metric" {
    iterator = metric_category
    for_each = data.azurerm_monitor_diagnostic_categories.example.metrics

    content {
      enabled  = contains(local.example_recovery_vault_diagnostic_categories_enabled,metric_category.value) ? true : false
      category = metric_category.value
      retention_policy {
        days     = 0
        enabled  = contains(local.example_recovery_vault_diagnostic_categories_enabled,metric_category.value) ? true : false
      }
    }
  }
}

Unfortunatelly, even this isn't working anymore for me, the data call doesn't seem to be called during plan, which still triggers changes.

kamilzzz commented 1 year ago

Version 3.39.0 of azurerm provider introduced new block enabled_log under azurerm_monitor_diagnostic_setting resource.

In my case, transitioning existing log blocks to enabled_log blocks fixed the issue and Terraform no longer tries to apply the same changes in every Terraform plan/apply runs.

mtrin commented 1 year ago

Version 3.39.1 with enabled_log still has the same issue for me

mukeshinit commented 1 year ago

Version 3.39.0 of azurerm provider introduced new block enabled_log under azurerm_monitor_diagnostic_setting resource. fixed the enabled_log block issue .

But same issue exists for the metric block and it tries to apply the same changes over and over every Terraform plan/apply runs.

shanilhirani commented 1 year ago

Would having an enabled_metrics improve the situation as some have reported that the new attribute (enabled_log) has resolved their terraform drift?

Stolz commented 11 months ago

For me it keeps happening even when using enabled_log. My workaround is to ignore the changes with a lifecycle > ignore_changes block

resource "azurerm_monitor_diagnostic_setting" "ds" {
  log_analytics_workspace_id = var.log_analytics_workspace_id
  name                       = var.diagnostic_setting_name
  target_resource_id         = var.target_resource_id

  dynamic "enabled_log" {
    for_each = var.logs_categories 

    content {
      category = enabled_log.value
    }
  }

  lifecycle {
    ignore_changes = [enabled_log, log, metric] # Workaround for BUG https://github.com/terraform-providers/terraform-provider-azurerm/issues/10388
  }
}
Kulesza commented 8 months ago

The same problem exists for the Synapse workspace.

The diagnostics have no metrics. But a category AllMetrics is created automatically. Every run forces terraform to remove those.

- metric {
          - category = "AllMetrics" -> null
          - enabled  = false -> null

          - retention_policy {
              - days    = 0 -> null
              - enabled = false -> null
            }
        }