hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.6k stars 4.65k forks source link

Log Analytics workspace got deleted after a Terraform update #25815

Closed xiangyx closed 6 months ago

xiangyx commented 6 months ago

Is there an existing issue for this?

Community Note

Terraform Version

1.3.9

AzureRM Provider Version

3.100.0

Affected Resource(s)/Data Source(s)

azurerm_log_analytics_workspace; azurerm_log_analytics_solution

Terraform Configuration Files

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.100.0"
    }
  }
  required_version = ">=1.3.4"
}

provider "azurerm" {
  features {
    template_deployment {
      delete_nested_items_during_deletion = true
    }
  }
}

resource "azurerm_resource_group" "sentinel" {
  name     = "example"
  location = "West Europe"
}

resource "azurerm_log_analytics_workspace" "sentinel" {
  name                = azurerm_resource_group.sentinel.name
  location            = azurerm_resource_group.sentinel.location
  resource_group_name = azurerm_resource_group.sentinel.name
  retention_in_days   = 90

  lifecycle {
    prevent_destroy = true
  }
}

resource "azurerm_monitor_diagnostic_setting" "la_diagnostics" {
  count                          = 1
  name                           = "LogAnalyticsDiagnostics"
  target_resource_id             = azurerm_log_analytics_workspace.sentinel.id
  log_analytics_destination_type = "AzureDiagnostics"
  log_analytics_workspace_id     = azurerm_log_analytics_workspace.sentinel.id

  enabled_log {
    category = "Audit"
  }
  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

resource "azurerm_log_analytics_solution" "sentinel" {
  solution_name         = "SecurityInsights"
  location              = azurerm_resource_group.sentinel.location
  resource_group_name   = azurerm_resource_group.sentinel.name
  workspace_resource_id = azurerm_log_analytics_workspace.sentinel.id
  workspace_name        = azurerm_log_analytics_workspace.sentinel.name
  plan {
    publisher = "Microsoft"
    product   = "OMSGallery/SecurityInsights"
  }

  lifecycle {
    prevent_destroy = true
  }
}

resource "azurerm_sentinel_alert_rule_scheduled" "sentinel" {
  name                       = "example"
  log_analytics_workspace_id = azurerm_log_analytics_workspace.sentinel.id
  display_name               = "example alert rule"
  severity                   = "Informational"
  query                      = <<QUERY
        AzureActivity |
        where OperationName == "Create or Update Virtual Machine" |
        where ActivityStatus == "Succeeded" |
        make-series dcount(ResourceId) default=0 on EventSubmissionTimestamp in range(ago(7d), now(), 1d) by Caller
        QUERY
}

resource "azurerm_resource_group_template_deployment" "parser" {
  deployment_mode     = "Incremental"
  name                = "parser-custom"
  resource_group_name = azurerm_resource_group.sentinel.name

  template_content = jsonencode(
    {
      "$schema" : "https://schema.management.azure.com/schemas/2019-08-01/deploymentTemplate.json#",
      "contentVersion" : "1.0.0.0",
      "resources" : [
        {
          "type" : "Microsoft.OperationalInsights/workspaces",
          "apiVersion" : "2017-03-15-preview",
          "name" : "Yaoexample",
          "location" : "West Europe",
          "resources" : [
            {
              "type" : "savedSearches",
              "apiVersion" : "2020-08-01",
              "name" : "ASim_NetworkSessionCustom",
              "dependsOn" : [
                "[concat('Microsoft.OperationalInsights/workspaces/', 'Yaoexample')]"
              ],
              "properties" : {
                "etag" : "*",
                "displayName" : "ASIM NetworkSession custom filtering glue parser",
                "category" : "Security",
                "FunctionAlias" : "ASim_NetworkSessionCustom",
                "query" : "union isfuzzy=true\nASim_NetworkSession_LG_PaloAlto_Traffic()",
                "version" : 1,
                "functionParameters" : ""
              }
            }
          ]
        }
      ]
    }
  )
}

resource "azurerm_log_analytics_query_pack" "default-query-pack" {
  name                = "default-query-pack"
  resource_group_name = azurerm_resource_group.sentinel.name
  location            = azurerm_resource_group.sentinel.location
}

resource "azurerm_log_analytics_query_pack_query" "example" {
  name          = "19952bc3-0bf9-49eb-b713-6b80e7a41847"
  query_pack_id = azurerm_log_analytics_query_pack.default-query-pack.id
  body          = "let newExceptionsTimeRange = 1d;\nlet timeRangeToCheckBefore = 7d;\nexceptions\n| where timestamp < ago(timeRangeToCheckBefore)\n| summarize count() by problemId\n| join kind= rightanti (\nexceptions\n| where timestamp >= ago(newExceptionsTimeRange)\n| extend stack = tostring(details[0].rawStack)\n| summarize count(), dcount(user_AuthenticatedId), min(timestamp), max(timestamp), any(stack) by problemId  \n) on problemId \n| order by  count_ desc\n"
  display_name  = "Exceptions - New in the last 24 hours"
}

Debug Output/Panic Output

Terraform will perform the following actions:

  # azurerm_monitor_diagnostic_setting.la_diagnostics[0] will be updated in-place
  ~ resource "azurerm_monitor_diagnostic_setting" "la_diagnostics" {
        id                             = "/subscriptions/***/resourceGroups/example-rg/providers/Microsoft.OperationalInsights/workspaces/staging-sentinel|LogAnalyticsDiagnostics"
      + log_analytics_destination_type = "AzureDiagnostics"
        name                           = "LogAnalyticsDiagnostics"
        # (2 unchanged attributes hidden)

        # (4 unchanged blocks hidden)
    }

  # module.log-parsers[0].azurerm_resource_group_template_deployment.parser["parser-custom.json"] will be destroyed
  # (because key ["parser-custom.json"] is not in for_each map)
  - resource "azurerm_resource_group_template_deployment" "parser" {
      - deployment_mode     = "Incremental" -> null
      - id                  = "/subscriptions/***/resourceGroups/example-rg/providers/Microsoft.Resources/deployments/parser-parser-custom" -> null
      - name                = "parser-parser-custom" -> null
      - output_content      = jsonencode({}) -> null
      - parameters_content  = jsonencode(
            {
              - location      = {
                  - value = "northeurope"
                }
              - workspaceName = {
                  - value = "staging-sentinel"
                }
            }
        ) -> null
      - resource_group_name = "example-rg" -> null
      - tags                = {} -> null
      - template_content    = jsonencode(
            ...
        ) -> null
    }

  # module.saved-queries.azurerm_log_analytics_query_pack_query.saved_query["customSavedQuery.json"] will be destroyed
  # (because key ["customSavedQuery.json"] is not in for_each map)
  - resource "azurerm_log_analytics_query_pack_query" "saved_query" {
      - body           = <<-EOT
            Syslog
            | where ProcessName == "dnstap"
        EOT -> null
      - categories     = [] -> null
      - description    = "customSavedQuery" -> null
      - display_name   = "customSavedQuery" -> null
      - id             = "/subscriptions/***/resourceGroups/example-rg/providers/Microsoft.OperationalInsights/queryPacks/DefaultQueryPack/queries/d9a75bcd...." -> null
      - name           = "d9a75bcd...." -> null
      - query_pack_id  = "/subscriptions/***/resourceGroups/example-rg/providers/Microsoft.OperationalInsights/queryPacks/DefaultQueryPack" -> null
      - resource_types = [
          - "microsoft.operationalinsights/workspaces",
        ] -> null
      - solutions      = [] -> null
      - tags           = {
          - "labels" = ""
        } -> null
    }

Plan: 0 to add, 1 to change, 2 to destroy.

=======Error message after first attempt==================
╷
│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to
│ azurerm_monitor_diagnostic_setting.la_diagnostics[0], provider
│ "provider[\"registry.terraform.io/hashicorp/azurerm\"]" produced an
│ unexpected new value: Root resource was present, but now absent.
│ 
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.
╵

=======Error message after second attempt==================
╷
│ Error: Instance cannot be destroyed
│ 
│   on main.tf line 32:
│   32: resource "azurerm_log_analytics_solution" "sentinel" {
│ 
│ Resource azurerm_log_analytics_solution.sentinel has
│ lifecycle.prevent_destroy set, but the plan calls for this resource to be
│ destroyed. To avoid this error and continue with the plan, either disable
│ lifecycle.prevent_destroy or reduce the scope of the plan using the -target
│ flag.
╵

Expected Behaviour

We expect only the parser and saved query to be deleted in a normal cleanup task, and the Log Analytics workspace and the Sentinel workspace on top it run as usual.

Deployments prior to this failure, in order from earliest to most recent, were

  1. Update a watchlist in Sentinel, success
  2. update provider azurerm from 3.99.0 to 3.100.0, success

Actual Behaviour

The Log Analysis workspace and the Sentinel workspace above it were both deleted, even though we had set the "prevent_destory = true" protection block in both resources.

Steps to Reproduce

We tried to reproduce the issue by simulating the deployment histories in the following steps:

  1. update the provider azurerm from version 3.99.0 to 3.100.0, run terraform init -upgrade
  2. remove resources azurerm_resource_group_template_deployment.parser and azurerm_log_analytics_query_pack_query.example, then deploy by terraform apply

Unfortunately, the issue somehow did not get reproduced and it made us more worried about our current setup of using Terraform to manage our Sentinel infrastructure, as the consequence of the workspace getting deleted is terrible when this change is not present in the execution plan and bypasses protection layers. Therefore, by reporting on this issue, we want to learn more about the potential causes and recommendations for preventing such cases.

Important Factoids

No response

References

No response

xiangyx commented 6 months ago

Hello,

While we are waiting for someone to take a look at this issue, today we tried doing a deployment to our production environment using Terraform after restoring the staging environment manually. Still, Terraform tried to delete the workspace without warning in the execution plan, as we saw on the staging posted above. Thankfully, this issue did not happen because we removed the delete permissions for the service principal on our production Log Analytics workspace before the trial.

Here is a digest of the Terraform plan

Nothing in the above should change the Log Analytics workspace.

The error messages we got are:

╷
│ Error: removing items provisioned by this Template Deployment: deleting Nested Resource "/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel": resources.Client#DeleteByID: Failure sending request: StatusCode=403 -- Original Error: Code="AuthorizationFailed" Message="The client '***-***-***' with object id '***-***-***' does not have authorization to perform action 'Microsoft.OperationalInsights/workspaces/delete' over scope '/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel' or the scope is invalid. If access was recently granted, please refresh your credentials."
│ 
│ 
╵
╷
│ Error: removing items provisioned by this Template Deployment: deleting Nested Resource "/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel": resources.Client#DeleteByID: Failure sending request: StatusCode=403 -- Original Error: Code="AuthorizationFailed" Message="The client '***-***-***' with object id '***-***-***' does not have authorization to perform action 'Microsoft.OperationalInsights/workspaces/delete' over scope '/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel' or the scope is invalid. If access was recently granted, please refresh your credentials."
│ 
│ 
╵
╷
│ Error: removing items provisioned by this Template Deployment: deleting Nested Resource "/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel": resources.Client#DeleteByID: Failure sending request: StatusCode=403 -- Original Error: Code="AuthorizationFailed" Message="The client '***-***-***' with object id '***-***-***' does not have authorization to perform action 'Microsoft.OperationalInsights/workspaces/delete' over scope '/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel' or the scope is invalid. If access was recently granted, please refresh your credentials."
│ 
│ 
╵
╷
│ Error: removing items provisioned by this Template Deployment: deleting Nested Resource "/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel": resources.Client#DeleteByID: Failure sending request: StatusCode=403 -- Original Error: Code="AuthorizationFailed" Message="The client '***-***-***' with object id '***-***-***' does not have authorization to perform action 'Microsoft.OperationalInsights/workspaces/delete' over scope '/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel' or the scope is invalid. If access was recently granted, please refresh your credentials."
│ 
│ 
╵
╷
│ Error: removing items provisioned by this Template Deployment: deleting Nested Resource "/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel": resources.Client#DeleteByID: Failure sending request: StatusCode=403 -- Original Error: Code="AuthorizationFailed" Message="The client '***-***-***' with object id '***-***-***' does not have authorization to perform action 'Microsoft.OperationalInsights/workspaces/delete' over scope '/subscriptions/***/resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinel' or the scope is invalid. If access was recently granted, please refresh your credentials."
│ 
│ 
╵

Again, we really want to know why and how Terraform tries to delete the workspace, and very much look forward to hearing anything on this issue.

xiangyx commented 6 months ago

We have discovered the root cause of this issue - We were using deployment template(azurerm_resource_group_template_deployment) to deploy log-parsers which had the Log Analytics workspace as nested resource and when that was deleted it also assumed it should delete everything contained in that template when it has the feature delete_nested_items_during_deletion=true in the azurerm provider. To prevent the issue, set the flag to false.

Thank everyone for taking the time to look into this issue!

github-actions[bot] commented 5 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.