hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.46k stars 4.54k forks source link

Diagnostic settings does not support retention for new diagnostic settings #23051

Open kavinkvb opened 10 months ago

kavinkvb commented 10 months ago

Is there an existing issue for this?

Community Note

Terraform Version

0.13.4

AzureRM Provider Version

3.68.0

Affected Resource(s)/Data Source(s)

azurerm_monitor_diagnostic_setting

Terraform Configuration Files

`# Create Azure log analytics workspace
resource "azurerm_log_analytics_workspace" "main" {
  count               = var.enable_log_analytics_workspace ? 1 : 0
  name                = var.cluster_name
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  sku                 = var.log_analytics_workspace_sku
  retention_in_days   = var.log_retention_in_days
  daily_quota_gb      = var.assign_policy ? var.log_analytics_dailyquota : -1
  tags                = var.tags
}
resource "azurerm_log_analytics_solution" "main" {
  count                 = var.enable_log_analytics_workspace ? 1 : 0
  solution_name         = "ContainerInsights"
  location              = azurerm_resource_group.main.location
  resource_group_name   = azurerm_resource_group.main.name
  workspace_resource_id = azurerm_log_analytics_workspace.main[0].id
  workspace_name        = azurerm_log_analytics_workspace.main[0].name

  plan {
    publisher = "Microsoft"
    product   = "OMSGallery/ContainerInsights"
  }

}

resource "azurerm_storage_account" "main" {
  count                    = var.enable_logs_storage ? 1 : 0
  name                     = var.storage_log_name
  resource_group_name      = azurerm_resource_group.main.name
  location                 = azurerm_resource_group.main.location
  account_tier             = "Standard"
  account_replication_type = "GRS"
  access_tier              = "Cool"
}

resource "azurerm_storage_management_policy" "main" {
  count              = var.enable_logs_storage ? 1 : 0
  storage_account_id = azurerm_storage_account.main[0].id
  rule {
    name    = "deleteBlobAfter365days"
    enabled = true
    filters {
      blob_types = ["appendBlob"]
    }
    actions {
      base_blob {
        delete_after_days_since_modification_greater_than = 365
      }
    }
  }
}

resource "azurerm_log_analytics_data_export_rule" "main" {
  count                   = var.enable_logs_storage ? 1 : 0
  name                    = "${var.cluster_name}-data-export"
  resource_group_name     = azurerm_resource_group.main.name
  workspace_resource_id   = azurerm_log_analytics_workspace.main[0].id
  destination_resource_id = azurerm_storage_account.main[0].id
  table_names             = ["ContainerInventory", "ContainerLog", "ContainerNodeInventory", "InsightsMetrics", "KubeEvents", "KubeMonAgentEvents", "KubeNodeInventory", "KubePodInventory", "KubePVInventory", "KubeServices"]
  enabled                 = true
}

# Enable diagnostic settings for the AKS
resource "azurerm_monitor_diagnostic_setting" "aks_logs" {
  count                      = var.enable_log_analytics_workspace ? 1 : 0
  name                       = var.cluster_name
  target_resource_id         = azurerm_kubernetes_cluster.main.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.main[0].id

  log {
    category = "kube-apiserver"
    enabled  = true
    retention_policy {
      enabled = true
      days    = var.log_retention_in_days
    }
  }

  log {
    category = "kube-audit"
    enabled  = true
    retention_policy {
      enabled = true
      days    = var.log_retention_in_days
    }
  }

  log {
    category = "cluster-autoscaler"
    enabled  = true
    retention_policy {
      enabled = true
      days    = var.log_retention_in_days
    }
  }
  log {
    category = "kube-scheduler"
    enabled  = true
    retention_policy {
      enabled = true
      days    = var.log_retention_in_days
    }
  }

  log {
    category = "kube-controller-manager"
    enabled  = true
    retention_policy {
      enabled = true
      days    = var.log_retention_in_days
    }
  }

  log {
    category = "kube-apiserver"
    enabled  = true
    retention_policy {
      enabled = true
      days    = var.log_retention_in_days
    }
  }

  metric {
    category = "AllMetrics"
    enabled  = true

    retention_policy {
      enabled = true
      days    = var.log_retention_in_days
    }
  }
}
`

Debug Output/Panic Output

`
creating Monitor Diagnostics Setting "test-tf-mod-azure-aks-qX9ajc" for Resource "/subscriptions/****/resourceGroups/test-tf-mod-azure-aks-qX9ajc/providers/Microsoft.ContainerService/managedClusters/test-tf-mod-azure-aks-qX9ajc": insights.DiagnosticSettingsClient#CreateOrUpdate:
 Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadRequest" Message="Diagnostic settings does not support retention for new diagnostic settings."`

Expected Behaviour

The diagnostic setting needs to be worked as expected

Actual Behaviour

No response

Steps to Reproduce

No response

Important Factoids

No response

References

No response

tberreis commented 10 months ago

Same here with azurerm provider 3.69.0 and 3.70.0.

It looks like Microsoft preponed the deprecation of retention policies. When trying to set up the diagnostic settings in the portal I get the following error:

Storage retention via diagnostic settings is being deprecated and new rules can no longer be configured. To maintain your existing retention rules please migrate to Azure Storage Lifecycle Management by September 30th 2025. [What do I need to do?](https://go.microsoft.com/fwlink/?linkid=2243231)

September 30, 2023 ā€“ You will no longer be able to use the API (CLI, Powershell, or templates), or Azure portal to configure retention setting unless you're changing them to 0. Existing retention rules will still be respected.

See https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-azure-storage-lifecycle-policy

Created a support case to clarify with MS. I'll keep you posted.


First response from the support team:

The below retention days no more available . instead of that , the configuration should be on the storage account itself (destination) by Lifecycle Management of the storage account . [...] I would recommend to remove the retention policy segment, or at least set the retention to false and with a "days" value of 0 template and try again.

fabiostawinski commented 10 months ago

I removed the "days" from the implementation, and it worked. The retention days is set in the target resource here, so it will result in the same, we have log analytics workspace here also, like you have, and I see you have the retention days defined in it as well, so it should be the quick fix.

sehgalnamit commented 10 months ago

Thanks, I also got the issue yesterday. Glad that I can find this on internet. log { category = "StorageDelete" enabled = true

retention_policy {
  enabled = true
  days    = 365 >> I will change to 0
}

}

I found this: - https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-azure-storage-lifecycle-policy

If I put this to 0 that means unlimited, cost of my LA will increase. Unless my LA automatically deletes as per retention set at LA.

MarkBibby1 commented 10 months ago

Same issue started 23rd Aug.23

Previous:

dynamic "log" {
    for_each = data.azurerm_monitor_diagnostic_categories.aks_diag_cat.log_category_types
    content {
       category = log.value
       enabled  = true
       retention_policy {
        days = 30
        enabled = true
      }
}

New (working):

dynamic "log" {
    for_each = data.azurerm_monitor_diagnostic_categories.aks_diag_cat.log_category_types
    content {
       category = log.value
       enabled  = true
   }
}
udith-ranmuthugala-rft commented 10 months ago

Can someone kindly clarify that if we are targeting log analytic workspace when enabling Diagnostic setting do we need to specify the "retention_policy" block as it is only relevant when targeting storage account.

Ref: https://github.com/Azure/azure-cli/issues/21328

rhaddadi commented 10 months ago

i'm experiencing the same issues when trying to create the diagnostic setting for Postgresql and keyvault. While previously we had no problem deploying with the retention policy set.

sagarallscripts commented 10 months ago

I have started seeing this issue today, while trying to deploy resources to Azure using Terraform.

Error: creating Monitor Diagnostics Setting "xyx-diag-setting" for Resource "xyz-keyvault": diagnosticsettings.DiagnosticSettingsClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadRequest" Message="Diagnostic settings does not support retention for new diagnostic settings."

mani-trimble commented 10 months ago

https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-azure-storage-lifecycle-policy

rezamanshouri commented 10 months ago

Possible related issues:

https://github.com/Azure/ResourceModules/issues/3572 https://github.com/MicrosoftDocs/azure-docs/issues/113490

MarkBibby1 commented 10 months ago

https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-azure-storage-lifecycle-policy

Deprecated dates look incorrect.

zioalex commented 10 months ago

This is happening to me only when creating new resources. Trying to stick to the azurerm 3.70.0 that worked before

cgrpa commented 10 months ago

im seeing a similar issue on 3.67.0, which is completely new.

cgrpa commented 10 months ago

i removed the retention rules & now can apply.

zioalex commented 10 months ago

This is happening to me only when creating new resources. Trying to stick to the azurerm 3.70.0 that worked before

Unfortunately the version 3.70.0 produce the same error. It looks something else changed on the backend

Khayoann2 commented 10 months ago

Same issue here ... If you got something as a workaround ... Because it is not possible for us to disable completely the retention rule because of the cost and we have a lot of resources & logs & environments ... It is really blocking us ... šŸ‘Ž To be honest, i don't understand how those breaking changes are managed in terms of communication .... Maybe, i missed something but it's clearly not clear enough ...

AndrewHuddleston commented 10 months ago

same issue here, going to remove from my code like was mentioned above. You can set log retention in the log analytics workspace which is probably what Microsoft is going to recommend people to do anyways going forward.

Khayoann2 commented 10 months ago

same issue here, going to remove from my code like was mentioned above. You can set log retention in the log analytics workspace which is probably what Microsoft is going to recommend people to do anyways going forward.

This is exactly what i'm doing right now... The problem that i have on my side is if you have different retention period depending of the log category for the same resource, i don't find a way to be able to be granular like before ....

shanilhirani commented 10 months ago

Removing seems to re-introduce the perpetual changes as previously posted

MageshSrinivasulu commented 10 months ago

Does someone have the terraform code for adding the lifecycle management rule into a storage account to add the retention policy?

tberreis commented 10 months ago

Does someone have the terraform code for adding the lifecycle management rule into a storage account to add the retention policy?

Please have a look at https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/storage_management_policy.html.

MageshSrinivasulu commented 10 months ago

If we have the logs and metrics sent to the event hub. Do we need to have the retention_policy enabled?

Annesars90 commented 10 months ago

Same issue here with setting diagnostic settings to a log analytics workspace. The documentation only describes the deprecation of retention policy when the logs are sent to a storage account.

Khayoann2 commented 10 months ago

@Annesars90 From my side, i managed the retention policy at log analytics workspace level and remove every retention period at the diag settings level in the code. I deleted every diag settings to recreate them from scratch with the new config and add the retention period for the log analytics. The only thing that i don't know for now is to have the same granularity as before...

jvdk13 commented 9 months ago

I removed the "days" from the implementation, and it worked. The retention days is set in the target resource here, so it will result in the same, we have log analytics workspace here also, like you have, and I see you have the retention days defined in it as well, so it should be the quick fix.

This is actually a problem when you are keeping all your logs in a centralized log analytics workspace... I want different retention values for different environments but still keep all my logs in the same workspace..

worldpwn commented 1 month ago

I just want to share with someone who will stumble upon this issue how to temporarily fix it in case you need to redeploy your environment from ground zero.

So we had a pipeline with almost the exact same configuration:

# With retention_policy
resource "azurerm_monitor_diagnostic_setting" "aks_logs" {
  count                      = var.enable_log_analytics_workspace ? 1 : 0
  name                       = var.cluster_name
  target_resource_id         = azurerm_kubernetes_cluster.main.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.main[0].id

  log {
    category = "kube-apiserver"
    enabled  = true
    retention_policy {
      enabled = true
      days    = var.log_retention_in_days
    }
  }
}

The pipeline and the infra were created a year ago. This configuration allows you to deploy because there are no changes. But if you need to recreate this component, it will be an error.

In order to bypass this, all you have to do is to remove the retention_policy block and deploy it. After that, return the retention_policy block and redeploy it.

# Wihtout
resource "azurerm_monitor_diagnostic_setting" "aks_logs" {
  count                      = var.enable_log_analytics_workspace ? 1 : 0
  name                       = var.cluster_name
  target_resource_id         = azurerm_kubernetes_cluster.main.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.main[0].id

  log {
    category = "kube-apiserver"
    enabled  = true
  }
}

It will not fix the issue in the future, but at least it will postpone it for you in case you need to overcome this blocking behavior as soon as possible.