hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.46k stars 4.54k forks source link

azurerm_security_center_workspace failed after 30m timeout #5475

Open szaher opened 4 years ago

szaher commented 4 years ago

Community Note

azurerm_security_center_workspace failed after 30m timeout I am creating two azurerm_security_center_workspace in each run, each one is being created in a separate resource group and unique name. module.omsla.azurerm_security_center_workspace.omssc: Still creating... [29m50s elapsed] module.omsla.azurerm_security_center_workspace.omssc: Still creating... [30m0s elapsed]

Error: Error waiting: timeout while waiting for state to become 'Populated' (last state: 'Waiting', timeout: 30m0s)

on ../../resources/oms/oms_main.tf line 24, in resource "azurerm_security_center_workspace" "omssc": 24: resource "azurerm_security_center_workspace" "omssc" {

I suspect the Refresh function here https://github.com/terraform-providers/terraform-provider-azurerm/blob/master/azurerm/internal/services/securitycenter/resource_arm_security_center_workspace.go#L115

Also the api version is 1.0 https://github.com/terraform-providers/terraform-provider-azurerm/blob/master/azurerm/internal/services/securitycenter/resource_arm_security_center_workspace.go#L8 while v3.0 is available https://github.com/Azure/azure-sdk-for-go/tree/master/services/preview/security/mgmt/v3.0/security

I think the workspace ID and everything is populated but terraform don't don't get it not sure why! may be I am wrong... Any idea how can I get around this problem ?

Terraform (and AzureRM Provider) Version 1.40.0

Affected Resource(s)

Terraform Configuration Files

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://keybase.io/hashicorp

resource "azurerm_log_analytics_workspace" "omsla" {
  name                = var.res_omsla_name
  location            = var.res_location
  resource_group_name = var.res_omsla_rg_name
  retention_in_days   = var.res_omsla_retention_days
  sku                 = var.res_omsla_sku
  tags                = var.res_tags
  lifecycle {
    ignore_changes = [
      name
    ]
  }
}

resource "azurerm_security_center_subscription_pricing" "omssc-pricing" {
  tier = "Standard"
}

resource "azurerm_security_center_workspace" "omssc" {
  scope        = "/subscriptions/${var.res_subscription_id}"
  workspace_id = azurerm_log_analytics_workspace.omsla.id
  depends_on = [
    azurerm_security_center_subscription_pricing.omssc-pricing,
    azurerm_log_analytics_workspace.omsla
  ]
}

Debug Output

Panic Output

module.omsla.azurerm_security_center_workspace.omssc: Still creating... [29m50s elapsed] module.omsla.azurerm_security_center_workspace.omssc: Still creating... [30m0s elapsed]

Error: Error waiting: timeout while waiting for state to become 'Populated' (last state: 'Waiting', timeout: 30m0s)

on ../../resources/oms/oms_main.tf line 24, in resource "azurerm_security_center_workspace" "omssc": 24: resource "azurerm_security_center_workspace" "omssc" {

Expected Behavior

module.omsla.azurerm_security_center_workspace.omssc: Still creating... [15m0s elapsed] module.omsla.azurerm_security_center_workspace.omssc: Creation complete after 15m5s [id=/subscriptions/xxxx-xxxx-xxxx-xxxx-xxx0003a96/providers/Microsoft.Security/workspaceSettings/default]

Apply complete! Resources: 8 added, 0 changed, 0 destroyed.

Actual Behavior

Steps to Reproduce

  1. terraform apply

Important Factoids

References

Also the api version is 1.0 https://github.com/terraform-providers/terraform-provider-azurerm/blob/master/azurerm/internal/services/securitycenter/resource_arm_security_center_workspace.go#L8 while v3.0 is available https://github.com/Azure/azure-sdk-for-go/tree/master/services/preview/security/mgmt/v3.0/security

r0b2g1t commented 4 years ago

Hi, I recognised this bug too. When I use the azurerm_security_center_workspace resource I'm able to assign a workspace to the "default" security center workspace settings with Terraform. But after a Terraform destroy the creation process of the azurerm_security_center_workspace ends with a timeout and Terraform isn't able to finished the job anymore. After that I need set the default security center workspace settings by the Azure cli client tools. I could reproduce this behaviour in multiple subscriptions. This issue can't be fixed by removing the settings manually (azure cli) or via Web gui.

anttipo commented 3 years ago

I'm also struggling with this, when I'm bootstrapping new subscriptions and the security center is one of the resources. Sometimes the workspace creation succeeds after 30, 45 or 75 minutes but this is incredibly inconsistent. For me, terraform destroy works sometimes just fine but if I would like to change variables involved with ASC, the workspace just gets stuck for no obvious reason.

MattGarner-N commented 3 years ago

This is still happening

Suseelraj commented 3 years ago

any fixes yet?

fluffy-cakes commented 3 years ago

So I grew tired of this failing and awaiting an update that has never come. Here is my work around that runs in an instant, no issues.

The terraform code:

resource "null_resource" "deploy_workspace_settings" {
    count                = var.enable_security_center ? 1 : 0
    provisioner "local-exec" {
        command          = <<EOT
            Import-Module ${path.module}/workspace_settings.psm1
            New-WorkspaceSetting `
                -ClientId       "${var.ARM_CLIENT_ID}" `
                -ClientSecret   "${var.ARM_CLIENT_SECRET}" `
                -SubscriptionId "${var.ARM_SUBSCRIPTION_ID}" `
                -TenantId       "${var.ARM_TENANT_ID}" `
                -WorkspaceId    "${var.workspace_id}" `
                -WorkspaceScope "${var.scope_id}"
        EOT
        interpreter      = ["pwsh", "-Command"]
    }
}

resource "null_resource" "destroy_workspace_settings" {
    count                = var.enable_security_center ? 1 : 0
    triggers             = {
        "ClientId"       = var.ARM_CLIENT_ID
        "ClientSecret"   = var.ARM_CLIENT_SECRET
        "SubscriptionId" = var.ARM_SUBSCRIPTION_ID
        "TenantId"       = var.ARM_TENANT_ID
    }
    provisioner "local-exec" {
        when             = destroy
        command          = <<EOT
            Import-Module ${path.module}/workspace_settings.psm1
            Remove-WorkspaceSetting `
                -ClientId       "${self.triggers.ClientId}" `
                -ClientSecret   "${self.triggers.ClientSecret}" `
                -SubscriptionId "${self.triggers.SubscriptionId}" `
                -TenantId       "${self.triggers.TenantId}"
        EOT
        interpreter      = ["pwsh", "-Command"]
    }
}

^^ For some reason ARM template resource would fail just as much as the Terraform one, so it became a PS script.

Which in turn calls PowerShell to do API calls (https://docs.microsoft.com/en-us/rest/api/securitycenter/workspacesettings).

function Get-Authenticated {
    [CmdletBinding()]
    param (
        [ValidateNotNullOrEmpty()]
        [string] $ClientId,
        [ValidateNotNullOrEmpty()]
        [string] $ClientSecret,
        [ValidateNotNullOrEmpty()]
        [string] $TenantId
    )

    $headers        = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
    $headers.Add("Content-Type", "application/x-www-form-urlencoded")

    # Client secrets can contain special characters, so we encode it to be used without errors
    $encode         = [System.Web.HTTPUtility]::UrlEncode("$ClientSecret")
    $body           = "client_id=$ClientId&client_secret=$encode&grant_type=client_credentials&scope=https%3A%2F%2Fmanagement.azure.com%2F.default"

    $url            = "https://login.microsoftonline.com/" + $TenantId + "/oauth2/v2.0/token"
    $authentication = Invoke-RestMethod $url -Method 'GET' -Headers $headers -Body $body

    return $authentication.access_token
}

function New-WorkspaceSetting {
    [CmdletBinding()]
    param (
        [ValidateNotNullOrEmpty()]
        [string] $ClientId,
        [ValidateNotNullOrEmpty()]
        [string] $ClientSecret,
        [ValidateNotNullOrEmpty()]
        [string] $SubscriptionId,
        [ValidateNotNullOrEmpty()]
        [string] $TenantId,
        [ValidateNotNullOrEmpty()]
        [string] $WorkspaceId,
        [ValidateNotNullOrEmpty()]
        [string] $WorkspaceScope
    )

    $workspaceName = "default"
    $token         = Get-Authenticated -ClientId "$ClientId" -ClientSecret "$ClientSecret" -TenantId "$TenantId"

    $headers       = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
    $headers.Add("Authorization", "Bearer $token")
    $headers.Add("Content-Type", "application/json")

    $body = "{`n
        `"properties`": {`n
            `"workspaceId`": `"$WorkspaceId`",`n
            `"scope`": `"$WorkspaceScope`"`n
        }`n
    }"

    $url           = "https://management.azure.com/subscriptions/" + $SubscriptionId + "/providers/Microsoft.Security/workspaceSettings/" + $workspaceName + "?api-version=2017-08-01-preview"
    $response      = Invoke-RestMethod $url -Method 'PUT' -Headers $headers -Body $body
    $response
}

function Remove-WorkspaceSetting {
    [CmdletBinding()]
    param (
        [ValidateNotNullOrEmpty()]
        [string] $ClientId,
        [ValidateNotNullOrEmpty()]
        [string] $ClientSecret,
        [ValidateNotNullOrEmpty()]
        [string] $SubscriptionId,
        [ValidateNotNullOrEmpty()]
        [string] $TenantId
    )

    $workspaceName = "default"
    $token         = Get-Authenticated -ClientId "$ClientId" -ClientSecret "$ClientSecret" -TenantId "$TenantId"

    $headers       = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
    $headers.Add("Authorization", "Bearer $token")

    $url           = "https://management.azure.com/subscriptions/" + $SubscriptionId + "/providers/Microsoft.Security/workspaceSettings/" + $workspaceName + "?api-version=2017-08-01-preview"
    $response      = Invoke-RestMethod $url -Method 'DELETE' -Headers $headers -Body $body
    $response
}
MattoHopkins commented 3 years ago

@tombuildsstuff there an update on this?

This failed after 60min, when reapplying it said the resource already existed so I imported it via terraform import, and then reapplied which took another 50 minutes to modify when it finally passed.

Is the safe bet for this just to extend the timeout to 2-3 hours? Is that how long this normally takes to build?

Thanks for your help

fluffy-cakes commented 3 years ago

2-3hrs time out? @MattoHopkins , how much time do you have on your hands?? The API call I posted deploys the resource within seconds; I strongly advise against waiting so long for this resource to be deployed, unless you're so bored.

anttipo commented 3 years ago

The actual solution cannot indeed be waiting for that long. I run similar operations as part of a CD pipeline and we can't have it running for that long.

MattoHopkins commented 3 years ago

2-3hrs time out? @MattoHopkins , how much time do you have on your hands?? The API call I posted deploys the resource within seconds; I strongly advise against waiting so long for this resource to be deployed, unless you're so bored.

Yeah of course mate. Thats why I've commented here to avoid having to wait (and others having to wait) for the same time period. I get your API's calls are more effecient, but doesnt change the fact that this resource is bugged :)

favoretti commented 2 years ago

I have been running into this issue on and off for about a year now, did a few passes at trying to remediate this in the code, but I don't see why this is happening so far. I'll try to give it another go. Indeed via the portal or cli it's instantaneous, via go-sdk it cane take up to 3 hours..

favoretti commented 3 weeks ago

Going through the list of the issues assigned to me... Popping this back up on the list of the interested folks - do we still see this happening? We haven't seen this for a while in our envs, so I'm thinking of closing this one unless people are still running into the issue, in which case a reproduction scenario would be helpful.