hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.6k stars 4.64k forks source link

vmss resource not in state file when error occurs during deployment also leading to destroy errors #9309

Closed obourdon closed 3 years ago

obourdon commented 3 years ago

Community Note

Terraform (and AzureRM Provider) Version

Affected Resource(s)

Terraform Configuration Files

# No need for HCL code, failing test case included in AzureRM provider code

Debug Output

If you run the following command in AzureRM provider code cloned from this repository

TF_ACC=1 go test github.com/terraform-providers/terraform-provider-azurerm/azurerm/internal/services/compute/tests -v -run=TestAccAzureRMLinuxVirtualMachineScaleSet_imagesNoPlan  -timeout 180m -ldflags="-X=github.com/terraform-providers/terraform-provider-azurerm/version.ProviderVersion=acc"

You get the following:

=== RUN   TestAccAzureRMLinuxVirtualMachineScaleSet_imagesNoPlan
=== PAUSE TestAccAzureRMLinuxVirtualMachineScaleSet_imagesNoPlan
=== CONT  TestAccAzureRMLinuxVirtualMachineScaleSet_imagesNoPlan
--- FAIL: TestAccAzureRMLinuxVirtualMachineScaleSet_imagesNoPlan (96.27s)
    testing.go:684: Step 1 error: Resource specified by ResourceName couldn't be found: azurerm_linux_virtual_machine_scale_set.test
    testing.go:745: Error destroying resource! WARNING: Dangling resources
        may exist. The full state and error is shown below.

        Error: errors during apply: Error deleting Subnet "internal" (Virtual Network "acctestnw-201113083726607487" / Resource Group "acctestRG-201113083726607487"): network.SubnetsClient#Delete: Failure sending request: StatusCode=400 -- Original Error: Code="InUseSubnetCannotBeDeleted" Message="Subnet internal is in use by /subscriptions/594db79d-c34f-4d4b-9bed-438dd7fa3697/resourceGroups/acctestRG-201113083726607487/providers/Microsoft.Network/networkInterfaces/|providers|Microsoft.Compute|virtualMachineScaleSets|acctestvmss-201113083726607487|virtualMachines|0|networkInterfaces|example/ipConfigurations/internal and cannot be deleted. In order to delete the subnet, delete all the resources within the subnet. See aka.ms/deletesubnet." Details=[]

        State: azurerm_resource_group.test:
          ID = /subscriptions/594db79d-c34f-4d4b-9bed-438dd7fa3697/resourceGroups/acctestRG-201113083726607487
          provider = provider.azurerm
          location = westeurope
          name = acctestRG-201113083726607487
          tags.% = 0
        azurerm_subnet.test:
          ID = /subscriptions/594db79d-c34f-4d4b-9bed-438dd7fa3697/resourceGroups/acctestRG-201113083726607487/providers/Microsoft.Network/virtualNetworks/acctestnw-201113083726607487/subnets/internal
          provider = provider.azurerm
          address_prefix = 10.0.2.0/24
          address_prefixes.# = 1
          address_prefixes.0 = 10.0.2.0/24
          delegation.# = 0
          enforce_private_link_endpoint_network_policies = false
          enforce_private_link_service_network_policies = false
          name = internal
          resource_group_name = acctestRG-201113083726607487
          service_endpoints.# = 0
          virtual_network_name = acctestnw-201113083726607487
        azurerm_virtual_network.test:
          ID = /subscriptions/594db79d-c34f-4d4b-9bed-438dd7fa3697/resourceGroups/acctestRG-201113083726607487/providers/Microsoft.Network/virtualNetworks/acctestnw-201113083726607487
          provider = provider.azurerm
          address_space.# = 1
          address_space.0 = 10.0.0.0/16
          bgp_community =
          ddos_protection_plan.# = 0
          dns_servers.# = 0
          guid = 527761f0-7360-42f6-93f1-c85665b19b4c
          location = westeurope
          name = acctestnw-201113083726607487
          resource_group_name = acctestRG-201113083726607487
          subnet.# = 1
          subnet.1005719214.address_prefix = 10.0.2.0/24
          subnet.1005719214.id = /subscriptions/594db79d-c34f-4d4b-9bed-438dd7fa3697/resourceGroups/acctestRG-201113083726607487/providers/Microsoft.Network/virtualNetworks/acctestnw-201113083726607487/subnets/internal
          subnet.1005719214.name = internal
          subnet.1005719214.security_group =
          tags.% = 0
          vm_protection_enabled = false

          Dependencies:
            azurerm_resource_group.test
FAIL
FAIL    github.com/terraform-providers/terraform-provider-azurerm/azurerm/internal/services/compute/tests   96.345s
FAIL

Expected Behaviour

The VMSS resource (and potentially attached and failing instances) should still be stored in the state file so that the delete step works

Actual Behaviour

The VMSS and attached instances are still present in resources list and should be deleted manually

Steps to Reproduce

See above in debug output section

References

None found

obourdon commented 3 years ago

My take is that any failing scenario on VMSS deployment will behave the same way. For instance when specifying a probe for which condition will never be successful (missing probe service, ...).

If I am right about this,I will also add a test scenario to check this case.

obourdon commented 3 years ago

After further checking, seems like failing probes do not trigger this issue as in the debug traces one can see:

GET /subscriptions/SUBSCRIPTION_ID/providers/Microsoft.Compute/locations/westeurope/operations/4b7dbe56-a2e6-472c-85ee-b32a19961f6f?api-version=2020-06-01 HTTP/1.1^M
Host: management.azure.com^M
User-Agent: Go/go1.13.15 (amd64-darwin) go-autorest/v14.2.1 Azure-SDK-For-Go/v48.1.0 compute/2020-06-01 HashiCorp Terraform/0.12.7-sdk (+https://www.terraform.io) Terraform Plugin SDK/1.13.1 terraform-provider\
\
-azurerm/acc pid-222c6c49-1b0a-5959-a213-6608f9eb8820^M
X-Ms-Correlation-Request-Id: 664d93e2-55b3-7f4f-bbe1-9a2b4aedb1a3^M
Accept-Encoding: gzip^M
^M

2020/11/13 14:09:17 [DEBUG] AzureRM Response for https://management.azure.com/subscriptions/SUBSCRIPTION_ID/providers/Microsoft.Compute/locations/westeurope/operations/4b7dbe56-a2e6-472c-8\
5ee-b32a19961f6f?api-version=2020-06-01:
HTTP/2.0 200 OK^M
Cache-Control: no-cache^M
Content-Type: application/json; charset=utf-8^M
Date: Fri, 13 Nov 2020 13:09:16 GMT^M
Expires: -1^M
Pragma: no-cache^M
Server: Microsoft-HTTPAPI/2.0^M
Server: Microsoft-HTTPAPI/2.0^M
Strict-Transport-Security: max-age=31536000; includeSubDomains^M
Vary: Accept-Encoding^M
X-Content-Type-Options: nosniff^M
X-Ms-Correlation-Request-Id: 664d93e2-55b3-7f4f-bbe1-9a2b4aedb1a3^M
X-Ms-Ratelimit-Remaining-Resource: Microsoft.Compute/GetOperation3Min;14996,Microsoft.Compute/GetOperation30Min;29974^M
X-Ms-Ratelimit-Remaining-Subscription-Reads: 11951^M
X-Ms-Request-Id: 250107be-5c77-471c-85a6-4a616d75b458^M
X-Ms-Routing-Request-Id: FRANCECENTRAL:20201113T130917Z:6fd4b516-5f49-4ee2-9370-ebed448a81bd^M
^M
{^M
  "startTime": "2020-11-13T13:07:05.7142297+00:00",^M
  "endTime": "2020-11-13T13:08:27.2772214+00:00",^M
  "status": "Succeeded",^M
  "name": "4b7dbe56-a2e6-472c-85ee-b32a19961f6f"^M
}
2020/11/13 14:09:17 [DEBUG] Virtual Machine Scale Set "acctestvmss-201113140648857075" (Resource Group "acctestRG-201113140648857075") was created

and therefore the TF state file contains the associated resource which can therefore be destroyed accordingly (even though the instance state is properly diagnosed as UNHEALTHY)

tombuildsstuff commented 3 years ago

hey @obourdon

Thanks for opening this issue.

As mentioned in #9316 - unfortunately this behaviour is by design, since the Azure API uses a ProvisioningState of "Failed" to mean multiple things, up to and including "this resource will eventually recover". Whilst that's unfortunate - to ensure consistency during failure conditions, we require that operators to determine if it's safe to determine how's best to proceed in their environment - since the Azure API doesn't provide enough context to be able to determine this automatically - and again this would likely change on a per-operator basis.

Since this behaviour is unfortunately by design due to the behaviour of the Azure API - whilst I'd like to thank you for opening this issue (and the associated PR), I'm going to close this issue for the moment - but a more detailed explanation is available in #9316.

Thanks!

ghost commented 3 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error πŸ€– πŸ™‰ , please reach out to my human friends πŸ‘‰ hashibot-feedback@hashicorp.com. Thanks!