Open LukasNajman opened 1 year ago
Hi @LukasNajman is this comment helps on this issue? https://github.com/hashicorp/terraform-provider-azurerm/issues/4330#issuecomment-546018260
Similar to the azurerm_lb_backend_address_pool resource - Azure allows adding a VM to a LB's Backend Address Pool asynchronously during creation but during deletion the ordering matters unfortunately.
Hi @wuxu92, thanks for the comment. I am aware of it and can confirm, that adding an explicit dependency from the azurerm_network_interface_backend_address_pool_association
to azurerm_linux_virtual_machine
will solve the problem. Dependency in the inverse order will also work.
But still, I see that as a workaround with limitations. For example in my case, I am creating load balancer (and thus azurerm_network_interface_backend_address_pool_association
resource) in a module other than the virtual machines. To create the dependency, I would need to pass virtual machines as an input variable to load balancer module. Unfortunately, that will not work, as Terraform requires dependencies to be declared statically, not through variables.
I can declare dependency of the whole load balancer module on the VM module, and it will solve the problem too. But that does not seem right to me. Mainly, because it is not intuitive, and there is no way to enforce it.
From what I understand, the problem is that deletion of the VMs and BE pool association runs concurrently. And there are two possible outcomes to that:
VM is deleted before BE pool association. Than, I can see following error message in the logs.
{"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/virtualMachines/vm-test-8' under resource group 'azurerm-bug-repro' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"}}: timestamp=2023-01-17T12:53:29.835+0100
I supposed that may be because azurerm
is trying to modify the VM resource after the completion of the delete operation of the BE pool association. But it seems this error is ignored and does not cause the whole destroy operation to fail.
BE pool association is deleted before VM. In that case, we are getting the error from the bug report.
waiting for removal of Backend Address Pool Association for NIC "nic-test-6" (Resource Group "azurerm-bug-repro"): Code="OperationNotAllowed" Message="Operation 'startTenantUpdate' is not allowed on VM 'vm-test-6' since the VM is marked for deletion. You can only retry the Delete operation (or wait for an ongoing one to complete)." Details=[]
I believe, this is the azurerm
trying to modify the VM resource, but failing to do that because the resource is already marked for deletion. Hovewer, this error is not ignored and causes the whole operation to fail.
Apart from the described workaround, I see two solutions.
OperationNotAllowed
error as it is with ResourceNotFound
.Dependency graph without explicit dependency
Dependency graph with explicit dependency
Hi @LukasNajman thanks for raising this and for the additional suggestions. Usually we would be unable to fix this from the provider as the correct order of operations can only be effected by the dependency graph - and so where no implicit dependency is inferred you must explicitly create one.
However in this case it might be possible to parse the error and infer that the association is being deleted because the VM is undergoing deletion. To achieve this though will likely require use of our upcoming transport layer, which in turn will probably require the entire network
package be migrated and this will take some time.
Is there an existing issue for this?
Community Note
Terraform Version
1.3.7
AzureRM Provider Version
3.39.1
Affected Resource(s)/Data Source(s)
azurerm_linux_virtual_machine, azurerm_lb_backend_address_pool, azurerm_network_interface_backend_address_pool_association
Terraform Configuration Files
Debug Output/Panic Output
Expected Behaviour
Resources created using
terraform apply
should be destroyable withterraform destroy
.Actual Behaviour
The resource deletion fails with an error:
With 10 VMs, it failed 3 times out of 3. The problem does not happen with 3 VMs.
Adding explicit dependency from
azurerm_network_interface_backend_address_pool_association
toazurerm_linux_virtual_machine
helps. Hovewer, I consider this a workaround, not a fix, as it is not usable when VMs and load balancer are created in different modules.Steps to Reproduce
terraform apply terraform destroy
Important Factoids
Running in westeurope
References
https://github.com/hashicorp/terraform-provider-azurerm/issues/4330