Closed J0F3 closed 10 minutes ago
Thank you @J0F3 for reporting this APIM issue. It seems different from the creation provisioning poller issue since the delete doesn't appear to be a long-running operation at first glance. However, I'm not certain yet and will investigate further when I have the chance.
Thank you @wuxu92 .
What I can say form my troubleshooting steps is that the API definitely returns 202 with the location
header which contents the url with the asyncId
and asyncCode
query parameters. So, this seems to be the same principle as for the Create/Update on the polling from https://github.com/hashicorp/go-azure-sdk/pull/1090 should work. But maybe the resource itself needs some update, so it recognizes the delete also as a long running operation where the polling is needed.
What I also noticed is that it really takes some time (approx. 1sec in my case but is probably depending on the size/complexity of th api) until the api really gone on the APIM. That's also what actually causes the issue (because the GET request come to early).
May this help to further investigations.
Thank you @wuxu92 . What I can say form my troubleshooting steps is that the API definitely returns 202 with the
location
header which contents the url with theasyncId
andasyncCode
query parameters. So, this seems to be the same principle as for the Create/Update on the polling from hashicorp/go-azure-sdk#1090 should work. But maybe the resource itself needs some update, so it recognizes the delete also as a long running operation where the polling is needed.What I also noticed is that it really takes some time (approx. 1sec in my case but is probably depending on the size/complexity of th api) until the api really gone on the APIM. That's also what actually causes the issue (because the GET request come to early).
May this help to further investigations.
Well with that said I cannot reproduce 202 answers with the api version "2022-08-01" anymore 😵. I get always just a 200. But when I use the api version "2024-06-01-preview" for example I get the 202 responses from the API. So, it seems tobe api version related.
@J0F3 Yes, the latest stable version 2024-05-01
added the 202
response: https://github.com/Azure/azure-rest-api-specs/blob/b9e65c8997ce097af3f773a48d2ea2e0535f3cca/specification/apimanagement/resource-manager/Microsoft.ApiManagement/stable/2024-05-01/apimapis.json#L458. This issue can be resolved once AzureRM upgrades its SDK: https://github.com/hashicorp/go-azure-sdk/blob/25c3e8aca56e73ede8722b5183526a9e1ef1addf/resource-manager/apimanagement/2024-05-01/api/method_delete.go#L91. However, as you mentioned in your investigation, the asyncId
parameter is also required, so the go-azure-sdk#1090 may still be a dependency.
Yes, but something seems to be strange with the api version 2024-05-01 also. It is the latest stable version which is listed in the documentation. But I cannot use this version in actual calls. The latest version wich worked for me is "2024-06-01-preview". When using "2024-05-01" I get "API version query parameter is not specified or was specified incorrectly. Supported versions: 2014-02-14-preview,2014-02-14,2015-09-15,2016-07-07,2016-10-10,2017-03-01,2018-01-01,2018-06-01-preview,2019-01-01,2019-12-01-preview,2019-12-01,2020-06-01-preview,2020-12-01,2021-01-01-preview,2021-04-01-preview,2021-08-01,2021-12-01-preview,2022-04-01-preview,2022-08-01,2022-09-01-preview,2023-03-01-preview,2023-05-01-preview,2023-09-01-preview,2024-06-01-preview"
This is a known issue with the API version. The service team provided "an estimated completion of the fix in the second week of January." We need a custom poller in the provider before the service team fixes it.
Ah ok, good to know. Thx!
@J0F3 I tested this configuration on my machine and found that the issue is not caused by an API problem, but by the overlapping order of API deletion and additional_api creation.
When we change local.additional_version
from false to true, azurerm_api_management_api.api
will be replaced, meaning it will be destroyed and then created. At the same time, azurerm_api_management_api.additional_api[0]
will also be created. Since there is no dependency between api and additional_api, their destruction and creation processes run in parallel. When calling for the creation of additional_api, the destruction of api
might not have started yet or could still be ongoing, causing an error to occur.
@wuxu92 Thank you.
Yes, you are right. I was looking at the debug log, again and I saw too that the recreation of azurerm_api_management_api.api
and the creation of azurerm_api_management_api.additional_api[0]
overlaps. It is not very good visible in the log but the destroy action gets actually finished after the "already exists" error happend.
So, this basically means that it cannot be fixed in the provider itself, right? Instead, it must be ensured by an explicit depends_on
in the configuration that the recreation and creation of the additional api version do run in sequence.
Or do you have any other suggestion for fixing the error?
Thx!
@J0F3 Yes, the only thing I can think of is to add the depends_on
meta argument to sequence the resources. However, I encountered issue of #23322 after adding depends_on
, but we can track this in that issue.
I tested it quickly by adding depends_on = [ azurerm_api_management_api.api ]
to the "additional_api" resource and that seems to fix it and the apply runs through without any error.
So, the full config would look now like this:
provider "azurerm" {
resource_provider_registrations = "none"
features {}
}
locals {
apim_rg = "test-rg"
apim_name = "test-apim"
additional_version = false
}
resource "azurerm_api_management_api_version_set" "additional_version" {
name = local.apim_name
api_management_name = local.apim_name
resource_group_name = local.apim_rg
display_name = "replace-api-bug-repro"
versioning_scheme = "Header"
version_header_name = "Version"
}
resource "azurerm_api_management_api" "api" {
name = local.additional_version == true ? "replace-api-bug-repro-cloud" : "replace-api-bug-repro"
resource_group_name = local.apim_rg
api_management_name = local.apim_name
revision = "1"
display_name = "replace-api-bug-repro - ${local.additional_version ? "multi version" : "single version"}"
path = "test/api"
protocols = ["https"]
import {
content_format = "openapi"
content_value = file("${path.module}/openapi.yaml")
}
version_set_id = local.additional_version ? azurerm_api_management_api_version_set.additional_version.id : null
version = local.additional_version ? "cloud" : null
}
resource "azurerm_api_management_api" "additional_api" {
count = local.additional_version ? 1 : 0
name = "replace-api-bug-repro"
resource_group_name = local.apim_rg
api_management_name = local.apim_name
revision = "1"
display_name = "replace-api-bug-repro - ${local.additional_version ? "multi version" : "single version"}"
path = "test/api"
protocols = ["https"]
import {
content_format = "openapi"
content_value = file("${path.module}/openapi.yaml")
}
version_set_id = azurerm_api_management_api_version_set.additional_version.id
version = null
depends_on = [ azurerm_api_management_api.api ]
}
In this particular test case, the error form https://github.com/hashicorp/terraform-provider-azurerm/issues/23322 did not occurred for me. But I am very aware of this issue. We have these errors almost on daily basis somewhere. 🙈
However, as this issue here can be fixed with the depends_on
and is not really an issue of the provider I am going to close this issue.
@wuxu92 Thank you very much for looking into it and for giving the hint where the cause actually is!
Is there an existing issue for this?
Community Note
Terraform Version
1.9.8
AzureRM Provider Version
4.10.0
Affected Resource(s)/Data Source(s)
azurerm_api_management_api
Terraform Configuration Files
Debug Output/Panic Output
https://gist.github.com/J0F3/5ec2d9c1d4425b871271b56a89a55ab5
Expected Behaviour
azurerm_api_management_api should be replaced as shown in
terraform plan
without any error.Actual Behaviour
The first time running
terraform apply
always fails with 'A resource with the ID ... already exist...'. The second time runningterraform apply
works then and completes the needed changes. But in a CI/CD pipeline this always leads to a broken pipeline and needs retries.The reason for that is similar to the reason of https://github.com/hashicorp/terraform-provider-azurerm/issues/23322 where azurerm does not follow correctly the polling URL of the Azure API where a long running operation should be tracked. Instead azurerm zu goes ahead and tries to create the api again (with the new setting) but then it finds the previous API for which the delete operation has not finished yet.
Here is the relevant part of the debug log:
Here it is clearly visible that the DELETE Call was made to the Azure API and then immediately after that the certation starts which find then API which is going to be deleted. It can identified by the "displayName" of the API which contains still "single version" which is the API which must be deleted. I added the string to the "displayName" by intent so that it can be tracked which version of the api is the one which actually triggers the conflict.
As you can see in the documentation of the Azure API the DELETE resource normally response with "202" and URLs where the actual progress can be tracked: https://learn.microsoft.com/en-us/rest/api/apimanagement/apis/delete?view=rest-apimanagement-2024-05-01&tabs=HTTP#apimanagementdeleteapi But this response seems to be ignored by azurem. So, it similar to what happens in https://github.com/hashicorp/terraform-provider-azurerm/issues/23322 for which PR with a fix is still pending: https://github.com/hashicorp/go-azure-sdk/pull/1090. So maybe this would also be fixed when the PR is merged. But I am not sure. @wuxu92 Can please confirm/verify that?
Steps to Reproduce
Apply the above configuration with
local.additional_version=false
Change
local.additional_version=false
tolocal.additional_version=true
. Plan show now that the api must be replaced which makes sense:Now when applying the changes (
terraform apply
) then it would fail with:The second time when
terraform apply
is run the apply succeeds then.Important Factoids
No response
References
I think this is also related to https://github.com/hashicorp/go-azure-sdk/pull/1090