Open thllxb opened 1 year ago
Hi @thllxb, thanks for opening the issue.
Does "retry" mean you re-runed terraform destroy?
In the log file, it stopped when the delete operation of azurerm_log_analytics_workspace failed, and the error is from the Azure service backend. I assume a service fluctuation and retry later would be helpful.
@ziyeqf
What I meant by retry
is terraform retry the delete automatically after the first 504 error.
From the debug log, you can see the first delete happened on line1 with the 504 error respond on line152. Then a second delete request was sent on line199 with a 404 respond on line212 confirming the resource was already gone. That looks like a retry to me. By now terraform was performing well.
What I think the issue is, after the second delete request got a 404 respond, it should just be done, but instead, terraform threw an error on line235 complaining The async operation failed
which I assume was from the 504 error. That is what needs to be fixed.
@thllxb Thank you for reply. And yes, you are right, it retied after the first request returned 504. Then, it failed because the 404 response. the DELETE request will return 200/202 normally. I assume the first request successfully deleted the resource although it returned 504, then the second request failed.
In this case manually re-running terraform destroy
works because it will refresh the state file.
@ziyeqf I agree that a second 'tf destroy' would work fine, but actually we don't have to. The 404 response is already a confirmation of the deletion even though it is a error code. Can we consider 404 acceptable for a delete request?
@thllxb Yes..I think for the delete operation in this scenario 404 is an acceptable response, but there is a little difference between 404 "resource not found" and 200 "delete successfully".
If we take 404 as success, then it might ignore the difference between the local state file and the real cloud state in other scenarios. I don't think the difference should be handled by a 404-returned delete operation.
@ziyeqf I'm not following what you mean exactly. There is nothing in the cloud after the 404. Why do we still care about remote state?
@thllxb Thanks for reporting this. It looks like the cause here is a broken API. Unfortunately there isn't much we can do about the API returning 504 whilst actually performing the delete request. We should correctly acknowledge a 404 but whenever a 5xx error is received we can't make assumptions about whether the request was processed or not.
@manicminer
whenever a 5xx error is received we can't make assumptions about whether the request was processed or not.
You are correct. 5xx should not mean any assumption but be followed with a retry which what it is already doing.
We should correctly acknowledge a 404
You are correct. That's exactly my ask in this ticket.
Thanks for clarifying. Given that the StatusCode
embedded in that error is 0
(zero) and not 404, it's likely that go-autorest is not handling this correctly and we should be able to improve our handling here once we moved moved away from go-autorest onto a different transport SDK (in the near future).
Is there an existing issue for this?
Community Note
Terraform Version
1.2.1
AzureRM Provider Version
3.28.0
Affected Resource(s)/Data Source(s)
azurerm_log_analytics_workspace
Terraform Configuration Files
Debug Output/Panic Output
Expected Behaviour
azurerm_log_analytics_workspace gets deleted by a retry, after the initial 504 gateway error, without throwing an error.
Actual Behaviour
azurerm_log_analytics_workspace got deleted by a retry, after the initial 504 gateway error, and STILL threw the 504 error.
Error: issuing AzureRM delete request for Log Analytics Workspaces 'tf-azure-aks-a34d7170-ci-TestAzureRBAC-eastus-law': performing Delete: workspaces.WorkspacesClient#Delete: Failure sending request: StatusCode=0 -- Original Error: Code="Failed" Message="The async operation failed."
Steps to Reproduce
Important Factoids
No response
References
No response