Open egorshulga opened 1 year ago
While testing on another environment I also spotted another resources that do not settle after terraform apply
:
# azurerm_mssql_server_extended_auditing_policy.auditingSettings must be replaced
-/+ resource "azurerm_mssql_server_extended_auditing_policy" "auditingSettings" {
~ id = "/subscriptions/.../resourceGroups/ResourceGroup/providers/Microsoft.Sql/servers/sql-server/extendedAuditingSettings/Default" -> (known after apply)
~ server_id = "/subscriptions/.../resourceGroups/Resourcegroup/providers/Microsoft.Sql/servers/sql-server" -> "/subscriptions/.../resourceGroups/ResourceGroup/providers/Microsoft.Sql/servers/sql-server" # forces replacement
# (5 unchanged attributes hidden)
}
# module.houseProd.azurerm_linux_web_app.app_service will be updated in-place
~ resource "azurerm_linux_web_app" "app_service" {
id = "/subscriptions/.../resourceGroups/ResourceGroup/providers/Microsoft.Web/sites/app-service"
name = "app-service"
~ service_plan_id = "/subscriptions/.../resourceGroups/Resourcegroup/providers/Microsoft.Web/serverfarms/app-service-plan" -> "/subscriptions/.../resourceGroups/ResourceGroup/providers/Microsoft.Web/serverfarms/app-service-plan"
tags = {}
# (18 unchanged attributes hidden)
# (3 unchanged blocks hidden)
}
Oh, on yet another environment I also discovered this unsettling thing (luckily, that's the second to last env 😅)
# azurerm_subnet_network_security_group_association.subnet1-nsg[0] must be replaced
-/+ resource "azurerm_subnet_network_security_group_association" "subnet1-nsg" {
~ id = "/subscriptions/.../resourceGroups/ResourceGroup/providers/Microsoft.Network/virtualNetworks/vnet/subnets/subnet1" -> (known after apply)
~ network_security_group_id = "/subscriptions/.../resourceGroups/Resourcegroup/providers/Microsoft.Network/networkSecurityGroups/nsg" -> "/subscriptions/.../resourceGroups/ResourceGroup/providers/Microsoft.Network/networkSecurityGroups/nsg" # forces replacement
# (1 unchanged attribute hidden)
}
Hello Egor and @tombuildsstuff. I'd like to report that we, too, are seeing this, and cannot resolve it with mere "matching case".
In our case, it's a NSG Association that is unsettled. Here's the PLAN (edited to anonymize, but to illustrate the issue):
# azurerm_subnet_network_security_group_association.sql_subnet_assignment must be replaced
-/+ resource "azurerm_subnet_network_security_group_association" "sql_subnet_assignment" {
~ id = "/subscriptions/<id>/resourceGroups/SOME-Project-EUS2-DEV-RG/providers/Microsoft.Network/virtualNetworks
/SOME-Project-EUS2-DEV-VNET_10.432.411.0_22/subnets/some-project-dev-eus2-sql-subnet" -> (known after apply)
~ subnet_id = "/subscriptions/<id>/resourceGroups/some-project-eus2-dev-rg/providers/Microsoft.Network
/virtualNetworks/SOME-Project-EUS2-DEV-VNET_10.432.411.0_22/subnets/some-project-dev-eus2-sql-subnet"
-> "/subscriptions/<id>/resourceGroups/SOME-Project-EUS2-DEV-RG/providers/Microsoft.Network
/virtualNetworks/SOME-Project-EUS2-DEV-VNET_10.432.411.0_22/subnets/some-project-dev-eus2-sql-subnet" # forces replacement
# (1 unchanged attribute hidden)
}
Note that it wants to change the case of the resource group name in the subnet_id from all-lower-case to the ORIGINAL-Weird-CASING that our Cloud Enablement team created.
From: .../resourceGroups/some-project-eus2-dev-rg/providers To: .../resourceGroups/SOME-Project-EUS2-DEV-RG/providers
This fails on apply (because the NSG is still associated with a SQL Managed Instance):
2023-06-14T21:28:07.7909957Z Error: removing Network Security Group Association from Subnet:
(Name "some-project-dev-eus2-sql-subnet" /
Virtual Network Name "SOME-Project-EUS2-DEV-VNET_10.432.411.0_22" /
Resource Group "SOME-Project-EUS2-DEV-RG"):
network.SubnetsClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error:
Code="NetworkSecurityGroupCannotBeRemovedDueToNipOnSubnet"
Message="Network security group cannot be removed from subnet
/subscriptions/<id>/resourceGroups/SOME-Project-EUS2-DEV-RG/providers/Microsoft.Network
/virtualNetworks/SOME-Project-EUS2-DEV-VNET_10.432.411.0_22/subnets/some-project-dev-eus2-sql-subnet
because it has network intent policy
/subscriptions/<id>/resourceGroups/SOME-Project-EUS2-DEV-RG/providers/Microsoft.Network
/networkIntentPolicies/mi_default_<guid_and_ip_address> applied."
Details=[]
2023-06-14T21:28:07.7911487Z
2023-06-14T21:28:07.8257005Z ##[error]Bash exited with code '1'.
2023-06-14T21:28:07.8269639Z ##[section]Finishing: Terraform Apply
Now, my position is that this ought not be being replaced (and that Egor's PR 21115 would have corrected this).
Now, I've edited directly into the state files, and determined that what we have stored is the CORRECT-Weird-CASING. I wish I could insert an image to show you, but you'll have to just take my word for it that the following is a copy/paste from the state file, straight out of Azure Blob Storage where it is stored (again, edited only for anonymity... but I didn't change the case of the RG name, other than to anonymize it):
{
"mode": "managed",
"type": "azurerm_subnet",
"name": "sql_subnet",
"provider": "provider[\"registry.terraform.io/hashicorp/azurerm\"]",
"instances": [
{
"schema_version": 0,
"attributes": {
"address_prefixes": [
"10.432.411.192/27"
],
"delegation": [
{
"name": "sqlDelegation",
"service_delegation": [
{
"actions": [
"Microsoft.Network/virtualNetworks/subnets/join/action",
"Microsoft.Network/virtualNetworks/subnets/prepareNetworkPolicies/action",
"Microsoft.Network/virtualNetworks/subnets/unprepareNetworkPolicies/action"
],
"name": "Microsoft.Sql/managedInstances"
}
]
}
],
"enforce_private_link_endpoint_network_policies": true,
"enforce_private_link_service_network_policies": false,
"id": "/subscriptions/<id>/resourceGroups/SOME-Project-EUS2-DEV-RG/providers/Microsoft.Network
/virtualNetworks/SOME-Project-EUS2-DEV-VNET_10.432.411.0_22/subnets/some-project-dev-eus2-sql-subnet",
"name": "some-project-dev-eus2-sql-subnet",
"private_endpoint_network_policies_enabled": false,
"private_link_service_network_policies_enabled": true,
"resource_group_name": "SOME-Project-EUS2-DEV-RG",
"service_endpoint_policy_ids": [],
"service_endpoints": [
"Microsoft.AzureActiveDirectory",
"Microsoft.Storage"
],
"timeouts": null,
"virtual_network_name": "SOME-Project-EUS2-DEV-VNET_10.432.411.0_22"
},
"sensitive_attributes": [],
"private": "<redacted>"
}
]
},
Note that the "id" of the sql_subnet is ALREADY in the CORRECT-Weird-CASING in the state file:
"id": "/subscriptions/
I find it significant that this is not something that we control, but is actually reading directly out of state -- it is just one "field" as part of the id of the subnet involved. If we go visit Azure Portal -> Virtual Networks -> this VNet -> This Subnet, we cannot even see the resource id for the subnet. That's buried, assigned by ARM at the time of resource creation.... and it is CORRECT in the state file.
But when "plan" comes along, something as part of the plan process has lower-cased it, and the plan THINKS that it has to be right-cased via replacement.... which can't happen due to the SQL Managed Instance behind it.
We really don't want to have to destroy and re-create this managed instance. It would take DAYS, between provisioning the new managed instance, and restoring backups.... without a certainty of outcome.
Can you please revisit this bug from the above perspective?
Thanks!
Jeff Woods Azure Architect Oaks, PA
@tombuildsstuff @egorshulga I see #22070 was recently also submitted, in response to #20138. That PR might also resolve this issue, or at least lead in that general direction. Tom assigned @mbfrahry back to this just six hours ago, after the original PR requestor made some requested changes. Here's hoping....
I am wondering if a change like the following, in subnet_network_security_group_association_resource.go on ~ line 178 might resolve it. It is based on the PR in 22070, but in a different module. I don't have a Go compiler installed (and couldn't anyway due to company policy), and don't know Go well enough to be confident in my usage of EqualFold.... but the essence should be the same.... if the originally passed in subnet ID is case-insensitive identical, just use the one passed in, which will bypass any case comparison issues that might cause a regeneration of the resource. If, however, there are "significant" differences (i.e. non-case-sensitive ones), set the subnet ID to the new one (which WOULD be indicative of it being moved, and needing a teardown and replacement).
// d.Set("subnet_id", resp.ID)
if (strings.EqualFold(id.ResourceGroup, resp.ID)) {
d.Set("subnet_id", props.ResourceGroup)
} else {
d.Set("subnet_id", resp.ID)
}
d.Set("network_security_group_id", securityGroup.ID)
return nil
}
@egorshulga FYI, I managed to get past my issue (I still believe its a bug) with a judiciously-placed:
lifecycle {
ignore_changes = [
subnet_id
]
}
It means I can't ever, ever change the underlying subnet... but the presence of the SQL managed instance in it guarantees that, since the MI applies the network intent policy to keep the subnet dedicated to nothing but the managed instance anyway, so while it is still a bug in AzureRM, I can live with it since I have no desire to tear down the SQL managed instance.
Still hope HashiCorp fixes it. In the meantime, you can probably circumvent casing issues (at least, resource group names embedded in resource ID paths) in the same manner.
Jeff Woods Azure Architect Oaks, PA
We were pretty much lucky, as unsettling resources were more or less transient, there were no data resources, so we went the way of dropping and recreating them. For example, for NSG Associations we recreated appropriate NSGs.
And that was the most huge risk in our migration to Terraform. We were keeping our eyes on the stuff we were recreating, so we wouldn't have dropped something that we must not drop
This is happening to me too right now, on a azurerm_private_endpoint
, seemingly due to the fact that the case of the resource ID of the azurerm_storage_account
it's referring to in the private_connection_resource_id
is being returned in a different case than the state contains.
I've sometimes been able to work around this by removing and reimporting resources with the altered case, but in this case I can't do that.
The only way I can see to "fix" this right now is to do an ugly case conversion on the fly.
@eehret I also have the exact same issue in private endpoints for my function apps during dns zone mapping. Here the ID keeps changing even after multiple apply. Im tired of doing these change so as a last resort i added in ignore_changes and i know this is the worst option because i can't change the dns zone later for private endpoints.
Please give some proper workarounds for this. Why azure and terraform are the worst combinations, So many things are outdated like LogicApps API Connections kind 2 latest api version is 2016 and like that many others.
I have started facing this issue while using the front door. The only change is
/subscriptions/sub_id/resourceGroups/resource_group_name/providers/Microsoft.Cdn/profiles/resource_name" /subscriptions/sub_id/resourcegroups/resource_group_name/providers/Microsoft.Cdn/profiles/resource_name"
and everytime its trying to replace the origin groups, origins and endpoints.
Is there an existing issue for this?
Community Note
Terraform Version
1.4.2
AzureRM Provider Version
3.47.0
Affected Resource(s)/Data Source(s)
azurerm_data_protection_backup_instance_blob_storage, azurerm_mssql_server_microsoft_support_auditing_policy
Terraform Configuration Files
(Related to existing resources, see description below)
Debug Output/Panic Output
Expected Behaviour
Terraform Plan should settle down. After running
terraform apply
subsequent calls toterraform plan
should not show any differences.Actual Behaviour
Terraform Plan does not settle down. Terraform detects some differences. For
azurerm_data_protection_backup_instance_blob_storage
it is just an update operation, forazurerm_mssql_server_microsoft_support_auditing_policy
it is even worse – it is a re-create operation.Steps to Reproduce
We are migrating from a really large and long-lived ARM template to terraform. We already have 5 environments that we must not re-create (this includes the Prod environment). We have ~200 resources in a NonProd environments and ~240 resources in Prod environments.
Initially, when we implemented the Terraform code and imported all of the resources into state, there were ~200 (effectively all of them) that Terraform wanted to re-create. We analyzed the reason, and we found that it is the Resource Group name which is causing this. It seems it is a well-known issue, that's been reported multiple times, that Azure almost for all of the resources is case-insensitive, while AzureRM is by default case-sensitive. The name of Resource Groups appears in IDs of resources, and a lot of resources simply parse the ID to get the name of their Resource Groups. Additionally, Resource Group names are not consistently stored across different resources in Azure (the best example would be LogAnalytics, which stores the name of its Resource Group lowercased – we observed it across different subscriptions, so we'd assume that this would be true for everyone using Azure).
We found a workaround of manipulating Resource IDs by fixing their Resource Group names on-the-fly while performing initial import. This reduced the amount of operations to ~60 updates (there are still some properties stored on the Azure side which have inconsistent casing) and ~80 re-creations (some of inconsistent properties stored on the Azure side can't be updated without resources re-creation). Luckily among the re-created resources there were none that we can't re-create (so, we are taking the risk of increased downtime during deployment to Prod, and this includes re-creation of these resources: WebTests, ActionGroups, Alerts, RoleAssignments, NGS Associations).
Unfortunately, even after applying workarounds and re-creating resources, we still observe unsettled Terraform Plan. We suspect it is Azure who is (a) refusing to update some properties if they differ only in casing and (b) decides on casing of RG's names for us, so neither we nor AzureRM provider running infrastructure provisioning for us can influence this.
In one of the previous issues I've seen a remark, that we can't easily decide on case-sensitivity, and we should be absolutely sure that this is how Azure behaves (and so it does not introduce issues for thousands of developers). I would like to leave this link to documentation: https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/resource-name-rules
If you would allow me, I would like to ask these questions (and use them to start a discussion):
(1) What is your view on the casing issues in general?
It seems that when AzureRM and Terraform is used from the very beginning of life of a Resource Group, then handling is much better. Numerous issues arise only when we migrate long-living existing infrastructure.
(2) It seems to be a tremendous amount of work required to identify all case-insensitive properties and make them like that. Although I spotted that some of the properties already use
DiffSuppressFunc: suppress.CaseDifference
, so I would assume that this work is already in progress. Would you have a plan to tackle all of the casing issues, or do you prefer going on case-by-case basis?(3) Would you kindly accept contributions to make properties, that are making troubles for us, case-insensitive?
Important Factoids
No response
References
No response