Open nimblenitin opened 1 year ago
The private endpoint is getting recreated because of the difference between the stored subnet_id and the received one (the part of the VNet name is coming in Upper case).
I did some testing and I was unable to reproduce the issue, even playing with lower and upper case names. Can you provide me with more details about the specific names?
It is annoying because is a case-sensitive issue, but I am not sure if it can be handled in this case directly in the private endpoint resource; that specific property has the ForceNew flag, and I don't see a way to make it ignore it
Still no luck :(
Please, go to the Azure portal and check two things: 1- Browse to the virtual network resource, and check if the name is ja-jm-JPL-OSSBSS-DevOps-Prod-vnet or ja-jm-jpl-ossbss-devops-prod-vnet 2- Browse to the private endpoint, and in the overview check if the Virtual network/subnet label says ja-jm-JPL-OSSBSS-DevOps-Prod-vnet/ja-jm-jpl-ossbss-devops-prod-non-dmz-snet or ja-jm-jpl-ossbss-devops-prod-vnet/ja-jm-jpl-ossbss-devops-prod-non-dmz-snet
Finally, are you using the refresh=false flag?
So it is ja-jm-jpl-ossbss-devops-prod-vnet/ja-jm-jpl-ossbss-devops-prod-non-dmz-snet. i.e the latter one for both points. I did not find the refresh flag anywhere. It is trying replace these with those in capital letters which is inaccurate. Not sure why.
I can not find any reason why the subnet data resource could return that inaccurate value, so I guess that this is the far I can go with this issue. The last long shoot that you could try is to force to "recreate" the data resource, either removing it from the state or changing the path.
I am seeing this as well. I have outputted the data field to a file and done various case sensitive string comparisons to the value stored in the tfstate file and can find no differences, and yet it wants to rebuild every time. However if I replace the data reference with a static string ie the output in the file, it doesn't want to rebuild. This seems to be a deterministic vs non deterministic issue.
resource "local_file" "test" {
content = data.azurerm_subnet.pe_snet.id
filename = "/tmp/data"
}
resource "azurerm_private_endpoint" "blob" {
count = var.pe_blob ? 1 : 0
provider = azurerm.pe
name = "${var.storage_main.name}-blobendpoint"
location = "North Europe"
resource_group_name = local.pe_endpoints.rg
subnet_id = "/subscriptions/XXXXX/resourceGroups/YYYY/providers/Microsoft.Network/virtualNetworks/VVVVVV/subnets/SSSSS"
#data.azurerm_subnet.pe_snet.id
tags = local.tags
private_service_connection {
...
data "azurerm_subnet" "pe_snet" {
provider = azurerm.pe
virtual_network_name = data.azurerm_virtual_network.pe_vnet.name
name = local.pe_endpoints.snet
resource_group_name = local.pe_endpoints.rg
}
Could it be its flagging the wrong thing that is forcing the change?
One thing to note is that i am doing this in a module not at the top level
$ terraform version
Terraform v1.3.1
on linux_amd64
+ provider registry.terraform.io/hashicorp/azurerm v3.37.0
+ provider registry.terraform.io/hashicorp/local v2.2.3
Still happens on this version
$ terraform version
Terraform v1.3.6
on linux_amd64
+ provider registry.terraform.io/hashicorp/azurerm v3.37.0
+ provider registry.terraform.io/hashicorp/local v2.2.3
This is something deeper, either in terraform or the azurerm provider as I have another instance of it. Again this is in a module. Basically my code calls a module, part of that it passes the resource group to the module. eg
module "storage" {
depends_on = [
azurerm_resource_group.main
]
providers = {
azurerm = azurerm
azurerm.corp = azurerm.pe
}
acl_default_action = "Allow"
allowed_ips = []
default_tags = var.default_tags
prod = true
sa_rg = azurerm_resource_group.main.name
source = "../shared/modules/Storage-PE"
storage_main = {
"name" = "ProdAcmeBot",
"tier" = "Standard",
"type" = "ZRS"
}
}
In the module i then data the rg to workout the location on where to put the SA eg
data "azurerm_resource_group" "main" {
name = var.sa_rg
}
resource "azurerm_storage_account" "main" {
name = replace(lower(var.storage_main.name), "/[^a-z0-9]/", "")
resource_group_name = data.azurerm_resource_group.main.name
location = data.azurerm_resource_group.main.location
account_tier = try(var.storage_main.tier, "Standard")
account_replication_type = try(var.storage_main.type, "ZRS")
account_kind = "StorageV2"
enable_https_traffic_only = try(var.storage_main.enable_https_traffic_only, true)
min_tls_version = "TLS1_2"
shared_access_key_enabled = var.shared_access_key_enabled
is_hns_enabled = var.is_hns_enabled
infrastructure_encryption_enabled = var.infrastructure_encryption_enabled
access_tier = try(var.storage_main.access_tier, var.access_tier)
#nfsv3_enabled = "true"
blob_properties {
delete_retention_policy {
days = var.delete_retention_policy
}
versioning_enabled = var.versioning_enabled
container_delete_retention_policy {
days = var.container_delete_retention_policy
}
}
network_rules {
default_action = var.acl_default_action
ip_rules = var.boot_diags_sa ? setunion(var.Boot_diags_ips, var.allowed_ips) : var.allowed_ips
virtual_network_subnet_ids = []
bypass = ["Logging", "Metrics", "AzureServices"]
}
tags = local.tags
}
However the code wants to rebuild all the time. Again statically defining it stops this rebuild. Using anything like a local or data ie a runtime variable forces it to rebuild. vars are fine, as they are filled in via the preprocessor not worked out at runtime
# module.storage.azurerm_storage_account.main must be replaced
-/+ resource "azurerm_storage_account" "main" {
~ id = "RADACT" -> (known after apply)
+ large_file_share_enabled = (known after apply)
~ location = "northeurope" -> (known after apply) # forces replacement
name = "RADACT"
Now that you mention this, I've experienced a similar situation using a data resource inside of a module (but with key vault). Moving the data out was my solution, and it is something that I always recommend.
That isnt a solution as what you are saying is that you cant programmatically determine a value of something inside a module, you have to pass the data, making the module no longer self contained
We have started to Notice this issue on our Azure terraform Builds which has become extremely annoying. Any progress here? @catriona-m
It is also a change in behavior and going back and modifying the estate of code to implement workarounds is none trivial.
I found a workaround:
lifecycle {
ignore_changes {
subnet_id, tags #I always add tags
}
}
Thats not a work around it's running away and not dealing with it. By lifecycling it you will never pick up any valid changes in the infra or code.
On Fri, 6 Jan 2023, 03:57 Andrej Rosic, @.***> wrote:
I found a workaround:
lifecycle { ignore_changes { subnet_id } }
— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1373112047, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XOKKYPROZ55HNT2VWDWQ6J33ANCNFSM6AAAAAAR2T2WUQ . You are receiving this because you commented.Message ID: @.***>
That isnt a solution as what you are saying is that you cant programmatically determine a value of something inside a module, you have to pass the data, making the module no longer self contained
I am aware that is not the solution, I just expressed that I use a different way of code that allows me to avoid this (and others) issues using a data inside of the same module that would consume it. Being said that, I agree that this situation must be fixed, so you can write your code as you want.
ignore_changes only ignores changes made on the Azure side, like the capitalization of a data resource or variable. It will make adjustments if you change your code https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#ignore_changes
Thats not a work around it's running away and not dealing with it. By lifecycling it you will never pick up any valid changes in the infra or code. … On Fri, 6 Jan 2023, 03:57 Andrej Rosic, @.> wrote: I found a workaround: lifecycle { ignore_changes { subnet_id } } — Reply to this email directly, view it on GitHub <#19200 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XOKKYPROZ55HNT2VWDWQ6J33ANCNFSM6AAAAAAR2T2WUQ . You are receiving this because you commented.Message ID: @.>
you could try passing the ids through terraform's upper/lower functions. Assuming Azure IDs are case-insensitive, it should still allow creating resources and also trick Terraform into not editing stuff when an ID with funny casing is returned by Azure/the Azure TF provider
ignore_changes only ignores changes made on the Azure side, like the capitalization of a data resource or variable. It will make adjustments if you change your code https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#ignore_changes
this is not true
still no movement i see
any updates from Microsoft on this issue?
it may be that Azure is case-insensitive (https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/resource-name-rules) and Terraform is case-sensitive. Can you try using a lower() when passing the parameter?
This issue has popped up elsewhere too: https://www.reddit.com/r/sysadmin/comments/xiknp2/terraform_azure_resource_not_case_sensitive/
I would open up a bug with Hashicorp
I have seen that, it's actually worse as you can input in one case and it comes out a different case when it's returned. However that's not the Cruz of the issue here, it's when the subnet is referenced via a variable or data when in a module. If you get the exact same string and define it as a static string in code it doesn't rebuild, so that isn't a case thing.
On Thu, 23 Feb 2023, 16:51 CGAndrej, @.***> wrote:
it may be that Azure is case-insensitive ( https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/resource-name-rules) and Terraform is case-sensitive.
I would open up a bug with Hashicorp
— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1442107104, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XNOYA75SAWXNTDXEJTWY6IQZANCNFSM6AAAAAAR2T2WUQ . You are receiving this because you commented.Message ID: @.***>
Hi all,
I work on Terraform Core rather than this provider, so I can't comment on the details of how the Azure API behaves here, but I found my way here after someone asked me how a provider can potentially deal with a situation like this, and so I figured I might as well leave my answer here in case it's useful to someone else too.
The azurerm_private_endpoint
resource type is implemented using this provider's own wrapper around the Terraform plugin SDK, but it imports (using a type alias) the schema.Schema
type from the central SDK, so I think what I'm about to describe should be workable in this provider, but I cannot be 100% sure.
The two SDK features relevant to problems like this are:
DiffSuppressFunc
: an optional callback function a provider can implement for each resource argument to define whether two values are functionally equivalent despite being non-equal.DiffSuppressOnRefresh
: tells the SDK to also apply the DiffSuppressFunc
to the result of refreshing the object from the remote API. This should typically always be set along with any new addition of DiffSuppressFunc
; this would ideally be the default behavior, but it isn't only because it was added relatively late in the SDK's life and we were concerned that it might cause unexpected behavior for existing providers whose DiffSuppressFunc
implementations might assume they run only during planning.The subnet_id
argument of azurerm_private_endpoint
does not currently set either of those:
If there is no definition of DiffSuppressFunc
then the SDK uses a default rule that requires the two strings to be exactly equal, including case-sensitivity. That default rule doesn't seem to be sufficient for this situation, so a possible way to fix it would be to add a DiffSuppressFunc
field to that schema definition which returns true
if the old and new values are equivalent, and then also set DiffSuppressOnRefresh: true
to make sure that same rule gets applied when the provider updates its records based on what's currently stored in the API.
Correctly implementing DiffSuppressFunc
will require first determining exactly what rules the remote API uses for case folding. The documentation linked earlier says that the API defines "alphanumeric" as including only the ASCII letters and digits, so perhaps an ASCII-only definition of case is sufficient if it's guaranteed that letters from other alphabets or letters with diacritics can never appear in these strings.
Looking elsewhere in the provider codebase I see that there's an existing function suppress.CaseDifference
which implements Unicode case folding. Unicode case folding should be strictly more complete than ASCII case folding and so that could be a sufficient implementation as long as the documentation is accurate that non-ASCII letters and letters with diacritics are never valid.
This "diff suppress" functionality takes precedence over ForceNew
, so if the given function returns true
then the SDK won't report to Terraform that the argument has changed or that the change requires replacing the object.
I can't promise that this is the whole story but I hope this will be a useful starting point if someone wanted to investigate this further!
What is most intriguing about all this in my case is that I am using the same variable string value for virtualNetworkRules
while deploying a Key Vault and a Storage Account (using azapi provider); they both use the exact same block to define the networkAcls
networkAcls = {
bypass = "Logging, Metrics, AzureServices"
defaultAction = "Deny"
ipRules = var.ip_rules
virtualNetworkRules = [
{
id = var.snet_id
}
]
}
Yet, it is only the Key Vault resource creation that exhibits this behavior repeatedly; even when I hard-code the string value, or reference it through a data mode.
So then I checked the JSON view
for the KV within Azure portal; it turns out that it is ARM that is causing this anomaly with camel-casing of RID for some unknown reason:
"virtualNetworkRules": [
{
"id": "/subscriptions/<>/resourcegroups/<>/providers/microsoft.network/virtualnetworks/<>/subnets/<>",
}
]
I have tried different API versions without any change in behavior. The issue points to Azure side of the equation!
I've come across a couple scenarios that seem related and quite interesting.
We have repositories of in-house modules and various repositories which utilise them. We have a private_endpoint
module which chiefly contains:
data "azurerm_subnet" "subnet" {
name = var.subnet_name
virtual_network_name = var.virtual_network_name
resource_group_name = var.virtual_network_resource_group_name
}
resource "azurerm_private_endpoint" "pe" {
name = join("-", [var.environment, var.application, var.label, "pe", var.location])
location = var.location
resource_group_name = var.resource_group_name
subnet_id = data.azurerm_subnet.subnet.id
// SNIP
}
And you call the module in another repository like:
module "private_endpoint_storage" {
source = "url"
label = "storage"
environment = var.environment
application = var.application
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
tags = local.tags
private_dns_zone_id = data.azurerm_private_dns_zone.zone_storage.id
private_connection_resource_id = module.storage.storage_id
subresource_names = ["blob"]
subnet_name = module.subnet.subnet_name
virtual_network_name = var.virtual_network_name
virtual_network_resource_group_name = var.vnet_resource_group_name
depends_on = [
module.subnet
]
}
A nightly pipeline that tests some of our modules with the latest provider version by running a plan, apply and destroy: building a subnet, storage account and sticks a private endpoint on the account. It looks like:
resource "azurerm_resource_group" "rg" {
name = join("-", [var.environment, var.application, "nightly", var.location])
location = var.location
tags = local.tags
}
module "subnet" {
// SNIP
}
module "storage" {
source = "url"
environment = var.environment
application = var.application
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
tags = local.tags
container_name = ["example-container"]
}
module "private_endpoint_storage" {
source = "url"
label = "storage"
environment = var.environment
application = var.application
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
tags = local.tags
private_dns_zone_id = data.azurerm_private_dns_zone.zone_storage.id
private_connection_resource_id = module.storage.storage_id
subresource_names = ["blob"]
subnet_name = module.subnet.subnet_name
virtual_network_name = var.virtual_network_name
virtual_network_resource_group_name = var.vnet_resource_group_name
depends_on = [
module.subnet
]
}
With this code base you can run a plan+apply on it over and over and it never changes anything, expected behaviour.
A repo that builds a private AKS cluster (with things like NSG, route table, managed identities, etc) and sticks a private endpoint over the AKS management plane, looks a bit like:
resource "azurerm_resource_group" "rg" {
// SNIP
}
module "subnet_aks" {
// SNIP
}
module "rt_aks" {
// SNIP
}
module "nsg" {
// SNIP
}
module "aks" {
// SNIP
}
module "private_endpoint_k8s" {
source = "url"
label = "k8s"
environment = var.environment
application = var.application
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
tags = local.tags
private_dns_zone_id = data.azurerm_private_dns_zone.zone_k8s.id
private_connection_resource_id = module.aks.aks_id
subresource_names = ["management"]
subnet_name = module.subnet_aks.subnet_name
virtual_network_name = var.virtual_network_name
virtual_network_resource_group_name = var.vnet_resource_group_name
depends_on = [
module.subnet_aks,
]
}
In this case even though we're using the same subnet and endpoint modules and without changing any code, every plan/apply produces a PE replacement due to subnet_id
:
# module.private_endpoint_k8s.data.azurerm_subnet.subnet will be read during apply
# (depends on a resource or a module with changes pending)
<= data "azurerm_subnet" "subnet" {
+ address_prefix = (known after apply)
+ address_prefixes = (known after apply)
+ enforce_private_link_endpoint_network_policies = (known after apply)
+ enforce_private_link_service_network_policies = (known after apply)
+ id = (known after apply)
+ name = "my-subnet-xxx"
+ network_security_group_id = (known after apply)
+ private_endpoint_network_policies_enabled = (known after apply)
+ private_link_service_network_policies_enabled = (known after apply)
+ resource_group_name = "my-rg-xxx"
+ route_table_id = (known after apply)
+ service_endpoints = (known after apply)
+ virtual_network_name = "my-vnet-xxx"
}
# module.private_endpoint_k8s.azurerm_private_endpoint.pe must be replaced
// SNIP
~ subnet_id = "/subscriptions/***/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx" # forces replacement -> (known after apply) # forces replacement
Very strange, the only real difference between the module calls across the scenarios is that we naturally target different dns zone IDs, the rest is effectively the same even though scenario 2 has a lot more going on with other modules.
I've checked the subnet ID in the portal, logs and statefiles and we always lowercase our vnet names, subnet names and RG names. The entire ID is exactly as should be expected but for some reason it thinks it's changed even though it hasn't.
I decided to run both scenarios with TF_LOG=DEBUG
and scenario 1 didn't really show anything meaningful but for scenario 2 this interesting message popped up:
[DEBUG] Resource state not found for node "module.private_endpoint_k8s.data.azurerm_subnet.subnet", instance module.private_endpoint_k8s.data.azurerm_subnet.subnet
This lead me to believe there might of been a weird and rare statefile/ID problem, but if I look through the statefile all seems well:
{
"module": "module.private_endpoint_k8s",
"mode": "data",
"type": "azurerm_subnet",
"name": "subnet",
"provider": "provider[\"registry.terraform.io/hashicorp/azurerm\"]",
"instances": [
{
// SNIP
"attributes": {
"id": "/subscriptions/xxx/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx",
"name": "my-subnet-xxx",
"virtual_network_name": "my-vnet-xxx"
},
// SNIP
Gut instinct makes me believe that if the data.azurerm_subnet
is removed from our PE module and instead when deploying the module signature looks more like:
module "subnet" {}
module "private_endpoint" {
subnet_id = module.subnet.id
// instead of supplying the vnet name, rg name and subnet name
}
So don't do a data lookup for it, this might go away - I'll refactor and try it out tomorrow. Very hard to figure out why scenario 1 is happy but 2 is sad. It seems like the data.azurerm_subnet
bit of the statefile is either being ignored or it isn't matching/being compared right to what the Azure API seems to return (which could be malformed on the Az api side)
I've come across a couple scenarios that seem related and quite interesting.
We have repositories of in-house modules and various repositories which utilise them. We have a
private_endpoint
module which chiefly contains:data "azurerm_subnet" "subnet" { name = var.subnet_name virtual_network_name = var.virtual_network_name resource_group_name = var.virtual_network_resource_group_name } resource "azurerm_private_endpoint" "pe" { name = join("-", [var.environment, var.application, var.label, "pe", var.location]) location = var.location resource_group_name = var.resource_group_name subnet_id = data.azurerm_subnet.subnet.id // SNIP }
And you call the module in another repository like:
module "private_endpoint_storage" { source = "url" label = "storage" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags private_dns_zone_id = data.azurerm_private_dns_zone.zone_storage.id private_connection_resource_id = module.storage.storage_id subresource_names = ["blob"] subnet_name = module.subnet.subnet_name virtual_network_name = var.virtual_network_name virtual_network_resource_group_name = var.vnet_resource_group_name depends_on = [ module.subnet ] }
Scenario 1
A nightly pipeline that tests some of our modules with the latest provider version by running a plan, apply and destroy: building a subnet, storage account and sticks a private endpoint on the account. It looks like:
resource "azurerm_resource_group" "rg" { name = join("-", [var.environment, var.application, "nightly", var.location]) location = var.location tags = local.tags } module "subnet" { // SNIP } module "storage" { source = "url" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags container_name = ["example-container"] } module "private_endpoint_storage" { source = "url" label = "storage" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags private_dns_zone_id = data.azurerm_private_dns_zone.zone_storage.id private_connection_resource_id = module.storage.storage_id subresource_names = ["blob"] subnet_name = module.subnet.subnet_name virtual_network_name = var.virtual_network_name virtual_network_resource_group_name = var.vnet_resource_group_name depends_on = [ module.subnet ] }
With this code base you can run a plan+apply on it over and over and it never changes anything, expected behaviour.
Scenario 2
A repo that builds a private AKS cluster (with things like NSG, route table, managed identities, etc) and sticks a private endpoint over the AKS management plane, looks a bit like:
resource "azurerm_resource_group" "rg" { // SNIP } module "subnet_aks" { // SNIP } module "rt_aks" { // SNIP } module "nsg" { // SNIP } module "aks" { // SNIP } module "private_endpoint_k8s" { source = "url" label = "k8s" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags private_dns_zone_id = data.azurerm_private_dns_zone.zone_k8s.id private_connection_resource_id = module.aks.aks_id subresource_names = ["management"] subnet_name = module.subnet_aks.subnet_name virtual_network_name = var.virtual_network_name virtual_network_resource_group_name = var.vnet_resource_group_name depends_on = [ module.subnet_aks, ] }
In this case even though we're using the same subnet and endpoint modules and without changing any code, every plan/apply produces a PE replacement due to
subnet_id
:# module.private_endpoint_k8s.data.azurerm_subnet.subnet will be read during apply # (depends on a resource or a module with changes pending) <= data "azurerm_subnet" "subnet" { + address_prefix = (known after apply) + address_prefixes = (known after apply) + enforce_private_link_endpoint_network_policies = (known after apply) + enforce_private_link_service_network_policies = (known after apply) + id = (known after apply) + name = "my-subnet-xxx" + network_security_group_id = (known after apply) + private_endpoint_network_policies_enabled = (known after apply) + private_link_service_network_policies_enabled = (known after apply) + resource_group_name = "my-rg-xxx" + route_table_id = (known after apply) + service_endpoints = (known after apply) + virtual_network_name = "my-vnet-xxx" } # module.private_endpoint_k8s.azurerm_private_endpoint.pe must be replaced // SNIP ~ subnet_id = "/subscriptions/***/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx" # forces replacement -> (known after apply) # forces replacement
Very strange, the only real difference between the module calls across the scenarios is that we naturally target different dns zone IDs, the rest is effectively the same even though scenario 2 has a lot more going on with other modules.
I've checked the subnet ID in the portal, logs and statefiles and we always lowercase our vnet names, subnet names and RG names. The entire ID is exactly as should be expected but for some reason it thinks it's changed even though it hasn't.
I decided to run both scenarios with
TF_LOG=DEBUG
and scenario 1 didn't really show anything meaningful but for scenario 2 this interesting message popped up:[DEBUG] Resource state not found for node "module.private_endpoint_k8s.data.azurerm_subnet.subnet", instance module.private_endpoint_k8s.data.azurerm_subnet.subnet
This lead me to believe there might of been a weird and rare statefile/ID problem, but if I look through the statefile all seems well:
{ "module": "module.private_endpoint_k8s", "mode": "data", "type": "azurerm_subnet", "name": "subnet", "provider": "provider[\"registry.terraform.io/hashicorp/azurerm\"]", "instances": [ { // SNIP "attributes": { "id": "/subscriptions/xxx/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx", "name": "my-subnet-xxx", "virtual_network_name": "my-vnet-xxx" }, // SNIP
Todo
Gut instinct makes me believe that if the
data.azurerm_subnet
is removed from our PE module and instead when deploying the module signature looks more like:module "subnet" {} module "private_endpoint" { subnet_id = module.subnet.id // instead of supplying the vnet name, rg name and subnet name }
So don't do a data lookup for it, this might go away - I'll refactor and try it out tomorrow. Very hard to figure out why scenario 1 is happy but 2 is sad. It seems like the
data.azurerm_subnet
bit of the statefile is either being ignored or it isn't matching/being compared right to what the Azure API seems to return (which could be malformed on the Az api side)
I can assure you that removing the data
from the module and just passing the ID will work.
Yep, i have switched to static references where ever i can. Some of the harder stuff i have even considered splitting the data logic and generating a json object so i can ingest it into the real tf job. All seems a little laborious for something that should just work
On Tue, 21 Nov 2023, 17:14 Lucas Fernández, @.***> wrote:
I've come across a couple scenarios that seem related and quite interesting.
We have repositories of in-house modules and various repositories which utilise them. We have a private_endpoint module which chiefly contains:
data "azurerm_subnet" "subnet" { name = var.subnet_name virtual_network_name = var.virtual_network_name resource_group_name = var.virtual_network_resource_group_name } resource "azurerm_private_endpoint" "pe" { name = join("-", [var.environment, var.application, var.label, "pe", var.location]) location = var.location resource_group_name = var.resource_group_name subnet_id = data.azurerm_subnet.subnet.id // SNIP }
And you call the module in another repository like:
module "private_endpoint_storage" { source = "url" label = "storage" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags private_dns_zone_id = data.azurerm_private_dns_zone.zone_storage.id private_connection_resource_id = module.storage.storage_id subresource_names = ["blob"] subnet_name = module.subnet.subnet_name virtual_network_name = var.virtual_network_name virtual_network_resource_group_name = var.vnet_resource_group_name depends_on = [ module.subnet ] }
Scenario 1
A nightly pipeline that tests some of our modules with the latest provider version by running a plan, apply and destroy: building a subnet, storage account and sticks a private endpoint on the account. It looks like:
resource "azurerm_resource_group" "rg" { name = join("-", [var.environment, var.application, "nightly", var.location]) location = var.location tags = local.tags } module "subnet" { // SNIP } module "storage" { source = "url" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags container_name = ["example-container"] } module "private_endpoint_storage" { source = "url" label = "storage" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags private_dns_zone_id = data.azurerm_private_dns_zone.zone_storage.id private_connection_resource_id = module.storage.storage_id subresource_names = ["blob"] subnet_name = module.subnet.subnet_name virtual_network_name = var.virtual_network_name virtual_network_resource_group_name = var.vnet_resource_group_name depends_on = [ module.subnet ] }
With this code base you can run a plan+apply on it over and over and it never changes anything, expected behaviour. Scenario 2
A repo that builds a private AKS cluster (with things like NSG, route table, managed identities, etc) and sticks a private endpoint over the AKS management plane, looks a bit like:
resource "azurerm_resource_group" "rg" { // SNIP }module "subnet_aks" { // SNIP }module "rt_aks" { // SNIP }module "nsg" { // SNIP }module "aks" { // SNIP }module "private_endpoint_k8s" { source = "url" label = "k8s" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags private_dns_zone_id = data.azurerm_private_dns_zone.zone_k8s.id private_connection_resource_id = module.aks.aks_id subresource_names = ["management"] subnet_name = module.subnet_aks.subnet_name virtual_network_name = var.virtual_network_name virtual_network_resource_group_name = var.vnet_resource_group_name depends_on = [ module.subnet_aks, ] }
In this case even though we're using the same subnet and endpoint modules and without changing any code, every plan/apply produces a PE replacement due to subnet_id:
module.private_endpoint_k8s.data.azurerm_subnet.subnet will be read during apply
(depends on a resource or a module with changes pending)
<= data "azurerm_subnet" "subnet" {
- address_prefix = (known after apply)
- address_prefixes = (known after apply)
- enforce_private_link_endpoint_network_policies = (known after apply)
- enforce_private_link_service_network_policies = (known after apply)
- id = (known after apply)
- name = "my-subnet-xxx"
- network_security_group_id = (known after apply)
- private_endpoint_network_policies_enabled = (known after apply)
- private_link_service_network_policies_enabled = (known after apply)
- resource_group_name = "my-rg-xxx"
- route_table_id = (known after apply)
- service_endpoints = (known after apply)
- virtual_network_name = "my-vnet-xxx" }
module.private_endpoint_k8s.azurerm_private_endpoint.pe must be replaced
// SNIP ~ subnet_id = "/subscriptions/***/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx" # forces replacement -> (known after apply) # forces replacement
Very strange, the only real difference between the module calls across the scenarios is that we naturally target different dns zone IDs, the rest is effectively the same even though scenario 2 has a lot more going on with other modules.
I've checked the subnet ID in the portal, logs and statefiles and we always lowercase our vnet names, subnet names and RG names. The entire ID is exactly as should be expected but for some reason it thinks it's changed even though it hasn't.
I decided to run both scenarios with TF_LOG=DEBUG and scenario 1 didn't really show anything meaningful but for scenario 2 this interesting message popped up:
[DEBUG] Resource state not found for node "module.private_endpoint_k8s.data.azurerm_subnet.subnet", instance module.private_endpoint_k8s.data.azurerm_subnet.subnet
This lead me to believe there might of been a weird and rare statefile/ID problem, but if I look through the statefile all seems well:
{ "module": "module.private_endpoint_k8s", "mode": "data", "type": "azurerm_subnet", "name": "subnet", "provider": "provider[\"registry.terraform.io/hashicorp/azurerm\ http://registry.terraform.io/hashicorp/azurerm%5C"]", "instances": [ { // SNIP "attributes": { "id": "/subscriptions/xxx/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx", "name": "my-subnet-xxx", "virtual_network_name": "my-vnet-xxx" }, // SNIP
Todo
Gut instinct makes me believe that if the data.azurerm_subnet is removed from our PE module and instead when deploying the module signature looks more like:
module "subnet" {}module "private_endpoint" { subnet_id = module.subnet.id// instead of supplying the vnet name, rg name and subnet name }
So don't do a data lookup for it, this might go away - I'll refactor and try it out tomorrow. Very hard to figure out why scenario 1 is happy but 2 is sad. It seems like the data.azurerm_subnet bit of the statefile is either being ignored or it isn't matching/being compared right to what the Azure API seems to return (which could be malformed on the Az api side)
I can assure you that removing the data from the module and just passing the ID will work.
— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1821333160, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XORRZ76IBV7E2XSZ3TYFTOO3AVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRRGMZTGMJWGA . You are receiving this because you commented.Message ID: @.***>
You must have an overwhelming complex scenario there... to me has been quite simple; if i am creating the subnet in the same deployment i already have the subnet_id from there, if not, i just add the data to read the id and then pass the value to the module (simple explanation, it is a little more complex the second case).
Not massively, but we are heavily private endpoint so that adds some fun. Its actually really easy to have a load of datas in a module, and write all the data to a json blob for ingestion in other places as an environment config. You just build your structure with a local. This completely side steps the issue as the jobs that ingest the object its all static data. It is really stupid to have to do it though.
On Tue, 21 Nov 2023, 21:05 Lucas Fernández, @.***> wrote:
You must have an overwhelming complex scenario there... to me has been quite simple; if i am creating the subnet in the same deployment i already have the subnet_id from there, if not, i just add the data to read the id and then pass the value to the module (simple explanation, it is a little more complex the second case).
— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1821678579, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XLBRLVPLIKIN5NLJULYFUJRRAVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRRGY3TQNJXHE . You are receiving this because you commented.Message ID: @.***>
I am experiencing the same behaviour in my Production environment for both an Azure Web App and PosgreSQL Server.
Private endpoints are getting replaced for both resources with each deploy, not optimal at all.
The best part is that in my Dev environment this is not the case! I use parametrised terraform scripts (basically the only difference in the scripts are the -dev
and -prod
suffixes) and I also use the subnet_id
with the direct reference to the already existing subnet.
Given I have exactly the same terraform scripts for both Dev and Prod subscriptions, I suspect it could be an issue with the terraform backend.
I have a config module not with all the subnet ids statically in to avoid this issue. You can of course construct the id from the subnet name, rg, vnet, and name. That is until they change the format.
On Thu, 21 Dec 2023, 15:29 FrancescoCipolla-TomTom, < @.***> wrote:
I am experiencing the same behaviour in my Production environment for both an Azure Web App and PosgreSQL Server. Private endpoints are getting replaced for both resources with each deploy, not optimal at all. The best part is that in my Dev environment this is not the case! I use parametrised terraform scripts (basically the only difference in the script are the -dev and -prod suffixes) and I also use the subnet_id with the direct reference to the already existing subnet. Given I have exactly the same terraform scripts for both Dev and Prod subscriptions, I suspect it could be an issue with the terraform backend.
— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1866493212, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XP3ENIW6L23I7KFUXDYKRIXLAVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWGQ4TGMRRGI . You are receiving this because you commented.Message ID: @.***>
It seems that some devs were able to avoid this situation by using a alternative ways of assigning the subnet_id
.
Not sure if I misunderstood the advices, but basically I tried removing the terraform data
reference of the subnet to use (already existing in the Azure subscription). Instead of that I tried with a fixed string when providing the private endpoint's subnet_id
reference.
Just to have a clearer idea with code snippets:
I removed this code block:
data "azurerm_subnet" "my-subnet" {
name = "my-subnet"
virtual_network_name = data.azurerm_virtual_network.my-default-vnet.name
resource_group_name = data.azurerm_virtual_network.my-default-vnet.resource_group_name
}
which was referenced like this:
resource "azurerm_private_endpoint" "my-private-endpoint" {
name = "my-private-endpoint"
subnet_id = data.azurerm_subnet.my-subnet.id
...
}
And instead went with:
resource "azurerm_private_endpoint" "my-private-endpoint" {
name = "my-private-endpoint"
subnet_id = "/subscriptions/<MY-SUBSCRIPTION-ID>/resourceGroups/<MY-VNET-RG>/providers/Microsoft.Network/virtualNetworks/my-default-vnet/subnets/my-subnet"
...
}
Unfortunately it did not fix the issue for me (using Terraform v3.74.0).
The code I have is really simple as well, no modules defined because I'm going with a more granular approach by initialising different terraform providers (with separate backends) depending on the infrastructure component.
Basically, for each "macro component" I want to deploy, I define different data.tf
, variables.tf
, main.tf
and provider.tf
files.
Are there any other options/suggestions worth trying?
it did for us. You do have to 100% spot on with the casing though as azure and the provider are quite fussy and not consistent.
Double check everything in your state with
terraform state show
eg
terraform state show module.storage.azurerm_private_endpoint.main
On Thu, 4 Jan 2024 at 13:44, Francesco Cipolla @.***> wrote:
It seems that some people were able to avoid this situation by using a different way of assigning the subnet_id. Not sure if I misunderstood the advices, but basically I tried removing the terraform data reference of the subnet it, and instead use a fixed when providing the private endpoint's subnet_id reference. Just to have a better idea with code snippets:
I removed this code block:
data "azurerm_subnet" "my-subnet" { name = "my-subnet-name" virtual_network_name = data.azurerm_virtual_network.my-default-vnet.name resource_group_name = data.azurerm_virtual_network.my-default-vnet.resource_group_name }
which was referenced like this:
resource "azurerm_private_endpoint" "hapt-flow-private-endpoint" { name = "my-private-endpoint" subnet_id = data.azurerm_subnet.my-subnet.id ... }
And instead went with:
resource "azurerm_private_endpoint" "hapt-flow-private-endpoint" { name = "my-private-endpoint" subnet_id = "/subscriptions/
/resourceGroups/my-default-vnet.resource_group_name/providers/Microsoft.Network/virtualNetworks/my-default-vnet/subnets/my-subnet-name" ... } It did not fix the issue. The code I have is really simple as well, no modules defined because I'm going with a more granular approach which has defines different terraform providers (with separate backends) depending on the infrastructure component (basically I have plain old data.tf, variables.tf, main.tf and provider.tf files). Are there any other options worth trying?
— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1877116987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XNJBF623FNBCVYUFILYM2W2HAVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGEYTMOJYG4 . You are receiving this because you commented.Message ID: @.***>
@kraduk after your suggestion I went back and checked the tfstate
files for all my resources in each environment. Eventually I did find out that while our Dev and Prod subnet_id
definition was semantically the same, there was indeed a difference in the character casings for 2 characters (would be interesting to know why this difference appeared in the first place!).
Anyways, I changed again to the implementation I mentioned in my comment above, but this time adding as well the correct casing for Prod environment and it fixed the issue. Thanks a lot for your suggestion!
I have a suggestion and that is to remove any dependencies in depends_on
. Instead rely on whatever variable the dependency is outputting.
This solved it for me when my azurerm_private_endpoint
got recreated for apparently no reason. In my case I had a for_each
loop in conjunction with a depends_on
, so it might not be the exact behaviour as you have, but worth a try.
It's not that at all I afraid as I have no dependencies other than implicit ones. It's data's that do it as they are non deterministic at compile time for TF
On Thu, 14 Mar 2024, 13:51 Rickard Cardell, @.***> wrote:
I have a suggestion and that is to remove any dependencies in depends_on. Instead rely on whatever variable the dependency is outputting. This solved it for me when my azurerm_private_endpoint got recreated for apparently no reason. In my case I had a for_each loop in conjunction with a depends_on, so it might not be the exact behaviour as you have, but worth a try.
— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1997510471, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XOMB7MBW324TLYSENLYYGTPHAVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGUYTANBXGE . You are receiving this because you were mentioned.Message ID: @.***>
I have a suggestion and that is to remove any dependencies in
depends_on
. Instead rely on whatever variable the dependency is outputting. This solved it for me when myazurerm_private_endpoint
got recreated for apparently no reason. In my case I had afor_each
loop in conjunction with adepends_on
, so it might not be the exact behaviour as you have, but worth a try.
Thanks for this - it worked for me too
Are we saying here that data "azurerm_subnet" is not usable in its current form unless we are happy that it periodically forces replacement?
A typical use case is when the subnet is not created in the same Terraform deployment.
Is the only alternative to set a variable with the resourceid?
Is there an existing issue for this?
Community Note
Terraform Version
1.3.4
AzureRM Provider Version
3.19. 0
Affected Resource(s)/Data Source(s)
azurerm_private_endpoint
Terraform Configuration Files
Debug Output/Panic Output
Expected Behaviour
this resource is already created it should not replace on reapply
Actual Behaviour
It is replacing the resource.
Steps to Reproduce
terraform plan terraform apply terraform plan terraform apply
Important Factoids
No response
References
No response