hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.59k stars 4.63k forks source link

azurerm_private_endpoint is getting recreated #19200

Open nimblenitin opened 1 year ago

nimblenitin commented 1 year ago

Is there an existing issue for this?

Community Note

Terraform Version

1.3.4

AzureRM Provider Version

3.19. 0

Affected Resource(s)/Data Source(s)

azurerm_private_endpoint

Terraform Configuration Files

resource "azurerm_private_endpoint" "pe_blob" {
  #count               = var.deploy_private_endpoint == true ? 1 : 0 # IF var.deploy_private_endpoint IS equal to TRUE, then deploy Private Endpoint.
  name                = "${var.storage_account_name}-pe"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  subnet_id           = data.azurerm_subnet.readysn.id

  private_service_connection {
    name                           = "${var.storage_account_name}-pe-connection"
    is_manual_connection           = false
    private_connection_resource_id = azurerm_storage_account.storage_account.id
    subresource_names              = ["blob"]
  }
}

Debug Output/Panic Output

# azurerm_private_endpoint.pe_blob must be replaced
-/+ resource "azurerm_private_endpoint" "pe_blob" {
      ~ custom_dns_configs       = [] -> (known after apply)
      ~ id                       = "/subscriptions/xxxxxxxxxx/resourceGroups/xxx-prod-rg/providers/Microsoft.Network/privateEndpoints/xxxx-pe" -> (known after apply)
        name                     = "xxx-pe"
      ~ network_interface        = [
          - {
              - id   = "/subscriptions/xxx/resourceGroups/xxx-prod-rg/providers/Microsoft.Network/networkInterfaces/xxx-pe.nic.xxx"
              - name = "xxx-pe.nic.xxx-xxx"
            },
        ] -> (known after apply)
      ~ private_dns_zone_configs = [
          - {
              - id                  = "/subscriptions/xxx/resourceGroups/xxx-xxx-prod-rg/providers/Microsoft.Network/privateEndpoints/fhiostoreprod-pe/privateDnsZoneGroups/deployedByPolicy/privateDnsZoneConfigs/storageBlob-privateDnsZone"
              - name                = "storageBlob-privateDnsZone"
              - private_dns_zone_id = "/subscriptions/xxx/resourcegroups/private-dns-zones-rg/providers/microsoft.network/privatednszones/privatelink.blob.core.windows.net"
              - record_sets         = [
                  - {
                      - fqdn         = "xxx.privatelink.blob.core.windows.net"
                      - ip_addresses = [
                          - "xxx",
                        ]
                      - name         = "xxx"
                      - ttl          = 10
                      - type         = "A"
                    },
                ]
            },
        ] -> (known after apply)
      ~ subnet_id                = "/subscriptions/xxx/resourceGroups/xxx-prod-rg/providers/Microsoft.Network/virtualNetworks/ja-xxx/subnets/xxxt" -> "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Network/virtualNetworks/xxx/subnets/xxx" # forces replacement
      - tags                     = {
          - "Jio-Azure-JPL-Application-Name" = "xxx"
          - "Jio-Azure-JPL-Business-Impact"  = "xxx"
          - "Jio-Azure-JPL-Business-Unit"    = "xxx"
          - "Jio-Azure-JPL-Cost-Center"      = "xxx"
          - "Jio-Azure-JPL-Environment"      = "xxx"
          - "Jio-Azure-JPL-Functional-Owner" = "xxx"
          - "Jio-Azure-JPL-Technical-Owner"  = "xxx"
        } -> null
        # (2 unchanged attributes hidden)

      - private_dns_zone_group {
          - id                   = "/subscriptions/xxx/resourceGroups/xxx-rg/providers/Microsoft.Network/privateEndpoints/fhiostoreprod-pe/privateDnsZoneGroups/deployedByPolicy" -> null
          - name                 = "deployedByPolicy" -> null
          - private_dns_zone_ids = [
              - "/subscriptions/xxx/resourcegroups/private-dns-zones-rg/providers/microsoft.network/privatednszones/privatelink.blob.core.windows.net",
            ] -> null
        }

      ~ private_service_connection {
            name                           = "xxx-pe-connection"
          ~ private_ip_address             = "xxx" -> (known after apply)
            # (3 unchanged attributes hidden)
        }
    }

Expected Behaviour

this resource is already created it should not replace on reapply

Actual Behaviour

It is replacing the resource.

Steps to Reproduce

terraform plan terraform apply terraform plan terraform apply

Important Factoids

No response

References

No response

CorrenSoft commented 1 year ago

The private endpoint is getting recreated because of the difference between the stored subnet_id and the received one (the part of the VNet name is coming in Upper case).

I did some testing and I was unable to reproduce the issue, even playing with lower and upper case names. Can you provide me with more details about the specific names?

It is annoying because is a case-sensitive issue, but I am not sure if it can be handled in this case directly in the private endpoint resource; that specific property has the ForceNew flag, and I don't see a way to make it ignore it

CorrenSoft commented 1 year ago

Still no luck :(

Please, go to the Azure portal and check two things: 1- Browse to the virtual network resource, and check if the name is ja-jm-JPL-OSSBSS-DevOps-Prod-vnet or ja-jm-jpl-ossbss-devops-prod-vnet 2- Browse to the private endpoint, and in the overview check if the Virtual network/subnet label says ja-jm-JPL-OSSBSS-DevOps-Prod-vnet/ja-jm-jpl-ossbss-devops-prod-non-dmz-snet or ja-jm-jpl-ossbss-devops-prod-vnet/ja-jm-jpl-ossbss-devops-prod-non-dmz-snet

Finally, are you using the refresh=false flag?

nimblenitin commented 1 year ago

So it is ja-jm-jpl-ossbss-devops-prod-vnet/ja-jm-jpl-ossbss-devops-prod-non-dmz-snet. i.e the latter one for both points. I did not find the refresh flag anywhere. It is trying replace these with those in capital letters which is inaccurate. Not sure why.

CorrenSoft commented 1 year ago

I can not find any reason why the subnet data resource could return that inaccurate value, so I guess that this is the far I can go with this issue. The last long shoot that you could try is to force to "recreate" the data resource, either removing it from the state or changing the path.

kraduk commented 1 year ago

I am seeing this as well. I have outputted the data field to a file and done various case sensitive string comparisons to the value stored in the tfstate file and can find no differences, and yet it wants to rebuild every time. However if I replace the data reference with a static string ie the output in the file, it doesn't want to rebuild. This seems to be a deterministic vs non deterministic issue.

resource "local_file" "test" {
    content  = data.azurerm_subnet.pe_snet.id
    filename = "/tmp/data"
}

resource "azurerm_private_endpoint" "blob" {
  count               = var.pe_blob ? 1 : 0
  provider            = azurerm.pe
  name                = "${var.storage_main.name}-blobendpoint"
  location            = "North Europe"
  resource_group_name = local.pe_endpoints.rg
  subnet_id           = "/subscriptions/XXXXX/resourceGroups/YYYY/providers/Microsoft.Network/virtualNetworks/VVVVVV/subnets/SSSSS"
  #data.azurerm_subnet.pe_snet.id
  tags                = local.tags
  private_service_connection {
...

data "azurerm_subnet" "pe_snet" {
  provider             = azurerm.pe
  virtual_network_name = data.azurerm_virtual_network.pe_vnet.name
  name                 = local.pe_endpoints.snet
  resource_group_name  = local.pe_endpoints.rg
}

Could it be its flagging the wrong thing that is forcing the change?

One thing to note is that i am doing this in a module not at the top level

$ terraform version
Terraform v1.3.1
on linux_amd64
+ provider registry.terraform.io/hashicorp/azurerm v3.37.0
+ provider registry.terraform.io/hashicorp/local v2.2.3
kraduk commented 1 year ago

Still happens on this version

$ terraform version
Terraform v1.3.6
on linux_amd64
+ provider registry.terraform.io/hashicorp/azurerm v3.37.0
+ provider registry.terraform.io/hashicorp/local v2.2.3
kraduk commented 1 year ago

This is something deeper, either in terraform or the azurerm provider as I have another instance of it. Again this is in a module. Basically my code calls a module, part of that it passes the resource group to the module. eg

module "storage" {
  depends_on = [
    azurerm_resource_group.main
  ]
  providers = {
    azurerm      = azurerm
    azurerm.corp = azurerm.pe
  }
  acl_default_action = "Allow"
  allowed_ips        = []
  default_tags       = var.default_tags
  prod               = true
  sa_rg              = azurerm_resource_group.main.name
  source             = "../shared/modules/Storage-PE"
  storage_main = {
    "name" = "ProdAcmeBot",
    "tier" = "Standard",
    "type" = "ZRS"
  }
}

In the module i then data the rg to workout the location on where to put the SA eg

data "azurerm_resource_group" "main" {
  name = var.sa_rg
}

resource "azurerm_storage_account" "main" {
  name                              = replace(lower(var.storage_main.name), "/[^a-z0-9]/", "")
  resource_group_name               = data.azurerm_resource_group.main.name
  location                          = data.azurerm_resource_group.main.location 
  account_tier                      = try(var.storage_main.tier, "Standard")
  account_replication_type          = try(var.storage_main.type, "ZRS")
  account_kind                      = "StorageV2"
  enable_https_traffic_only         = try(var.storage_main.enable_https_traffic_only, true)
  min_tls_version                   = "TLS1_2"
  shared_access_key_enabled         = var.shared_access_key_enabled
  is_hns_enabled                    = var.is_hns_enabled
  infrastructure_encryption_enabled = var.infrastructure_encryption_enabled
  access_tier                       = try(var.storage_main.access_tier, var.access_tier)
  #nfsv3_enabled                     = "true"

  blob_properties {
    delete_retention_policy {
      days = var.delete_retention_policy
    }
    versioning_enabled = var.versioning_enabled
    container_delete_retention_policy {
      days = var.container_delete_retention_policy
    }
  }
    network_rules {
    default_action             = var.acl_default_action
    ip_rules                   = var.boot_diags_sa ? setunion(var.Boot_diags_ips, var.allowed_ips) : var.allowed_ips
    virtual_network_subnet_ids = []
    bypass                     = ["Logging", "Metrics", "AzureServices"]
  }
  tags = local.tags
}

However the code wants to rebuild all the time. Again statically defining it stops this rebuild. Using anything like a local or data ie a runtime variable forces it to rebuild. vars are fine, as they are filled in via the preprocessor not worked out at runtime

  # module.storage.azurerm_storage_account.main must be replaced
-/+ resource "azurerm_storage_account" "main" {
      ~ id                                = "RADACT" -> (known after apply)
      + large_file_share_enabled          = (known after apply)
      ~ location                          = "northeurope" -> (known after apply) # forces replacement
        name                              = "RADACT"
CorrenSoft commented 1 year ago

Now that you mention this, I've experienced a similar situation using a data resource inside of a module (but with key vault). Moving the data out was my solution, and it is something that I always recommend.

kraduk commented 1 year ago

That isnt a solution as what you are saying is that you cant programmatically determine a value of something inside a module, you have to pass the data, making the module no longer self contained

mark88d commented 1 year ago

We have started to Notice this issue on our Azure terraform Builds which has become extremely annoying. Any progress here? @catriona-m

kraduk commented 1 year ago

It is also a change in behavior and going back and modifying the estate of code to implement workarounds is none trivial.

Drejmester commented 1 year ago

I found a workaround:

lifecycle { 
  ignore_changes {
    subnet_id, tags #I always add tags
  }
}
kraduk commented 1 year ago

Thats not a work around it's running away and not dealing with it. By lifecycling it you will never pick up any valid changes in the infra or code.

On Fri, 6 Jan 2023, 03:57 Andrej Rosic, @.***> wrote:

I found a workaround:

lifecycle { ignore_changes { subnet_id } }

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1373112047, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XOKKYPROZ55HNT2VWDWQ6J33ANCNFSM6AAAAAAR2T2WUQ . You are receiving this because you commented.Message ID: @.***>

CorrenSoft commented 1 year ago

That isnt a solution as what you are saying is that you cant programmatically determine a value of something inside a module, you have to pass the data, making the module no longer self contained

I am aware that is not the solution, I just expressed that I use a different way of code that allows me to avoid this (and others) issues using a data inside of the same module that would consume it. Being said that, I agree that this situation must be fixed, so you can write your code as you want.

CGAndrej commented 1 year ago

ignore_changes only ignores changes made on the Azure side, like the capitalization of a data resource or variable. It will make adjustments if you change your code https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#ignore_changes

nickelTyle commented 1 year ago

Thats not a work around it's running away and not dealing with it. By lifecycling it you will never pick up any valid changes in the infra or code. On Fri, 6 Jan 2023, 03:57 Andrej Rosic, @.> wrote: I found a workaround: lifecycle { ignore_changes { subnet_id } } — Reply to this email directly, view it on GitHub <#19200 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XOKKYPROZ55HNT2VWDWQ6J33ANCNFSM6AAAAAAR2T2WUQ . You are receiving this because you commented.Message ID: @.>

you could try passing the ids through terraform's upper/lower functions. Assuming Azure IDs are case-insensitive, it should still allow creating resources and also trick Terraform into not editing stuff when an ID with funny casing is returned by Azure/the Azure TF provider

nickelTyle commented 1 year ago

ignore_changes only ignores changes made on the Azure side, like the capitalization of a data resource or variable. It will make adjustments if you change your code https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#ignore_changes

this is not true

kraduk commented 1 year ago

still no movement i see

mark88d commented 1 year ago

any updates from Microsoft on this issue?

CGAndrej commented 1 year ago

it may be that Azure is case-insensitive (https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/resource-name-rules) and Terraform is case-sensitive. Can you try using a lower() when passing the parameter?

This issue has popped up elsewhere too: https://www.reddit.com/r/sysadmin/comments/xiknp2/terraform_azure_resource_not_case_sensitive/

I would open up a bug with Hashicorp

kraduk commented 1 year ago

I have seen that, it's actually worse as you can input in one case and it comes out a different case when it's returned. However that's not the Cruz of the issue here, it's when the subnet is referenced via a variable or data when in a module. If you get the exact same string and define it as a static string in code it doesn't rebuild, so that isn't a case thing.

On Thu, 23 Feb 2023, 16:51 CGAndrej, @.***> wrote:

it may be that Azure is case-insensitive ( https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/resource-name-rules) and Terraform is case-sensitive.

I would open up a bug with Hashicorp

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1442107104, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XNOYA75SAWXNTDXEJTWY6IQZANCNFSM6AAAAAAR2T2WUQ . You are receiving this because you commented.Message ID: @.***>

apparentlymart commented 1 year ago

Hi all,

I work on Terraform Core rather than this provider, so I can't comment on the details of how the Azure API behaves here, but I found my way here after someone asked me how a provider can potentially deal with a situation like this, and so I figured I might as well leave my answer here in case it's useful to someone else too.

The azurerm_private_endpoint resource type is implemented using this provider's own wrapper around the Terraform plugin SDK, but it imports (using a type alias) the schema.Schema type from the central SDK, so I think what I'm about to describe should be workable in this provider, but I cannot be 100% sure.

The two SDK features relevant to problems like this are:

The subnet_id argument of azurerm_private_endpoint does not currently set either of those:

https://github.com/hashicorp/terraform-provider-azurerm/blob/d69a2a85c6b8dcdfcd985a709a7f9b55a2d0c396/internal/services/network/private_endpoint_resource.go#L66-L71

If there is no definition of DiffSuppressFunc then the SDK uses a default rule that requires the two strings to be exactly equal, including case-sensitivity. That default rule doesn't seem to be sufficient for this situation, so a possible way to fix it would be to add a DiffSuppressFunc field to that schema definition which returns true if the old and new values are equivalent, and then also set DiffSuppressOnRefresh: true to make sure that same rule gets applied when the provider updates its records based on what's currently stored in the API.

Correctly implementing DiffSuppressFunc will require first determining exactly what rules the remote API uses for case folding. The documentation linked earlier says that the API defines "alphanumeric" as including only the ASCII letters and digits, so perhaps an ASCII-only definition of case is sufficient if it's guaranteed that letters from other alphabets or letters with diacritics can never appear in these strings.

Looking elsewhere in the provider codebase I see that there's an existing function suppress.CaseDifference which implements Unicode case folding. Unicode case folding should be strictly more complete than ASCII case folding and so that could be a sufficient implementation as long as the documentation is accurate that non-ASCII letters and letters with diacritics are never valid.

This "diff suppress" functionality takes precedence over ForceNew, so if the given function returns true then the SDK won't report to Terraform that the argument has changed or that the change requires replacing the object.

I can't promise that this is the whole story but I hope this will be a useful starting point if someone wanted to investigate this further!

kbk574 commented 1 year ago

What is most intriguing about all this in my case is that I am using the same variable string value for virtualNetworkRules while deploying a Key Vault and a Storage Account (using azapi provider); they both use the exact same block to define the networkAcls

  networkAcls = {
    bypass        = "Logging, Metrics, AzureServices"
    defaultAction = "Deny"
    ipRules       = var.ip_rules
    virtualNetworkRules = [
      {
        id = var.snet_id
      }
    ]
  }

Yet, it is only the Key Vault resource creation that exhibits this behavior repeatedly; even when I hard-code the string value, or reference it through a data mode.

So then I checked the JSON view for the KV within Azure portal; it turns out that it is ARM that is causing this anomaly with camel-casing of RID for some unknown reason:

        "virtualNetworkRules": [
            {
                "id": "/subscriptions/<>/resourcegroups/<>/providers/microsoft.network/virtualnetworks/<>/subnets/<>",
            }
        ]

I have tried different API versions without any change in behavior. The issue points to Azure side of the equation!

BlondeBurrito commented 11 months ago

I've come across a couple scenarios that seem related and quite interesting.

We have repositories of in-house modules and various repositories which utilise them. We have a private_endpoint module which chiefly contains:

data "azurerm_subnet" "subnet" {
  name                 = var.subnet_name
  virtual_network_name = var.virtual_network_name
  resource_group_name  = var.virtual_network_resource_group_name
}

resource "azurerm_private_endpoint" "pe" {
  name                = join("-", [var.environment, var.application, var.label, "pe", var.location])
  location            = var.location
  resource_group_name = var.resource_group_name
  subnet_id           = data.azurerm_subnet.subnet.id
  // SNIP
}

And you call the module in another repository like:

module "private_endpoint_storage" {
  source                              = "url"
  label                               = "storage"
  environment                         = var.environment
  application                         = var.application
  resource_group_name                 = azurerm_resource_group.rg.name
  location                            = azurerm_resource_group.rg.location
  tags                                = local.tags
  private_dns_zone_id                 = data.azurerm_private_dns_zone.zone_storage.id
  private_connection_resource_id      = module.storage.storage_id
  subresource_names                   = ["blob"]
  subnet_name                         = module.subnet.subnet_name
  virtual_network_name                = var.virtual_network_name
  virtual_network_resource_group_name = var.vnet_resource_group_name
  depends_on = [
    module.subnet
  ]
}

Scenario 1

A nightly pipeline that tests some of our modules with the latest provider version by running a plan, apply and destroy: building a subnet, storage account and sticks a private endpoint on the account. It looks like:

resource "azurerm_resource_group" "rg" {
  name     = join("-", [var.environment, var.application, "nightly", var.location])
  location = var.location
  tags     = local.tags
}

module "subnet" {
  // SNIP
}

module "storage" {
  source              = "url"
  environment         = var.environment
  application         = var.application
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  tags                = local.tags
  container_name      = ["example-container"]
}

module "private_endpoint_storage" {
  source                              = "url"
  label                               = "storage"
  environment                         = var.environment
  application                         = var.application
  resource_group_name                 = azurerm_resource_group.rg.name
  location                            = azurerm_resource_group.rg.location
  tags                                = local.tags
  private_dns_zone_id                 = data.azurerm_private_dns_zone.zone_storage.id
  private_connection_resource_id      = module.storage.storage_id
  subresource_names                   = ["blob"]
  subnet_name                         = module.subnet.subnet_name
  virtual_network_name                = var.virtual_network_name
  virtual_network_resource_group_name = var.vnet_resource_group_name
  depends_on = [
    module.subnet
  ]
}

With this code base you can run a plan+apply on it over and over and it never changes anything, expected behaviour.

Scenario 2

A repo that builds a private AKS cluster (with things like NSG, route table, managed identities, etc) and sticks a private endpoint over the AKS management plane, looks a bit like:

resource "azurerm_resource_group" "rg" {
  // SNIP
}
module "subnet_aks" {
  // SNIP
}
module "rt_aks" {
  // SNIP
}
module "nsg" {
  // SNIP
}
module "aks" {
  // SNIP
}
module "private_endpoint_k8s" {
  source                              = "url"
  label                               = "k8s"
  environment                         = var.environment
  application                         = var.application
  resource_group_name                 = azurerm_resource_group.rg.name
  location                            = azurerm_resource_group.rg.location
  tags                                = local.tags
  private_dns_zone_id                 = data.azurerm_private_dns_zone.zone_k8s.id
  private_connection_resource_id      = module.aks.aks_id
  subresource_names                   = ["management"]
  subnet_name                         = module.subnet_aks.subnet_name
  virtual_network_name                = var.virtual_network_name
  virtual_network_resource_group_name = var.vnet_resource_group_name
  depends_on = [
    module.subnet_aks,
  ]
}

In this case even though we're using the same subnet and endpoint modules and without changing any code, every plan/apply produces a PE replacement due to subnet_id:

# module.private_endpoint_k8s.data.azurerm_subnet.subnet will be read during apply
# (depends on a resource or a module with changes pending)
<= data "azurerm_subnet" "subnet" {
+ address_prefix                                 = (known after apply)
+ address_prefixes                               = (known after apply)
+ enforce_private_link_endpoint_network_policies = (known after apply)
+ enforce_private_link_service_network_policies  = (known after apply)
+ id                                             = (known after apply)
+ name                                           = "my-subnet-xxx"
+ network_security_group_id                      = (known after apply)
+ private_endpoint_network_policies_enabled      = (known after apply)
+ private_link_service_network_policies_enabled  = (known after apply)
+ resource_group_name                            = "my-rg-xxx"
+ route_table_id                                 = (known after apply)
+ service_endpoints                              = (known after apply)
+ virtual_network_name                           = "my-vnet-xxx"
 }
# module.private_endpoint_k8s.azurerm_private_endpoint.pe must be replaced
// SNIP
~ subnet_id                = "/subscriptions/***/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx" # forces replacement -> (known after apply) # forces replacement

Very strange, the only real difference between the module calls across the scenarios is that we naturally target different dns zone IDs, the rest is effectively the same even though scenario 2 has a lot more going on with other modules.

I've checked the subnet ID in the portal, logs and statefiles and we always lowercase our vnet names, subnet names and RG names. The entire ID is exactly as should be expected but for some reason it thinks it's changed even though it hasn't.

I decided to run both scenarios with TF_LOG=DEBUG and scenario 1 didn't really show anything meaningful but for scenario 2 this interesting message popped up:

[DEBUG] Resource state not found for node "module.private_endpoint_k8s.data.azurerm_subnet.subnet", instance module.private_endpoint_k8s.data.azurerm_subnet.subnet

This lead me to believe there might of been a weird and rare statefile/ID problem, but if I look through the statefile all seems well:

{
    "module": "module.private_endpoint_k8s",
    "mode": "data",
    "type": "azurerm_subnet",
    "name": "subnet",
    "provider": "provider[\"registry.terraform.io/hashicorp/azurerm\"]",
    "instances": [
      {
        // SNIP
        "attributes": {
          "id": "/subscriptions/xxx/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx",
          "name": "my-subnet-xxx",
          "virtual_network_name": "my-vnet-xxx"
        },
      // SNIP

Todo

Gut instinct makes me believe that if the data.azurerm_subnet is removed from our PE module and instead when deploying the module signature looks more like:

module "subnet" {}
module "private_endpoint" {
  subnet_id = module.subnet.id
// instead of supplying the vnet name, rg name and subnet name
}

So don't do a data lookup for it, this might go away - I'll refactor and try it out tomorrow. Very hard to figure out why scenario 1 is happy but 2 is sad. It seems like the data.azurerm_subnet bit of the statefile is either being ignored or it isn't matching/being compared right to what the Azure API seems to return (which could be malformed on the Az api side)

CorrenSoft commented 11 months ago

I've come across a couple scenarios that seem related and quite interesting.

We have repositories of in-house modules and various repositories which utilise them. We have a private_endpoint module which chiefly contains:

data "azurerm_subnet" "subnet" {
  name                 = var.subnet_name
  virtual_network_name = var.virtual_network_name
  resource_group_name  = var.virtual_network_resource_group_name
}

resource "azurerm_private_endpoint" "pe" {
  name                = join("-", [var.environment, var.application, var.label, "pe", var.location])
  location            = var.location
  resource_group_name = var.resource_group_name
  subnet_id           = data.azurerm_subnet.subnet.id
  // SNIP
}

And you call the module in another repository like:

module "private_endpoint_storage" {
  source                              = "url"
  label                               = "storage"
  environment                         = var.environment
  application                         = var.application
  resource_group_name                 = azurerm_resource_group.rg.name
  location                            = azurerm_resource_group.rg.location
  tags                                = local.tags
  private_dns_zone_id                 = data.azurerm_private_dns_zone.zone_storage.id
  private_connection_resource_id      = module.storage.storage_id
  subresource_names                   = ["blob"]
  subnet_name                         = module.subnet.subnet_name
  virtual_network_name                = var.virtual_network_name
  virtual_network_resource_group_name = var.vnet_resource_group_name
  depends_on = [
    module.subnet
  ]
}

Scenario 1

A nightly pipeline that tests some of our modules with the latest provider version by running a plan, apply and destroy: building a subnet, storage account and sticks a private endpoint on the account. It looks like:

resource "azurerm_resource_group" "rg" {
  name     = join("-", [var.environment, var.application, "nightly", var.location])
  location = var.location
  tags     = local.tags
}

module "subnet" {
  // SNIP
}

module "storage" {
  source              = "url"
  environment         = var.environment
  application         = var.application
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  tags                = local.tags
  container_name      = ["example-container"]
}

module "private_endpoint_storage" {
  source                              = "url"
  label                               = "storage"
  environment                         = var.environment
  application                         = var.application
  resource_group_name                 = azurerm_resource_group.rg.name
  location                            = azurerm_resource_group.rg.location
  tags                                = local.tags
  private_dns_zone_id                 = data.azurerm_private_dns_zone.zone_storage.id
  private_connection_resource_id      = module.storage.storage_id
  subresource_names                   = ["blob"]
  subnet_name                         = module.subnet.subnet_name
  virtual_network_name                = var.virtual_network_name
  virtual_network_resource_group_name = var.vnet_resource_group_name
  depends_on = [
    module.subnet
  ]
}

With this code base you can run a plan+apply on it over and over and it never changes anything, expected behaviour.

Scenario 2

A repo that builds a private AKS cluster (with things like NSG, route table, managed identities, etc) and sticks a private endpoint over the AKS management plane, looks a bit like:

resource "azurerm_resource_group" "rg" {
  // SNIP
}
module "subnet_aks" {
  // SNIP
}
module "rt_aks" {
  // SNIP
}
module "nsg" {
  // SNIP
}
module "aks" {
  // SNIP
}
module "private_endpoint_k8s" {
  source                              = "url"
  label                               = "k8s"
  environment                         = var.environment
  application                         = var.application
  resource_group_name                 = azurerm_resource_group.rg.name
  location                            = azurerm_resource_group.rg.location
  tags                                = local.tags
  private_dns_zone_id                 = data.azurerm_private_dns_zone.zone_k8s.id
  private_connection_resource_id      = module.aks.aks_id
  subresource_names                   = ["management"]
  subnet_name                         = module.subnet_aks.subnet_name
  virtual_network_name                = var.virtual_network_name
  virtual_network_resource_group_name = var.vnet_resource_group_name
  depends_on = [
    module.subnet_aks,
  ]
}

In this case even though we're using the same subnet and endpoint modules and without changing any code, every plan/apply produces a PE replacement due to subnet_id:

# module.private_endpoint_k8s.data.azurerm_subnet.subnet will be read during apply
# (depends on a resource or a module with changes pending)
<= data "azurerm_subnet" "subnet" {
+ address_prefix                                 = (known after apply)
+ address_prefixes                               = (known after apply)
+ enforce_private_link_endpoint_network_policies = (known after apply)
+ enforce_private_link_service_network_policies  = (known after apply)
+ id                                             = (known after apply)
+ name                                           = "my-subnet-xxx"
+ network_security_group_id                      = (known after apply)
+ private_endpoint_network_policies_enabled      = (known after apply)
+ private_link_service_network_policies_enabled  = (known after apply)
+ resource_group_name                            = "my-rg-xxx"
+ route_table_id                                 = (known after apply)
+ service_endpoints                              = (known after apply)
+ virtual_network_name                           = "my-vnet-xxx"
 }
# module.private_endpoint_k8s.azurerm_private_endpoint.pe must be replaced
// SNIP
~ subnet_id                = "/subscriptions/***/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx" # forces replacement -> (known after apply) # forces replacement

Very strange, the only real difference between the module calls across the scenarios is that we naturally target different dns zone IDs, the rest is effectively the same even though scenario 2 has a lot more going on with other modules.

I've checked the subnet ID in the portal, logs and statefiles and we always lowercase our vnet names, subnet names and RG names. The entire ID is exactly as should be expected but for some reason it thinks it's changed even though it hasn't.

I decided to run both scenarios with TF_LOG=DEBUG and scenario 1 didn't really show anything meaningful but for scenario 2 this interesting message popped up:

[DEBUG] Resource state not found for node "module.private_endpoint_k8s.data.azurerm_subnet.subnet", instance module.private_endpoint_k8s.data.azurerm_subnet.subnet

This lead me to believe there might of been a weird and rare statefile/ID problem, but if I look through the statefile all seems well:

{
  "module": "module.private_endpoint_k8s",
  "mode": "data",
  "type": "azurerm_subnet",
  "name": "subnet",
  "provider": "provider[\"registry.terraform.io/hashicorp/azurerm\"]",
  "instances": [
    {
      // SNIP
      "attributes": {
        "id": "/subscriptions/xxx/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx",
        "name": "my-subnet-xxx",
        "virtual_network_name": "my-vnet-xxx"
      },
    // SNIP

Todo

Gut instinct makes me believe that if the data.azurerm_subnet is removed from our PE module and instead when deploying the module signature looks more like:

module "subnet" {}
module "private_endpoint" {
  subnet_id = module.subnet.id
// instead of supplying the vnet name, rg name and subnet name
}

So don't do a data lookup for it, this might go away - I'll refactor and try it out tomorrow. Very hard to figure out why scenario 1 is happy but 2 is sad. It seems like the data.azurerm_subnet bit of the statefile is either being ignored or it isn't matching/being compared right to what the Azure API seems to return (which could be malformed on the Az api side)

I can assure you that removing the data from the module and just passing the ID will work.

kraduk commented 11 months ago

Yep, i have switched to static references where ever i can. Some of the harder stuff i have even considered splitting the data logic and generating a json object so i can ingest it into the real tf job. All seems a little laborious for something that should just work

On Tue, 21 Nov 2023, 17:14 Lucas Fernández, @.***> wrote:

I've come across a couple scenarios that seem related and quite interesting.

We have repositories of in-house modules and various repositories which utilise them. We have a private_endpoint module which chiefly contains:

data "azurerm_subnet" "subnet" { name = var.subnet_name virtual_network_name = var.virtual_network_name resource_group_name = var.virtual_network_resource_group_name } resource "azurerm_private_endpoint" "pe" { name = join("-", [var.environment, var.application, var.label, "pe", var.location]) location = var.location resource_group_name = var.resource_group_name subnet_id = data.azurerm_subnet.subnet.id // SNIP }

And you call the module in another repository like:

module "private_endpoint_storage" { source = "url" label = "storage" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags private_dns_zone_id = data.azurerm_private_dns_zone.zone_storage.id private_connection_resource_id = module.storage.storage_id subresource_names = ["blob"] subnet_name = module.subnet.subnet_name virtual_network_name = var.virtual_network_name virtual_network_resource_group_name = var.vnet_resource_group_name depends_on = [ module.subnet ] }

Scenario 1

A nightly pipeline that tests some of our modules with the latest provider version by running a plan, apply and destroy: building a subnet, storage account and sticks a private endpoint on the account. It looks like:

resource "azurerm_resource_group" "rg" { name = join("-", [var.environment, var.application, "nightly", var.location]) location = var.location tags = local.tags } module "subnet" { // SNIP } module "storage" { source = "url" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags container_name = ["example-container"] } module "private_endpoint_storage" { source = "url" label = "storage" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags private_dns_zone_id = data.azurerm_private_dns_zone.zone_storage.id private_connection_resource_id = module.storage.storage_id subresource_names = ["blob"] subnet_name = module.subnet.subnet_name virtual_network_name = var.virtual_network_name virtual_network_resource_group_name = var.vnet_resource_group_name depends_on = [ module.subnet ] }

With this code base you can run a plan+apply on it over and over and it never changes anything, expected behaviour. Scenario 2

A repo that builds a private AKS cluster (with things like NSG, route table, managed identities, etc) and sticks a private endpoint over the AKS management plane, looks a bit like:

resource "azurerm_resource_group" "rg" { // SNIP }module "subnet_aks" { // SNIP }module "rt_aks" { // SNIP }module "nsg" { // SNIP }module "aks" { // SNIP }module "private_endpoint_k8s" { source = "url" label = "k8s" environment = var.environment application = var.application resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location tags = local.tags private_dns_zone_id = data.azurerm_private_dns_zone.zone_k8s.id private_connection_resource_id = module.aks.aks_id subresource_names = ["management"] subnet_name = module.subnet_aks.subnet_name virtual_network_name = var.virtual_network_name virtual_network_resource_group_name = var.vnet_resource_group_name depends_on = [ module.subnet_aks, ] }

In this case even though we're using the same subnet and endpoint modules and without changing any code, every plan/apply produces a PE replacement due to subnet_id:

module.private_endpoint_k8s.data.azurerm_subnet.subnet will be read during apply

(depends on a resource or a module with changes pending)

<= data "azurerm_subnet" "subnet" {

  • address_prefix = (known after apply)
  • address_prefixes = (known after apply)
  • enforce_private_link_endpoint_network_policies = (known after apply)
  • enforce_private_link_service_network_policies = (known after apply)
  • id = (known after apply)
  • name = "my-subnet-xxx"
  • network_security_group_id = (known after apply)
  • private_endpoint_network_policies_enabled = (known after apply)
  • private_link_service_network_policies_enabled = (known after apply)
  • resource_group_name = "my-rg-xxx"
  • route_table_id = (known after apply)
  • service_endpoints = (known after apply)
  • virtual_network_name = "my-vnet-xxx" }

    module.private_endpoint_k8s.azurerm_private_endpoint.pe must be replaced

    // SNIP ~ subnet_id = "/subscriptions/***/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx" # forces replacement -> (known after apply) # forces replacement

Very strange, the only real difference between the module calls across the scenarios is that we naturally target different dns zone IDs, the rest is effectively the same even though scenario 2 has a lot more going on with other modules.

I've checked the subnet ID in the portal, logs and statefiles and we always lowercase our vnet names, subnet names and RG names. The entire ID is exactly as should be expected but for some reason it thinks it's changed even though it hasn't.

I decided to run both scenarios with TF_LOG=DEBUG and scenario 1 didn't really show anything meaningful but for scenario 2 this interesting message popped up:

[DEBUG] Resource state not found for node "module.private_endpoint_k8s.data.azurerm_subnet.subnet", instance module.private_endpoint_k8s.data.azurerm_subnet.subnet

This lead me to believe there might of been a weird and rare statefile/ID problem, but if I look through the statefile all seems well:

{ "module": "module.private_endpoint_k8s", "mode": "data", "type": "azurerm_subnet", "name": "subnet", "provider": "provider[\"registry.terraform.io/hashicorp/azurerm\ http://registry.terraform.io/hashicorp/azurerm%5C"]", "instances": [ { // SNIP "attributes": { "id": "/subscriptions/xxx/resourceGroups/my-rg-xxx/providers/Microsoft.Network/virtualNetworks/my-vnet-xxx/subnets/my-subnet-xxx", "name": "my-subnet-xxx", "virtual_network_name": "my-vnet-xxx" }, // SNIP

Todo

Gut instinct makes me believe that if the data.azurerm_subnet is removed from our PE module and instead when deploying the module signature looks more like:

module "subnet" {}module "private_endpoint" { subnet_id = module.subnet.id// instead of supplying the vnet name, rg name and subnet name }

So don't do a data lookup for it, this might go away - I'll refactor and try it out tomorrow. Very hard to figure out why scenario 1 is happy but 2 is sad. It seems like the data.azurerm_subnet bit of the statefile is either being ignored or it isn't matching/being compared right to what the Azure API seems to return (which could be malformed on the Az api side)

I can assure you that removing the data from the module and just passing the ID will work.

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1821333160, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XORRZ76IBV7E2XSZ3TYFTOO3AVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRRGMZTGMJWGA . You are receiving this because you commented.Message ID: @.***>

CorrenSoft commented 11 months ago

You must have an overwhelming complex scenario there... to me has been quite simple; if i am creating the subnet in the same deployment i already have the subnet_id from there, if not, i just add the data to read the id and then pass the value to the module (simple explanation, it is a little more complex the second case).

kraduk commented 11 months ago

Not massively, but we are heavily private endpoint so that adds some fun. Its actually really easy to have a load of datas in a module, and write all the data to a json blob for ingestion in other places as an environment config. You just build your structure with a local. This completely side steps the issue as the jobs that ingest the object its all static data. It is really stupid to have to do it though.

On Tue, 21 Nov 2023, 21:05 Lucas Fernández, @.***> wrote:

You must have an overwhelming complex scenario there... to me has been quite simple; if i am creating the subnet in the same deployment i already have the subnet_id from there, if not, i just add the data to read the id and then pass the value to the module (simple explanation, it is a little more complex the second case).

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1821678579, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XLBRLVPLIKIN5NLJULYFUJRRAVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRRGY3TQNJXHE . You are receiving this because you commented.Message ID: @.***>

FrancescoCipolla-TomTom commented 10 months ago

I am experiencing the same behaviour in my Production environment for both an Azure Web App and PosgreSQL Server. Private endpoints are getting replaced for both resources with each deploy, not optimal at all. The best part is that in my Dev environment this is not the case! I use parametrised terraform scripts (basically the only difference in the scripts are the -dev and -prod suffixes) and I also use the subnet_id with the direct reference to the already existing subnet. Given I have exactly the same terraform scripts for both Dev and Prod subscriptions, I suspect it could be an issue with the terraform backend.

kraduk commented 10 months ago

I have a config module not with all the subnet ids statically in to avoid this issue. You can of course construct the id from the subnet name, rg, vnet, and name. That is until they change the format.

On Thu, 21 Dec 2023, 15:29 FrancescoCipolla-TomTom, < @.***> wrote:

I am experiencing the same behaviour in my Production environment for both an Azure Web App and PosgreSQL Server. Private endpoints are getting replaced for both resources with each deploy, not optimal at all. The best part is that in my Dev environment this is not the case! I use parametrised terraform scripts (basically the only difference in the script are the -dev and -prod suffixes) and I also use the subnet_id with the direct reference to the already existing subnet. Given I have exactly the same terraform scripts for both Dev and Prod subscriptions, I suspect it could be an issue with the terraform backend.

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1866493212, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XP3ENIW6L23I7KFUXDYKRIXLAVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWGQ4TGMRRGI . You are receiving this because you commented.Message ID: @.***>

FrancescoCipolla-TomTom commented 10 months ago

It seems that some devs were able to avoid this situation by using a alternative ways of assigning the subnet_id. Not sure if I misunderstood the advices, but basically I tried removing the terraform data reference of the subnet to use (already existing in the Azure subscription). Instead of that I tried with a fixed string when providing the private endpoint's subnet_id reference. Just to have a clearer idea with code snippets:

I removed this code block:

data "azurerm_subnet" "my-subnet" {
  name                 = "my-subnet"
  virtual_network_name = data.azurerm_virtual_network.my-default-vnet.name
  resource_group_name  = data.azurerm_virtual_network.my-default-vnet.resource_group_name
}

which was referenced like this:

resource "azurerm_private_endpoint" "my-private-endpoint" {
  name      = "my-private-endpoint"
  subnet_id = data.azurerm_subnet.my-subnet.id
  ...
}

And instead went with:

resource "azurerm_private_endpoint" "my-private-endpoint" {
  name      = "my-private-endpoint"
  subnet_id = "/subscriptions/<MY-SUBSCRIPTION-ID>/resourceGroups/<MY-VNET-RG>/providers/Microsoft.Network/virtualNetworks/my-default-vnet/subnets/my-subnet"
  ...
}

Unfortunately it did not fix the issue for me (using Terraform v3.74.0). The code I have is really simple as well, no modules defined because I'm going with a more granular approach by initialising different terraform providers (with separate backends) depending on the infrastructure component. Basically, for each "macro component" I want to deploy, I define different data.tf, variables.tf, main.tf and provider.tf files. Are there any other options/suggestions worth trying?

kraduk commented 9 months ago

it did for us. You do have to 100% spot on with the casing though as azure and the provider are quite fussy and not consistent.

Double check everything in your state with

terraform state show

eg

terraform state show module.storage.azurerm_private_endpoint.main

On Thu, 4 Jan 2024 at 13:44, Francesco Cipolla @.***> wrote:

It seems that some people were able to avoid this situation by using a different way of assigning the subnet_id. Not sure if I misunderstood the advices, but basically I tried removing the terraform data reference of the subnet it, and instead use a fixed when providing the private endpoint's subnet_id reference. Just to have a better idea with code snippets:

I removed this code block:

data "azurerm_subnet" "my-subnet" { name = "my-subnet-name" virtual_network_name = data.azurerm_virtual_network.my-default-vnet.name resource_group_name = data.azurerm_virtual_network.my-default-vnet.resource_group_name }

which was referenced like this:

resource "azurerm_private_endpoint" "hapt-flow-private-endpoint" { name = "my-private-endpoint" subnet_id = data.azurerm_subnet.my-subnet.id ... }

And instead went with:

resource "azurerm_private_endpoint" "hapt-flow-private-endpoint" { name = "my-private-endpoint" subnet_id = "/subscriptions//resourceGroups/my-default-vnet.resource_group_name/providers/Microsoft.Network/virtualNetworks/my-default-vnet/subnets/my-subnet-name" ... }

It did not fix the issue. The code I have is really simple as well, no modules defined because I'm going with a more granular approach which has defines different terraform providers (with separate backends) depending on the infrastructure component (basically I have plain old data.tf, variables.tf, main.tf and provider.tf files). Are there any other options worth trying?

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1877116987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XNJBF623FNBCVYUFILYM2W2HAVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGEYTMOJYG4 . You are receiving this because you commented.Message ID: @.***>

FrancescoCipolla-TomTom commented 9 months ago

@kraduk after your suggestion I went back and checked the tfstate files for all my resources in each environment. Eventually I did find out that while our Dev and Prod subnet_id definition was semantically the same, there was indeed a difference in the character casings for 2 characters (would be interesting to know why this difference appeared in the first place!).

Anyways, I changed again to the implementation I mentioned in my comment above, but this time adding as well the correct casing for Prod environment and it fixed the issue. Thanks a lot for your suggestion!

RickardCardell commented 7 months ago

I have a suggestion and that is to remove any dependencies in depends_on. Instead rely on whatever variable the dependency is outputting. This solved it for me when my azurerm_private_endpoint got recreated for apparently no reason. In my case I had a for_each loop in conjunction with a depends_on, so it might not be the exact behaviour as you have, but worth a try.

kraduk commented 7 months ago

It's not that at all I afraid as I have no dependencies other than implicit ones. It's data's that do it as they are non deterministic at compile time for TF

On Thu, 14 Mar 2024, 13:51 Rickard Cardell, @.***> wrote:

I have a suggestion and that is to remove any dependencies in depends_on. Instead rely on whatever variable the dependency is outputting. This solved it for me when my azurerm_private_endpoint got recreated for apparently no reason. In my case I had a for_each loop in conjunction with a depends_on, so it might not be the exact behaviour as you have, but worth a try.

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-azurerm/issues/19200#issuecomment-1997510471, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE23XOMB7MBW324TLYSENLYYGTPHAVCNFSM6AAAAAAR2T2WUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGUYTANBXGE . You are receiving this because you were mentioned.Message ID: @.***>

stuharper commented 6 months ago

I have a suggestion and that is to remove any dependencies in depends_on. Instead rely on whatever variable the dependency is outputting. This solved it for me when my azurerm_private_endpoint got recreated for apparently no reason. In my case I had a for_each loop in conjunction with a depends_on, so it might not be the exact behaviour as you have, but worth a try.

Thanks for this - it worked for me too

MilesCameron-DMs commented 2 months ago

Are we saying here that data "azurerm_subnet" is not usable in its current form unless we are happy that it periodically forces replacement?

A typical use case is when the subnet is not created in the same Terraform deployment.

Is the only alternative to set a variable with the resourceid?