hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.59k stars 4.63k forks source link

Azurerm_frontdoor with v2.24.0 breaks when azure frontdoor is edited in portal. #8208

Closed andrstor closed 3 years ago

andrstor commented 4 years ago

Community Note

Terraform (and AzureRM Provider) Version

Terraform v0.12.21
+ provider.azurerm v2.24.0

Affected Resource(s)

Terraform Configuration Files

provider "azurerm" {
  version = "=2.24.0"
  features {} # https://www.terraform.io/docs/providers/azurerm/index.html#features
}

resource "azurerm_resource_group" "example" {
  name     = "andreastester"
  location = "norway east"
}

resource "azurerm_frontdoor" "example" {
  name                                         = "andreastester"
  resource_group_name                          = azurerm_resource_group.example.name
  enforce_backend_pools_certificate_name_check = false

  routing_rule {
    name               = "exampleRoutingRule1"
    accepted_protocols = ["Http", "Https"]
    patterns_to_match  = ["/*"]
    frontend_endpoints = ["exampleFrontendEndpoint1"]
    forwarding_configuration {
      forwarding_protocol = "MatchRequest"
      backend_pool_name   = "exampleBackendBing"
    }
  }

  backend_pool_load_balancing {
    name = "exampleLoadBalancingSettings1"
  }

  backend_pool_health_probe {
    name = "exampleHealthProbeSetting1"

  }

  backend_pool {
    name = "exampleBackendBing"
    backend {
      host_header = "www.bing.com"
      address     = "www.bing.com"
      http_port   = 80
      https_port  = 443
    }

    load_balancing_name = "exampleLoadBalancingSettings1"
    health_probe_name   = "exampleHealthProbeSetting1"
  }

  frontend_endpoint {
    name                              = "exampleFrontendEndpoint1"
    host_name                         = "andreastester.azurefd.net"
    custom_https_provisioning_enabled = false
  }
}

Debug Output

https://gist.github.com/andrstor/0aa07440e0a01befb23351db3257340f

Panic Output

Expected Behavior

Terraform identifies that no changes are required or tries to recover its state.

Actual Behavior

Error: flattening backend_pool: ID was missing the healthProbeSettings element

Steps to Reproduce

  1. terraform apply
  2. Do anything in the azure portal that trigges a change. For instance add a rule engine rule to the routing rule.
  3. terraform plan

You can also undo the manual change again, the resource is still broken for azurerm v2.24.0. This works with v2.23.0

Important Factoids

None

References

eliasgrueninger commented 4 years ago

Same issue here but on another level: Error: flattening frontend_endpoint: ID was missing the frontDoorWebApplicationFirewallPolicies element

lyubomirr commented 4 years ago

Same behaviour when trying to import the resource.

MelHarbour commented 4 years ago

Seeing the same issue as @eliasgrueninger on the Firewall Policies element, but we're running 2.21, which implies that it's not necessarily a change in the most recent version of the provider, but could be a change at the Azure end?

nmiodice commented 4 years ago

I found a similar issue (#8231) and did not realize there was an open issue already.

The only difference in my case is that the AFD & related resources were not edited in the Azure Portal - these resources are fully managed by Terraform. cc @tombuildsstuff , that is the only difference I see between these two issues, though the root cause is likely the same.

kevinchabreck commented 4 years ago

I experienced this issue when trying to update the resource, and then again when trying to import it after removing it from the terraform state.

import azurerm_frontdoor.fd <my frontdoor resource ID>
azurerm_frontdoor.fd: Importing from ID "<my frontdoor resource ID>"...
azurerm_frontdoor.fd: Import prepared!
  Prepared azurerm_frontdoor for import
azurerm_frontdoor.fd: Refreshing state... [id=<my frontdoor resource ID>]

Error: flattening `backend_pool`: ID was missing the `healthProbeSettings` element

I tested this on terraform v0.12.24 with azurem provider versions v2.13.0 and v2.24.0 with the same results. Perhaps @MelHarbour is right about it being a change in Azure's API?

lyubomirr commented 4 years ago

@kevinchabreck I've had the exact same issue with the same steps as you, but it started working once i changed the provider version to 2.23.0. You can try it.

MelHarbour commented 4 years ago

As a follow-on, we re-ran a previous release on an environment that hadn't been modified outside of Terraform since the last apply, and it also failed with the same error. So it looks distinctly like something's changed in the underlying platform.

kevinchabreck commented 4 years ago

@lyubomirr downgrading to azurerm v2.23.0 seemed to work! Additionally, I had previously modified my resource ID string during the import from .../providers/Microsoft.Network/frontdoors/... to .../providers/Microsoft.Network/frontDoors/... due to an error I got when attempting a previous import. Reverting this and importing the resource ID exactly as it is shown in the Azure console (ie. the one with the lowercase D in frontdoors) fixed this issue. Thanks for the tip!

MelHarbour commented 4 years ago

Did another bit of testing - realised that we were actually using 2.24, so I've also downgraded to 2.23 and it appears to be working again, so looks like the regression is in provider version 2.24.

MelHarbour commented 4 years ago

Presumably the issue was introduced in https://github.com/terraform-providers/terraform-provider-azurerm/pull/8146 as part of the rewriting around IDs.

tombuildsstuff commented 4 years ago

@MelHarbour as you mentioned above, unfortunately the FrontDoor API is broken here in that it's returning these in the incorrect case.

Azure API's are supposed to be case-insensitive for Requests but that URI's listed in Responses should be case-sensitive - the Resource Group can be case-insensitive, but unfortunately the way this has been implemented means that the entire URI can be insensitive at Request time.

Other downstream API's treat URI's in a mixed manner - where some API's require that URI's are case sensitive - and others aren't bothered by casing - as such we intentionally treat them as case-sensitive (since that's how the HTTP spec recommends treating URI's).

Unfortunately the FrontDoor API implements this in such a fashion that they're case-insensitive in the response too (specifically being lower-cased when updated through the Portal) - such that #8146 tries to make this consistent by updating/requiring the ID's to be in a consistent format (since users should be able to rely on these being the same). As such it appears there's some more edge-cases to handle here, as a follow up to #8146 - would anyone be able to provide an example of the ID formats being returned/in the state for these fields?

Thanks!

andrstor commented 4 years ago

@tombuildsstuff I dont know if this is what you are asking, but I have noticed this behaviour:

After creating the frontdoor resource with azurerm v2.24 it looks like this in https://resources.azure.com/.

After editing it manually in the portal (whatever change), it looks like this. Notice how many of the ID's have all become lowercase.

When running terraform plan again now, it fails. You can actually edit the resource in https://resources.azure.com/ manually (edit the template and use PUT), and terraform will start working again (if you managed to correctly adjust all the lowercase ID's.

tombuildsstuff commented 4 years ago

@andrstor perfect, thanks - where the entire ID is being lower-cased, that's a bug in the API (since the health probe names etc should be consistent) - so we'll need to raise an API bug here either way unfortunately

robselway commented 4 years ago

Hi @tombuildsstuff - just to clarify - is the only resolution here to wait for Azure to fix the issue? I'm trying to figure out whether it's worth replacing this resource with another script until it's resolved.

cpressland commented 4 years ago

We fought so hard with Azure Support during some previous Azure Front Door Terraform/API issues to get them to recognise the Azure API was a bit of a mess and provided multiple examples via Terraform, Azure Portal, and Azure CLI. Response was simply that this isn't an issue because the Azure Portal still works, I kinda get it, but I equally don't think it's this projects responsibility to have to constantly build work-arounds to a buggy API. I'll raise this issue with our Account Managers etc again and see if we can get any traction.

tombuildsstuff commented 4 years ago

@robselway

just to clarify - is the only resolution here to wait for Azure to fix the issue?

Based on what I can see, unfortunately yes.

The Azure API Specification states that values should be returned in the casing they're submitted (although the HTTP Specification states URI's should be case-sensitive but I digress) - so unfortunately this is an API bug which needs to be fixed, since this should be returned in the same casing we're submitting it in here.

For what it's worth we've also raised this on our end - unfortunately the Networking API's differ from every other Azure API here, so I don't think we can easily work around this (unless perhaps we can find the original casing from the specific sub-element, but that's assuming the API doesn't change to break the casing there too)


@JeffreyRichter this is a good example of the Networking API's returning URI's in a case-insensitive manner which differs from the recommendation in the ARM Spec (which unfortunately differs from the HTTP Specification, where the entire URI is case sensitive, this StackOverflow answer for more details).

Whilst it'd be possible to work around this bug if the "resource type" segment could be parsed case-insensitively, where the entire URI is lower-cased in some responses, but not in others (see this comment for example responses) - there's not much we can do here, since we can't guarantee these are the correct casing (or that the Networking API's won't change this casing in a future update).

Whilst this Github issue is the wrong place for this discussion, I feel like perhaps the current ambiguous behaviour defined in the ARM Specification ("return in the casing the user passed it in as") is the cause of this confusion - perhaps it'd be clearer if the ARM Specification stated that the entire URI must (to use the language of RFC 7230) be treated as case-sensitive in Responses - WDYT?

Thanks!

nmiodice commented 4 years ago

@tombuildsstuff , do you have a (GitHub ?) we can use to track this?

SunnyOswal commented 4 years ago

Facing same issue with azurerm Provider: 2.26.0 .

tombuildsstuff commented 4 years ago

@WodansSon would you be able to get an ETA for a fix from the service team here?

WodansSon commented 4 years ago

@WodansSon would you be able to get an ETA for a fix from the service team here?

@tombuildsstuff I will reach out to the Front Door service team and see how quickly we can get a fix in place for this issue, it will most likely include some cross team collaboration with the portal team to roll back their changes to get this issue totally fixed.

nmiodice commented 4 years ago

Is there a viable workaround on the TF side that we can use until it's fixed?

surlypants commented 4 years ago

Is there a viable workaround on the TF side that we can use until it's fixed?

I could not simply just downgrade. I had to import the portal-modified resource into a temporary workspace then push that state file to the appropriate place and then downgrade.

nmiodice commented 4 years ago

@surlypants can you elaborate on this?

push that state file to the appropriate place

surlypants commented 4 years ago

@surlypants can you elaborate on this?

push that state file to the appropriate place

I'll try...

The (remote) state file in the terragrunt workspace containing our FD could not plan (with 2.25.0) post portal touch. Downgrading to 2.23.0 still would not pass a plan. I thus imported the resource to a local state file, downloaded the remote state file and replaced its front door module's "instances" array with that from the local state file. I then state push-ed the result back up to the remote backend. Finally, 2.23.0 would plan. We now are back to the time-outs workaround from:

https://github.com/terraform-providers/terraform-provider-azurerm/issues/7925

Side note: I tried setting required_providers: azurerm = ">= 2.23.0, <=2.25.0" at the top level workspace and scoped 2.23.0 specifically to the frontdoor workspace; but it seems that when providing a range, the lowest always wins. So our entire infra is back to 2.23.0

hope this helps / is understandable

nmiodice commented 4 years ago

Definitely, thank you @surlypants !

phenggeler commented 4 years ago

Hello, is there an update on ETA for this issue? We are not able to manage Frontdoor via Terraform while this issue persists.

naikajah commented 4 years ago

we are also hitting this issue when migrating to 2.24. Could we please get an update as to when the bug will be fixed?

jmcshane commented 4 years ago

Is there anything that can be done once you face this error with a provider greater than 2.24? I'm getting:

Error: Resource instance managed by newer provider version

The current state of module.front_door.azurerm_frontdoor.frontdoor was created
by a newer provider version than is currently selected. Upgrade the azurerm
provider to work with this state.

Error: Resource instance managed by newer provider version

The current state of
module.front_door.azurerm_frontdoor_firewall_policy.policy was created by a
newer provider version than is currently selected. Upgrade the azurerm
provider to work with this state.

Can you delete the backend settings and reset the front door? Can you delete the front door and have it recreated?

GarethOates commented 4 years ago

I'm facing the same problem as @jmcshane where I cannot use a provider newer than 2.23.0. This is causing me problems when I want to use features released in later versions, like 2.27.0 for example. I want to know that it's safe to recreate the resource using a newer provider.

Lahiri commented 4 years ago

I'm also struggling with this problem, is a fix in the works?

scottzilla commented 4 years ago

@GarethOates @Lahiri you should be able to bring in a provider of a different version using provider aliases, but you'll need to explicitly reference them where appropriate.

See: https://www.terraform.io/docs/configuration/providers.html#alias-multiple-provider-configurations

GarethOates commented 4 years ago

@GarethOates @Lahiri you should be able to bring in a provider of a different version using provider aliases, but you'll need to explicitly reference them where appropriate.

See: https://www.terraform.io/docs/configuration/providers.html#alias-multiple-provider-configurations

Yeah I tried that but I think as someone mentioned before it just reverts to the oldest version. It didn't work for me anyway even though on init it downloaded both provider versions.

CoopCNIT commented 4 years ago

I tried to run this provider setup to come around it:

provider "azurerm" { 
  version         = "2.15.0"
  features {}
  subscription_id = var.subscription_id
  client_id       = var.client_id
  client_secret   = var.client_secret
  tenant_id       = var.tenant_id
}
provider "azurerm" { 
  alias           = "temp"
  version         = "2.30.0"
  features {}
  subscription_id = var.subscription_id
  client_id       = var.client_id
  client_secret   = var.client_secret
  tenant_id       = var.tenant_id
}

..but it fails with this error:

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
hashicorp/azurerm: no available releases match the given constraints 2.15.0,
2.30.0
cpressland commented 4 years ago

@scottzilla We already have a ton of subscriptions and use Terraform provider Aliases extensively. Unfortunately it doesn't let us mix and match Provider versions though, I get the same error as @CoopCNIT.

IanMoroney commented 4 years ago

@WodansSon , Is there any update from the Azure Front Door service team?

scottzilla commented 4 years ago

Sorry to mislead, I could've sworn I had done this in the past. You are correct, it does appear to resolve a single provider with a superset of the version spec, yuck.

cpressland commented 4 years ago

I decided to take some time to try and fix this tonight, after further investigation Case Sensitivity is the only issue in play here. A fix was put in for this in with #8046 but this was ultimately rejected. I started by upgrading to the latest release and removing my Front Door from the Terraform State and attempting to reimport.

Unfortunately this failed, but with something new:

➜  terraform import "module.uksouth-frontdoor.azurerm_frontdoor.frontdoor" "/subscriptions/<snipped>/resourceGroups/frontdoor/providers/Microsoft.Network/frontdoors/bink-frontdoor"

Error: Error parsing Resource ID "/subscriptions/<snipped>/resourceGroups/frontdoor/providers/Microsoft.Network/frontdoors/bink-frontdoor": ID was missing the `frontDoors` element

Changing to the case Terraform is expecting returns us back to a broken, but familiar state.

➜  terraform import "module.uksouth-frontdoor.azurerm_frontdoor.frontdoor" "/subscriptions/<snipped>/resourceGroups/frontdoor/providers/Microsoft.Network/frontdoors/bink-frontdoor"

Error: flattening `backend_pool`: ID was missing the `healthProbeSettings` element

If I look on https://resources.azure.com/ I can see that my Front Door has healthProbeSettings with entirely lower case IDs, example:

              "healthProbeSettings": {
                "id": "/subscriptions/<snipped>/resourcegroups/frontdoor/providers/microsoft.network/frontdoors/bink-frontdoor/healthprobesettings/healthz"
              },
              "loadBalancingSettings": {
                "id": "/subscriptions/<snipped>/resourcegroups/frontdoor/providers/microsoft.network/frontdoors/bink-frontdoor/loadbalancingsettings/standard"
              },

So, my hunch is that all I need to do is update the AzureRM API to be case Terraform is expecting to workaround this issue, same as I did for the import command.

So, I switched to Read/Write mode on the Resources site, went into edit mode and manually started updating all IDs to be case sensitive, but only on the specific keys Terraform was complaining about, example:

"id": "/subscriptions/<snipped>/resourcegroups/frontdoor/providers/microsoft.network/frontdoors/bink-frontdoor/healthprobesettings/healthz"

became

"id": "/subscriptions/<snipped>/resourcegroups/frontdoor/providers/microsoft.network/frontdoors/bink-frontdoor/healthProbeSettings/healthz"

Upon re-running my import, Terraform now moans about a totally different object! HURAHH! Progress!

Error: flattening `backend_pool`: ID was missing the `loadBalancingSettings` element

So, I ended up fixing case on the following IDs:

Running a Terraform Import between each modification, slowly getting further to success.

Now that front door was fixed, I tried reimporting our WAF policy and was hit with a familiar message:

Error: Error parsing Resource ID "/subscriptions/<snipped>/resourcegroups/frontdoor/providers/Microsoft.Network/frontdoorwebapplicationfirewallpolicies/policy": ID was missing the `frontDoorWebApplicationFirewallPolicies` element

This was simply a case of fixing my import command to be case-sensitive as with the original import example.

After these imports I was able to run a terraform plan, it wanted to apply changes but these were all case related changes so I let it proceed. It did panic the provider but after a few runs of terraform apply it got itself unstuck. I was getting messages like:

Error: Provider produced inconsistent final plan

When expanding the plan for
module.uksouth-frontdoor.azurerm_frontdoor.frontdoor to include new values
learned so far during apply, provider
"registry.terraform.io/hashicorp/azurerm" produced an invalid new value for
.frontend_endpoint[6].web_application_firewall_policy_link_id: was
cty.StringVal("/subscriptions/<snipped>/resourcegroups/frontdoor/providers/Microsoft.Network/frontDoorWebApplicationFirewallPolicies/policy"),
but now
cty.StringVal("/subscriptions/<snipped>/resourceGroups/frontdoor/providers/Microsoft.Network/frontDoorWebApplicationFirewallPolicies/policy").

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

But I guess this makes sense if it's changing objects in this weird example of case-inconsistency.

Anyway, issue fixed for me. I really hope the above helps others and I really hope in the mean time @tombuildsstuff considers merging #8046.

GarethOates commented 4 years ago

So the fix in #8046 was rejected in favour of #8146 which was included in provider version 2.24.0, but that didn't actually fix the problem?

tombuildsstuff commented 4 years ago

@cpressland indeed - unfortunately this is a bug on the Azure side which needs fixing.

The HTTP Specification defines that URI's in Requests and Responses must be case sensitive (although servers may opt to parse Request URI's case insensitively, as IIS has done historically). Whilst some Azure API's do support case-insensitively - others require that URI's are sent case sensitively and behave incorrectly (e.g. accepting the data, but failing in subtle ways) if these are in the incorrect casing - ultimately this means that we need these to be case-sensitive in all cases.

In the case of this bug, the issue is the Portal/FrontDoor API's are returning these case insensitively (the root-cause being a misinterpretation of ambiguous wording in the Azure API Specifications) - as such this issue needs a fix on the Microsoft side (in the FrontDoor API) - and as such we have no plans to merge #8046, as previous attempts at doing so caused subtle issues in other API's.

@WodansSon could you reach out to the Portal/Service Teams for an update here, since this has been quite a while since we've had an update from them?

Thanks!

GarethOates commented 4 years ago

and as such we have no plans to merge #8046, as previous attempts at doing so caused subtle issues in other API's.

I understand your desire to have the problem fixed at the root, but so much time has passed since this issue was raised and there are clearly a lot of users who want to be able to manage FrontDoor through terraform. Can this fix not be merged in as a temporary work around until a more permanent solution is offered from the Microsoft team? As far as I can see, it's a localized change to a front door specific file. There are so many features since provider 2.23.0 came out that many users will most likely be wanting to take advantage of, but cannot just now due to this bug.

cpressland commented 4 years ago

@tombuildsstuff - fair enough, from a technical perspective I completely agree that this is Microsofts burden to fix, I just don't see it actually happening in a sane timeframe. Else, could we mark my previous post as on-topic again? It does include potential workarounds to this issue for users currently blocked by this. I'm happy to edit it to make it more on-topic or even put it somewhere else if you have a good suggestion?

tombuildsstuff commented 4 years ago

@GarethOates @cpressland we've tried that historically and it leads to further cases of this, which ultimately ends up breaking other usages in subtle ways.

It's worth calling out that the Azure Networking team get a lot of exceptions, covering:

Whilst I appreciate it's frustrating to be blocked here - we can't keep layering workarounds on top of workarounds, doing so is exacerbating the problem and leading to the issues we see today (and a bunch of more subtle, and harder to diagnose, issues like the JSON ordering issue mentioned above).

We've reached out to the Portal Team who've committed to fixing this bug on their side (which we've chased them on) - after which we can then work with the Networking Team to fix the root-cause of these bugs.

As Microsoft have committed to fixing these bugs (and for the reasons outline above) - unfortunately we have no plans to introduce a hack for this API bug. In the interim, since this bug is only triggered when editing the resource in the Portal - I believe it should be possible to workaround this using RBAC (or a Management Lock).

From our side, whilst we appreciate this isn't ideal (and is frustrating to be blocked) - we're working with Microsoft to fix this and will post an update as soon as we have one.

Thanks!

surlypants commented 4 years ago

once you go through the workaround I described previously, you can upgrade providers.

bradcavanagh-unity3d commented 4 years ago

I've just had to upgrade to 2.33.0 (from 2.2!) because of this issue, and now we're seeing this bug block our deployments. I know that you ultimately have to wait for Microsoft to fix the underlying problem, but could you at least in the meantime mark comments that have workarounds as "on topic" so they're easy to find?

GarethOates commented 3 years ago

@WodansSon would you be able to get an ETA for a fix from the service team here?

@tombuildsstuff I will reach out to the Front Door service team and see how quickly we can get a fix in place for this issue, it will most likely include some cross team collaboration with the portal team to roll back their changes to get this issue totally fixed.

Did you ever hear back from the Microsoft Portal team about this issue? What's the ETA on a fix from their end?

Rorschachly commented 3 years ago

I am also wondering if this fix could be applied sooner. Just as @GarethOates thought. Thanks.

ajklotz commented 3 years ago

I'm blocked by this issue and I cannot downgrade as some others have mentioned because I require azurerm resources that are not available in 2.23. As of now, managing the Azure Front Door is impossible and I'm surprised that azurerm_frontdoor is in this state

tombuildsstuff commented 3 years ago

@WodansSon did you get an update from the Service/Portal Team here?

WodansSon commented 3 years ago

@tombuildsstuff

@WodansSon did you get an update from the Service/Portal Team here?

The last I heard from the service team, Nov. 4th 2020, is that the deployment of the fix is ongoing and completed for several stages but there was an Azure locked down blocking the complete roll out of the fix to all regions. They stated that they would continue to roll out the change to the rest of the regions after the lock down was lifted. I will continue to put pressure on the portal team in an attempt to get this pushed out ASAP.

WodansSon commented 3 years ago

@andrstor @tombuildsstuff The fix has be deployed to all regions, so I am going to go ahead and close this issue. Sorry it took so long for the turnaround. πŸš€