hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.52k stars 4.6k forks source link

Application Gateway request_routing_rule order change even with Azurerm 3.0.2 #16136

Open aport1996 opened 2 years ago

aport1996 commented 2 years ago

Community Note

Terraform (and AzureRM Provider) Version

Terraform v1.1.7 on windows_amd64

Affected Resource(s)

Terraform Configuration Files

  dynamic request_routing_rule {
    for_each = var.application_gateway_request_routing_rule

    content  {
    http_listener_name = request_routing_rule.value["http_listener_name"]
    name = request_routing_rule.value["name"]
    redirect_configuration_name = request_routing_rule.value["redirect_configuration_name"]
    rule_type  = request_routing_rule.value["rule_type"]
    backend_address_pool_name = request_routing_rule.value["backend_address_pool_name"]
    backend_http_settings_name = request_routing_rule.value["backend_http_settings_name"]
    url_path_map_name = request_routing_rule.value["url_path_map_name"]
    }
  }

    {
      http_listener_name          = "HTTP-DEV-XXX-LISTENER"
      name                        = "XXX-DEV-HTTPS-REDIRECT-RULE"
      redirect_configuration_name = "XXX-DEV-HTTPS-REDIRECT"
      rule_type                   = "Basic"
      backend_address_pool_name   = null
      backend_http_settings_name  = null
      url_path_map_name           = null
    },

Debug Output

  - request_routing_rule {
      - http_listener_id            = "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Network/applicationGateways/xxx/httpListeners/HTTP-DEV-XXX-LISTENER" -> null
      - http_listener_name          = "HTTP-DEV-XXX-LISTENER" -> null
      - id                          = "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Network/applicationGateways/xxx/requestRoutingRules/XXX-DEV-HTTPS-REDIRECT-RULE" -> null   
      - name                        = "XXX-DEV-HTTPS-REDIRECT-RULE" -> null
      - priority                    = 0 -> null
      - redirect_configuration_id   = "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Network/applicationGateways/xxx/redirectConfigurations/XXX-DEV-HTTPS-REDIRECT" -> null     
      - redirect_configuration_name = "XXX-DEV-HTTPS-REDIRECT" -> null
      - rule_type                   = "Basic" -> null
    }

  + request_routing_rule {
      + http_listener_id            = "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Network/applicationGateways/xxx/httpListeners/HTTP-DEV-XXX-LISTENER"
      + http_listener_name          = "HTTP-DEV-XXX-LISTENER"
      + id                          = "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Network/applicationGateways/xxx/requestRoutingRules/XXX-DEV-HTTPS-REDIRECT-RULE"
      + name                        = "XXX-DEV-HTTPS-REDIRECT-RULE"
      + redirect_configuration_id   = "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Network/applicationGateways/xxx/redirectConfigurations/XXX-DEV-HTTPS-REDIRECT"
      + redirect_configuration_name = "XXX-DEV-HTTPS-REDIRECT"
      + rule_type                   = "Basic"
    }

Panic Output

Expected Behaviour

No change detected, I also tried to run terraform apply, and it does somehow "modify" the application gateway, but if I run terraform plan again I still have the same issue.

Actual Behaviour

Terraform tries to change the order of request_routing_rules only (and all of them, I only provided you one sample output since we have many of them on this app gateway). It keeps happening even after a terraform apply.

Steps to Reproduce

  1. Configure request_routing_rule using dynamic blocks as per the above code example in an application gateway
  2. terraform plan - you will see the attempted change
  3. terraform apply - terraform will apply the change even though there is no difference
  4. terraform plan - terraform still tries to do the same changes

Important Factoids

Some time ago when this issue was not known yet, I remember that I tried to create a new application gateway from scratch and the issue was not there, but after some months between various changes, the issue appeared again randomly and never went away. I don't know what is causing the issue but we have it on 3 different application gateways and can't get rid of it.

References

aport1996 commented 2 years ago

Just wanted to add that on another application gateway, the issue is on backend_http_settings, so I assume this is just randomly happening on all the blocks. Not sure why one or the other is specifically affected though in each separate app gateway. But for this specific case, there was actually a difference in the http settings and once I fixed that (in the code only by aligning to what was in the portal) and ran the plan again no infrastructure changes were detected.

I still believe that according to the new version of azurerm I should've seen only the change in one of the backend_http_settings and not the addition and removal of all of them.

owaisaamir commented 2 years ago

I am observing this issue still in probe and backend_http_settings blocks with the 3.0.2 provider. Also, fix is required to skip ordering change for path_rule block under url_path_map.

mbfrahry commented 2 years ago

Hey @owaisaamir, do you mind posting the application gateway configuration you're using that is causing a plan diff?

owaisaamir commented 2 years ago

Hi @mbfrahry, I am using using dynamic with sets for probe, backend_http_settings and path_rule blocks. Below is the sample configuration used.

locals {
  az_app_gw_health_probes = ["a", "b", "c"]
  backend_address_pool = {
    "a" = ["X", "Y"]
    "b" = ["W", "V"]
  }
  backend_address_pool_to_use = "a"
}

resource "azurerm_application_gateway" "app_gw" {
  name                = "test"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location

  zones = [1, 2, 3]

  sku {
    name = "Standard_v2"
    tier = "Standard_v2"
  }

  autoscale_configuration {
    min_capacity = 1
    max_capacity = 5
  }

  gateway_ip_configuration {
    name      = "appGatewayIpConfig"
    subnet_id = azurerm_subnet.app_gw_subnet.id
  }

  frontend_port {
    name = "https"
    port = 443
  }

  frontend_ip_configuration {
    name                 = "appGwPublicFrontendIp"
    public_ip_address_id = azurerm_public_ip.app_gw_ip.id
  }

  dynamic "backend_address_pool" {
    for_each = local.backend_address_pool

    content {
      name  = backend_address_pool.key
      fqdns = backend_address_pool.value
    }
  }

  dynamic "probe" {
    for_each = toset(local.az_app_gw_health_probes)

    content {
      name     = probe.key
      port     = 443
      protocol = "Https"

      path = format("/%s", probe.key)

      match {
        body        = ""
        status_code = ["200"]
      }

      interval            = 60
      timeout             = 5
      unhealthy_threshold = 1

      pick_host_name_from_backend_http_settings = true
    }
  }

  dynamic "backend_http_settings" {
    for_each = toset(local.az_app_gw_health_probes)

    content {
      name       = backend_http_settings.key
      port       = 443
      protocol   = "Https"
      probe_name = backend_http_settings.key

      pick_host_name_from_backend_address = true

      request_timeout = 20

      connection_draining {
        enabled           = true
        drain_timeout_sec = 120
      }

      cookie_based_affinity = "Disabled"
    }
  }

  ssl_policy {
      policy_type = "Predefined"
      policy_name = "AppGwSslPolicy20170401S"
    }

  ssl_profile {
    name = "TLS_1_2"

    ssl_policy {
      policy_type = "Predefined"
      policy_name = "AppGwSslPolicy20170401S"
    }
  }

  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.app_gw_user.id]
  }

  ssl_certificate {
    name                = "cert"
    key_vault_secret_id = azurerm_key_vault_certificate.cert.versionless_secret_id
  }

  http_listener {
    name     = "main"
    protocol = "Https"

    frontend_ip_configuration_name = "appGwPublicFrontendIp"
    frontend_port_name             = "https"

    ssl_certificate_name = "cert"

    ssl_profile_name = "TLS_1_2"
  }

  url_path_map {
    name = "mypath"

    default_backend_address_pool_name  = local.backend_address_pool_to_use
    default_backend_http_settings_name = local.az_app_gw_health_probes[0]

    dynamic "path_rule" {
      for_each = toset(local.az_app_gw_health_probes)

      content {
        name = path_rule.key
        paths = [
          format("/%s", path_rule.key)
        ]

        backend_address_pool_name  = local.backend_address_pool_to_use
        backend_http_settings_name = path_rule.key
      }
    }
  }

  request_routing_rule {
    name               = "mypath"
    rule_type          = "PathBasedRouting"
    http_listener_name = "main"
    url_path_map_name  = "mypath"
  }
}

Adding an element to local.az_app_gw_health_probes causes the ordering plan diff.

mbfrahry commented 2 years ago

Thanks for that info @owaisaamir! Just to confirm, this is a different problem than the original issue of seeing changes without any changes being made to the config? What you're describing is seeing changes across all the backend_http_settings when you're just looking to add one new one?

If that's the case, then that's just an unfortunate consequence of moving many of the attributes in application gateway from a List to a Set. We've traded the ordering issues that were causing many problems for a much noisier plan when adding new blocks.

mbfrahry commented 2 years ago

Hey @Nyxbiker, what was your configuration for application gateway and what did you have to do to your config to get it to line up with the portal? My first thought is that we're not generating the Hash for backend_http_settings correctly so it'd be useful to see which attributes you had to modify to prevent a plan from occurring

owaisaamir commented 2 years ago

Thanks for that info @owaisaamir! Just to confirm, this is a different problem than the original issue of seeing changes without any changes being made to the config? What you're describing is seeing changes across all the backend_http_settings when you're just looking to add one new one?

If that's the case, then that's just an unfortunate consequence of moving many of the attributes in application gateway from a List to a Sets. We've traded the ordering changes that were causing issues for many for a much noisier plan when adding new blocks.

This is painful when I have a lot (currently 20-25) of backend_http_settings and probes that get disturbed. It will add uncertainty on the changes to be done and can create a panic while planning updates without connection failures.

mbfrahry commented 2 years ago

I hate to hear that those changes are causing uncertainty for your configuration but the alternative is for permanent diffs to occur if the ordering is different in the config versus what Azure is returning from the API which was happening often.

This is an issue being tracked by the Terraform proper team but there haven't been any proposed solutions. https://github.com/hashicorp/terraform/issues/28281

Because this is a separate issue from the original, I'm going to mark this conversation as off topic but I encourage you to make an issue on this repo, or better yet, on the Terraform repo so the Core team can try and come up with a solution for you.

johannespetereit commented 2 years ago
I can confirm that the rules are indeed not fixed. I added one of each of the following blocks (in the middle of about 10 configurations, as order should not matter anymore according to #6896), using azurerm 3.0.2 and Terraform 1.1.7. Here are the results: Type result
backend_address_pool โœ” works, only one change detected
backend_http_settings โŒ does NOT work, all settings deleted and readded
http_listener โŒ does NOT work, all settings deleted and readded
probe โœ” works, only one change detected
redirect_configuration โŒ does NOT work, all settings deleted and readded
request_routing_rule โŒ does NOT work, all settings deleted and readded
url_path_map โŒ does NOT work, all settings deleted and readded

So summa summarum only backend address pools and probes seem to now produce a clean output, everything else still acts like everything is deleted and readded in different order.

If this information helps: We are using dynamic blocks (which seems to be the common use case when applying proxy rules via configuration).

If the issue cannot be tackled using sets - wouldn't the API allow us to create separate resources for routing configuration? In most scenarios rules for one central gateway are added from different, distributed projects anyway (kind of like API management, which usually is managed decentrally too).

We were really relying on the issue being resolved with using sets, as I have multiple customers who are dreading the planned changes of multiple thousands of lines (not exaggerated) when a simple rule in their central AGWs is being added in production.

mbfrahry commented 2 years ago

Hi @johannespetereit, unfortunately, what you're seeing is a separate issue than what is being reported on here where no changes being made to the configuration are causing diffs in Terraform. With that in mind, I'm going to mark this conversation as off-topic.

This issue you're seeing is being tracked on the Terraform proper repo and I encourage you to make noise there https://github.com/hashicorp/terraform/issues/28281 or open a separate issue to track what's going on.

johannespetereit commented 2 years ago

@mbfrahry thanks for your reply. I think I'm grasping the issue, but I also think that many, many customers were waiting for an adertised fix with 3.0. I also realize that another attempt will probably not happen until the next major update of this provider, that is far, far away, which is a bit frustrating. I will ask for the original issue to be opened again. In my view, this will have no way of moving onwards - azurerm is a down stream api to terraform. Azure API is a upstream api to azurerm. I totally understand terraform with this being categorized as a minor optimization for the terraform-team, the azurerm provider is in charge of handling the core logic (getting the current state and providing the planned state, terraform only supplies a diff). In my experience it is not helpful to hope that an upstream api will change on accord of a single downstream provider having difficulties getting their api to comply to the contract, and I don't think this paradigm will shift because of "community preasure" of a single provider. We will therefore start looking into alternatives which are still in the "ARM-days" history of our repos.

aport1996 commented 2 years ago

Hey @Nyxbiker, what was your configuration for application gateway and what did you have to do to your config to get it to line up with the portal? My first thought is that we're not generating the Hash for backend_http_settings correctly so it'd be useful to see which attributes you had to modify to prevent a plan from occurring

Hi @mbfrahry, the config for backend_http_settings is also a dynamic block as below:

`dynamic backend_http_settings { for_each = var.application_gateway_backend_http_settings

content  {
  name  = backend_http_settings.value["name"]
  host_name  = backend_http_settings.value["host_name"]
  cookie_based_affinity = backend_http_settings.value["cookie_based_affinity"]
  affinity_cookie_name = backend_http_settings.value["affinity_cookie_name"]
  pick_host_name_from_backend_address = backend_http_settings.value["pick_host_name_from_backend_address"]
  port                  = backend_http_settings.value["port"]
  protocol              = backend_http_settings.value["protocol"]
  probe_name            = backend_http_settings.value["probe_name"]
  path                  = backend_http_settings.value["path"]
  trusted_root_certificate_names = backend_http_settings.value["trusted_root_certificate_names"]
  request_timeout       = backend_http_settings.value["request_timeout"]
}

}`

And I basically just noticed that in the Azure Portal we had some settings with cookie affinity enabled, so I proceeded to align these properties in the code by changing "cookie_based_affinity" to "Enabled" and "affinity_cookie_name" to the cookie name that was set in the Portal.

I think that this specific issue is related to what you were discussing above with johannespetereit and owaisaamir though, and I agree with them that this is a huge issue because especially in big configurations (we have 51 request routing rules in one app gateway only) it becomes super difficult to figure out what has changed, that would be causing the huge Terraform output for one small property difference.

Coming back to the original issue, I noticed in the output that Terraform adds a property "- priority = 0 -> null" in the request_routing_rule that should be "removed". So I thought that I should maybe add "property = 0" to the request routing rules since Terraform might see it as a difference (although it's marked as an optional property in the docs) and cause the huge output, but I then get Error: expected request_routing_rule.49.priority to be in the range (1 - 20000), got 0 as an error, so I couldn't test it.

I'm not sure if this is related to the issue that I'm currently experiencing though because I don't have this issue in 1 out of 3 application gateways (that are all using the same parent module with dynamic blocks).

Please note that the initial example is just for one sample routing_rule, but I have this removal and addition issue for all of the request_routing_rules in the affected application gateways. I checked if there were any differences between the code and the portal and I couldn't find any. Also, even after running terraform apply (that should just align everything that isn't) I still have the issue after running terraform plan again, which makes me think that this "priority" property that I see in the output might be causing the issue (but we don't have it set either in the code or the portal). I also tried to ignore the "priority" property in request_routing_rules to see if it would fix the issue, but I can't since lifecycle ignore does not support splat expressions etc.

Huntermsi commented 2 years ago

Hi there. I observe the same issue. There is no way to add a new routing rule, listener, etc. without maintenance and downtime it is very complicated. I tried to use azurerm 2.98.0 and 3.2 versions both have this issue.

mahmoudghorbelMG commented 2 years ago

Hi all I experience the same issue when updating an app gateway with a new configuration (with Terraform v1.1.9 and azurerm v3.8.0). It would be better to enable new configuration throw a child resource to stick to the parent resource (the gateway) like the way of adding a new certificate or secret to an existing key vault resource. That allows making a separation between terraform projects: the one that maintains the creation of core (shared) resources (like key vault and app gateway), and the others that maintains the creations of web app/VM etc in the backend. that also prevents us using dynamic blocs.

eissko commented 2 years ago

@johannespetereit @Nyxbiker @mbfrahry

Do I understand it correctly that we have just given up on this? What we can do more to move this forward? @johannespetereit - btw, thanks for your time invested in analyzing the problem.

nomoresecrets commented 2 years ago

Having the same issue after migration from azurerm 2.x to 3.x

We have dynamic http_listener, request_routing_rule, backend_http_settings block and everytime the application gateway get's a new/random order of these items :/

adamrushuk commented 2 years ago

Please could somebody explain the current state of this issue, as our terraform plans can be 1000's lines when we use AppGW. Should we look for alternatives for the time being if currently unfixable, or is this still being assessed?

mahmoudghorbelMG commented 2 years ago

Personally, I canโ€™t wait for โ€œcoming soonโ€ version. I am under a drastic alternative: I am developing my specific go provider that will allow me to add new settings (listener, backend, rules, cert, etc.) as a separate resource to an existing app gateway (data) defined in a core project with azureRM provider. Actually, it works with http backend. I will add the other settings when I find some free time. I also faced the 429 error (retry later) when calling the azure API several times in parallel.

Update: This is the current version of the provider i have implemented: https://registry.terraform.io/providers/Citeo/azurermagw/0.3.0 I have the same repo on my git, but the last version is in the github of Citeo (were i work actually)

johannespetereit commented 2 years ago

Not really comfortable posting this as a workaround, but maybe it helps someone:

We automated our rules/backends/listeners etc. into variables. Furthermore we use heavy transformation on the properties using locals. (the modules' input variables are high level variables, leaning on the k8s ingress definitions. From these inputs we generate the actual AGW properties in the format AGW consumes them)

With that architecture, not being able to check the end result in the plan is one of our main concerns with this bug.

To get a readable version of the plan, we also push all these changes into a storage table (each property one table row). The Table rows provide a meaningful diff in the plan (this is only a text diff and doesn't show 1000s of changes, but the actual ons, as long as you get the order fixed):

resource "azurerm_storage_table_entity" "agw_settings" {
  for_each = { for key, config in {
    http_listeners          = local.http_listeners
    backend_address_pools   = local.backend_address_pools
    redirect_configurations = local.redirect_configurations
    probes                  = local.probes
    backend_http_settings   = local.backend_http_settings
    request_routing_rules   = local.request_routing_rules
    url_path_maps           = local.url_path_maps
  } : key => config }
  storage_account_name = data.azurerm_storage_account.config_storage.name
  table_name           = azurerm_storage_table.rules_table.name
  partition_key        = "main"
  row_key              = each.key
  entity = {
    value = jsonencode(each.value)
  }
}

This can at least give you an idea what is changing before you approve your plan.


We actually write the properties into storage for a second reason: we use a null_resource PS Script which performs all the configuration on AGW using the storage table as source (so we moved away from terraform for the routing configuration). Sadly this script is owned by the customer, so I can't post it here. Using a custom script has the advantage that we can have ignore_property on all rules in AGW, so we know that nothing gets messed up there. But I can't really say I recommend building a custom script, as a clean update (=only update real changes) comes with a massive code overhead, and is error prone.

dsiperek-vendavo commented 2 years ago

Any updates on this issue?

mahmoudghorbelMG commented 2 years ago

I implemented a provider to overcome such behaviors. https://registry.terraform.io/providers/Citeo/azurermagw/0.3.0 I am not golang dev expert, but i have done my best as a devops :). Currently, I use it in dev environnent and it works fine. if you can test it and make feedbacks, Iโ€™ll be grateful.

torivara commented 1 year ago

Is anyone working on resolving this issue? I am contemplating using PowerShell for plan output parsing here, just to get some reasonable plan results.. Have anyone successfully parsed the terraform plan output to calculate the actual changes? Might save some time if someone could share a script I can start with :-)

This is a huge issue and I am surprised it doesn't get more attention. Either this is not a priority to fix, or it is nearly impossible to fix.

For now I will try to parse the plan output, and maybe find some workarounds myself.

eissko commented 1 year ago

@torivara I went through same phase - got suprised about it. Tried to be active and understand if this is real state or I just missed the point. And the outcome was - it is real state. Still.

People are basically doing workarounds off the terraform. This is too critical resource to relay on such messy plans in production.

Peter

rolandjohann commented 1 year ago

@mbfrahry how can we approach a fix for the time being? Currently this resource is unusable because it deletes all request_routing_rule blocks to create them afterwards in a non atomic operation leading to a downtime of the AGW connected services.

anrub commented 1 year ago

@mbfrahry how can we approach a fix for the time being? Currently this resource is unusable because it deletes all request_routing_rule blocks to create them afterwards in a non atomic operation leading to a downtime of the AGW connected services.

sure, that leads to downtime? I observe a lot of changes in the plan, but the Azure API seems to handle that through a "update" and not a "drop and create". Did you observe different behaviour?

Looking at the REST API, this is the only way it could work. There is only one create or update method, so you can't delete request routing rules in isolation. azurerm just PUTs the updated complete configuration of the instance at the API and azure acts accordingly.

premchavhan99 commented 1 year ago

Hi all, facing issues with application gateway, after importing the application gateway, request_routing_rule, http_listeners and backend_http_setting are getting recreated, is it related to this issue or should I create another ticket for that? terraform version: 1.1.1 azurerm provider version: v3.35.0

jmigone commented 1 year ago

Not sure if it helps anyone, but I had this issue and I discovered that after a recent manual certificate update, tf suddenly started caring if a priority was set wrt request routing rules. As soon as I set the priority, my request routing rules were sorted out.

Terraform v0.12.31

MAmmerlaan commented 1 year ago

Had comparable issues. After applying, a new plan indicated recreation of the request_routing_rule, and had no idea why. Updated to latest terraform and azurerm provider. To no avail (but is always a good start :) )

After hours of debugging, i looked at the statefile, and found that the 'url_path_map_name' is empty, while on every apply i set it (and came back as succesfull change).

But when you look at the state file, no name was set. Cleared the name -> (url_path_map_name = ""), and ran a plan / apply. Only for a single change to be apply, no more re-applying that block.

Hope this is helpfull to anyone.

Kapsztajn commented 1 year ago

Had the same issue with a recreation of request_routing_rule in the application gateway. We upgraded the provider to the newest version and added priority to every request_routing_rule, after that application gateway is not changing.

sknaresh2000 commented 1 year ago

I have the same happening in http_listener. Has anyone faced this?

anrub commented 1 year ago

I have the same happening in http_listener. Has anyone faced this?

Yes, I am not sure, when exactly it happens, as sometimes there are "no changes" and sometimes all listeners are removed/added at one update, although there is no real change to any of them.

LoicC04 commented 1 year ago

I have the same issue on 3.42.0. Applying the ressource for the first time creates all requested resources. Then, a simple plan show it will destroy the resources and create them again despite there are no changes.

nicktolhurst commented 1 year ago

This renders the plan pretty useless on a very heavily configured application gateway. Without a plan, deployments in to production are super risky. Is there any plan to fix this? ๐Ÿ˜ฎโ€๐Ÿ’จ

mmohoney commented 1 year ago

On 3.19.1 running into same issue. Our gateway is heavily configured and any changes we make causes the plan to show a create and destroy for each of the blocks. This makes the changes difficult, and a lot of manual reviewing is needed. Would really like to see a fix for this especially on such an important piece of infrastructure.

velmafia commented 1 year ago

The same problem was reported here: https://github.com/hashicorp/terraform-provider-azurerm/issues/6896 where @neil-yechenwei proposed Pull request: https://github.com/hashicorp/terraform-provider-azurerm/pull/7021. PR was "temporary closed" (waiting for version 3.0), but finally has newer been merged.

Root case of this issue is the same as in #6896 and solution proposed in #7021 can fix it? or this is different problem?

eissko commented 1 year ago

@mbfrahry please where we can watch at progress and perhaps help with application gateway refactoring you mentioned here - https://github.com/hashicorp/terraform-provider-azurerm/pull/19963#issuecomment-1421208093

Thank you, Peter

cveld commented 1 year ago

Is there any way to workaround this behavior? I have two request_routing_rule blocks. In the state the priority is sorted ["20", "10"] but during plan phase the plan reports ["10", "20"]. This causes a change in any plan run. What would be the property that I can use as a workaround? Asssuming there is some magic sorting implemented in the azurerm provider.

eissko commented 1 year ago

Is there any way to workaround this behavior? I have two request_routing_rule blocks. In the state the priority is sorted ["20", "10"] but during plan phase the plan reports ["10", "20"]. This causes a change in any plan run. What would be the property that I can use as a workaround? Asssuming there is some magic sorting implemented in the azurerm provider.

There is no straightforward workaround. You can try as mentioned:

odeeka commented 1 year ago

I implemented a provider to overcome such behaviors. https://registry.terraform.io/providers/Citeo/azurermagw/0.3.0 I am not golang dev expert, but i have done my best as a devops :). Currently, I use it in dev environnent and it works fine. if you can test it and make feedbacks, Iโ€™ll be grateful.

I try to use but got API error for 'location' attribute that isn't existing in provider.

azurermagw_binding_service.binding-service-resource: Creating... โ•ท โ”‚ Error: Unable to create the resource. ######## API response = 400 โ”‚ { โ”‚ "error": { โ”‚ "code": "LocationRequired", โ”‚ "message": "The location property is required for this definition." โ”‚ } โ”‚ } โ”‚ โ”‚ with azurermagw_binding_service.binding-service-resource, โ”‚ on main.tf line 40, in resource "azurermagw_binding_service" "binding-service-resource": โ”‚ 40: resource "azurermagw_binding_service" "binding-service-resource" { โ”‚ โ”‚ Check the API response

rbev commented 11 months ago

Is there any sort of timeline on this being fixed? this is a pretty frustrating bug

rbev commented 11 months ago

Looking into the plan output as json i do see things like this:

{
  "before": {
    "request_routing_rule": [
      {
        "backend_address_pool_id": "",
        "backend_address_pool_name": "",
        "backend_http_settings_id": "",
        "backend_http_settings_name": "",
        "http_listener_id": "/subscriptions/REDACTED/resourceGroups/REDACTED/providers/Microsoft.Network/applicationGateways/REDACTED/httpListeners/global-default-https-redirect",
        "http_listener_name": "global-default-https-redirect",
        "id": "/subscriptions/REDACTED/resourceGroups/REDACTED/providers/Microsoft.Network/applicationGateways/REDACTED/requestRoutingRules/global-default-https-redirect",
        "name": "global-default-https-redirect",
        "priority": 22,
        "redirect_configuration_id": "/subscriptions/REDACTED/resourceGroups/REDACTED/providers/Microsoft.Network/applicationGateways/REDACTED/redirectConfigurations/global-default-https-redirect",
        "redirect_configuration_name": "global-default-https-redirect",
        "rewrite_rule_set_id": "",
        "rewrite_rule_set_name": "",
        "rule_type": "Basic",
        "url_path_map_id": "",
        "url_path_map_name": ""
      }
    ]
  },
  "after": {
    "request_routing_rule": [
      {
        "backend_address_pool_id": "",
        "backend_address_pool_name": null,
        "backend_http_settings_id": "",
        "backend_http_settings_name": null,
        "http_listener_id": "/subscriptions/REDACTED/resourceGroups/REDACTED/providers/Microsoft.Network/applicationGateways/REDACTED/httpListeners/global-default-https-redirect",
        "http_listener_name": "global-default-https-redirect",
        "id": "/subscriptions/REDACTED/resourceGroups/REDACTED/providers/Microsoft.Network/applicationGateways/REDACTED/requestRoutingRules/global-default-https-redirect",
        "name": "global-default-https-redirect",
        "priority": 22,
        "redirect_configuration_id": "/subscriptions/REDACTED/resourceGroups/REDACTED/providers/Microsoft.Network/applicationGateways/REDACTED/redirectConfigurations/global-default-https-redirect",
        "redirect_configuration_name": "global-default-https-redirect",
        "rewrite_rule_set_id": "",
        "rewrite_rule_set_name": null,
        "rule_type": "Basic",
        "url_path_map_id": "",
        "url_path_map_name": null
      }
    ]
  }
}

is it just that the provider is using null for the omitted variables and azure is sending back empty string? it can't be ordering because this one is the first item in both lists, and always shows as a delete/creatwe

samrobillard commented 9 months ago

I'm having the same issue but with http_listeners where it always recreates the listeners because the host_name changes to null for some reason.

Nopesound commented 7 months ago

I don't know if I can help you, I had the same problem and solved it this way. In my case, I opened the state file and replicated the exact order of both the configurations and the properties contained therein. I eliminated all the changes I had made in the configurations (of 4 rules only one had changed, this is a part important) I made the only change that was made by the portal. It was not present either in the status or in the terraform configuration. In this way I saw that the plan no longer highlighted the deletion of the rules, at this point, I reintroduced all the changes to the rule and relaunched the plan and Terraform gave me the same scenario, all the routing rules had to be deleted and then recreated. Talking about it together with a colleague, with whom we spent the afternoon banging our heads about this thing, we concluded that since the routing rules are an array, modifying their order or even the properties contained in a single one from this behaviour, the complete elimination of all elements and their recreation. At this point, a question arises: Is there a latency between writing a routing rule and its actual implementation?

IopenDoor commented 5 months ago

I was able to fix it with azurerm 3.92

andyr8939 commented 3 months ago

I had this issue with the latest provider 3.100.4 and after way too long troubleshooting I found it was incorrect backend settings on a pathbased route rule.

Basically what was happening was the route rule had recently been changed from basic to a path based, so a url_path_map was added. But the backend settings had been left in the request_routing_rule section, but they are not used there when you do PathBasedRouting. Instead, they move into the url_path_map and from part of the default.

As an example, I just commented out those 2 commented lines and that solved my problem.

  request_routing_rule {
    name                       = "routerule-webapp-443"
    rule_type                  = "PathBasedRouting"
    http_listener_name         = "listener-webapp-443"
    # backend_address_pool_name  = "bepool-webapp-empty"
    # backend_http_settings_name = "behttp-webapp-443"
    priority                   = "340"
    url_path_map_name          = "urlmap-webapp-443"
  }

  url_path_map {
    name                               = "urlmap-webapp-443"
    default_backend_address_pool_name  = "bepool-webapp-empty"
    default_backend_http_settings_name = "behttp-webapp-443"

    path_rule {
      name                       = "frontend"
      backend_address_pool_name  = "bepool-webapp"
      backend_http_settings_name = "behttp-webapp-443"

      paths = [
        "/*",
      ]
    }
  }
dsczltch commented 2 months ago

@mbfrahry I can see this ticket is assigned to you since last year. Could you please ask App gateway team to improve the Azure Application gateway API and fix this terraform provider? Our team is looking for this fix, the service is difficult to update with the current behaviour and most of our downtimes come from this.

Github ticket https://github.com/hashicorp/terraform-provider-azurerm/issues/6896 opened in 2020 has not fixed the root issue even though its closed. :/

chuncheungy commented 2 months ago

On latest terraform document application_gateway,

The backend_address_pool, backend_http_settings, http_listener, private_link_configuration, request_routing_rule, redirect_configuration, probe, ssl_certificate, and frontend_port properties are Sets as the service API returns these lists of objects in a different order from how the provider sends them. As Sets are stored using a hash, if one value is added or removed from the Set, Terraform considers the entire list of objects changed and the plan shows that it is removing every value in the list and re-adding it with the new information. Though Terraform is showing all the values being removed and re-added, we are not actually removing anything unless the user specifies a removal in the configfile.

Do we really have downtime when modifying a part of the rules even though in the Terraform plan it "removes and adds" the whole set of rules? I ran a curl loop to test accessibility and did not notice any errors during terraform apply.

dsczltch commented 2 months ago

@chuncheungy Since dry run is unreadable, in case of misconfigurations from the engineers, it's impossible to spot them in the dry run which produce a downtime. We have multiple app gateways over multiple applications and environments. Errors happen and the current implementation of the App gateway API and its Terraform provider prevent us to identity this kind of errors, which is the main objective of IaC and dry run.

tpcgold commented 2 weeks ago

how is it possible that microsoft still didn't fix this issue? (still with 4.0.1) it's pretty annoying that if a APP Gateway is depoyed and the "PathBasedRouting" is not working!

e.g. when using it with AKS this leads to the fact that the whole cluster needs to be destroyed in order to destroy and redeploy the gateway!

there is no way one can edit from "Basic" routing to "PathBasedRouting" as it's a deadlock situation (if it doesn't deploy correctly - and it seems to be totally random if or if not it's working with the same script)