hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.52k stars 4.6k forks source link

Creating new firewall rule with azurerm_firewall_policy_rule_collection_group force existing rule to be recreated. #18114

Open juicybaba opened 2 years ago

juicybaba commented 2 years ago

Is there an existing issue for this?

Community Note

Terraform Version

1.1.9

AzureRM Provider Version

3.19.1

Affected Resource(s)/Data Source(s)

azurerm_firewall_policy_rule_collection_group

Terraform Configuration Files

resource "azurerm_firewall_policy_rule_collection_group" "example" {
  name               = "example-fwpolicy-rcg"
  firewall_policy_id = azurerm_firewall_policy.example.id
  priority           = 500

  network_rule_collection {
    name     = “rule 1”
    …
  }

  network_rule_collection {
    name     = “rule 2”
    …
  }

 network_rule_collection { <- this is a new rule
    name     = “rule 3”
    …
  }

}

Debug Output/Panic Output

No matter whether the new rule is added after/between/before the previous rules, the plan always replaces rule2 with rule3 and recreates rule 2. This happens in `azurerm 3.0` only.

Expected Behaviour

rule 3 should be added without impacting existing rules.

This can be the cause, but I attempted to find a way to insert the rule3 to the end of the list by what I mentioned earlier, but I couldnt make it work.

Actual Behaviour

No response

Steps to Reproduce

No response

Important Factoids

No response

References

No response

wuxu92 commented 2 years ago

Hi @henryxuteck just a simple confirmation: is azurerm 2.x works as your expect and from 3.x azurerm raised the recreate action?

juicybaba commented 2 years ago

Hi @henryxuteck just a simple confirmation: is azurerm 2.x works as your expect and from 3.x azurerm raised the recreate action?

Yes, 2.0 works fine, 3.0 doesn't.

wuxu92 commented 2 years ago

Hi @juicybaba, I have tried your configuration by adding rule3 after rule2 and don't see the recreation of rule2. or how can we tell it is the plan always replaces rule2 with rule3 and recreates rule 2. I have reviewed the result in Azure Portal and the rules' order is the same as those listed in terraform.

In the update of azurerm 3.0, the Firewall changes from Set to List should not cause this kind of behavior

juicybaba commented 2 years ago

@wuxu92 thanks for getting back to me, sorry for my bad example, I think the root cause is that no matter which order network rule get passed in, the network rule will be re-ordered alphabetically. e.g. when appending network rule collection 2 to existing network rule collection 1 and network rule collection 3, terraform will recreate replace 3 with 2 and recreate 3.

And again, this happens in azurerm 3.0

My interpretation of order matters in below azurerm 3.0 upgrade guide is the order of how rule collection get fed in, instead of the name of the rule collection. Maybe I am wrong.

Firewall: The behavior of the nested items will be changed to List instead of Sets where required, meaning that the order of these items now matters. Note that if you're referencing these nested items within your Terraform Configuration, then this may require some code changes.

image

wuxu92 commented 2 years ago

I have tried to add xxx2 after xxx3 and the result order is listed as configuration and no recreation happened.

image

could you please share your full azurerm_firewall_policy_rule_collection_group configuration before and after applying to see what is the difference?

wuxu92 commented 2 years ago

here is the origin PR of the change from set to list, and it looks fine to me. https://github.com/hashicorp/terraform-provider-azurerm/pull/15008

juicybaba commented 2 years ago

@wuxu92 thanks for getting back to me. Please see the full configuration for azurerm_firewall_policy, azurerm_firewall_policy_rule_collection_group and local.firewall_rule_collection_groups. The only difference between before and after is, a new network rule collection network_rule_collections_3 get added to local.firewall_rule_collection_groups.

Perhaps the interation in the dynamic block re-ordered the rules?

Before:

resource "azurerm_firewall_policy" "firewall_test" {
  resource_group_name = module.resourcegroup.resource_group.name
  location            = module.resourcegroup.resource_group.location
  base_policy_id      = null
  name                = format("fwpol-test")
  # https://github.com/hashicorp/terraform-provider-azurerm/issues/9620
  tags                     = tomap({ for key, val in module.resourcegroup.resource_group.tags : lower(key) => val })
  sku                      = "Standard"
  threat_intelligence_mode = "Deny"
  dns { proxy_enabled = true }
}

resource "azurerm_firewall_policy_rule_collection_group" "default" {
  for_each           = local.firewall_rule_collection_groups
  name               = format("fwrcg-%s", each.key)
  firewall_policy_id = azurerm_firewall_policy.firewall_test.id
  priority           = each.value.priority

  dynamic "network_rule_collection" {
    iterator = self
    for_each = each.value.network_rule_collections
    content {
      name     = format("nrc-%s", self.key)
      priority = self.value.priority
      action   = self.value.action
      dynamic "rule" {
        for_each = self.value.rules
        content {
          name              = rule.key
          protocols         = rule.value.protocols
          destination_ports = rule.value.destination_ports

          destination_addresses = rule.value.destination_addresses
          destination_ip_groups = rule.value.destination_ip_groups
          destination_fqdns     = rule.value.destination_fqdns

          source_addresses = rule.value.source_addresses
          source_ip_groups = rule.value.source_ip_groups
        }
      }
    }
  }

  dynamic "application_rule_collection" {
    iterator = self
    for_each = each.value.application_rule_collections

    content {
      name     = format("arc-%s", self.key)
      priority = self.value.priority
      action   = self.value.action
      dynamic "rule" {
        for_each = lookup(self.value, "rules", {})
        content {
          name = rule.key

          destination_fqdns     = rule.value.destination_fqdns
          destination_fqdn_tags = rule.value.destination_fqdn_tags

          source_addresses = rule.value.source_addresses
          source_ip_groups = rule.value.source_ip_groups

          terminate_tls = false
          dynamic "protocols" {
            for_each = rule.value.protocols
            content {
              type = protocols.value.type
              port = protocols.value.port
            }
          }
        }
      }
    }
  }

  dynamic "nat_rule_collection" {
    iterator = self
    for_each = each.value.nat_rule_collections
    content {
      name     = format("natrc-%s", self.key)
      priority = self.value.priority
      action   = "Dnat"
      dynamic "rule" {
        for_each = lookup(self.value, "rules", {})
        content {
          name      = rule.key
          protocols = rule.value.protocols

          destination_address = rule.value.destination_address
          destination_ports   = rule.value.destination_ports
          translated_address  = rule.value.translated_address
          translated_port     = rule.value.translated_port

          source_addresses = rule.value.source_addresses
          source_ip_groups = rule.value.source_ip_groups
        }
      }
    }
  }

}

locals {
  firewall_rule_collection_groups = {
    landingzones = {
      priority = 40000
      network_rule_collections = {
        network_rule_collections_1 = {
          priority = 30000
          action   = "Allow"
          rules = {
            rule1 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["10.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["10.0.0.0/8"]
              source_ip_groups      = null
            }
            rule2 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["11.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["11.0.0.0/8"]
              source_ip_groups      = null
            }

          }
        },
        network_rule_collections_3 = {
          priority = 40000
          action   = "Allow"
          rules = {
            rule1 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["10.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["10.0.0.0/8"]
              source_ip_groups      = null
            }

            rule2 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["11.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["11.0.0.0/8"]
              source_ip_groups      = null
            }

          }
        }

      }
      application_rule_collections = {}
      nat_rule_collections         = {}
    }
  }
}

After:

resource "azurerm_firewall_policy" "firewall_test" {
  resource_group_name = module.resourcegroup.resource_group.name
  location            = module.resourcegroup.resource_group.location
  base_policy_id      = null
  name                = format("fwpol-test")
  # https://github.com/hashicorp/terraform-provider-azurerm/issues/9620
  tags                     = tomap({ for key, val in module.resourcegroup.resource_group.tags : lower(key) => val })
  sku                      = "Standard"
  threat_intelligence_mode = "Deny"
  dns { proxy_enabled = true }
}

resource "azurerm_firewall_policy_rule_collection_group" "default" {
  for_each           = local.firewall_rule_collection_groups
  name               = format("fwrcg-%s", each.key)
  firewall_policy_id = azurerm_firewall_policy.firewall_test.id
  priority           = each.value.priority

  dynamic "network_rule_collection" {
    iterator = self
    for_each = each.value.network_rule_collections
    content {
      name     = format("nrc-%s", self.key)
      priority = self.value.priority
      action   = self.value.action
      dynamic "rule" {
        for_each = self.value.rules
        content {
          name              = rule.key
          protocols         = rule.value.protocols
          destination_ports = rule.value.destination_ports

          destination_addresses = rule.value.destination_addresses
          destination_ip_groups = rule.value.destination_ip_groups
          destination_fqdns     = rule.value.destination_fqdns

          source_addresses = rule.value.source_addresses
          source_ip_groups = rule.value.source_ip_groups
        }
      }
    }
  }

  dynamic "application_rule_collection" {
    iterator = self
    for_each = each.value.application_rule_collections

    content {
      name     = format("arc-%s", self.key)
      priority = self.value.priority
      action   = self.value.action
      dynamic "rule" {
        for_each = lookup(self.value, "rules", {})
        content {
          name = rule.key

          destination_fqdns     = rule.value.destination_fqdns
          destination_fqdn_tags = rule.value.destination_fqdn_tags

          source_addresses = rule.value.source_addresses
          source_ip_groups = rule.value.source_ip_groups

          terminate_tls = false
          dynamic "protocols" {
            for_each = rule.value.protocols
            content {
              type = protocols.value.type
              port = protocols.value.port
            }
          }
        }
      }
    }
  }

  dynamic "nat_rule_collection" {
    iterator = self
    for_each = each.value.nat_rule_collections
    content {
      name     = format("natrc-%s", self.key)
      priority = self.value.priority
      action   = "Dnat"
      dynamic "rule" {
        for_each = lookup(self.value, "rules", {})
        content {
          name      = rule.key
          protocols = rule.value.protocols

          destination_address = rule.value.destination_address
          destination_ports   = rule.value.destination_ports
          translated_address  = rule.value.translated_address
          translated_port     = rule.value.translated_port

          source_addresses = rule.value.source_addresses
          source_ip_groups = rule.value.source_ip_groups
        }
      }
    }
  }

}

locals {
  firewall_rule_collection_groups = {
    landingzones = {
      priority = 40000
      network_rule_collections = {
        network_rule_collections_1 = {
          priority = 30000
          action   = "Allow"
          rules = {
            rule1 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["10.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["10.0.0.0/8"]
              source_ip_groups      = null
            }
            rule2 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["11.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["11.0.0.0/8"]
              source_ip_groups      = null
            }

          }
        },
        network_rule_collections_3 = {
          priority = 40000
          action   = "Allow"
          rules = {
            rule1 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["10.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["10.0.0.0/8"]
              source_ip_groups      = null
            }

            rule2 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["11.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["11.0.0.0/8"]
              source_ip_groups      = null
            }

          }
        },
        network_rule_collections_2 = {
          priority = 40000
          action   = "Allow"
          rules = {
            rule1 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["10.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["10.0.0.0/8"]
              source_ip_groups      = null
            }

            rule2 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["12.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["12.0.0.0/8"]
              source_ip_groups      = null
            }

          }
        }

      }
      application_rule_collections = {}
      nat_rule_collections         = {}
    }
  }
}
jr8279 commented 1 year ago

Has anyone found a solution for this issue? We are upgrading to a premium firewall, which requires us to move to Azure Firewall Policy and Firewall Rule Collection Groups. Currently with classic rules each rule collection gets its own tf file and allows for easy management of our rules at scale. I've attempted a solution similar to juicybaba with identical results; yamldecode orders the objects lexicographically by key and it forces all of our rule collections to be rewritten almost every time a new rule collection is added. In looking at the tfstate file, it appears that the Rule Collection Group gets an Azure Resource ID, but the rule collections do not; they are just an ordered list of objects. I'm stumped as to how our operations team can manage our 40+ rule collections without them all being in a single variable map and only adding new rule collections to the bottom.

rgb000 commented 1 year ago

We have the same issue here. As we can see in Azure portal, each time when we create Rule collection it has its own specific unpredictable rules and collections order which cannot be changed and if this order do not match the terraform defined one - it tries to change it. We have terraform module for azurerm_firewall_policy_rule_collection_group and using the same module we added rules for 2 different environments and get different rules order, thus not sustainable state:

├── rule-collection-group-dev
│   ├── dnat-rule-collection-dev-1
│   │   ├── rule1
│   │   └── rule2
│   ├── network-rule-collection-dev-1
│   │   ├── rule1
│   │   ├── rule2
│   │   └── rule3
│   └── network-rule-collection-dev-2
│       ├── rule1
│       ├── rule2
│       └── rule3
└── rule-collection-group-qa
    ├── dnat-rule-collection-qa-1
    │   ├── rule1
    │   └── rule2
    ├── network-rule-collection-qa-1
    │   ├── rule1
    │   ├── rule2
    │   └── rule3
    └── network-rule-collection-qa-2
        ├── rule1
        ├── rule2
        └── rule3
├── rule-collection-group-dev
│   ├── network-rule-collection-dev-1
│   │   ├── rule2
│   │   ├── rule1
│   │   └── rule3
│   ├── dnat-rule-collection-dev-1
│   │   ├── rule1
│   │   └── rule2
│   └── network-rule-collection-dev-2
│       ├── rule3
│       ├── rule1
│       └── rule2
└── rule-collection-group-qa
    ├── dnat-rule-collection-qa-1
    │   ├── rule1
    │   └── rule2
    ├── network-rule-collection-qa-2
    │   ├── rule3
    │   ├── rule2
    │   └── rule1
    └── network-rule-collection-qa-1
        ├── rule1
        ├── rule2
        └── rule3

Then, if you delete manually collection group from the portal and run terraform again it can be different again. So, our opinion here:

  1. Some problem with the way azurerm_firewall_policy_rule_collection_group resource interacts with Azure API (order in which objects are created/appended to the resource)
  2. Some problem with Azure Firewall Policy service (each rule has its own order index when added without any sorting in place)

But based on our observation of Azure Firewall Policy service each new rule/collection/collection_group is always appended (added at the end) not matter what the name/type is. Thus we think this is option 1 issue

jr8279 commented 1 year ago

There are several issues open with Hashicorp for the azurerm_firewall_policy_rule_collection_group module. As I've worked on this issue and talked to Microsoft reps about it I think the issue is with how Microsoft designed Firewall Policy Rule Collection Groups and not an issue with how Terraform deploys them.

https://github.com/hashicorp/terraform-provider-azurerm/issues/16586

Microsoft made the decision with the Rule Collection Groups to make the list of collections and subsequent rules an ordered list rather than making each collection a resource. This means that there isn't a way for Azure to identify each collection and thus if the list changes it must recreate the list. Also notice in Azure Portal that the Rule Collections in a Rule Collection Group are ordered in reverse order of which they are processed (Application, then Network, then NAT).

image

To reduce the potential impact to our deployments we broke the collections into individual yml files, and when we pull the collections together we sort them by priority. There will still be periodic changes to the list as we clean up old rules, but it is a higher probability that new rules will be placed at the bottom of the priority list.

Microsoft has also confirmed with me that when updates are made to the firewall rules, the instance of the firewall will be removed from the pool, have the updates made to it, and then the instance will be returned to the pool before moving to the next firewall instance (a 1-2 minute process per instance). So any restructuring of the collections and rules will not impact production.

ozen10 commented 1 year ago

Microsoft has also confirmed with me that when updates are made to the firewall rules, the instance of the firewall will be removed from the pool, have the updates made to it, and then the instance will be returned to the pool before moving to the next firewall instance (a 1-2 minute process per instance). So any restructuring of the collections and rules will not impact production.

Hi @jr8279 just checking if I got this correctly, There should be no downtime on the network connectivity even if the collections groups and rules are being recreated?

jr8279 commented 1 year ago

According to Microsoft, there shouldn’t be a downtime. But depending on how many firewalls instances you are using, there could be a period of time where some traffic is using the new rule list and some traffic is using the old rule list. They told me it takes 1-2 minutes per instance to update.

wuxu92 commented 10 months ago

@juicybaba The cause should be the map in Terraform configuration will be alphabetically ordered automatically. as we defined a local variable like:

    rc_map = {
        "g1" = {
            "name" = "g1"
        }
        "g3" = {
            "name" = "g3"
        }
        "g2" = {
            "name" = "g2"
        }
    }

the output in Terraform console, the key will be reordered:

terraform console
> local.rc_map
{
  "g1" = {
    "name" = "g1"
  }
  "g2" = {
    "name" = "g2"
  }
  "g3" = {
    "name" = "g3"
  }
}

So If we want the rule_collection to keep the order, we have to define the local variable as a list and then loop over the list.

a local variable like below(changing network_rule_collections to list):

locals = {
  firewall_rule_collection_groups_list = {
    landingzones = {
      priority = 40000
      network_rule_collections = [
        {
          key      = "network_rule_collections_1"
          priority = 30000
          action   = "Allow"
          rules = {
            rule1 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["10.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["10.0.0.0/8"]
              source_ip_groups      = null
            }
            rule2 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["11.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["11.0.0.0/8"]
              source_ip_groups      = null
            }

          }
        },
        {
          key      = "network_rule_collections_3 "
          priority = 40000
          action   = "Allow"
          rules = {
            rule1 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["10.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["10.0.0.0/8"]
              source_ip_groups      = null
            }

            rule2 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["11.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["11.0.0.0/8"]
              source_ip_groups      = null
            }

          }
        },
        {
          key      = "network_rule_collections_2"
          priority = 40000
          action   = "Allow"
          rules = {
            rule1 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["10.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["10.0.0.0/8"]
              source_ip_groups      = null
            }

            rule2 = {
              protocols             = ["TCP"]
              destination_ports     = ["*"]
              destination_addresses = ["12.0.0.0/8"]
              destination_ip_groups = null
              destination_fqdns     = null
              source_addresses      = ["12.0.0.0/8"]
              source_ip_groups      = null
            }

          }
        }
      ]
      application_rule_collections = {}
      nat_rule_collections         = {}
    }
  }
}

and the new resource configuration to consume a list in dynamic collection blocks, Then append new items to the list will not display recreating operation in terraform plan (but this plane output does not indicate there is no replace/recreate in the Azure firewall server side as comments above give more details):

resource "azurerm_firewall_policy_rule_collection_group" "default2" {
  for_each           = local.firewall_rule_collection_groups_list
  name               = format("fwrcg-%s", each.key)
  firewall_policy_id = azurerm_firewall_policy.firewall_test.id
  priority           = each.value.priority

  dynamic "network_rule_collection" {
    iterator = self
    for_each = {for idx, v in each.value.network_rule_collections: idx => v}
    content {
      name     = format("nrc-%s", self.value.key)
      priority = self.value.priority
      action   = self.value.action
      dynamic "rule" {
        for_each = self.value.rules
        content {
          name              = rule.key
          protocols         = rule.value.protocols
          destination_ports = rule.value.destination_ports

          destination_addresses = rule.value.destination_addresses
          destination_ip_groups = rule.value.destination_ip_groups
          destination_fqdns     = rule.value.destination_fqdns

          source_addresses = rule.value.source_addresses
          source_ip_groups = rule.value.source_ip_groups
        }
      }
    }
  }
}