hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.62k stars 4.65k forks source link

azurerm_firewall_policy_rule_collection_group - deleting a rule results in all rules above showing as deletes and reconverging to new rules in plan #23292

Open kahawai-sre opened 1 year ago

kahawai-sre commented 1 year ago

Is there an existing issue for this?

Community Note

Terraform Version

1.5.7

AzureRM Provider Version

3.72.0

Affected Resource(s)/Data Source(s)

azurerm_firewall_policy_rule_collection_group - inline network_rule_collection

Terraform Configuration Files

terraform {
  required_version = ">= 1.5.0"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "= 3.72.0"
    }
  }
}

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "australiaeast"
}

resource "azurerm_firewall_policy" "example" {
  name                = "example-fwpolicy"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
}

resource "azurerm_firewall_policy_rule_collection_group" "example" {
  name               = "example-fwpolicy-rcg"
  firewall_policy_id = azurerm_firewall_policy.example.id
  priority           = 500

  network_rule_collection {
    name     = "network_rule_collection1"
    priority = 400
    action   = "Deny"
    rule {
      name                  = "rule1"
      protocols             = ["TCP", "UDP"]
      source_addresses      = ["10.0.0.1"]
      destination_addresses = ["192.168.1.1", "192.168.1.2"]
      destination_ports     = ["80", "1000-2000"]
    }
    rule {
      name                  = "rule2"
      protocols             = ["TCP", "UDP"]
      source_addresses      = ["10.0.0.2"]
      destination_addresses = ["192.168.1.1", "192.168.1.2"]
      destination_ports     = ["80", "1000-2000"]
    }
    rule {
      name                  = "rule3"
      protocols             = ["TCP", "UDP"]
      source_addresses      = ["10.0.0.3"]
      destination_addresses = ["192.168.1.1", "192.168.1.2"]
      destination_ports     = ["80", "1000-2000"]
    }
  }
}

Debug Output/Panic Output

Deleting the "rule2" rule block above, will result in the following confusing PLAN output:

Terraform will perform the following actions:

  # azurerm_firewall_policy_rule_collection_group.example will be updated in-place
  ~ resource "azurerm_firewall_policy_rule_collection_group" "example" {
        id                 = "/subscriptions/29c7e3d4-e589-4555-a1d8-ac60fcc05a2c/resourceGroups/example-resources/providers/Microsoft.Network/firewallPolicies/example-fwpolicy/ruleCollectionGroups/example-fwpolicy-rcg"
        name               = "example-fwpolicy-rcg"
        # (2 unchanged attributes hidden)

      ~ network_rule_collection {
            name     = "network_rule_collection1"
            # (2 unchanged attributes hidden)

          ~ rule {
              ~ name                  = "rule2" -> "rule3"
              ~ source_addresses      = [
                  - "10.0.0.2",
                  + "10.0.0.3",
                ]
                # (6 unchanged attributes hidden)
            }
          - rule {
              - destination_addresses = [
                  - "192.168.1.1",
                  - "192.168.1.2",
                ] -> null
              - destination_fqdns     = [] -> null
              - destination_ip_groups = [] -> null
              - destination_ports     = [
                  - "80",
                  - "1000-2000",
                ] -> null
              - name                  = "rule3" -> null
              - protocols             = [
                  - "TCP",
                  - "UDP",
                ] -> null
              - source_addresses      = [
                  - "10.0.0.3",
                ] -> null
              - source_ip_groups      = [] -> null
            }

            # (1 unchanged block hidden)
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Expected Behaviour

Deleting the "rule" block with name = "rule2" should only show that rule being deleted in the plan, and no other changes.

Actual Behaviour

Deleting the "rule" block with name = "rule2" show "rule3" being deleted, and "rule2" block being modified with the properties of "rule3". This is very confusing esp. as the number of rules grows large.

Is this expected behavior? If so, it makes it very hard to use terraform to manage Azure Firewall Policy rule collections.

As a comparison, with inline NSG rules, rules can be added and deleted, and plan only shows the changed rule(s).

Example PLAN output when deleting rule block for "rule2":

Terraform will perform the following actions:

  # azurerm_firewall_policy_rule_collection_group.example will be updated in-place
  ~ resource "azurerm_firewall_policy_rule_collection_group" "example" {
        id                 = "/subscriptions/29c7e3d4-e589-4555-a1d8-ac60fcc05a2c/resourceGroups/example-resources/providers/Microsoft.Network/firewallPolicies/example-fwpolicy/ruleCollectionGroups/example-fwpolicy-rcg"
        name               = "example-fwpolicy-rcg"
        # (2 unchanged attributes hidden)

      ~ network_rule_collection {
            name     = "network_rule_collection1"
            # (2 unchanged attributes hidden)

          ~ rule {
              ~ name                  = "rule2" -> "rule3"
              ~ source_addresses      = [
                  - "10.0.0.2",
                  + "10.0.0.3",
                ]
                # (6 unchanged attributes hidden)
            }
          - rule {
              - destination_addresses = [
                  - "192.168.1.1",
                  - "192.168.1.2",
                ] -> null
              - destination_fqdns     = [] -> null
              - destination_ip_groups = [] -> null
              - destination_ports     = [
                  - "80",
                  - "1000-2000",
                ] -> null
              - name                  = "rule3" -> null
              - protocols             = [
                  - "TCP",
                  - "UDP",
                ] -> null
              - source_addresses      = [
                  - "10.0.0.3",
                ] -> null
              - source_ip_groups      = [] -> null
            }

            # (1 unchanged block hidden)
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Steps to Reproduce

terraform init terraform apply delete "rule" block for "rule2" terraform plan

Important Factoids

No response

References

No response

kahawai-sre commented 1 year ago

@magodo - Not sure if you can help at all here? Did the PR resolving https://github.com/hashicorp/terraform-provider-azurerm/issues/10083 intend to solve the problem described above?

magodo commented 1 year ago

@kahawai-sre Per my understanding, this is how terraform expects to work. For the inlined NSR in NSG, I think it was even worse (as the security_rule is defined as a Set):

Removing the rule with `name` = `test2`:

      ~ security_rule       = [
          - {
              - access                                     = "Allow"
              - description                                = ""
              - destination_address_prefix                 = "*"
              - destination_address_prefixes               = []
              - destination_application_security_group_ids = []
              - destination_port_range                     = "*"
              - destination_port_ranges                    = []
              - direction                                  = "Inbound"
              - name                                       = "test1"
              - priority                                   = 100
              - protocol                                   = "Tcp"
              - source_address_prefix                      = "*"
              - source_address_prefixes                    = []
              - source_application_security_group_ids      = []
              - source_port_range                          = "*"
              - source_port_ranges                         = []
            },
          - {
              - access                                     = "Allow"
              - description                                = ""
              - destination_address_prefix                 = "*"
              - destination_address_prefixes               = []
              - destination_application_security_group_ids = []
              - destination_port_range                     = "*"
              - destination_port_ranges                    = []
              - direction                                  = "Inbound"
              - name                                       = "test3"
              - priority                                   = 102
              - protocol                                   = "Udp"
              - source_address_prefix                      = "*"
              - source_address_prefixes                    = []
              - source_application_security_group_ids      = []
              - source_port_range                          = "*"
              - source_port_ranges                         = []
            },
          + {
              + access                                     = "Allow"
              + description                                = null
              + destination_address_prefix                 = "*"
              + destination_address_prefixes               = []
              + destination_application_security_group_ids = []
              + destination_port_range                     = "*"
              + destination_port_ranges                    = []
              + direction                                  = "Inbound"
              + name                                       = "test1"
              + priority                                   = 100
              + protocol                                   = "Tcp"
              + source_address_prefix                      = "*"
              + source_address_prefixes                    = []
              + source_application_security_group_ids      = []
              + source_port_range                          = "*"
              + source_port_ranges                         = []
            },
          + {
              + access                                     = "Allow"
              + description                                = null
              + destination_address_prefix                 = "*"
              + destination_address_prefixes               = []
              + destination_application_security_group_ids = []
              + destination_port_range                     = "*"
              + destination_port_ranges                    = []
              + direction                                  = "Inbound"
              + name                                       = "test3"
              + priority                                   = 102
              + protocol                                   = "Udp"
              + source_address_prefix                      = "*"
              + source_address_prefixes                    = []
              + source_application_security_group_ids      = []
              + source_port_range                          = "*"
              + source_port_ranges                         = []
            },
          - {
              - access                                     = "Deny"
              - description                                = ""
              - destination_address_prefix                 = "*"
              - destination_address_prefixes               = []
              - destination_application_security_group_ids = []
              - destination_port_range                     = "*"
              - destination_port_ranges                    = []
              - direction                                  = "Inbound"
              - name                                       = "test2"
              - priority                                   = 101
              - protocol                                   = "Udp"
              - source_address_prefix                      = "*"
              - source_address_prefixes                    = []
              - source_application_security_group_ids      = []
              - source_port_range                          = "*"
              - source_port_ranges                         = []
            },
        ]

10083 was changing the type from set to list, to make the plan diff a bit more readable than the NSG/NSR ones.

kahawai-sre commented 1 year ago

@magodo thanks, but for me the inline NSG rule schema works very well (as per your suggestion in https://github.com/hashicorp/terraform-provider-azurerm/issues/22630). I can modify rules, remove rules in any position in the order, add rules in any position, and plan works as expected i.e. it only shows modifications, additions and deletes where they have been explicitly made (assuming no properties are inadvertently set to null). I'm not sure why Azure Firewall Rule Collection groups rules should be any different, other than that the rules are nested in their own sub class of collection (network, application, nat). For me the plan output is relatively unusable in it's current form. If e.g. a single network rule is deleted, and there are 200 rules above it in the order, the plan will show hundreds of lines making review of the proposed target state very difficult to validate, for what should be a single change.

magodo commented 1 year ago

@kahawai-sre

Per my understanding, only Set type can meet your requirement that "when a rule is added/deleted, the change should only show that rule", by manually set all the attributes/blocks (even for those are absent).

The reason why the FW rule related resources are using List prevalently is because there are other cx are complaining "when a rule is updated, the change should only show that rule". This can be supported easily by using type List, and without the effort to manually zero set those not interested attributes/blocks. Also, it can avoid a bunch of weired bugs introduced by Set type, in the legacy SDK.

Since there are different cx have different needs on this, and there is actually no ideal way to meet both requirements with the current SDK, I tend to keep it as is. Till we migrated to the terraform-plugin-framework, we can consider changing all these to type Set so that:

Actually, I've tried to verify above behavior in my self-made demo provider: https://github.com/magodo/terraform-provider-demo/tree/89b8be93da21c80a45494023a87a4fffe8f89895. I have following config:

resource "demo_foo" "test" {
  list_nested_block {
    name = "foo1"
    age  = 1
  }
  list_nested_block {
    name = "foo2"
  }
  list_nested_block {
    name = "foo3"
  }
  set_nested_block {
    name = "bar1"
    age  = 1
  }
  set_nested_block {
    name = "bar2"
  }
  set_nested_block {
    name = "bar3"
  }
}

After the inital apply.

1. Create/Delete

Change it to:

  list_nested_block {
    name = "foo1"
    age  = 1
  }
  # list_nested_block {
  #   name = "foo2"
  # }
  list_nested_block {
    name = "foo3"
  }
  list_nested_block {
    name = "foo4"
  }
  set_nested_block {
    name = "bar1"
    age  = 1
  }
  # set_nested_block {
  #   name = "bar2"
  # }
  set_nested_block {
    name = "bar3"
  }
  set_nested_block {
    name = "bar4"
  }

The plan shows:

  # demo_foo.test will be updated in-place
  ~ resource "demo_foo" "test" {
        id      = "77370a54-f845-1005-c209-0ffe6c94566b"
        # (5 unchanged attributes hidden)

      ~ list_nested_block {
          ~ name = "foo2" -> "foo3"
        }
      ~ list_nested_block {
          ~ name = "foo3" -> "foo4"
        }

      - set_nested_block {
          - name = "bar2" -> null
        }
      + set_nested_block {
          + name = "bar4"
        }

        # (3 unchanged blocks hidden)
    }

2. Update

If I change the name of list_nested_block foo2 to foo4, and set_nested_block bar2 to bar4, the plan shows:

  # demo_foo.test will be updated in-place
  ~ resource "demo_foo" "test" {
        id      = "77370a54-f845-1005-c209-0ffe6c94566b"
        # (5 unchanged attributes hidden)

      ~ list_nested_block {
          ~ name = "foo2" -> "foo4"
        }

      - set_nested_block {
          - name = "bar2" -> null
        }
      + set_nested_block {
          + name = "bar4"
        }

        # (4 unchanged blocks hidden)
    }
kahawai-sre commented 1 year ago

Thanks @magodo, very much appreciate you looking into this. That is what I figured. If Set can be handled similar to inline NSG rules, for me it is a shame we did not persist with Set for AzFW rules, it is easy to deliver to both cx outcomes by forcing each string or list type property to an empty string "" or list [] vs default Null, then we get the best of both. If your test provider mockup indicates 100% that the same handling as per inline NSG rules will not work in the AzFw RCG case, then sure I guess this is stuck.

For me I'm blocked on using AzureRM for managing FW policies at scale now. It's not tenable to have security or platform folk try to approve PLAN outputs at scale, esp. for larger policies, in the current state.

This creates a gap for how our organization can consistently deliver and manages the capability. We can revert to AzAPI, but that has a limitation for initial deployments where if there are any dynamically determined fields in the "properties" payload, initial deployment will show the entire properties payload as "determined after apply", rather than just the specific dynamically determined property within, which again makes intended deployments hard to validate and approve.

Would sincerely appreciate your feedback on how we might proceed in lieu of the pending TF plugin framework.

magodo commented 1 year ago

@kahawai-sre

Regarding your specific needs, I believe if the FW RCG can be defined standalone (like the NSR), it should be good for you to manage them at a scale. You have mentioned that standalone resources might not tolerate those out-of-band drift, whilst since v1.5 terraform introduced the check block, which can assits you to detect those unwanted drifts. Checkout this video on one of the use case for drift detecting.

The unfortunate fact is that, there is one PR trying to create those standalone FW RC resources, but was closed: https://github.com/hashicorp/terraform-provider-azurerm/pull/22810. I'll ask the reviewer to see whether we can continue looking into this PR.

kahawai-sre commented 1 year ago

Thanks again @magodo - that sounds like a great way forward. Appreciate that.

kahawai-sre commented 1 year ago

@magodo, there is no resolution given the PR is also blocked and there is currently no alternative resolution path. Did you manage to get any further information on re-opening the PR?

magodo commented 1 year ago

@kahawai-sre Pinged the maintainer in the PR.