hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.85k stars 9.19k forks source link

Network Firewall: Destroying Rule Group does not first detach from Policy causing endless loop on InvalidOperationException #19616

Open rschwartz-tpn opened 3 years ago

rschwartz-tpn commented 3 years ago

Community Note

Terraform CLI and Terraform AWS Provider Version

Terraform v0.13.7

Affected Resource(s)

aws_networkfirewall_rule_group

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

locals {
  rule_files = [
    "rule1",    # files with a few lines of Suricata rules
    # "rule2"
  ]
}

# Loop over files and create rule groups
module "make-rules" {
  source    = "./make-it"
  for_each  = toset(local.rule_files)
  rule_file = each.key
}

# returns:
# rule-arns = {
#   "rule1" = {
#     "rule-arn" = "arn:aws:network-firewall:us-east-1:xxx:stateful-rulegroup/rule1"
#   }
#   "rule2" = {
#     "rule-arn" = "arn:aws:network-firewall:us-east-1:xxx:stateful-rulegroup/rule2"
#   }
# }

# flatten vars from above into one list
locals {
  biglist = flatten([for value in module.make-rules :
    values(value)  
  ])
}

# biglist = [
#   "arn:aws:network-firewall:us-east-1:xxx:stateful-rulegroup/rule1",
#   "arn:aws:network-firewall:us-east-1:xxx:stateful-rulegroup/rule2",
# ]

resource "aws_networkfirewall_firewall_policy" "test-policy" {
  name = "test-policy"

  firewall_policy {
    stateless_default_actions          = ["aws:forward_to_sfe"]
    stateless_fragment_default_actions = ["aws:forward_to_sfe"]

    dynamic "stateful_rule_group_reference" {
      iterator = rule
      for_each = toset(local.biglist) 
      content {
        resource_arn = rule.key
      }
    }
  }
}

make-rules module:

variable "rule_file" {
  type = string
}

# create rule group
resource "aws_networkfirewall_rule_group" "test-rules" {

  capacity = 10
  name     = var.rule_file
  type     = "STATEFUL"
  rule_group {
    rules_source {
      rules_string = file("${path.module}/${var.rule_file}")
    }
  }
}

output "rule-arn" {
  value = aws_networkfirewall_rule_group.test-rules.arn
}

Debug Output

2021-06-01T15:35:06.569Z [INFO]  plugin.terraform-provider-aws_v3.42.0_x5: 2021/06/01 15:35:06 [DEBUG] [aws-sdk-go] DEBUG: Response Network Firewall/DeleteRuleGroup Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 400 Bad Request
Connection: close
Content-Length: 157
Content-Type: application/x-amz-json-1.0
Date: Tue, 01 Jun 2021 15:35:05 GMT
X-Amzn-Requestid: XXX

-----------------------------------------------------: timestamp=2021-06-01T15:35:06.569Z
2021-06-01T15:35:06.569Z [INFO]  plugin.terraform-provider-aws_v3.42.0_x5: 2021/06/01 15:35:06 [DEBUG] [aws-sdk-go] {"__type":"com.amazonaws.networkfirewall.customerapi.v20201112#InvalidOperationException","Message":"Unable to delete the object because it is still in use"}: timestamp=2021-06-01T15:35:06.569Z

Panic Output

Expected Behavior

I would have expected the rule group to have been removed from the policy before attempting to delete it.

Actual Behavior

If you run the code with both files in local.rule_files listed, everything creates fine. If you modify the contents of any of the files, all is good. However, if you delete a file (ie: comment out "rule2" like in my pasted example above) and try to apply it, it just loops forever saying 'Still destroying...'

Steps to Reproduce

  1. terraform apply

Terraform shows that it wants to remove the rule from the policy:

Terraform will perform the following actions:

  # aws_networkfirewall_firewall_policy.test-policy will be updated in-place
  ~ resource "aws_networkfirewall_firewall_policy" "test-policy" {
        arn          = "arn:aws:network-firewall:us-east-1:xx:firewall-policy/test-policy"
        id           = "arn:aws:network-firewall:us-east-1:xx:firewall-policy/test-policy"
        name         = "test-policy"
        tags         = {}
        tags_all     = {}
        update_token = "dea69a4e-0fb1-4d0f-8183-e909837caf34"

      ~ firewall_policy {
            stateless_default_actions          = [
                "aws:forward_to_sfe",
            ]
            stateless_fragment_default_actions = [
                "aws:forward_to_sfe",
            ]

            stateful_rule_group_reference {
                resource_arn = "arn:aws:network-firewall:us-east-1:xx:stateful-rulegroup/rule1"
            }
          - stateful_rule_group_reference {
              - resource_arn = "arn:aws:network-firewall:us-east-1:xx:stateful-rulegroup/rule2" -> null
            }
        }
    }

  # module.make-rules["rule2"].aws_networkfirewall_rule_group.test-rules will be destroyed
  - resource "aws_networkfirewall_rule_group" "test-rules" {
      - arn          = "arn:aws:network-firewall:us-east-1:xx:stateful-rulegroup/rule2" -> null
      - capacity     = 10 -> null
      - id           = "arn:aws:network-firewall:us-east-1:xx:stateful-rulegroup/rule2" -> null
      - name         = "rule2" -> null
      - tags         = {} -> null
      - tags_all     = {} -> null
      - type         = "STATEFUL" -> null
      - update_token = "56741c80-1189-4af7-82a8-4eebd6aec554" -> null

      - rule_group {

          - rules_source {
              - rules_string = <<~EOT
                    drop ip 91.228.32.21/32 any -> any any (msg:"voipbl.txt.drop Hit!"; sid:10;)
                    drop ip 91.230.155.84/32 any -> any any (msg:"voipbl.txt.drop Hit!"; sid:11;)
                    drop ip 91.245.28.14/32 any -> any any (msg:"voipbl.txt.drop Hit!"; sid:12;)
                    drop ip 91.245.29.184/32 any -> any any (msg:"voipbl.txt.drop Hit!"; sid:13;)
                    drop ip 91.245.254.20/32 any -> any any (msg:"voipbl.txt.drop Hit!"; sid:14;)
                    drop ip 91.245.254.36/32 any -> any any (msg:"voipbl.txt.drop Hit!"; sid:15;)
                EOT -> null
            }
        }
    }

Plan: 0 to add, 1 to change, 1 to destroy.

Important Factoids

We are looping over the module with a for_each to create rules. We then use a Dynamic Block to attach all the rules to the policy. Looking through the debug, it seems like the first action TF is trying is to delete the rule group. There are some references to:

2021/06/01 17:30:17 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)" is waiting for "aws_networkfirewall_firewall_policy.test-policy"
2021/06/01 17:30:17 [TRACE] dag/walk: vertex "module.make-rules (close)" is waiting for "module.make-rules.output.rule-arn (expand)"
2021/06/01 17:30:17 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "aws_networkfirewall_firewall_policy.test-policy"
2021/06/01 17:30:17 [TRACE] dag/walk: vertex "module.make-rules.output.rule-arn (expand)" is waiting for "module.make-rules[\"rule2\"].aws_networkfirewall_rule_group.test-rules (destroy)"
2021/06/01 17:30:17 [TRACE] dag/walk: vertex "local.biglist (expand)" is waiting for "module.make-rules (close)"
2021/06/01 17:30:17 [TRACE] dag/walk: vertex "root" is waiting for "meta.count-boundary (EachMode fixup)"

So I'm not sure if there is some dependency on local.biglist to be complete before it tries to remove the rule from the policy.

References

anGie44 commented 3 years ago

Hi @rschwartz-tpn , thank you for raising this issue. If memory serves me right, to get terraform to do the modification first to the aws_networkfirewall_firewall_policy resource and then the rule_group deletion, we've updated our tests in the provider to use the lifecycle create_before_destroy attribute to ensure the rule_group deletion doesn't hang. Lmk if this workaround is feasible and/or if it works as expected!

Ex:


resource "aws_networkfirewall_rule_group" "test" {
  count    = 4
  capacity = 100
  name     = "tf-ac-test-${count.index}"
  # ... other configuration ...

  lifecycle {
    create_before_destroy = true
  }
}
rschwartz-tpn commented 3 years ago

Hi @anGie44 - The lifecycle meta-argument inserted in the rule_group has solved my problem. Is this a temporary workaround while the bug gets fixed? Thanks!

phill0555 commented 1 year ago

I believe we are facing the same issue. We have tried adding the lifecycle meta-argument which doesn't appear to be helping.

We have resources: Firewall Policy stateful rule group stateless rule group

We are making a simple change to the stateless rule group name.

The rule group simply doesn't seem to acknowledge the relationship between itself and the policy.

Should I create a new issue with the configuration? Or provide it in this ticket?

simon-wessel commented 1 year ago

I can confirm that adding the lifecycle create_before_destroy does not help. We are still running into timeouts when trying to remove rules and are therefore currently unable to fully manage our firewall rules using Terraform.

@phill0555 Did you find a solution?

AndreasSko commented 3 months ago

We are also facing the same issue. The lifecycle hook on it's own doesn't work for us as Terraform will attempt to create the new resource with the same name, so the request fails with A resource with the specified name already exists. Our "workaround" is to also rename the group when we know that a change will force a recreation. But it would of course be nice if the provide could handle that situation itself 🙂

phill0555 commented 3 months ago

I can confirm that adding the lifecycle create_before_destroy does not help. We are still running into timeouts when trying to remove rules and are therefore currently unable to fully manage our firewall rules using Terraform.

@phill0555 Did you find a solution?

Was quite a while ago now, but I think we saw that changing rules within the rule group didn't cause issues. So as long as you're happy with the rule group name, you can add and remove rules.

If you want to add a new rule group, (I think someone tested this at the time but can't fully remember) the solution was to:

It was a while ago now though and I'm not involved in the project that we encountered the issue in. Our first solution was to not change the rule group name.

simon-wessel commented 3 months ago

We solved the problem by adding a hash to the rule group name based on the rule group's content. When the content changes, the name changes as well and therefore Terraform will delete the old rule and create a new one. It's an hackfix, but the provider is not able to handle it otherwise.