Applying aws_wafv2_web_acl removes DDoS auto mitigation

sivanovhm commented 2 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

We have Shield advanced enabled for a specific CloudFront resource. Moreover, we've enabled DDoS auto mitigation on it.

The problem:

Once DDoS auto mitigation is enabled, AWS automatically adds a rule to the web acl with the following name format: ShieldMitigationRuleGroup___
Whenever tf apply is executed, the ShieldMitigationRuleGroup is removed from the acl simply because it is not defined in our aws_wafv2_web_acl and terraform marks it as a difference to be removed.
Once removed DDoS auto mitigation is turned off and we have to manually turn it on from the console.
It cannot be imported, since it is rule managed by AWS and our account does not even have permissions to view it.

This has caused us problems a couple of times now, since we sometimes forget to re-enable after applying terraform.

Terraform CLI and Terraform AWS Provider Version

Terraform v1.0.0 provider registry.terraform.io/hashicorp/aws v3.74.0

Affected Resource(s)

aws_wafv2_web_acl

Terraform Configuration Files

aws_wafv2_web_acl resource

Debug Output

      - rule {
          - name     = "ShieldMitigationRuleGroup_111111111111_22222222-2222-2222-2222-222222222222_33333333-3333-3333-3333-333333333333" -> null
          - priority = 11 -> null

          - override_action {

              - none {}
            }

          - statement {

              - rule_group_reference_statement {
                  - arn = "arn:aws:wafv2:us-east-1:153427709519:global/rulegroup/ShieldMitigationRuleGroup_111111111111_22222222-2222-2222-2222-222222222222_33333333-3333-3333-3333-333333333333" -> null
                }
            }

          - visibility_config {
              - cloudwatch_metrics_enabled = true -> null
              - metric_name                = "ShieldMitigationRuleGroup_111111111111_22222222-2222-2222-2222-222222222222_33333333-3333-3333-3333-333333333333" -> null
              - sampled_requests_enabled   = true -> null
            }

Panic Output

Expected Behavior

Shield Advanced auto mitigation rule to not be removed

Actual Behavior

Shield Advanced auto mitigation rule is removed

Steps to Reproduce

Create aws_wafv2_web_acl and apply to CF distribution via terraform
Add aws_shield_protection to the CloudFront distribution
Enable DDoS auto mitigation -> Open AWS Console - Go to WAF & Shield -> On the left side expand AWS Shield -> Click on Protected Resources -> Cliick on the resource you just protected -> On the bottom pick the radio button "Enable" under block "Automatic application layer DDoS mitigation" -> Next -> Save
terraform plan
terraform apply

Important Factoids

References

makknife commented 2 years ago

i also plan on using shield advanced with waf and noticed there's no tf resource to enable shield's automatic ddos mitigation

sounds like this could be an aws wafv2 api backend issue rather than terraform since aws has not made this a feature in a configurable resource i.e. wafv2 acl, and there's no shield api call when you tf apply. https://docs.aws.amazon.com/waf/latest/DDOSAPIReference/API_EnableApplicationLayerAutomaticResponse.html

but i second having a tf resource to abstract making these api calls

maybe consider using null_resource and local-exec as a last resort. this is where i am headed in the interim. it may not be elegant, but it may work conveniently until this gets resolved i.e. in tf cloud they have awscli and also curl installed https://awscli.amazonaws.com/v2/documentation/api/latest/reference/shield/enable-application-layer-automatic-response.html

https://www.terraform.io/language/resources/provisioners/syntax#provisioners-are-a-last-resort https://www.terraform.io/language/resources/provisioners/null_resource https://www.terraform.io/language/resources/provisioners/local-exec

or maybe use an external data source to create and grab the ddos rulegroup arn https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source

since aws shield aws enable-application-layer-automatic-response returns no output, you could do like a null_resource local_exec block for the creation, and an external datasource for the getter. i.e.

aws shield enable-application-layer-automatic-response --resource-arn arn:aws:cloudfront::XXXXXX:distribution/E36CP7BIX5EYXX --action Block={}

aws wafv2 get-web-acl --name internet-ingress-example-waf --scope CLOUDFRONT --id YYYY --query 'WebACL.Rules[?Priority==`10000000`].Statement.RuleGroupReferenceStatement.ARN' --output text

arn:aws:wafv2:us-east-1:153427709519:global/rulegroup/ShieldMitigationRuleGroup_XXXXX_2ece0f4b-a0c1-408a-9dc0-e7ff67dac624_6187a29e-02a3-417b-b17d-9845c8842748/35968a00-bdc0-4494-b653-e613f881a61e

deleting the ddos mitigation rule group in the acl does not remove the association in shield. nor does shield recreate a rule group. it will still say enabled, so you have to disable-application-layer-automatic-response followed by enable-application-layer-automatic-response to make a new group. also update-application-layer-automatic-response does nothing if the group is removed from the acl.

makknife commented 2 years ago

actually, after you apply ddos auto mitigation, what you can do is describe the web acl and get the rule group reference statement and put that into tf code. then you can make changes without it getting removed. i just verified this.

as defined in https://docs.aws.amazon.com/waf/latest/developerguide/ddos-automatic-app-layer-response-rg.html

i.e. here is a sample rule from aws wafv2 get-web-acl

      {
        "Name": "ShieldMitigationRuleGroup_XXXXX_2ece0f4b-a0c1-408a-9dc0-e7ff67dac624_95e24be0-e68c-43d1-9edb-d0f7c356d8e3",
        "Priority": 10000000,
        "Statement": {
          "RuleGroupReferenceStatement": {
            "ARN": "arn:aws:wafv2:us-east-1:153427709519:global/rulegroup/ShieldMitigationRuleGroup_XXXXX_2ece0f4b-a0c1-408a-9dc0-e7ff67dac624_95e24be0-e68c-43d1-9edb-d0f7c356d8e3/db6ff746-4a07-4825-8c3c-61319d9621a5"
          }
        },
        "OverrideAction": {
          "None": {}
        },
        "VisibilityConfig": {
          "SampledRequestsEnabled": false,
          "CloudWatchMetricsEnabled": true,
          "MetricName": "ShieldMitigationRuleGroup_XXXXX_2ece0f4b-a0c1-408a-9dc0-e7ff67dac624_95e24be0-e68c-43d1-9edb-d0f7c356d8e3"
        }
      }

you can see that the aws managed rule group arn in account 153427709519 is generated every time the shield auto mitigation association is made using the shield api call EnableApplicationLayerAutomaticResponse (with a suffix added to the generated rule name) thus, the tf code rule block for the above example would be

  rule {
    name     = "ShieldMitigationRuleGroup_XXXXX_2ece0f4b-a0c1-408a-9dc0-e7ff67dac624_95e24be0-e68c-43d1-9edb-d0f7c356d8e3"
    priority = 10000000

    override_action {
      none {}
    }

    statement {
      rule_group_reference_statement {
        arn = "arn:aws:wafv2:us-east-1:153427709519:global/rulegroup/ShieldMitigationRuleGroup_XXXXX_2ece0f4b-a0c1-408a-9dc0-e7ff67dac624_95e24be0-e68c-43d1-9edb-d0f7c356d8e3/db6ff746-4a07-4825-8c3c-61319d9621a5"
      }
    }

    visibility_config {
      cloudwatch_metrics_enabled = true
      metric_name                = "ShieldMitigationRuleGroup_XXXXX_2ece0f4b-a0c1-408a-9dc0-e7ff67dac624_95e24be0-e68c-43d1-9edb-d0f7c356d8e3"
      sampled_requests_enabled   = false
    }
  }

szymon-lyszkowski-dragon commented 2 years ago

@justinretzolk Any timelines on fixing that bug?

sivanovhm commented 2 years ago

@makknife your workaround sounds valid. However, we took a different approach to this. We ended up doing a custom AWS Config rule which triggers a lambda function via SSM Document (as a remediation action) for the shield protected resources which have a certain tag.

The Config rule also uses a lambda to check if the AWS resources comply with the rule. So basically:

Config rule
SSM Document
Two lambdas
- The first checks if the resources have a certain tag and if they do it evaluates its value. Based on the value it returns NOT_COMPLIANT or COMPLIANT back to AWS Config. If they don't have the tag it just returns NOT_APPLICABLE and nothing happens.
- The second lambda just turns on auto mitigation for the given resource detected by Lambda1

AWS Config -> Lambda to evaluate -> AWS Config -> SSM Document -> Lambda 2 to re-enable.

It's cool but given that your solution works, I would rather go with that for the sake of simplicity :D

If you need the whole AWS config set up, I'll ask around if we can share with the community. Just let me know.

justinretzolk commented 2 years ago

Hey @szymon-lyszkowski-dragon 👋 Thank you for checking in on this. Unfortunately, I'm not able to provide an estimate on when this will be looked into due to the potential of shifting priorities (we prioritize work by count of ":+1:" reactions, as well as a few other things). A larger prioritization document is in the works, but in the meantime additional information may be found in our issue lifecycle document.

nodomain commented 2 years ago

If you need the whole AWS config set up, I'll ask around if we can share with the community. Just let me know.

Need :) Cool approach!

sivanovhm commented 2 years ago

@nodomain

Got a green light. Sharing the tf "module" we use to remediate this. shield-aws-custom-config-rule

nodomain commented 2 years ago

Thanks!

nodomain commented 2 years ago

Unfortunately I cannot use this since AWS Config is already in use for central account maintenance.

Does anybody have another idea of working around this issue?

lifeofguenter commented 2 years ago

We tried the approach of importing the generated rule, our plan looks fine, but then during apply:

09:53:11  │ Error: Provider produced inconsistent final plan
09:53:11  │ 
09:53:11  │ When expanding the plan for aws_wafv2_web_acl.main to include new values
09:53:11  │ learned so far during apply, provider "registry.terraform.io/hashicorp/aws"
09:53:11  │ produced an invalid new value for .rule: planned set element

github-actions[bot] commented 1 year ago

This functionality has been released in v4.56.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform-provider-aws