hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.74k stars 9.1k forks source link

[Bug]: destruction of wafv2 rule group happens in wrong order #28331

Open Dominik-Gubrynowicz opened 1 year ago

Dominik-Gubrynowicz commented 1 year ago

Terraform Core Version

v1.3.6

AWS Provider Version

v4.46.0

Affected Resource(s)

Expected Behavior

On aws_wafv2_rule_group destroy, there should be different destruction order:

Actual Behavior

Currently, destruction order is following:

Relevant Error/Panic Output Snippet

╷
│ Error: Error deleting WAFv2 RuleGroup: WAFAssociatedItemException: AWS WAF couldn’t perform the operation because your resource is being used by another resource or it’s associated with another resource.
│ 
│ 
╵

Terraform Configuration Files

https://github.com/Dominik-Gubrynowicz/terraform-aws-wafv2-rulegroup-destruction-error/tree/master

Steps to Reproduce

Link attached above directs to the repo that have two branches:

How to reproduce this bug:

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

jaumebalust commented 1 year ago

As a work around I had to delete the whole waf_web_acl_association and the waf_web_acl, and then I could delete the rule_group

justinretzolk commented 1 year ago

Hey all 👋 Thank you for taking the time to raise this! Terraform itself is responsible for generating the graph that determines order of operations, and doesn't currently have a way for providers to supply additional information regarding ordering. That said, you can control this to some degree with create_before_destroy (this issue in the Terraform Core repository has quite a bit more information that I found helpful when brushing up on this particular pattern).

Can someone who has run into this test using the meta-argument to see if that corrects the issue?

Dominik-Gubrynowicz commented 1 year ago

Hello, thank you for your suggestion – I have tested this solution, and it seems it works when you're just removing rule group from configuration, however most of the pain is a rule group modification. When you include create_before_destroy to your lifecycle config and try to modify rule-group with parameters which modification require resource replacement (i.e capacity), then instead of successful apply you'll get because WAF doesn't accept duplicate resources:

╷
│ Error: creating WAFv2 RuleGroup (example-rule1): WAFDuplicateItemException: AWS WAF couldn’t perform the operation because some resource in your request is a duplicate of an existing one.
│ 
│   with aws_wafv2_rule_group.example_rule_group1,
│   on main.tf line 1, in resource "aws_wafv2_rule_group" "example_rule_group1":
│    1: resource "aws_wafv2_rule_group" "example_rule_group1" {
│ 
justinretzolk commented 1 year ago

Thanks for getting back to me @Dominik-Gubrynowicz! I'm thinking this is something that we're going to need upstream Terraform changes for.

Relates https://github.com/hashicorp/terraform/issues/31309

cornevandyk commented 4 months ago

adding to this, it also applies to rule groups added to aws_fms_policy resources, with exactly the same outcome as described above. create_before_destroy doesn't work here, probably because the name property is fixed once created.

I tried to game the system by renaming the resource, but of course that fails as well, since it's effectively the same as a destroy & recreate operation.

Interestingly, CloudFormation does not require the name (although it can't be changed after creation), but the API for CreateRuleGroup and UpdateRuleGroup does require it. I'm guessing that CloudFormation just makes up a random name.

cornevandyk commented 4 months ago

fyi I tested this with CloudFormation, with the same outcome. I also spoke to AWS Support - the UpdateRuleGroup API call doesn't even include the capacity property. Their proposed workaround is to create a replacement group, attach it to the ACL, and remove the previous group, then delete it. Unfortunately in a pipeline-driven setup it means two PRs.

imo, this could be abstracted by an IaC provider by hiding the above steps.