hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.82k stars 9.16k forks source link

[Bug]: Provider produced inconsistent final plan for complex `aws_wafv2_web_acl` configurations #27273

Closed mkielar closed 1 year ago

mkielar commented 2 years ago

Related:

Terraform Core Version

1.3.1

AWS Provider Version

4.34.

Affected Resource(s)

Expected Behavior

Running terraform apply should finish successfully

Actual Behavior

Running terraform apply fails, and outputs a ~2.5MB Go StackTrace.

Relevant Error/Panic Output Snippet

Plan: 0 to add, 1 to change, 0 to destroy.

Error: Provider produced inconsistent final plan

When expanding the plan for aws_wafv2_web_acl.main to include new values
learned so far during apply, provider "registry.terraform.io/hashicorp/aws"
produced an invalid new value for .rule: planned set element
cty.ObjectVal(map[string]cty.Value{"action":cty.ListValEmpty(cty.Object(map[string]cty.Type{"allow":cty.List(cty.Object(map[string]cty.Type{"custom_request_handling":cty.List(cty.Object(map[string]cty.Type{"insert_header":cty.Set(cty.Object(map[string]cty.Type{"name":cty.String,
"value":cty.String}))}))})),
"block":cty.List(cty.Object(map[string]cty.Type{"custom_response":cty.List(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String,
"response_code":cty.Number,

... (this continues for 2,5MB (see "_expected/out.log" in attached ZIP File)

Terraform Configuration Files

See attached: tf-waf-custom-response-bug.zip

Steps to Reproduce

  1. terraform init
  2. terraform apply -var='v=1' This will deploy all resources and will provision v1 of the Custom Response we configure for WAF WebACL to display a Maintenance Page. This should pass correctly.
  3. terraform apply -var='v=2' --auto-approve > out.log 2>&1
    This will mimic making modification to the HTML in Custom Response (terraform will use a differenf file to generate a change in WebACL Custom Response Configuration). You should see a very long exception logged.

Debug Output

N/A

Panic Output

See _expected/out.log in attached ZIP file.

Important Factoids

  1. I initially thought the error is caused by HTML in CustomResource, but then I started reproducing this error with minimal configuration, and only was able to reproduce when I added all of the rules we use in our production. E.g., if you remove the last set of rules (the dynamic section operating on local.managed_rules) the error no longer occurs. This would suggest that the error is a result of overall complexity (or perhaps size?) of the change to apply, or a combination of settings, rather than a single setting. But that's just my impression.
  2. I have tried several other changes to work around the issue, and it seems like trying to change any custom response in this particular setup causes that error. 2.1. I eventually changed the website rule to returning a 307 response with Location header, and that would still fail when trying to apply the change 2.2. I also tried to remove the custom response for api rule, and make it also only respond with status and headers, and terraform failed to remove custom response in this case as well (in this case the reponse is just a simple JSON, so it seems the content is irrelevant).
  3. I also tried running that with TF_LOG=debug (unfortunately don't have that log anymore) and I remember seeing several "Produced inconsistent plan, but we don't care because it's using legacy SDK" sort of messages around all WAFv2 resource (not just WebACL). Perhaps that's related?

References

No response

Would you like to implement a fix?

No

github-actions[bot] commented 2 years ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

ewbankkit commented 2 years ago

@mkielar Thanks for raising this issue πŸ‘. Does the same error occur if you use v4.33.0 of the Terraform AWS Provider?

Possibly related:

mkielar commented 2 years ago

@ewbankkit, yup, v4.33.0 also fails. I haven't verified if its exactly the same error but it looks similar after a brief review. Attached: out_4_33.zip

xael-fry commented 2 years ago

It look as the same problem as #23936, #27175, #23390, #23992

I try with 4.34.0, 4.10.0 and 4.0.0 and I have the same issue, if I rollback to 3.74.0 It did not have this issue.

mmaetzler commented 1 year ago

FYI: This issues with WAFv2 has been reported multiple times:

elbazon commented 1 year ago

A frustrating issue... I am encountering it as well.

I ended up creating it manually and simply ignoring it for now. I wish that this is will be resolved.

smailc commented 1 year ago

Is this being worked on? Just started happening for me. Adding any new block and attempting apply gives a huge provider error.

petur commented 1 year ago

This started happening after upgrading to version 4.52.0, even with no changes in the configuration. After the provider is upgraded, terraform plan shows a huge diff for the resource (even if nothing is really changing), then when it tries to apply it fails with this error.

Staying on 4.51.0 isn't a viable workaround because oversize_handling will become required soon.

bkona-alopa commented 1 year ago

Same thing is happening after upgrading to version 4.52.0.

nodomain commented 1 year ago

Workaround to pin version to 4.51.0 did not do the trick here. Any other workarounds?

nunofernandes commented 1 year ago

The workaround for me was to taint the resource and then apply. It recreates the WAFv2. Another option that I did and worked was to use aws console to delete a few of the rules in the WAF and then doing an apply. That also worked (sometimes).

YakDriver commented 1 year ago

NOTE: I cannot reproduce this error using Terraform v1.5+/AWS provider v5.7+ after trying various configurations. Retry using a minimum of Terraform v1.4.2/AWS provider v4.67.0 but preferably Terraform v1.5.3+/AWS provider v5.8.0+ and let us know if this is still a problem! If we don't hear back and can't reproduce, we plan to close this on or around July 20, 2023. The evidence suggests this is OBE (ie, fixed in the interim).

For more details see #23992 (comment) and #28672 (comment).

mkielar commented 1 year ago

@YakDriver, thanks for looking into the issue.

I have tested some configurations this morning, mostly the ones I currently have + the ones I'm planning to migrate to:

1.4.2 + 4.67.0: worked!
1.4.6 + 4.67.0: worked!
1.5.3 + 5.8.0:  worked! # Needed to refactor excluded_rule => rule_action_override for this to work.

I can also say I've introduced some (sometimes significant) changes to my WAF deployment scripts since this ticket was raised, and all of them worked without issues. Looks like this is indeed resolved, at least for my case.

YakDriver commented 1 year ago

@mkielar Thank you for your response! Some of the issues in this family were related to Terraform core fixes (yours, I believe) and provider fixes (such as tag-related problems).

justinretzolk commented 1 year ago

Hi all :wave: As was mentioned above, this issue appears to be fixed when using a minimum Terraform version of 1.4.2 and a minimum AWS Provider version of 4.67.0 (preferably Terraform 1.5.3 or later and AWS Provider 5.8.0 or later). If you experience additional unexpected behaviors with versions that meet these parameters, please open a new issue so that we can investigate further.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.