hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.74k stars 9.1k forks source link

Unexpected EOF on apply resulting in loss of state data #10462

Open repl-mike-roest opened 4 years ago

repl-mike-roest commented 4 years ago

Community Note

Terraform Version

0.11.11 2019-10-09 15:53:59.448 - Downloading plugin for provider "null" (2.1.2)... 2019-10-09 15:54:00.418 - Downloading plugin for provider "grafana" (1.5.0)... 2019-10-09 15:54:01.600 - Downloading plugin for provider "aws" (2.20.0)... 2019-10-09 15:54:06.609 - Downloading plugin for provider "template" (2.1.2)... 2019-10-09 15:54:07.611 - Downloading plugin for provider "random" (2.2.1)... 2019-10-09 15:54:08.705 - Downloading plugin for provider "external" (1.2.0)...

Affected Resource(s)

Terraform Configuration Files

I don't have a reproducable configuration.

Debug Output

2019-10-09 15:55:39.198 Error: Error applying plan: 2019-10-09 15:55:39.198 3 error(s) occurred: 2019-10-09 15:55:39.198 aws_lambda_function.sync: 1 error(s) occurred: 2019-10-09 15:55:39.198 aws_lambda_function.sync: unexpected EOF 2019-10-09 15:55:39.198 aws_elastic_beanstalk_environment.beanstalk: 1 error(s) occurred: 2019-10-09 15:55:39.198 aws_elastic_beanstalk_environment.beanstalk: unexpected EOF 2019-10-09 15:55:39.198 aws_lambda_function.proxy: 1 error(s) occurred: 2019-10-09 15:55:39.198 aws_lambda_function.proxy: unexpected EOF 2019-10-09 15:55:39.198 Terraform does not automatically rollback in the face of errors. 2019-10-09 15:55:39.198 Instead, your Terraform state file has been partially updated with 2019-10-09 15:55:39.198 any resources that successfully completed. Please address the error 2019-10-09 15:55:39.198 above and apply again to incrementally change your infrastructure.

Expected Behavior

These 3 resources should have been left in the state file

Actual Behavior

they were removed from the state file (while remaining in AWS) so subsequent runs tried to recreate the resources it thought was missing

Steps to Reproduce

Unknown we've only seen this twice on 2 completely distinct and seperate terraform configurations. Both contained a aws_elastic_beanstalk_environment resource.

nywilken commented 4 years ago

hi @repl-mike-roest sorry to hear you are running into trouble here. You mention not having any reproducible configuration. By any chance would you be able to share any debug log information so that the maintainers can better understand what might be happening.

Both contained a aws_elastic_beanstalk_environment resource.

Do both configurations contain lambda functions as well? Is there any special dependencies on the beanstalk environment resource.

repl-mike-roest commented 4 years ago

@nywilken we're making changes to our automated deployment here to turn on debug logging to see if we can capture that. It happened again yesterday with the exact same resources in a different environment. The first place I saw it did not have any lambda functions it was just a beanstalk application & environment and the supporting minutae (security groups, roles etc).

there is a API gateway acting as a HTTP_PROXY to the beanstalk environment that is the only dependency on the beanstalk environment resource. The lambda's are seperate (dynamo streams processors so they're not related to the beanstalk at all).

These deployment tasks are running on AWS fargate tasks.

repl-mike-roest commented 4 years ago

We had this happen again and managed to get it with debug log on. I'm unable to create it as a gist as it's 16.5 MB but I've thrown the file in google drive after cleaning out any secrets: https://drive.google.com/file/d/1_VPVR3iUYe_wCYOUtTTKzMQUeTgOPHle/view?usp=sharing

repl-mike-roest commented 4 years ago

Actually had 2 runs in a row (this is a automated system) the first I attached corrupted the state file. Then the next run further corrupted the state file (I've also uploaded this second log in google drive) https://drive.google.com/file/d/1wrcI7BkXGoydSaYrj7m99vJpMrRR7wD_/view?usp=sharing

repl-mike-roest commented 4 years ago

It seems like enabling debug logging has caused this to happen consistenly, after reverting the state file to the last good value it caused the problem again on the next run.

We're running this task via ECS fargate using cloudwatch logs log connector inside a custom docker container built based on node:10.16-alpine with terraform installed via

RUN wget https://releases.hashicorp.com/terraform/0.11.11/terraform_0.11.11_linux_amd64.zip && \
   mkdir ./terraform && unzip -o terraform_0.11.11_linux_amd64.zip -d ./terraform && \
   install terraform/terraform /usr/local/bin

Once I disabled the debug logging it was able to complete and correctly update the state file. Seems like maybe the increased level of log output from the cloudwatch logs is somehow triggering whatever is causing the EOF's which is then triggering the statefile corruption.

repl-mike-roest commented 4 years ago

I ran this exact same task (with the debug logging) on a regular docker EC2 instance and it worked correctly.

Ray-B commented 4 years ago

Any update on this?

justinretzolk commented 2 years ago

Hey y'all 👋 Thank you for taking the time to file this issue! Given that there's been a number of Terraform and AWS provider releases since this was initially filed, can anyone confirm whether you're still experiencing this behavior?