runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.82k stars 1.06k forks source link

Insufficient Disk Space for a plan causes obtuse failures #1047

Open dbolack opened 4 years ago

dbolack commented 4 years ago

We run our atlantis instance on a (now clear too) small AWS instance. As we have multiple sets of states being managed and use private github repo for modules, when might have a much larger cache and plan scratch space than most.

We have found that when dis space is close enough to full we end up getting plan failures that frankly, make no sense, and some form of corruption in the plugin cache. The only time there was a clue was when a provider needed to be grabbed for the plan.

I'm not entirely certain how to describe reproducing other than to say full up the disk and run a big plan with multiple providers,

Example error:

running "/usr/local/bin/terraform plan -input=false -refresh -no-color -out \"/home/ec2-user/.atlantis/repos/enthought/terraform/12345/default/silos/projectname/silos::projectname-default.tfplan\" -var-file /home/ec2-user/tfvars/jumpcloud.tfvars" in "/home/ec2-user/.atlantis/repos/enthought/terraform/12345/default/silos/projectname": exit status 1

Error: Failed to instantiate provider "aws" to obtain schema: fork/exec /home/ec2-user/.atlantis/repos/enthought/terraform/12345/default/silos/projectname/.terraform/plugins/linux_amd64/terraform-provider-aws_v2.63.0_x4: permission denied
lkysow commented 4 years ago

Hmm. I'm not sure how best to deal with this in Atlantis. I don't think it makes sense to build disk space checking into Atlantis. I think this is best dealt with via your own health checking.

grimm26 commented 4 years ago

Could we at least have a status or healthcheck endpoint that would report available disk and that atlantis can run? Or just add available disk to the /status output?

lkysow commented 4 years ago

Sure, adding it to the /status endpoint makes sense.

nitrocode commented 10 months ago

As a workaround, this pre_workflow_hook has been helping me keep my volumes clean

https://github.com/runatlantis/atlantis/issues/3238#issuecomment-1869094337