jmccann / drone-terraform

Drone plugin for triggering Terraform deployments
http://plugins.drone.io/jmccann/terraform/
Apache License 2.0
85 stars 92 forks source link

Persist the state failed in Drone Server #95

Open PierreRAFFA opened 5 years ago

PierreRAFFA commented 5 years ago

Hi there, I am facing an issue on apply where the state could not have pushed for some reason in Drone Server. When this happens from my local machine, I just run the command specified at the end. But how to do it when this happens from Drone Server ? Does it make sense to create a dedicated step after the deployment to run terraform state push errored.tfstate ? Any Idea ? Thanks.

Failed to save state: failed to upload state: RequestError: send request failed
caused by: Put https://********.s3.eu-central-1.amazonaws.com/stag/us-east-1/infra/terraform.tfstate: dial tcp: lookup ********.s3.eu-central-1.amazonaws.com on 127.0.0.11:53: read udp 127.0.0.1:50397->127.0.0.11:53: i/o timeout

Error: Failed to persist state to backend.

The error shown above has prevented Terraform from writing the updated state
to the configured backend. To allow for recovery, the state has been written
to the file "errored.tfstate" in the current working directory.

Running "terraform apply" again at this point will create a forked state,
making it harder to recover.

To retry writing this state, use the following command:
    terraform state push errored.tfstate
caioquirino commented 5 years ago

Looks like you have connection issues from your drone server :) More specifically DNS problems. Once you fix it, it should work :) You can try changing the DNS server that you are using or mitigate why this one is not working

PierreRAFFA commented 5 years ago

@caioquirino oh sorry ...I should have been more precise. The pipeline worked all the time but this situation happened once the last week. And when this happens, even if it's rare, how to manage it ?

caioquirino commented 5 years ago

This is a connection issue, so you can understand better how your drone containers are running, and based on that, mitigate what can be happened. Examples:

If they are running on virtual machines/ec2/etc, you can SSH to that machine and try to investigate if the connection with the dns server is working

If they are running on ECS, Kubernetes or any container orchestrator, you can run a container with a dig or host command eg.: host github.com and see the output, or even add a step into your drone pipeline that does that...

This problem appears to be more a connectivity issue. Even more with this happening only once :)