Hang during terraform plan in 0.6.7

landro commented 8 years ago

Performing a terraform plan on an existing infrastructure (based on cloudstack and aws providers) that was created using 0.6.6 crashes:

crash.txt

phinze commented 8 years ago

Hi @landro - this is the same crash reported in #4036 and elsewhere that we believe is fixed in #4024. We're drawing a circle around a few important bugfixes and we'll be looking at a bugfix release soon.

In the meantime, if you're able to check latest master to see if it solves your issue, that'd be helpful!

landro commented 8 years ago

I built fra master, and gave it a go. Now it hangs again (instead of crashing). I've attached debug logs debug.txt

phinze commented 8 years ago

Okay, so the key portion of the hanging debug log is here:

2015/11/24 16:42:56 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws
2015/11/24 16:42:56 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api
2015/11/24 16:42:56 [DEBUG] vertex root, waiting for: provider.aws (close)
2015/11/24 16:43:01 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws
2015/11/24 16:43:01 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api
2015/11/24 16:43:01 [DEBUG] vertex root, waiting for: provider.aws (close)
2015/11/24 16:43:06 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws
2015/11/24 16:43:06 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api
2015/11/24 16:43:06 [DEBUG] vertex root, waiting for: provider.aws (close)
2015/11/24 16:43:11 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws
2015/11/24 16:43:11 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api
2015/11/24 16:43:11 [DEBUG] vertex root, waiting for: provider.aws (close)

Those three lines will likely continue to repeat every 5s while it hangs.

Will dig in some more and follow up.

phinze commented 8 years ago

Can you share the config for your aws_route52_record.api and describe how you are authenticating the AWS provider in your environment? That will help us narrow this down.

landro commented 8 years ago

resource "aws_route53_record" "api" {
   zone_id = "${var.route53_zone_id}"
   name = "my.domain.com"
   type = "A"
   ttl = "300"
   records = [
    "${cloudstack_ipaddress.lbext_tcp.0.ipaddress}",
    "${cloudstack_ipaddress.lbext_tcp.1.ipaddress}"   
    ]
}

I'm setting TF_VAR_access_key and TF_VAR_secret_key, reading into variable,

  variable "access_key" {}
  variable "secret_key" {}
  provider "aws" {
    access_key = "${var.access_key}"
    secret_key = "${var.secret_key}"
    region     = "eu-central-1"
  }

and passing them to the aws provider like so

  provider "aws" {
    access_key = "${var.access_key}"
    secret_key = "${var.secret_key}"
    region     = "eu-central-1"
  }

phinze commented 8 years ago

Great thank you. Going to see if I can cook up a reproduction with this.

phinze commented 8 years ago

Our initial attempts to reproduce have not been successful. Does Terraform v0.6.6 still complete a plan successfully in your environment?

phinze commented 8 years ago

Another shot in the dark to try out would be to switch from using variable and provider blocks to using environment variables to configure your providers (so AWS_* and CLOUDSTACK_*).

Sorry for the trouble! We'll get to the bottom of this.

landro commented 8 years ago

So I tried running a plan using 0.6.6 (and the original provider config), and that also hangs for a long time, repetedly showing

2015/11/25 08:27:32 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws 2015/11/25 08:27:37 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api

for like 5 mins and the it moves on, before it repeatedly shows the same thing for another 5 mins a few more times. In totalt, 20 minutes pass before plan is completed. Debug log attached. 0.6.6_terraform_plan_hang.txt

Grep-ing for "waiting for: aws_route53_record.api" in the log, shows it appearing every 5 sec during the entire run.

I executed terraform plan one more time - again it took 20 minutes.

landro commented 8 years ago

Anywhere I could add debugging code in order to shed light on what's going on? What does 2015/11/25 08:27:32 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws 2015/11/25 08:27:37 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api actually mean? What kind of thing are we waiting for - a process, a response from aws?

landro commented 8 years ago

I tried configuring the aws provider like so:

  provider "aws" {
    access_key = "whatever"
    secret_key = "something"
    region     = "eu-central-1"
  }

That also triggers waiting for: provider.aws

However:

  provider "aws" {
    access_key = "whatever"
    secret_key = "something"
    region     = "eu-central-9"
  }

immediately fails with: Not a valid region: eu-central-9 as expected

dverbeek84 commented 8 years ago

I have the same problem, only i am not using route53. It just hangs forrever

2015/11/25 21:19:25 [INFO] Terraform version: 0.6.7  b7c41aed92c2d6ea841f618b54e400fcbf824845+CHANGES
2015/11/25 21:19:25 [DEBUG] Detected home directory from env var: /Users/Danny
2015/11/25 21:19:25 [DEBUG] Discovered plugin: atlas = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-atlas
2015/11/25 21:19:25 [DEBUG] Discovered plugin: aws = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-aws
2015/11/25 21:19:25 [DEBUG] Discovered plugin: azure = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-azure
2015/11/25 21:19:25 [DEBUG] Discovered plugin: cloudflare = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-cloudflare
2015/11/25 21:19:25 [DEBUG] Discovered plugin: cloudstack = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-cloudstack
2015/11/25 21:19:25 [DEBUG] Discovered plugin: consul = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-consul
2015/11/25 21:19:25 [DEBUG] Discovered plugin: digitalocean = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-digitalocean
2015/11/25 21:19:25 [DEBUG] Discovered plugin: dme = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-dme
2015/11/25 21:19:25 [DEBUG] Discovered plugin: dnsimple = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-dnsimple
2015/11/25 21:19:25 [DEBUG] Discovered plugin: docker = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-docker
2015/11/25 21:19:25 [DEBUG] Discovered plugin: dyn = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-dyn
2015/11/25 21:19:25 [DEBUG] Discovered plugin: google = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-google
2015/11/25 21:19:25 [DEBUG] Discovered plugin: heroku = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-heroku
2015/11/25 21:19:25 [DEBUG] Discovered plugin: mailgun = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-mailgun
2015/11/25 21:19:25 [DEBUG] Discovered plugin: null = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-null
2015/11/25 21:19:25 [DEBUG] Discovered plugin: openstack = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-openstack
2015/11/25 21:19:25 [DEBUG] Discovered plugin: packet = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-packet
2015/11/25 21:19:25 [DEBUG] Discovered plugin: rundeck = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-rundeck
2015/11/25 21:19:25 [DEBUG] Discovered plugin: template = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-template
2015/11/25 21:19:25 [DEBUG] Discovered plugin: terraform = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-terraform
2015/11/25 21:19:25 [DEBUG] Discovered plugin: tls = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-tls
2015/11/25 21:19:25 [DEBUG] Discovered plugin: vsphere = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-vsphere
2015/11/25 21:19:25 [DEBUG] Discovered plugin: chef = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provisioner-chef
2015/11/25 21:19:25 [DEBUG] Discovered plugin: file = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provisioner-file
2015/11/25 21:19:25 [DEBUG] Discovered plugin: local-exec = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provisioner-local-exec
2015/11/25 21:19:25 [DEBUG] Discovered plugin: remote-exec = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provisioner-remote-exec
2015/11/25 21:19:25 [DEBUG] Detected home directory from env var: /Users/Danny
2015/11/25 21:19:25 [DEBUG] Attempting to open CLI config file: /Users/Danny/.terraformrc
2015/11/25 21:19:25 [DEBUG] Detected home directory from env var: /Users/Danny

landro commented 8 years ago

@dverbeek84 - have you tried building from master?

dverbeek84 commented 8 years ago

@landro yes, i just did. It was not working. It also hangs. switching back to 0.6.6 still works.

mlehner commented 8 years ago

Switching to AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION resolved the issue for me. Originally my access credentials using variables and TFVAR environment variables similar to @landro.

I was experiencing this issue while running 'terraform plan' or 'terraform refresh'. All the aws resources would just repeatedly say they were waiting on another resource every 5 seconds. I tried three different versions, but none of them worked. v0.6.6 (hang), v0.6.7 (crash) and master (hang)

phinze commented 8 years ago

Hey folks, some of the occurrences here might be explained by the HCL issue we found in #4082, which caused config parsing to hang on any file that ended in a comment line. That should be fixed on master with a make updatedeps to fetch the latest hashicorp/hcl dependency (and will be included in the forthcoming v0.6.8 release).

antonbabenko commented 8 years ago

I think such bug is rather critical and would be great if new version of Terraform is released (at least it should contain just this fix, if possible). Kind of, version 0.6.7.1 :)

landro commented 8 years ago

so removed trailing # and set credentials using AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION - and everything started working both in master and 0.6.6 Setting aws credential using variables, still hangs

phinze commented 8 years ago

@antonbabenko agreed! We're cutting a release today. :)

phinze commented 8 years ago

@landro and @mlehner - can either of you share with us some repro config / steps on the "AWS credentials with variables" hang? I've so far been unable to reproduce based on your description of your setup.

landro commented 8 years ago

I'll try to reproduce in a simple repo later today, @phinze

phinze commented 8 years ago

Thanks @landro!

landro commented 8 years ago

Managed to reproduce issue with a dead simple tf-file https://github.com/landro/terraform_aws_bug

HTH in tracking down the cause of this issue, @phinze

phinze commented 8 years ago

This is super helpful, @landro. Will dig into this tomorrow.

phinze commented 8 years ago

Hello again, unfortunately I'm not having any luck reproducing this using your example quite yet.

I can plan, apply, refresh, and destroy using v0.6.8. I also tried v0.6.6 and v0.6.7 and was unable to reproduce any hang.

The r53 APIs are quite slow - often taking minutes to complete creation, but I've gotten no hangs.

My prereqs config executed separately to get a valid zone ID:

variable "my_access_key" {}
variable "my_secret_key" {}

provider "aws" {
  access_key = "${var.my_access_key}"
  secret_key = "${var.my_secret_key}"
  region     = "eu-central-1"
}

resource "aws_vpc" "primary" {
  cidr_block = "10.1.2.0/24"
}

resource "aws_route53_zone" "primary" {
  vpc_id = "${aws_vpc.primary.id}"
  name = "example.com"
}

output "zone_id" {
  value = "${aws_route53_zone.primary.id}"
}

The repro config:

variable "my_access_key" {}
variable "my_secret_key" {}

provider "aws" {
  access_key = "${var.my_access_key}"
  secret_key = "${var.my_secret_key}"
  region     = "eu-central-1"
}

resource "aws_route53_record" "default" {
   zone_id = "Z3TP2LAVI4BAXL" # Output from prereqs config
   name = "example.com"
   type = "A"
   ttl = "300"
   records = [
    "10.0.0.1"
    ]
}

Other details:

No AWS_* env vars set.
TF_VAR_my_access_key and TF_VAR_my_secret_key set to valid values.
Using darwin_amd64 on a mac.

Questions for @landro and anybody else seeing hangs:

Are you still seeing hangs on v0.6.8
Anything else could be affecting your environment that would help me reproduce?

jwadolowski commented 8 years ago

Hi guys,

I experienced the same for both v0.6.7 and v0.6.8. After a little bit of debugging and network tracing it turned out that the reason it hangs and loops forever is the fact that AWS provider does some background checks and tries to connect 169.254.169.254:80 (source code reference) before it moves on with further steps.

On AWS machines response to GET request to http://169.254.169.254/latest/meta-data/ is almost immediate. Surprisingly on my machine it was sometimes possible to establish TCP connection to this host (actually the varying factor was the network I was connected to).

As soon as I added a rule that drops all connections to 169.254.0.0/16 this issue is completely gone.

wirescrossed commented 8 years ago

After upgrading to 0.6.7 it works fine on OS X for me.

catsby commented 8 years ago

Hey @landro – are you running this on your local machine, or from an EC2 instance?

landro commented 8 years ago

Sorry for the long wait. Just to clarify @phinze - I manage to reproduce the issue using

  TF_VAR_my_access_key=acc TF_VAR_my_secret_key=sec TF_LOG=TRACE terraform plan

without any prereqs. I run the above command as is - literally using acc as my access key and sec as my secret key, and without modifying the bug.tf file in any way. I have run this from my laptop @catsby, on different networks, home, two different company networks and mobile 4g network. Same hangs all over.

I tried looking into what's going on with the 169.254.169.254:80 port that @jwadolowski mentioned. It turns out, it accepts connections even when I'm disconnected from the internet (I take down my wifi connection).

nmap -sT -p80 169.254.169.254 Nmap scan report for 169.254.169.254 Host is up (0.000065s latency). PORT STATE SERVICE 80/tcp open http

Nmap done: 1 IP address (1 host up) scanned in 0.03 seconds I noticed however, that often, curling this service hangs.

I believe, it might be a good idea to find a better way to sort out if we're running terraform from amazon or not. BTW, in my current project, we're using a private cloud running cloudstack, and only using aws for dns.

phinze commented 8 years ago

@landro this is very good information. We had an internal chat about this issue and #3243 + #2693 and decided to revamp AWS creds handling in the provider. @catsby explains more in his comment over here: https://github.com/hashicorp/terraform/issues/2693#issuecomment-162029528

Stay tuned for a new PR that hopefully addresses all these issues! :+1:

catsby commented 8 years ago

Hello friends! I just merged #4254 which refactored our AWS Authentication chain some. I'm not sure it will help here, however, some bits have moved and it's worth checking out.

If any of you could be so kind as to build from master and evaluate, please let us know your findings!

catsby commented 8 years ago

Hey all – I'm going to go ahead and close this, please let us know if you're still hitting this!

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform

Hang during terraform plan in 0.6.7 #4043