Closed landro closed 8 years ago
Hi @landro - this is the same crash reported in #4036 and elsewhere that we believe is fixed in #4024. We're drawing a circle around a few important bugfixes and we'll be looking at a bugfix release soon.
In the meantime, if you're able to check latest master to see if it solves your issue, that'd be helpful!
I built fra master, and gave it a go. Now it hangs again (instead of crashing). I've attached debug logs debug.txt
Okay, so the key portion of the hanging debug log is here:
2015/11/24 16:42:56 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws
2015/11/24 16:42:56 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api
2015/11/24 16:42:56 [DEBUG] vertex root, waiting for: provider.aws (close)
2015/11/24 16:43:01 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws
2015/11/24 16:43:01 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api
2015/11/24 16:43:01 [DEBUG] vertex root, waiting for: provider.aws (close)
2015/11/24 16:43:06 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws
2015/11/24 16:43:06 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api
2015/11/24 16:43:06 [DEBUG] vertex root, waiting for: provider.aws (close)
2015/11/24 16:43:11 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws
2015/11/24 16:43:11 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api
2015/11/24 16:43:11 [DEBUG] vertex root, waiting for: provider.aws (close)
Those three lines will likely continue to repeat every 5s while it hangs.
Will dig in some more and follow up.
Can you share the config for your aws_route52_record.api
and describe how you are authenticating the AWS provider in your environment? That will help us narrow this down.
resource "aws_route53_record" "api" {
zone_id = "${var.route53_zone_id}"
name = "my.domain.com"
type = "A"
ttl = "300"
records = [
"${cloudstack_ipaddress.lbext_tcp.0.ipaddress}",
"${cloudstack_ipaddress.lbext_tcp.1.ipaddress}"
]
}
I'm setting TF_VAR_access_key and TF_VAR_secret_key, reading into variable,
variable "access_key" {}
variable "secret_key" {}
provider "aws" {
access_key = "${var.access_key}"
secret_key = "${var.secret_key}"
region = "eu-central-1"
}
and passing them to the aws provider like so
provider "aws" {
access_key = "${var.access_key}"
secret_key = "${var.secret_key}"
region = "eu-central-1"
}
Great thank you. Going to see if I can cook up a reproduction with this.
Our initial attempts to reproduce have not been successful. Does Terraform v0.6.6 still complete a plan successfully in your environment?
Another shot in the dark to try out would be to switch from using variable
and provider
blocks to using environment variables to configure your providers (so AWS_*
and CLOUDSTACK_*
).
Sorry for the trouble! We'll get to the bottom of this.
So I tried running a plan using 0.6.6 (and the original provider config), and that also hangs for a long time, repetedly showing
2015/11/25 08:27:32 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws 2015/11/25 08:27:37 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api
for like 5 mins and the it moves on, before it repeatedly shows the same thing for another 5 mins a few more times. In totalt, 20 minutes pass before plan is completed. Debug log attached. 0.6.6_terraform_plan_hang.txt
Grep-ing for "waiting for: aws_route53_record.api" in the log, shows it appearing every 5 sec during the entire run.
I executed terraform plan one more time - again it took 20 minutes.
Anywhere I could add debugging code in order to shed light on what's going on? What does 2015/11/25 08:27:32 [DEBUG] vertex aws_route53_record.api, waiting for: provider.aws 2015/11/25 08:27:37 [DEBUG] vertex provider.aws (close), waiting for: aws_route53_record.api actually mean? What kind of thing are we waiting for - a process, a response from aws?
I tried configuring the aws provider like so:
provider "aws" {
access_key = "whatever"
secret_key = "something"
region = "eu-central-1"
}
That also triggers waiting for: provider.aws
However:
provider "aws" {
access_key = "whatever"
secret_key = "something"
region = "eu-central-9"
}
immediately fails with: Not a valid region: eu-central-9 as expected
I have the same problem, only i am not using route53. It just hangs forrever
2015/11/25 21:19:25 [INFO] Terraform version: 0.6.7 b7c41aed92c2d6ea841f618b54e400fcbf824845+CHANGES
2015/11/25 21:19:25 [DEBUG] Detected home directory from env var: /Users/Danny
2015/11/25 21:19:25 [DEBUG] Discovered plugin: atlas = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-atlas
2015/11/25 21:19:25 [DEBUG] Discovered plugin: aws = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-aws
2015/11/25 21:19:25 [DEBUG] Discovered plugin: azure = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-azure
2015/11/25 21:19:25 [DEBUG] Discovered plugin: cloudflare = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-cloudflare
2015/11/25 21:19:25 [DEBUG] Discovered plugin: cloudstack = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-cloudstack
2015/11/25 21:19:25 [DEBUG] Discovered plugin: consul = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-consul
2015/11/25 21:19:25 [DEBUG] Discovered plugin: digitalocean = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-digitalocean
2015/11/25 21:19:25 [DEBUG] Discovered plugin: dme = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-dme
2015/11/25 21:19:25 [DEBUG] Discovered plugin: dnsimple = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-dnsimple
2015/11/25 21:19:25 [DEBUG] Discovered plugin: docker = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-docker
2015/11/25 21:19:25 [DEBUG] Discovered plugin: dyn = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-dyn
2015/11/25 21:19:25 [DEBUG] Discovered plugin: google = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-google
2015/11/25 21:19:25 [DEBUG] Discovered plugin: heroku = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-heroku
2015/11/25 21:19:25 [DEBUG] Discovered plugin: mailgun = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-mailgun
2015/11/25 21:19:25 [DEBUG] Discovered plugin: null = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-null
2015/11/25 21:19:25 [DEBUG] Discovered plugin: openstack = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-openstack
2015/11/25 21:19:25 [DEBUG] Discovered plugin: packet = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-packet
2015/11/25 21:19:25 [DEBUG] Discovered plugin: rundeck = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-rundeck
2015/11/25 21:19:25 [DEBUG] Discovered plugin: template = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-template
2015/11/25 21:19:25 [DEBUG] Discovered plugin: terraform = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-terraform
2015/11/25 21:19:25 [DEBUG] Discovered plugin: tls = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-tls
2015/11/25 21:19:25 [DEBUG] Discovered plugin: vsphere = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provider-vsphere
2015/11/25 21:19:25 [DEBUG] Discovered plugin: chef = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provisioner-chef
2015/11/25 21:19:25 [DEBUG] Discovered plugin: file = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provisioner-file
2015/11/25 21:19:25 [DEBUG] Discovered plugin: local-exec = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provisioner-local-exec
2015/11/25 21:19:25 [DEBUG] Discovered plugin: remote-exec = /opt/homebrew-cask/Caskroom/terraform/0.6.7/terraform-provisioner-remote-exec
2015/11/25 21:19:25 [DEBUG] Detected home directory from env var: /Users/Danny
2015/11/25 21:19:25 [DEBUG] Attempting to open CLI config file: /Users/Danny/.terraformrc
2015/11/25 21:19:25 [DEBUG] Detected home directory from env var: /Users/Danny
@dverbeek84 - have you tried building from master?
@landro yes, i just did. It was not working. It also hangs. switching back to 0.6.6 still works.
Switching to AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION resolved the issue for me. Originally my access credentials using variables and TFVAR environment variables similar to @landro.
I was experiencing this issue while running 'terraform plan' or 'terraform refresh'. All the aws resources would just repeatedly say they were waiting on another resource every 5 seconds. I tried three different versions, but none of them worked. v0.6.6 (hang), v0.6.7 (crash) and master (hang)
Hey folks, some of the occurrences here might be explained by the HCL issue we found in #4082, which caused config parsing to hang on any file that ended in a comment line. That should be fixed on master with a make updatedeps
to fetch the latest hashicorp/hcl
dependency (and will be included in the forthcoming v0.6.8 release).
I think such bug is rather critical and would be great if new version of Terraform is released (at least it should contain just this fix, if possible). Kind of, version 0.6.7.1 :)
so removed trailing # and set credentials using AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION - and everything started working both in master and 0.6.6 Setting aws credential using variables, still hangs
@antonbabenko agreed! We're cutting a release today. :)
@landro and @mlehner - can either of you share with us some repro config / steps on the "AWS credentials with variables" hang? I've so far been unable to reproduce based on your description of your setup.
I'll try to reproduce in a simple repo later today, @phinze
Thanks @landro!
Managed to reproduce issue with a dead simple tf-file https://github.com/landro/terraform_aws_bug
HTH in tracking down the cause of this issue, @phinze
This is super helpful, @landro. Will dig into this tomorrow.
Hello again, unfortunately I'm not having any luck reproducing this using your example quite yet.
I can plan, apply, refresh, and destroy using v0.6.8. I also tried v0.6.6 and v0.6.7 and was unable to reproduce any hang.
The r53 APIs are quite slow - often taking minutes to complete creation, but I've gotten no hangs.
My prereqs config executed separately to get a valid zone ID:
variable "my_access_key" {}
variable "my_secret_key" {}
provider "aws" {
access_key = "${var.my_access_key}"
secret_key = "${var.my_secret_key}"
region = "eu-central-1"
}
resource "aws_vpc" "primary" {
cidr_block = "10.1.2.0/24"
}
resource "aws_route53_zone" "primary" {
vpc_id = "${aws_vpc.primary.id}"
name = "example.com"
}
output "zone_id" {
value = "${aws_route53_zone.primary.id}"
}
The repro config:
variable "my_access_key" {}
variable "my_secret_key" {}
provider "aws" {
access_key = "${var.my_access_key}"
secret_key = "${var.my_secret_key}"
region = "eu-central-1"
}
resource "aws_route53_record" "default" {
zone_id = "Z3TP2LAVI4BAXL" # Output from prereqs config
name = "example.com"
type = "A"
ttl = "300"
records = [
"10.0.0.1"
]
}
Other details:
AWS_*
env vars set.TF_VAR_my_access_key
and TF_VAR_my_secret_key
set to valid values.Questions for @landro and anybody else seeing hangs:
Hi guys,
I experienced the same for both v0.6.7 and v0.6.8. After a little bit of debugging and network tracing it turned out that the reason it hangs and loops forever is the fact that AWS provider does some background checks and tries to connect 169.254.169.254:80
(source code reference) before it moves on with further steps.
On AWS machines response to GET
request to http://169.254.169.254/latest/meta-data/
is almost immediate. Surprisingly on my machine it was sometimes possible to establish TCP connection to this host (actually the varying factor was the network I was connected to).
As soon as I added a rule that drops all connections to 169.254.0.0/16
this issue is completely gone.
After upgrading to 0.6.7 it works fine on OS X for me.
Hey @landro – are you running this on your local machine, or from an EC2 instance?
Sorry for the long wait. Just to clarify @phinze - I manage to reproduce the issue using
TF_VAR_my_access_key=acc TF_VAR_my_secret_key=sec TF_LOG=TRACE terraform plan
without any prereqs. I run the above command as is - literally using acc as my access key and sec as my secret key, and without modifying the bug.tf file in any way. I have run this from my laptop @catsby, on different networks, home, two different company networks and mobile 4g network. Same hangs all over.
I tried looking into what's going on with the 169.254.169.254:80 port that @jwadolowski mentioned. It turns out, it accepts connections even when I'm disconnected from the internet (I take down my wifi connection).
nmap -sT -p80 169.254.169.254 Nmap scan report for 169.254.169.254 Host is up (0.000065s latency). PORT STATE SERVICE 80/tcp open http
Nmap done: 1 IP address (1 host up) scanned in 0.03 seconds I noticed however, that often, curling this service hangs.
I believe, it might be a good idea to find a better way to sort out if we're running terraform from amazon or not. BTW, in my current project, we're using a private cloud running cloudstack, and only using aws for dns.
@landro this is very good information. We had an internal chat about this issue and #3243 + #2693 and decided to revamp AWS creds handling in the provider. @catsby explains more in his comment over here: https://github.com/hashicorp/terraform/issues/2693#issuecomment-162029528
Stay tuned for a new PR that hopefully addresses all these issues! :+1:
Hello friends! I just merged #4254 which refactored our AWS Authentication chain some. I'm not sure it will help here, however, some bits have moved and it's worth checking out.
If any of you could be so kind as to build from master
and evaluate, please let us know your findings!
Hey all – I'm going to go ahead and close this, please let us know if you're still hitting this!
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Performing a terraform plan on an existing infrastructure (based on cloudstack and aws providers) that was created using 0.6.6 crashes:
crash.txt