Closed hashibot closed 7 years ago
Hi @regin64 thanks for reporting this bug.
The attached debug log only contains the initial request for creation. There is a couple of reasons terraform apply
may have timed out. In order to find the root cause we'll need more details - do you mind attaching the full debug log (minus secret/sensitive data) and/or configs?
If you're worried about exposing secrets or sensitive data, feel free to encrypt the above using our PGP key available at https://keybase.io/hashicorp and send it to radek@hashicorp.com
Thanks.
Hi @radeksimko replying here for Regin, apologies for delay. Will send you direct pgp encrypted email now.
Hi @thelevante thank you for providing those details.
I'll share a relevant (yet anonymised) snippet here, just for context:
Error applying plan:
1 error(s) occurred:
* aws_lambda_function.redacted-name: 1 error(s) occurred:
* aws_lambda_function.redacted-name: Error creating Lambda function: RequestError: send request failed
caused by: Post https://lambda.us-west-2.amazonaws.com/2015-03-31/functions: dial tcp: lookup lambda.us-west-2.amazonaws.com on 127.0.0.1:53: read udp 127.0.0.1:56823->127.0.0.1:53: i/o timeout
I believe this is a client-side DNS related issue.
A couple of questions:
dig lambda.us-west-2.amazonaws.com
?curl -v https://lambda.us-west-2.amazonaws.com/2015-03-31/functions
?I have the same problem - lambda api DNS lookup failures - when our corporate opendns client is running. Failure seems specific to terraform's use:
plugin.terraform-provider-aws_v0.1.4_x4: caused by: Get https://lambda.us-east-1.amazonaws.com/2015-03-31/functions/example: dial tcp: lookup lambda.us-east-1.amazonaws.com on 127.0.0.1:53: read udp 127.0.0.1:61596->127.0.0.1:53: i/o timeout
I can resolve DNS just fine using other programs:
$ dig lambda.us-east-1.amazonaws.com @127.0.0.1
; <<>> DiG 9.8.3-P1 <<>> lambda.us-east-1.amazonaws.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45953
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;lambda.us-east-1.amazonaws.com. IN A
;; ANSWER SECTION:
lambda.us-east-1.amazonaws.com. 60 IN CNAME prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com.
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 50.16.135.137
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 54.86.201.80
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.226.4.125
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.199.141.99
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 54.85.88.123
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.192.59.90
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.233.233.173
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.233.179.150
;; Query time: 396 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Sep 26 03:07:04 2017
;; MSG SIZE rcvd: 715
$ curl -v https://lambda.us-east-1.amazonaws.com/2015-03-31/functions
* Trying 34.192.75.194...
* Connected to lambda.us-east-1.amazonaws.com (34.192.75.194) port 443 (#0)
> GET /2015-03-31/functions HTTP/1.1
> Host: lambda.us-east-1.amazonaws.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Content-Type: application/json
< Date: Tue, 26 Sep 2017 10:05:11 GMT
< x-amzn-ErrorType: MissingAuthenticationTokenException
< x-amzn-RequestId: 2f2f2594-a2a2-11e7-a83d-97f376d4fe4e
< Content-Length: 42
< Connection: keep-alive
<
* Connection #0 to host lambda.us-east-1.amazonaws.com left intact
{"message":"Missing Authentication Token"}
Thanks @radeksimko and @arohter for your responses. I am observing the same behavior as @arohter, specific to Terraform not other clients. I (or @regin64) will try using the AWS CLI and update this issue probably next week.
Any updates on this issue? I am experiencing the same problem.
Hi folks,
There is one last question to be answered (suggested by Radek): Were you able to reproduce this issue consistently?
This would help us better diagnose the issue, and probably where to search for if so.
Thanks
Yes, 100% reliable to reproduce for me.
Yes, I can reproduce it right now.
Thanks for the feedback. 2 last questions for me:
I saw the dial issue and all the stuff regarding OpenDNS. Just want to remove any other issue before digging more into it.
Thanks!
My terraform version is 0.10.7. I just downloaded and installed terraform from harshicorp website.
tf 0.10.7 + aws 1.1.0: still fails the same way, no matter what type of network I'm connected via:
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: 2017/10/21 01:47:32 [DEBUG] [aws-sdk-go] DEBUG: Send Request lambda/GetFunction failed, will retry, error RequestError: send request failed
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: caused by: Get https://lambda.us-east-1.amazonaws.com/2015-03-31/functions/manager: dial tcp: lookup lambda.us-east-1.amazonaws.com on 127.0.0.1:53: read udp 127.0.0.1:63270->127.0.0.1:53: i/o timeout
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: 2017/10/21 01:47:32 [DEBUG] [aws-sdk-go] DEBUG: Retrying Request lambda/GetFunction, attempt 2
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: 2017/10/21 01:47:32 [DEBUG] [aws-sdk-go] DEBUG: Request lambda/GetFunction Details:
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: ---[ REQUEST POST-SIGN ]-----------------------------
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: GET /2015-03-31/functions/manager HTTP/1.1
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: Host: lambda.us-east-1.amazonaws.com
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: User-Agent: aws-sdk-go/1.12.8 (go1.9; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.10.0-dev
Corporate OpenDNS local daemon proxy info:
Umbrella is running. Checking debug.opendns.com DNS…
"server m37.pao"
"device XXXXXXXXXXXXX"
"flags 36 0 40 10022003EEE1002010070000"
"originid 123456"
"orgid 12345"
"orgflags 1"
"actype 0"
"bundle 12345"
"source 123.123.123.123:47007"
"dnscrypt enabled (78862839306E)"
Currently using name servers: 127.0.0.1
Facing the same issue in regions us-west-2
, works smooth in us-east-2
. dig/nslookup
looks good. aws-cli
works fine. Any workaround for this?
An update to the above comment. So as I got it working after doing a dns cache flush and a restart of my laptop. Not exactly sure what got it fixed. BTW I am using OpenDNS .
@josephjoice nice to hear! thanks for the feedback :)
@roncato @arohter @thelevante could you try the same? (flushing the dns)
Thanks!
+1
EDIT: I debugged the call - I accidentally passed a wrong lambda function role and Terraform got "The role defined for the function cannot be assumed by Lambda." The error is not propagating to the user and Terraform is trying to create the function over and over again. See output.
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: ---[ REQUEST POST-SIGN ]-----------------------------
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: POST /2015-03-31/functions HTTP/1.1
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Host: lambda.us-east-1.amazonaws.com
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: User-Agent: aws-sdk-go/1.12.27 (go1.9; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.11.0-beta1
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Content-Length: 3291
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: X-Amz-Date: 20171124T150704Z
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Accept-Encoding: gzip
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4:
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: -----------------------------------------------------
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: 2017/11/24 16:07:04 [DEBUG] [aws-sdk-go] DEBUG: Response lambda/CreateFunction Details:
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: ---[ RESPONSE ]--------------------------------------
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: HTTP/2.0 400 Bad Request
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Content-Length: 90
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Content-Type: application/json
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Date: Fri, 24 Nov 2017 15:07:04 GMT
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: X-Amzn-Errortype: InvalidParameterValueException
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: X-Amzn-Requestid: 21c3e4eb-d129-11e7-95b1-e9d6aeef59a5
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4:
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4:
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: -----------------------------------------------------
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: 2017/11/24 16:07:04 [DEBUG] [aws-sdk-go] {"Type":"User","message":"The role defined for the function cannot be assumed by Lambda."}
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: 2017/11/24 16:07:04 [DEBUG] [aws-sdk-go] DEBUG: Validate Response lambda/CreateFunction failed, not retrying, error InvalidParameterValueException: The role defined for the function cannot be assumed by Lambda.
Flushing has no affect. It's also not region specific. Same DNS lookup failure no matter what region.
Hi folks, I'm afraid there isn't much we can do on Terraform's side. Based on the conversation and available data this is a client-side problem related to DNS. There's a slight chance of bumping into some rare bugs in the Go's DNS library, but I'm more confident this is rather a client-side and/or DNS-server related issue which needs to be debugged and resolved outside of Terraform.
Regardless of where the problem is we don't have enough data to reproduce this problem and help, so I'm going to close this.
@squidfunk This problem is unrelated to the thread here. You were apparently able to translate DNS and reach the API endpoint. The error you mentioned may be caused by genuine problem with insufficient permissions or just IAM being eventually consistent and slow in propagating the IAM role and/or associated policy. We don't ignore that error, we just intentionally retry on this error code to avoid raising it to the user when it's just an effect of eventual consistency. See https://github.com/terraform-providers/terraform-provider-aws/blob/943230985fefc7b203eedaf6059e905279b27645/aws/resource_aws_lambda_function.go#L333-L353
@radeksimko just wanted to give some debugging hints for someone running into the same problem - it was just meant as an FYI for someone coming here from Google having problems creating Lambda functions.
FYI - in case you're connected to a VPN (which none of you here mentioned) and experience DNS resolution issues, we have an open issue for that in core which I recommend you to follow: https://github.com/hashicorp/terraform/issues/3536
I have been able to fix this by building the aws and nomad providers using CGO_ENABLED. I am not sure how that would best be integrated into a distribution that pulls the binaries using terraform init
Here are the steps I followed:
go get github.com/terraform-providers/terraform-provider-aws
cd $GOPATH/src/github.com/terraform-providers/terraform-provider-aws
CGO_ENABLED=1 make build
cd -
cp $GOPATH/bin/terraform-provider-aws .terraform/plugins/darwin_amd64/terraform-provider-aws_v1.3.1_x4
terraform init
This is similar to the MRs we submitted to fix the nomad and vault brew formulas: https://github.com/Homebrew/homebrew-core/pull/7238 https://github.com/Homebrew/homebrew-core/pull/7246
But even if this is done for Terraform's brew formula, it still has an issue because the problem crops up with the providers in addition to core ala https://github.com/hashicorp/terraform/issues/3536 (maybe not, I only rebuilt the providers and it fixed the issues I was seeing in my use case).
@ramarnat what you're describing sounds like a problem with the built-in Go DNS resolver. AFAIK Nomad, Vault nor Terraform nor providers themselves implement custom DNS resolver. See https://golang.org/pkg/net/#hdr-Name_Resolution
Can you try and reproduce this outside of HashiCorp tools with a snippet of Go code that's just calling out to any known hostname?
Let me know if you need any help with that - eventually I can build a binary for you, but I'll need to know your target platform (OS + arch).
There must be something unique/different about your DNS - I'm not suggesting it's necessarily wrong, but unique enough that many others (incl. myself) are unable to reproduce this problem.
yep, this is absolutely because of the way Go does resolution. Specifically, a dynamic build (hence the CGO_ENABLED) will utilize native OS X DNS resolution (inspect-able with scutil --dns) rather than the DNS resolvers defined in /etc/resolv.conf. Other behavioral differences are possible as well. If any issues occur, and if you don't need the corporate vpn, split dns use cases, it is better to stick with the static build. But if you do, following the above steps will fix the issue, but unlike brew install nomad --with-dynamic
where it was possible to MR an optional flag, I don't know how easy it would be to provide that sort of option for terraform to seamlessly download a dynamically linked version when doing the init.
That said, for folks who are willing to take a couple of extra steps, they can get past the issue by following the steps I put out above, which was the main reason for posting, at least till a more elegant solution could be found.
AFAIK we don't have any near-term plans for building with CGO as cross-platform compilation becomes potentially non-trivial and related maintenance of that likewise.
However if we can find a solution in pure Go I'm sure we'd be happy to look into that (assuming it solves majority of the issues mentioned here and in the other core issue). After all that's why all of HashiCorp tools use https://github.com/mitchellh/go-homedir for homedir lookup - to avoid CGO. Admittedly we don't need to do that anymore since go 1.9, but hopefully that gives you an idea of our direction and way of thinking.
@arohter did you ever solve this? We have the same issue and are also using Umbrella
Thanks to all for the contributions above, switching DNS servers from Googles 8.8.8.8/8.8.4.4 to another DNS server fixed the issue for me.
We are aware of an issue with Terraform while using the Umbrella Roaming Client - specifically surrounding the large DNS responses on OSX of lambda.us-west-2.amazonaws.com
; MSG SIZE rcvd: 715
Since this is above the RFC 512 bytes for UDP DNS, this was causing a failure. Our roaming client version 2.0.62+ on OSX includes a fix for this issue and standard DNS queries are correctly truncated to 512 bytes and are once again able to be handled by Terraform. Note, this does happen with the roaming client on OSX; however, it would occur for any UDP DNS response over 512 bytes (use dig not nslookup) hence we're seeing reports across multiple providers at different times.
For lower versions of the client, a workaround is adding the domain lambda.us-west-2.amazonaws.com to your hosts file to ensure it resolves to an IP you receive when querying lambda.us-west-2.amazonaws.com.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!
This issue was originally opened by @regin64 as hashicorp/terraform#15783. It was migrated here as a result of the provider split. The original body of the issue is below.
Hi,
I was trying to create a lambda function using Terraform onto AWS but did not succeed. After Terraform trying to create the lambda function, the debug information below (the gist link) kept showing up until the Terraform is shut down due to timeout (10 minutes).
Terraform Version
v0.9.9
Terraform Configuration Files
Debug Output
https://gist.github.com/regin64/51081e8310b19de20b80e9cc341debb1
Expected Behavior
Create IAM role with policy, the lambda function
Actual Behavior
IAM role with policy was created, the lambda function was not
Steps to Reproduce
terraform plan
terraform apply