hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.87k stars 9.21k forks source link

Not able to create lambda function with aws_lambda_function #1392

Closed hashibot closed 7 years ago

hashibot commented 7 years ago

This issue was originally opened by @regin64 as hashicorp/terraform#15783. It was migrated here as a result of the provider split. The original body of the issue is below.


Hi,

I was trying to create a lambda function using Terraform onto AWS but did not succeed. After Terraform trying to create the lambda function, the debug information below (the gist link) kept showing up until the Terraform is shut down due to timeout (10 minutes).

Terraform Version

v0.9.9

Terraform Configuration Files

provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region     = "${var.aws_region}"
}

resource "aws_lambda_function" "Resource-Creation-Tagger-2" {
  filename         = "lambda_function.zip"
  function_name    = "lambda_function"
  role             = "${aws_iam_role.Resource-Creation-Tagger-Role.arn}"
  handler          = "lambda_function.lambda_handler"
  runtime          = "python2.7"
  memory_size      = "128"
  timeout          = "3"
}

resource "aws_iam_role" "Resource-Creation-Tagger-Role" {
    name = "Resource-Creation-Tagger-Role"

    assume_role_policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            }
        }
    ]
}
EOF
}

resource "aws_iam_role_policy" "Resource-Tagger-Role-Policy" {
    name = "Resource-Tagger-Role-Policy"
    role = "${aws_iam_role.Resource-Creation-Tagger-Role.id}"

    policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:*"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": "ec2:Describe*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
EOF
}

Debug Output

https://gist.github.com/regin64/51081e8310b19de20b80e9cc341debb1

Expected Behavior

Create IAM role with policy, the lambda function

Actual Behavior

IAM role with policy was created, the lambda function was not

Steps to Reproduce

  1. terraform plan
  2. terraform apply
radeksimko commented 7 years ago

Hi @regin64 thanks for reporting this bug.

The attached debug log only contains the initial request for creation. There is a couple of reasons terraform apply may have timed out. In order to find the root cause we'll need more details - do you mind attaching the full debug log (minus secret/sensitive data) and/or configs?

If you're worried about exposing secrets or sensitive data, feel free to encrypt the above using our PGP key available at https://keybase.io/hashicorp and send it to radek@hashicorp.com

Thanks.

thelevante commented 7 years ago

Hi @radeksimko replying here for Regin, apologies for delay. Will send you direct pgp encrypted email now.

radeksimko commented 7 years ago

Hi @thelevante thank you for providing those details.

I'll share a relevant (yet anonymised) snippet here, just for context:

Error applying plan:

1 error(s) occurred:

* aws_lambda_function.redacted-name: 1 error(s) occurred:

* aws_lambda_function.redacted-name: Error creating Lambda function: RequestError: send request failed
caused by: Post https://lambda.us-west-2.amazonaws.com/2015-03-31/functions: dial tcp: lookup lambda.us-west-2.amazonaws.com on 127.0.0.1:53: read udp 127.0.0.1:56823->127.0.0.1:53: i/o timeout

I believe this is a client-side DNS related issue.

A couple of questions:

arohter commented 7 years ago

I have the same problem - lambda api DNS lookup failures - when our corporate opendns client is running. Failure seems specific to terraform's use:

plugin.terraform-provider-aws_v0.1.4_x4: caused by: Get https://lambda.us-east-1.amazonaws.com/2015-03-31/functions/example: dial tcp: lookup lambda.us-east-1.amazonaws.com on 127.0.0.1:53: read udp 127.0.0.1:61596->127.0.0.1:53: i/o timeout

I can resolve DNS just fine using other programs:

$ dig lambda.us-east-1.amazonaws.com @127.0.0.1

; <<>> DiG 9.8.3-P1 <<>> lambda.us-east-1.amazonaws.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45953
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;lambda.us-east-1.amazonaws.com.    IN  A

;; ANSWER SECTION:
lambda.us-east-1.amazonaws.com. 60 IN   CNAME   prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com.
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 50.16.135.137
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 54.86.201.80
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.226.4.125
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.199.141.99
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 54.85.88.123
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.192.59.90
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.233.233.173
prod-04-2014-elb-1276263184.us-east-1.elb.amazonaws.com. 60 IN A 34.233.179.150

;; Query time: 396 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Sep 26 03:07:04 2017
;; MSG SIZE  rcvd: 715

$ curl -v https://lambda.us-east-1.amazonaws.com/2015-03-31/functions
*   Trying 34.192.75.194...
* Connected to lambda.us-east-1.amazonaws.com (34.192.75.194) port 443 (#0)
> GET /2015-03-31/functions HTTP/1.1
> Host: lambda.us-east-1.amazonaws.com
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 403 Forbidden
< Content-Type: application/json
< Date: Tue, 26 Sep 2017 10:05:11 GMT
< x-amzn-ErrorType: MissingAuthenticationTokenException
< x-amzn-RequestId: 2f2f2594-a2a2-11e7-a83d-97f376d4fe4e
< Content-Length: 42
< Connection: keep-alive
< 
* Connection #0 to host lambda.us-east-1.amazonaws.com left intact
{"message":"Missing Authentication Token"}
thelevante commented 7 years ago

Thanks @radeksimko and @arohter for your responses. I am observing the same behavior as @arohter, specific to Terraform not other clients. I (or @regin64) will try using the AWS CLI and update this issue probably next week.

roncato commented 7 years ago

Any updates on this issue? I am experiencing the same problem.

Ninir commented 7 years ago

Hi folks,

There is one last question to be answered (suggested by Radek): Were you able to reproduce this issue consistently?

This would help us better diagnose the issue, and probably where to search for if so.

Thanks

arohter commented 7 years ago

Yes, 100% reliable to reproduce for me.

roncato commented 7 years ago

Yes, I can reproduce it right now.

Ninir commented 7 years ago

Thanks for the feedback. 2 last questions for me:

I saw the dial issue and all the stuff regarding OpenDNS. Just want to remove any other issue before digging more into it.

Thanks!

roncato commented 7 years ago

My terraform version is 0.10.7. I just downloaded and installed terraform from harshicorp website.

arohter commented 7 years ago

tf 0.10.7 + aws 1.1.0: still fails the same way, no matter what type of network I'm connected via:

2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: 2017/10/21 01:47:32 [DEBUG] [aws-sdk-go] DEBUG: Send Request lambda/GetFunction failed, will retry, error RequestError: send request failed
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: caused by: Get https://lambda.us-east-1.amazonaws.com/2015-03-31/functions/manager: dial tcp: lookup lambda.us-east-1.amazonaws.com on 127.0.0.1:53: read udp 127.0.0.1:63270->127.0.0.1:53: i/o timeout
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: 2017/10/21 01:47:32 [DEBUG] [aws-sdk-go] DEBUG: Retrying Request lambda/GetFunction, attempt 2
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: 2017/10/21 01:47:32 [DEBUG] [aws-sdk-go] DEBUG: Request lambda/GetFunction Details:
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: ---[ REQUEST POST-SIGN ]-----------------------------
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: GET /2015-03-31/functions/manager HTTP/1.1
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: Host: lambda.us-east-1.amazonaws.com
2017-10-21T01:47:32.543-0700 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: User-Agent: aws-sdk-go/1.12.8 (go1.9; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.10.0-dev

Corporate OpenDNS local daemon proxy info:

Umbrella is running. Checking debug.opendns.com DNS…
  "server m37.pao"
  "device XXXXXXXXXXXXX"
  "flags 36 0 40 10022003EEE1002010070000"
  "originid 123456"
  "orgid 12345"
  "orgflags 1"
  "actype 0"
  "bundle 12345"
  "source 123.123.123.123:47007"
  "dnscrypt enabled (78862839306E)"
Currently using name servers: 127.0.0.1 
josephjoice commented 7 years ago

Facing the same issue in regions us-west-2, works smooth in us-east-2. dig/nslookup looks good. aws-cli works fine. Any workaround for this?

josephjoice commented 7 years ago

An update to the above comment. So as I got it working after doing a dns cache flush and a restart of my laptop. Not exactly sure what got it fixed. BTW I am using OpenDNS .

Ninir commented 7 years ago

@josephjoice nice to hear! thanks for the feedback :)

@roncato @arohter @thelevante could you try the same? (flushing the dns)

Thanks!

squidfunk commented 7 years ago

+1

EDIT: I debugged the call - I accidentally passed a wrong lambda function role and Terraform got "The role defined for the function cannot be assumed by Lambda." The error is not propagating to the user and Terraform is trying to create the function over and over again. See output.

2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: ---[ REQUEST POST-SIGN ]-----------------------------
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: POST /2015-03-31/functions HTTP/1.1
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Host: lambda.us-east-1.amazonaws.com
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: User-Agent: aws-sdk-go/1.12.27 (go1.9; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.11.0-beta1
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Content-Length: 3291
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: X-Amz-Date: 20171124T150704Z
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Accept-Encoding: gzip
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: 
2017-11-24T16:07:04.463+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: -----------------------------------------------------
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: 2017/11/24 16:07:04 [DEBUG] [aws-sdk-go] DEBUG: Response lambda/CreateFunction Details:
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: ---[ RESPONSE ]--------------------------------------
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: HTTP/2.0 400 Bad Request
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Content-Length: 90
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Content-Type: application/json
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: Date: Fri, 24 Nov 2017 15:07:04 GMT
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: X-Amzn-Errortype: InvalidParameterValueException
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: X-Amzn-Requestid: 21c3e4eb-d129-11e7-95b1-e9d6aeef59a5
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: 
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: 
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: -----------------------------------------------------
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: 2017/11/24 16:07:04 [DEBUG] [aws-sdk-go] {"Type":"User","message":"The role defined for the function cannot be assumed by Lambda."}
2017-11-24T16:07:04.911+0100 [DEBUG] plugin.terraform-provider-aws_v1.3.1_x4: 2017/11/24 16:07:04 [DEBUG] [aws-sdk-go] DEBUG: Validate Response lambda/CreateFunction failed, not retrying, error InvalidParameterValueException: The role defined for the function cannot be assumed by Lambda.
arohter commented 7 years ago

Flushing has no affect. It's also not region specific. Same DNS lookup failure no matter what region.

radeksimko commented 7 years ago

Hi folks, I'm afraid there isn't much we can do on Terraform's side. Based on the conversation and available data this is a client-side problem related to DNS. There's a slight chance of bumping into some rare bugs in the Go's DNS library, but I'm more confident this is rather a client-side and/or DNS-server related issue which needs to be debugged and resolved outside of Terraform.

Regardless of where the problem is we don't have enough data to reproduce this problem and help, so I'm going to close this.

@squidfunk This problem is unrelated to the thread here. You were apparently able to translate DNS and reach the API endpoint. The error you mentioned may be caused by genuine problem with insufficient permissions or just IAM being eventually consistent and slow in propagating the IAM role and/or associated policy. We don't ignore that error, we just intentionally retry on this error code to avoid raising it to the user when it's just an effect of eventual consistency. See https://github.com/terraform-providers/terraform-provider-aws/blob/943230985fefc7b203eedaf6059e905279b27645/aws/resource_aws_lambda_function.go#L333-L353

squidfunk commented 7 years ago

@radeksimko just wanted to give some debugging hints for someone running into the same problem - it was just meant as an FYI for someone coming here from Google having problems creating Lambda functions.

radeksimko commented 6 years ago

FYI - in case you're connected to a VPN (which none of you here mentioned) and experience DNS resolution issues, we have an open issue for that in core which I recommend you to follow: https://github.com/hashicorp/terraform/issues/3536

ramarnat commented 6 years ago

I have been able to fix this by building the aws and nomad providers using CGO_ENABLED. I am not sure how that would best be integrated into a distribution that pulls the binaries using terraform init

Here are the steps I followed:

go get github.com/terraform-providers/terraform-provider-aws
cd $GOPATH/src/github.com/terraform-providers/terraform-provider-aws
CGO_ENABLED=1 make build
cd -
cp $GOPATH/bin/terraform-provider-aws .terraform/plugins/darwin_amd64/terraform-provider-aws_v1.3.1_x4
terraform init

This is similar to the MRs we submitted to fix the nomad and vault brew formulas: https://github.com/Homebrew/homebrew-core/pull/7238 https://github.com/Homebrew/homebrew-core/pull/7246

But even if this is done for Terraform's brew formula, it still has an issue because the problem crops up with the providers in addition to core ala https://github.com/hashicorp/terraform/issues/3536 (maybe not, I only rebuilt the providers and it fixed the issues I was seeing in my use case).

radeksimko commented 6 years ago

@ramarnat what you're describing sounds like a problem with the built-in Go DNS resolver. AFAIK Nomad, Vault nor Terraform nor providers themselves implement custom DNS resolver. See https://golang.org/pkg/net/#hdr-Name_Resolution

Can you try and reproduce this outside of HashiCorp tools with a snippet of Go code that's just calling out to any known hostname?

Let me know if you need any help with that - eventually I can build a binary for you, but I'll need to know your target platform (OS + arch).

There must be something unique/different about your DNS - I'm not suggesting it's necessarily wrong, but unique enough that many others (incl. myself) are unable to reproduce this problem.

ramarnat commented 6 years ago

yep, this is absolutely because of the way Go does resolution. Specifically, a dynamic build (hence the CGO_ENABLED) will utilize native OS X DNS resolution (inspect-able with scutil --dns) rather than the DNS resolvers defined in /etc/resolv.conf. Other behavioral differences are possible as well. If any issues occur, and if you don't need the corporate vpn, split dns use cases, it is better to stick with the static build. But if you do, following the above steps will fix the issue, but unlike brew install nomad --with-dynamic where it was possible to MR an optional flag, I don't know how easy it would be to provide that sort of option for terraform to seamlessly download a dynamically linked version when doing the init.

That said, for folks who are willing to take a couple of extra steps, they can get past the issue by following the steps I put out above, which was the main reason for posting, at least till a more elegant solution could be found.

radeksimko commented 6 years ago

AFAIK we don't have any near-term plans for building with CGO as cross-platform compilation becomes potentially non-trivial and related maintenance of that likewise.

However if we can find a solution in pure Go I'm sure we'd be happy to look into that (assuming it solves majority of the issues mentioned here and in the other core issue). After all that's why all of HashiCorp tools use https://github.com/mitchellh/go-homedir for homedir lookup - to avoid CGO. Admittedly we don't need to do that anymore since go 1.9, but hopefully that gives you an idea of our direction and way of thinking.

databrecht commented 6 years ago

@arohter did you ever solve this? We have the same issue and are also using Umbrella

nohealy commented 6 years ago

Thanks to all for the contributions above, switching DNS servers from Googles 8.8.8.8/8.8.4.4 to another DNS server fixed the issue for me.

aharrison7 commented 6 years ago

We are aware of an issue with Terraform while using the Umbrella Roaming Client - specifically surrounding the large DNS responses on OSX of lambda.us-west-2.amazonaws.com

; MSG SIZE rcvd: 715

Since this is above the RFC 512 bytes for UDP DNS, this was causing a failure. Our roaming client version 2.0.62+ on OSX includes a fix for this issue and standard DNS queries are correctly truncated to 512 bytes and are once again able to be handled by Terraform. Note, this does happen with the roaming client on OSX; however, it would occur for any UDP DNS response over 512 bytes (use dig not nslookup) hence we're seeing reports across multiple providers at different times.

For lower versions of the client, a workaround is adding the domain lambda.us-west-2.amazonaws.com to your hosts file to ensure it resolves to an IP you receive when querying lambda.us-west-2.amazonaws.com.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!