hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.85k stars 9.19k forks source link

Assume role doesn't work via VPC #9869

Closed sergii-zemlianyi closed 4 years ago

sergii-zemlianyi commented 5 years ago

Community Note

Terraform Version

Terraform v0.12.6

Affected Resource(s)

aws_iam_role aws_iam_role_policy

Terraform Configuration Files

`provider "aws" { access_key = "${var.access_key}" secret_key = "${var.secret_key}" region = "${var.region}"

endpoints { sts = "https://vpce-xxxxxxx.sts.us-east-1.vpce.amazonaws.com" }

assume_role { role_arn = "${var.role_arn}" session_name = "${var.session_name}" external_id = "${var.external_id}" } }

module "iam_role_resources" { source = "../../modules/iam_role_resources" }`

Debug Output

Initializing the backend... 2019/08/23 13:52:39 [TRACE] Meta.Backend: merging -backend-config=... CLI overrides into backend configuration 2019/08/23 13:52:39 [TRACE] Meta.Backend: built configuration for "s3" backend with hash value 3570462577 2019/08/23 13:52:39 [TRACE] Preserving existing state lineage "f761743e-4847-4f41-e1e8-91d87cc7a1cd" 2019/08/23 13:52:39 [TRACE] Preserving existing state lineage "f761743e-4847-4f41-e1e8-91d87cc7a1cd" 2019/08/23 13:52:39 [TRACE] Meta.Backend: working directory was previously initialized for "s3" backend 2019/08/23 13:52:39 [TRACE] Meta.Backend: moving from default local state only to "s3" backend 2019/08/23 13:52:39 [INFO] Setting AWS metadata API timeout to 100ms 2019/08/23 13:52:39 [INFO] Ignoring AWS metadata API endpoint at default location as it doesn't return any instance-id 2019/08/23 13:52:39 [INFO] Attempting to AssumeRole arn:aws:iam::7633xxxxx:role/lsm_terraform_init (SessionName: "terraform", ExternalId: "terraform", Policy: "") 2019/08/23 13:52:39 [INFO] AWS Auth provider used: "StaticProvider"

Error: The role "arn:aws:iam::7633xxxxxxx:role/lsm_terraform_init" cannot be assumed.

There are a number of possible causes of this - the most common are:

Expected Behavior

Terraform should connect to AWS STS endpoint via VPC private connection

Actual Behavior

Terraform bails on attempt to assume role.

Steps to Reproduce

Create any AWS IAM role Attach some policy to let Terraform to interact with AWS resources Add Terraform user into Trusted principals Create VPC STS enpoint Add assume role and STS endpoint into Terraform AWS provider: `provider "aws" { access_key = "${var.access_key}" secret_key = "${var.secret_key}" region = "${var.region}"

endpoints { sts = "https://vpce-xxxxxxx.sts.us-east-1.vpce.amazonaws.com" }

assume_role { role_arn = "${var.role_arn}" session_name = "${var.session_name}" external_id = "${var.external_id}" } } ` unset proxies

unset http_proxy https_proxy

run apply

`$ ../../terraform init -reconfigure -backend-config="../../globals.tfvars" -backend-config="terraform.tfvars" -backend-config="secrets.tfvars"

Initializing the backend...

Error: The role "arn:aws:iam::7633xxxxxx:role/lsm_terraform_init" cannot be assumed.

There are a number of possible causes of this - the most common are:

If I set proxy then init/plan/apply work fine. So it suggests that TF goes via internet which is a huge breach for us. It looks like Terraform does not pick up STS endpoint url. STS url is accessible via proxy using $ curl -I https://vpce-xxxxxxx.sts.us-east-1.vpce.amazonaws.com HTTP/1.1 302 Found x-amzn-RequestId: ded33b46-c5d2-11e9-9627-13da8f46d40c Location: https://aws.amazon.com/iam Content-Length: 0 Date: Fri, 23 Aug 2019 18:21:48 GMT

So how can I let TF connect to AWS using VPC Endpoint to assume IAM role?

Thanks in advance, Sergii

ewbankkit commented 5 years ago

@szemlyanoy Configuration for the Terraform S3 Backend is done independently of the configuration for the Terraform AWS Provider and it looks like your error is occurring during initialization of the S3 Backend. Take a look at the S3 Backend documentation - it has all the various options you require (although in a slightly different format from the AWS Provider configuration 😄).

sergii-zemlianyi commented 5 years ago

Thanks for reply @ewbankkit
At glance I am providing required attributes for S3 backend

terraform{ backend "s3" { key = "terraform.tfstate" encrypt = true sts_endpoint = "https://vpce-xxxxxx.sts.us-east-1.vpce.amazonaws.com" external_id = "terraform" role_arn = "arn:aws:iam::763xxxxxx:role/lsm_terraform_init" } }

And here is my Assuming role trust policy

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::763xxxxxx:user/lsm-terraform" }, "Action": "sts:AssumeRole" } ] }

The weird thing is that if I set proxies - role is assumed and backed can be initialized. If I unset proxies - with the same configs above I am getting error

`Initializing the backend... 2019/08/23 16:31:44 [INFO] Setting AWS metadata API timeout to 100ms 2019/08/23 16:31:45 [INFO] Ignoring AWS metadata API endpoint at default location as it doesn't return any instance-id 2019/08/23 16:31:45 [INFO] Attempting to AssumeRole arn:aws:iam::763331026866:role/lsm_terraform_init (SessionName: "terraform", ExternalId: "terraform", Policy: "") 2019/08/23 16:31:45 [INFO] AWS Auth provider used: "StaticProvider"

Error: The role "arn:aws:iam::763331026866:role/lsm_terraform_init" cannot be assumed.

There are a number of possible causes of this - the most common are:

I do not see anything in IAM role preventing access from VPC.

sergii-zemlianyi commented 5 years ago

Ok I put aside this backend issue for awhile and skipped assume role usage in S3 backend by setting s3 bucket persmissions explicitly in policy. But now issue with AWS provider pops up:

main.tf

provider "aws" {
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"
  region     = "${var.region}"

  endpoints {
   sts = "https://vpce-xxxxxxx.sts.us-east-1.vpce.amazonaws.com"
  }

  assume_role {
    role_arn     = "${var.role_arn}"
    session_name = "${var.session_name}"
    external_id  = "${var.external_id}"
  }
}

backend.tf

terraform{
  backend "s3" {
    key     = "terraform.tfstate"
    encrypt = true
#    sts_endpoint = "https://vpce-xxxxxxxxxxxx.sts.us-east-1.vpce.amazonaws.com"
#    external_id = "terraform"
#    role_arn = "arn:aws:iam::7633xxxxxxx:role/lsm_terraform_init"
  }
}
unset http_proxy https_proxy
$ ../../terraform init -reconfigure -backend-config="../../globals.tfvars" -backend-config="terraform.tfvars" -backend-config="secrets.tfvars"

Initializing the backend...

Error: error validating provider credentials: error calling sts:GetCallerIdentity: RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: dial tcp 54.239.29.25:443: connect: connection refused

Why it does not try to connect to provided STS endpoint?

sergii-zemlianyi commented 5 years ago

regarding backend issue: if I set the following in backend.tf Provider/resources config are commented out at all.

terraform{
  backend "s3" {
    key     = "terraform.tfstate"
    encrypt = true
    sts_endpoint = "https://vpce-xxxxxxxxxxxx.sts.us-east-1.vpce.amazonaws.com"
    external_id = "terraform"
    role_arn = "arn:aws:iam::763xxxxxxxxxxx:role/lsm_terraform_init"
  }
}

and set proxy to let TF connect via internet then ../../terraform init -reconfigure -backend-config="../../globals.tfvars" -backend-config="terraform.tfvars" -backend-config="secrets.tfvars"

returns

Error: error validating provider credentials: error calling sts:GetCallerIdentity: RequestError: send request failed
caused by: Post https://vpce-029bf1212bccf5012-82vurhdd.sts.us-east-1.vpce.amazonaws.com/: Service Unavailable

if I unset proxies then

$ TF_LOG=trace ../../terraform init -reconfigure -backend-config="../../globals.tfvars" -backend-config="terraform.tfvars" -backend-config="secrets.tfvars"
2019/08/23 18:16:41 [INFO] Terraform version: 0.12.6
2019/08/23 18:16:41 [INFO] Go runtime version: go1.12.4
2019/08/23 18:16:41 [INFO] CLI args: []string{"/home/szemlyanoy-loc/devops/terraform/terraform", "init", "-reconfigure", "-backend-config=../../globals.tfvars", "-backend-config=terraform.tfvars", "-backend-config=secrets.tfvars"}
2019/08/23 18:16:41 [DEBUG] Attempting to open CLI config file: /home/jira/.terraformrc
2019/08/23 18:16:41 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2019/08/23 18:16:41 [INFO] CLI command args: []string{"init", "-reconfigure", "-backend-config=../../globals.tfvars", "-backend-config=terraform.tfvars", "-backend-config=secrets.tfvars"}

Initializing the backend...
2019/08/23 18:16:41 [TRACE] Meta.Backend: merging -backend-config=... CLI overrides into backend configuration
2019/08/23 18:16:41 [TRACE] Meta.Backend: built configuration for "s3" backend with hash value 3065339372
2019/08/23 18:16:41 [TRACE] Preserving existing state lineage "f761743e-4847-4f41-e1e8-91d87cc7a1cd"
2019/08/23 18:16:41 [TRACE] Preserving existing state lineage "f761743e-4847-4f41-e1e8-91d87cc7a1cd"
2019/08/23 18:16:41 [TRACE] Meta.Backend: working directory was previously initialized for "s3" backend
2019/08/23 18:16:41 [TRACE] Meta.Backend: moving from default local state only to "s3" backend
2019/08/23 18:16:41 [INFO] Setting AWS metadata API timeout to 100ms
2019/08/23 18:16:42 [INFO] Ignoring AWS metadata API endpoint at default location as it doesn't return any instance-id
2019/08/23 18:16:42 [INFO] Attempting to AssumeRole arn:aws:iam::763xxxxxxxxxx:role/lsm_terraform_init (SessionName: "terraform", ExternalId: "terraform", Policy: "")
2019/08/23 18:16:42 [INFO] AWS Auth provider used: "StaticProvider"

Error: The role "arn:aws:iam::763xxxxxx:role/lsm_terraform_init" cannot be assumed.

  There are a number of possible causes of this - the most common are:
    * The credentials used in order to assume the role are invalid
    * The credentials do not have appropriate permission to assume the role
    * The role ARN is not valid
ewbankkit commented 5 years ago

@szemlyanoy Have you verified that the _arn:aws:iam::7633xxxxxxx:role/lsm_terraforminit role has the correct trust relationship with the base calling role?

sergii-zemlianyi commented 5 years ago

Yes trust is correct

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::7633xxxxxx:user/lsm-terraform"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "terraform"
        }
      }
    }
  ]
}

I was able to past this issue using workaround

export https_proxy=<proxy>
export no_proxy=.sts.us-east-1.vpce.amazonaws.com

now I can set VPC STS endpoint in both backend and provider

terraform{
  backend "s3" {
    key     = "terraform.tfstate"
    encrypt = true
    sts_endpoint = "https://vpce-xxxxxxxxx.sts.us-east-1.vpce.amazonaws.com"
  }
}

provider "aws" {
  access_key = var.access_key
  secret_key = var.secret_key
  region     = var.region
  endpoints {
   sts = "https://vpce-xxxxxxxxx.sts.us-east-1.vpce.amazonaws.com"
  }
  assume_role {
    role_arn     = var.role_arn
    session_name = var.session_name
    external_id  = var.external_id
  }
}

One question I am trying to figure out - would appreciate advice. On of my module is to create Role+policy. I want to start assuming this new role ARN in other modules. How can I let s3 provider overwrite assume role for specific module?

Bellow module where I want to assume role from another module which exported ARN: modules/aws_route53_res/main.tf

provider "aws" {
  assume_role {
    role_arn     = var.resources_role
  }
}

resource "aws_route53_resolver_endpoint" "to_BT_on-prem" {
... attributes...
}

but it bails telling that I need to provide all required attributes(region etc) for S3 provider.

Is there any way to overwrite only single attribute of root provider resource on child module level?

Thanks!

ewbankkit commented 5 years ago

@szemlyanoy It's always the proxy 😄.

From the documentation:

The configuration arguments defined by the provider may be assigned using expressions, which can for example allow them to be parameterized by input variables. However, since provider configurations must be evaluated in order to perform any resource type action, provider configurations may refer only to values that are known before the configuration is applied. In particular, avoid referring to attributes exported by other resources unless their values are specified directly in the configuration.

So you will be unable to refer to the newly created role/policy when configuring the 2nd provider if you keep everything in one pass of terraform apply. Is there any way you can split into two passes, the first creating the role/policy and the second using it for provider configuration?

sergii-zemlianyi commented 5 years ago

It seems the problem with S3 provider attributes in child module 'route53_resolver' is caused by order of resources creation. When I run initial apply TF tries to create resources in such order: assume_role(w/o policy) -> route53 resolver -> assume_role_policy hence it bails with bellow since aws_iam_role_policy was not created yet

2019-08-28T18:46:23.082-0400 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 2019/08/28 18:46:23 [INFO] assume_role configuration set: (ARN: "arn:aws:iam::763331026866:role/lsm_terraform_resources", Ses
sionID: "terraform", ExternalID: "terraform", Policy: "")
2019-08-28T18:46:23.089-0400 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 2019/08/28 18:46:23 [INFO] Building AWS auth structure
2019-08-28T18:46:23.089-0400 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 2019/08/28 18:46:23 [INFO] Setting AWS metadata API timeout to 100ms
2019-08-28T18:46:23.786-0400 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 2019/08/28 18:46:23 [INFO] Ignoring AWS metadata API endpoint at default location as it doesn't return any instance-id
2019-08-28T18:46:23.786-0400 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 2019/08/28 18:46:23 [INFO] Attempting to AssumeRole arn:aws:iam::763331026866:role/lsm_terraform_resources (SessionName: "ter
raform", ExternalId: "terraform", Policy: "")
2019-08-28T18:46:23.786-0400 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 2019/08/28 18:46:23 [INFO] AWS Auth provider used: "StaticProvider"
2019/08/28 18:46:23 [ERROR] module.route53_resolver: eval: *terraform.EvalConfigProvider, err: The role "arn:aws:iam::7633xxxxx:role/lsm_terraform_resources" cannot be assumed.

  There are a number of possible causes of this - the most common are:
    * The credentials used in order to assume the role are invalid
    * The credentials do not have appropriate permission to assume the role
    * The role ARN is not valid

It is seen from log that new role 'lsm_terraform_resourcers' was assumed by S3 provider inside 'route53_resolver' module. The main issue is that policy was not created yet. Role however was created before 'route53_resolver' module since we have explicit dependency in provider section

provider "aws" {
  assume_role {
    role_arn     = var.resources_role
  }
}```

So I just need to make aws_iam_role_policy to be created together with aws_iam_role and then let other resources assume this role.
I tried to use S3 assume_role 'policy' attribute https://www.terraform.io/docs/providers/aws/index.html#policy
```  assume_role {
    role_arn     = var.resources_role
    session_name = var.session_name
    external_id  = var.external_id
    policy = var.resources_role_policy
  }

Any advice how to realize this order policy->role(vice versa) -> other modules ?

Thanks in advance

sergii-zemlianyi commented 5 years ago

can I export role arn from 'aws_iam_role_policy_attachment' resource type? It is not clear from doc what exactly can be output for this resource but if it's possible then this might be workaround

sergii-zemlianyi commented 5 years ago

I even tried to implement such workaround to enforce policy<->role->other modules order in such way

resource "aws_iam_policy" "lsm_terraform_resources_all"
<...>
resource "aws_iam_role" "lsm_terraform_resources" 
<...>

resource "aws_iam_role_policy_attachment" "resources" {
  role = aws_iam_role.lsm_terraform_resources.name
  policy_arn = aws_iam_policy.lsm_terraform_resources_all.arn
}

data "aws_iam_role" "lsm_terraform_resources" {
  name = aws_iam_role_policy_attachment.resources.role
}

So it means by the time module 'iam_role_resources' is finished role and policy should be created and attached. I am getting assume role arn using data source and supplying it to route53_resolver module provider. But it still fails on first apply:

module.iam_role_resources.data.aws_caller_identity.current: Refreshing state...
module.iam_role_resources.data.template_file.policy_resources_all_doc: Refreshing state...
module.iam_role_resources.data.template_file.role_trusts_doc: Refreshing state...
module.iam_role_resources.aws_iam_role.lsm_terraform_resources: Creating...
module.iam_role_resources.aws_iam_policy.lsm_terraform_resources_all: Creating...
module.iam_role_resources.aws_iam_policy.lsm_terraform_resources_all: Creation complete after 1s [id=arn:aws:iam::76333xxxxxx:policy/lsm_terraform_resources_all]
module.iam_role_resources.aws_iam_role.lsm_terraform_resources: Creation complete after 1s [id=lsm_terraform_resources]
module.iam_role_resources.aws_iam_role_policy_attachment.resources: Creating...
module.iam_role_resources.aws_iam_role_policy_attachment.resources: Creation complete after 0s [id=lsm_terraform_resources-20190829133304716100000001]
module.iam_role_resources.data.aws_iam_role.lsm_terraform_resources: Refreshing state...

Error: The role "arn:aws:iam::7633xxxxx:role/lsm_terraform_resources" cannot be assumed.

  There are a number of possible causes of this - the most common are:
    * The credentials used in order to assume the role are invalid
    * The credentials do not have appropriate permission to assume the role
    * The role ARN is not valid

  on ../../modules/route53_res/main.tf line 1, in provider "aws":
   1: provider "aws" { 

Interesting from trace log that after resolver's failure TF uploads state to s3 backend

2019/08/29 09:33:06 [ERROR] module.route53_resolver: eval: *terraform.EvalSequence, err: The role "arn:aws:iam::763331026866:role/lsm_terraform_resources" cannot be assumed.

  There are a number of possible causes of this - the most common are:
    * The credentials used in order to assume the role are invalid
    * The credentials do not have appropriate permission to assume the role
    * The role ARN is not valid
2019/08/29 09:33:06 [TRACE] [walkApply] Exiting eval tree: module.route53_resolver.provider.aws
2019/08/29 09:33:06 [TRACE] vertex "module.route53_resolver.provider.aws": visit complete
2019/08/29 09:33:06 [TRACE] dag/walk: upstream of "module.route53_resolver.aws_route53_resolver_endpoint.to_BT_on-prem_test (prepare state)" errored, so skipping
2019/08/29 09:33:06 [TRACE] dag/walk: upstream of "module.route53_resolver.aws_route53_resolver_endpoint.to_BT_on-prem_test" errored, so skipping
2019/08/29 09:33:06 [TRACE] dag/walk: upstream of "module.route53_resolver.provider.aws (close)" errored, so skipping
2019/08/29 09:33:06 [TRACE] dag/walk: upstream of "meta.count-boundary (EachMode fixup)" errored, so skipping
2019/08/29 09:33:06 [TRACE] dag/walk: upstream of "root" errored, so skipping
2019/08/29 09:33:06 [DEBUG] Uploading remote state to S3: {
  Body: buffer(0xc001014630),
  Bucket: "lsm-terraform-backend-dev",
  ContentLength: 9553,
  ContentType: "application/json",
  Key: "terraform.tfstate",
  ServerSideEncryption: "AES256"
}
2019/08/29 09:33:06 [DEBUG] [aws-sdk-go] DEBUG: Request s3/PutObject Details:
---[ REQUEST POST-SIGN ]-----------------------------
PUT /terraform.tfstate HTTP/1.1
...

I see both role and policy resources are sent in PUT request. Next apply would succeed with resolver module. I am not sure if this backend upload can cause the issue with resolver module not being able to assume role.

Any advice on that? Thanks in advance!

bflad commented 4 years ago

Hi folks 👋 Version 3.0 of the Terraform AWS Provider will include a few authentication changes that should help in this case. Similar enhancements and fixes were applied to the Terraform S3 Backend (part of Terraform CLI) in version 0.13.0-beta2.

The Terraform AWS Provider major version update will release in the next two weeks or so. Please follow the v3.0.0 milestone for tracking the progress of that release. If you are still having trouble after updating when its released, please file a new issue. Thanks!

ghost commented 4 years ago

This has been released in version 3.0.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!