runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.66k stars 1.04k forks source link

EKS assume-role issues? #1724

Open mzupan opened 3 years ago

mzupan commented 3 years ago

I'm trying to run atlantis on EKS with IRSA enabled and attaching a serviceaccount that is allowed to assume role to a role with full admin permissions.

I'm using the helm chart also.

I installed aws-cli and I'm able to aws sts assume-role just fine but whenever atlantis runs it's using the IAM role for the worker node not the IRSA role attached to the pod

My env vars seem good

bash-5.0$ env | grep AWS
AWS_DEFAULT_REGION=us-east-1
AWS_REGION=us-east-1
AWS_ROLE_ARN=arn:aws:iam::022149140658:role/eks-atlantis-production
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token

Permissions seem ok

bash-5.0$ id
uid=100(atlantis) gid=1000(atlantis) groups=0(root),1000(atlantis)
bash-5.0$ ls -lsa /var/run/secrets/eks.amazonaws.com/serviceaccount/token
     0 lrwxrwxrwx    1 root     root            12 Jul 28 16:53 /var/run/secrets/eks.amazonaws.com/serviceaccount/token -> ..data/token
bash-5.0$ ls -lsa /var/run/secrets/eks.amazonaws.com/serviceaccount/..data/token
     4 -rw-r-----    1 atlantis atlantis       976 Jul 28 16:53 /var/run/secrets/eks.amazonaws.com/serviceaccount/..data/token
dhaven commented 3 years ago

Authenticating with AWS is handled by the terraform provider not Atlantis so this most likely is just a configuration issue. Have a look at https://github.com/runatlantis/atlantis/issues/1138 which might help. Usually this happens because of permissions issues and is solved by adding

securityContext:
        fsGroup: 65534 

to the pod

ichasco-heytrade commented 2 years ago

I have the same error. Even if adding securityContext it fails.

terraform version
Terraform v0.15.1
atlantis version
atlantis 0.17.0
ls -lsa /var/run/secrets/eks.amazonaws.com/serviceaccount/token
     0 lrwxrwxrwx    1 root     atlantis        12 Oct 21 12:57 /var/run/secrets/eks.amazonaws.com/serviceaccount/token -> ..data/token
bash-5.0$ printenv | grep -i atlantis
ATLANTIS_PORT=4141
HOSTNAME=atlantis-0
ATLANTIS_PORT_80_TCP_PROTO=tcp
AWS_ROLE_ARN=arn:aws:iam::XXXXXXXXX:role/Atlantis_EKS_Role
ATLANTIS_GITLAB_TOKEN=XXXXXXXXXXXXXXX
ATLANTIS_DATA_DIR=/atlantis-data
HOME=/home/atlantis
ATLANTIS_GITLAB_USER=atlantis_xxxxxxx
ATLANTIS_SERVICE_PORT_ATLANTIS=80
ATLANTIS_SERVICE_PORT=80
ATLANTIS_LOG_LEVEL=debug
ATLANTIS_SERVICE_HOST=172.20.55.213
ATLANTIS_PORT_80_TCP=tcp://172.20.55.213:80
ATLANTIS_PORT_80_TCP_ADDR=172.20.55.213
ATLANTIS_REPO_WHITELIST=gitlab.com/xxxxxxxxxx/devops/terraform/projects/*
ATLANTIS_PORT_80_TCP_PORT=80
ATLANTIS_GITLAB_WEBHOOK_SECRET=xxxxxxxxxxxxxxxxxxx
bash-5.0$ printenv | grep -i aws
AWS_DEFAULT_REGION=eu-west-1
AWS_REGION=eu-west-1
AWS_ROLE_ARN=arn:aws:iam::xxxxxxxxxxxxxxxxxxxxxxx:role/Atlantis_EKS_Role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token

This role has access s3:* and dynamodb:*

Thanks

ichasco-heytrade commented 2 years ago

The policy I think I have to add is:

data "aws_iam_policy_document" "tfstate" {
  statement {
    sid    = "TfstateS3in"
    effect = "Allow"
    resources = [
      "${module.tfstate-s3.s3_bucket_arn}/*",
    ]
    actions = [
      "s3:PutObject",
      "s3:GetObject",
    ]
  }
  statement {
    sid    = "TfstateS3out"
    effect = "Allow"
    resources = [
      module.tfstate-s3.s3_bucket_arn,
    ]
    actions = [
      "s3:ListBucket",
    ]
  }
  statement {
    sid    = "TfstateDynamoDB"
    effect = "Allow"
    resources = [
      aws_dynamodb_table.terraform-lock.arn,
    ]
    actions = [
      "dynamodb:GetItem",
      "dynamodb:PutItem",
      "dynamodb:DeleteItem"
    ]
  }
}
ichasco-heytrade commented 2 years ago

I have found my mistake. I needed to give access to KMS key because the S3 was encrypted. Doing this. It works

scarecrow111 commented 2 years ago

Hi @ichasco-heytrade In the process of deploying atlantis through helm, I found that configuring assume_role in aws.config block does not work properly. It needs to be configured in terraform to take effect. Have you encountered this problem?

ichasco-heytrade commented 2 years ago

I needed to configure in atlantis also because we have the tfstate in a encrypted S3 bucket. So to get access to this, I needed to grant access to the S3 and to the KMS.

Then, of course you need to to grant access to terraform to AWS to be able to deploy resources. I use multi-account with roles. This is an example:

provider "aws" {
  region = var.region
  assume_role {
    role_arn     = "arn:aws:iam::XXXXXXXXXXX:role/Terraform-Role"
    session_name = "${var.stage}-terraform"
  }
}

# ROOT

provider "aws" {
  alias  = "root"
  region = var.region
  assume_role {
    role_arn     = "arn:aws:iam::XXXXXXXXXXX:role/Terraform-Role"
    session_name = "root-terraform"
  }
}

# PRODUCTION

provider "aws" {
  alias  = "production"
  region = var.region
  assume_role {
    role_arn     = "arn:aws:iam::XXXXXXXXXXX:role/Terraform-Role"
    session_name = "production-terraform"
  }
}
scarecrow111 commented 2 years ago

Yes, this works fine, but configuring in value.yaml doesn't work.

`aws: credentials: | [atlantis] aws_access_key_id=** aws_secret_access_key= region=ap-northeast-1 config: | [default] role_arn = arn:aws:iam::48:role/**** source_profile = atlantis

ichasco-heytrade commented 2 years ago

I use IRSA in EKS

serviceAccount:
    create: true
    mount: true
    name:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXX:role/Atlantis_EKS_Role

This is better than use static credentials

scarecrow111 commented 2 years ago

Thanks.

romelBen commented 9 months ago

I know I am reopening this post again in regards to IRSA but would anyone have any insight on locally testing their code when implementing IRSA? I was able to get Atlantis to work but the provider block poses an issue for developers and myself when testing locally. Pretty much the process is to comment out the assume_role and uncomment when done which is a pain. Would like any insight on this:

Here's an example of what I am talking on:

provider "aws" {
  region  = "us-east-1"

#  assume_role {
#    role_arn = "arn:aws:iam::123445678785:role/fufuAtlantisRole"
#  }
}

I experimented with profile but that doesn't work. And I also tested inputting Shared Credentials onto the pod itself to help with the `profile. Would appreciate any help.

HyperMe1200 commented 4 months ago

Hi @romelBen. Have you found a way to work with Atlantis without providing the assume_role block?

samarara commented 4 months ago

I had a similar problem the last couple of days with deploying atlantis on an EKS cluster with IRSA. I got it working with the following aws config via helm:

aws:
  config: |
    [profile dev]
    role_arn = arn:aws:iam::{{ .Values.aws.accountNo }}:role/atlantis
    web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
  directory: "/home/atlantis/.aws"

I haven't yet tested why web_identity_token_file needs to be specified here since this value is also exposed as an envar. But my local terraform provider config remains unchanged:

terraform {
  backend "s3" {
    key            = "eks/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    bucket         = "..."
    dynamodb_table = "..."
    profile        = "dev"
  }
  required_providers {
    aws = {
      version = "4.59.0"
    }
    sops = {
      source  = "carlpett/sops"
      version = "~> 0.5"
    }
    archive = {
      source  = "hashicorp/archive"
      version = "~> 2.2.0"
    }
  }
}

provider "aws" {
  region  = "us-east-1"
  profile = "dev"
}

https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-role.html#cli-configure-role-oidc