runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.75k stars 1.05k forks source link

[Help Needed] AWS iam role to manipulate EKS cluster #800

Closed YesYouKenSpace closed 5 years ago

YesYouKenSpace commented 5 years ago

I am trying to get atlantis to manage our EKS cluster. Following the instructions here https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html

I added the following code to the configmap under mapRoles

- rolearn: ${instance_role_arn}
  username: atlantis
  groups:
    - system:masters

I still get this error

Error: Unauthorized

Does anyone know of any solution?

YesYouKenSpace commented 5 years ago

Fixed. I have no idea how but changing the username helped.

andridzi commented 4 years ago

hi @kennethtxytqw could you please share what change did help? thanks

llamahunter commented 4 years ago

I am experiencing a similar problem. I have added the atlantis task role arn to the EKS aws-auth configmap, but when the atlantis launched terraform task tries to operate on the EKS cluster, it fails:

Error: Unauthorized

  on .terraform/modules/prometheus_operator/modules/prometheus-operator/main.tf line 36, in resource "kubernetes_namespace" "this":
  36: resource "kubernetes_namespace" "this" {

And looking at the EKS authorization logs I see this:

time="2020-05-06T05:17:59Z" level=warning msg="access denied" client="127.0.0.1:55512" error="input token was not properly formatted: X-Amz-Date parameter is expired (15 minute expiration) 2020-05-06 01:09:00 +0000 UTC" method=POST path=/authenticate

It appears that atlantis, or terraform via atlantis, is trying to use a several hour old token to auth to EKS?

rverma-jm commented 3 years ago

@llamahunter did you found a solution for this?

llamahunter commented 3 years ago

Well, not really. The problem seems to be that the terraform plan caches the eks auth token, so that when you go to apply it later, the tokens are expired. We have to re-plan right before apply, and even then, it's possible that for complex terraform that there will be eks timeouts midway through the apply. We then need to re-plan and re-apply to finish applying the terraform. See https://github.com/terraform-providers/terraform-provider-aws/issues/13189 and https://github.com/hashicorp/terraform/issues/24886

YesYouKenSpace commented 3 years ago

I think @llamahunter is right. We (team at my workplace) have an internal rule that states if

  1. the plan involves kubernetes resources, whether via helm or kubernetes provider
  2. it is more than 5 minutes old

Always re-plan and apply.

trallnag commented 3 years ago

@kennethtxytqw, so performing a plan does recreate the token if the saved one has expired?

llamahunter commented 3 years ago

@kennethtxytqw, so performing a plan does recreate the token if the saved one has expired?

In my experience, yes. However, you can still run into problems if you have a LONG running operation and the token expires in the middle of it. You will need to re-plan and re-apply to pick up from where you left off.

flixx commented 1 year ago

Hello,

as a workaround, we are using an extra plan step inside the apply command:

workflows:
  myworkflow:
    plan:
      steps:
        - init
        - plan
    apply:
      steps:
        # We have an extra plan here because the aws_eks_cluster_auth.token expires within 15min
        # https://github.com/runatlantis/atlantis/issues/800
        - plan
        - apply

This is a bit suboptimal because there might be some unintended/unapproved plan-changes sneaking in. Still need to see if this causes problems in practice.

nitrocode commented 1 year ago

@flixx couldn't you use a data source to retrieve that information so when you apply the terraform it creates a new token with a new expiration? or is that not correct?

edit: nvm, I see the relevant issue https://github.com/hashicorp/terraform/issues/24886

nitrocode commented 1 year ago

For now, until that issue is resolved, perhaps you could check the time of when the plan file is generated, if it's been more than X minutes, then run the plan+apply step. If it's less than X minutes, then run only the apply step.

nitrocode commented 1 year ago

I noticed in the upstream issue that the kubernetes provider doesn't use an exec

provider "kubernetes" {
  host                   = data.aws_eks_cluster.example.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.example.certificate_authority.0.data)
  token                  = null

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args = [
      "eks", "get-token", "--cluster-name", local.eks_cluster_id
    ]
  }
}

Excerpt taken from https://github.com/cloudposse/terraform-aws-components/blob/master/modules/eks/efs-controller/provider-helm.tf

@flixx have you tried this method?

flixx commented 1 year ago

@nitrocode Yes, this might work as well - however it would require us to build a custom atlantis docker image with the aws-cli binary included. Something I'd like to avoid at the moment.

nitrocode commented 1 year ago

Youre correct. However, we highly encourage users to customize the container.

Here's mine for reference. It contains awscli v2 and a number of other tools.

https://github.com/nitrocode/atlantis-terraform-module/blob/main/Dockerfile