terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.46k stars 4.08k forks source link

AWS Auth Config Map Permissions Issue Trying to upgrade to 8.0.0 #699

Closed paulalex closed 4 years ago

paulalex commented 4 years ago

I have issues

I am unable to upgrade to v8.0.0 of the module due to issues with the aws-auth config map, prior to this upgrade the module worked fine for me.

I'm submitting a...

What is the current behavior?

Trying to upgrade to version 8.0.0 of module and terraform apply results in a permissions issue for aws-auth config map.

I cannot successfully run a terraform apply and move to eks 1.14 because I get errors from terraform regarding the config map (see below), I am accessing the cluster from my macbook with kubeconfig for the cluster admin user, so should have admin permissions to the cluster.

If this is a bug, how to reproduce? Please include a code sample if relevant.

Upgrade following the upgrade steps and then run a terraform apply.

Question - Is the following import command from the important notes section of the upgrade documentation correct?

terraform import module.cluster1.kubernetes_config_map.aws_auth[0] kube-system/aws-auth

What's the expected behavior?

The module is upgraded to 8.0.0 and the cluster upgrades, with no errors and terraform apply works without permission issues regarding the aws-auth config map.

Are you able to fix this problem and submit a PR? Link here if you have already.

No

Environment details

eks v 1.13

Any other relevant info

If I delete my cluster and run version 8.0.0 then the apply times out after 15 minutes and I cannot then run terraform apply again because the eks control plane already exists.

The first time I try to run apply using the latest version 8.0.0 of the module I get a permissions error with the aws-auth config map.

Any subsequent attempts to run apply after this result in an error that permission was denied to the config map.

Error: configmaps "aws-auth" already exists

  on ../../modules/eks/aws_auth.tf line 52, in resource "kubernetes_config_map" "aws_auth":
  52: resource "kubernetes_config_map" "aws_auth" {
dpiddockcmp commented 4 years ago

Yes, as per the changelog instructions you must either import the existing aws-auth configmap into Terraform state, or delete the ConfigMap so it can be created from scratch. But only delete the configmap if you know for certain which IAM user or role created the EKS cluster as they will need to be doing the recreating.

Unfortunately the kubernetes provider does not support force overwriting existing resources.

I struggled to get the import command to work as k8s provider seems to ignore environment variables. I ended up forcing the config to the generated kubeconfig file just for this operation:

provider "kubernetes" {
  config_path = "kubeconfig_${var.cluster_name}"
}
paulalex commented 4 years ago

@dpiddockcmp Yes I think is where I am also struggling, I had a little more success when I set load_config_file = true and set export KUBECONFIG from the commandline but once I can actually upgrade my test cluster I need to get this to work from a jenkins pipeline and doing the above wont be an option.

I am going to try your suggestion above and see if this gives me anymore success. I dont know if its coincidence but I destroyed my cluster again and then ran the apply from fresh and this time it worked without timing out. I dont know if that is a red herring..

paulalex commented 4 years ago

Also, should this:

terraform import module.cluster1.kubernetes_config_map.aws_auth[0] kube-system/aws-auth

Actually be:

terraform import module.eks.kubernetes_config_map.aws_auth[0] kube-system/aws-auth

dpiddockcmp commented 4 years ago

If you create the cluster from scratch, you do not need to import a previously created aws-auth configmap as it should not exist. The module should create it properly for you.

The import path depends on what you've called your module definition in your configuration. It might not be at the top level if you have the definition in a sub module. There is no single or correct answer here. Maybe the changelog could be better aligned with the examples which usually use "eks"?

Not sure how you would pipeline this action. It only needs doing once per cluster for the pre-8 to 8 upgrade.

paulalex commented 4 years ago

Not sure how you would pipeline this action. It only needs doing once per cluster for the pre-8 to 8 upgrade.

Maybe for this reason I could do it manually and then upgrade the cluster using the pipeline and the latest version of the module afterwards.

paulalex commented 4 years ago

I rolled back to my original version and rebuilt my cluster as it is on prod right now. Next I added the kubernetes provider to my main.tf and then updated the terraform state using the following command:

terraform import -var-file=../../tfvars/dev.tfvars module.eks.kubernetes_config_map.aws_auth[0] kube-system/aws-auth

This results in the following output (all looks good?):

Import successful!
The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

I then ran an apply with the v8.0.0 and I still get the same error:

module.eks.aws_launch_configuration.workers[0]: Destruction complete after 1s

Error: Failed to update Config Map: Unauthorized

  on ../../modules/eks/aws_auth.tf line 52, in resource "kubernetes_config_map" "aws_auth":
  52: resource "kubernetes_config_map" "aws_auth" {

@dpiddockcmp Is the same issue you got when trying to use the import command? My provider looks like this currently:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
  # config_path              = "kubeconfig_${var.eks_cluster_name}"
  version                  = "~> 1.10"
}
dpiddockcmp commented 4 years ago

My error was on the import command. It was ignoring all settings and picking up the default kube config file in ~/.kube/config. Trying to read from minikube which didn't have the configmap.

Seems weird that you get permission denied on update if you have just created the cluster.

paulalex commented 4 years ago

Seems weird that you get permission denied on update if you have just created the cluster.

Yes I know, I am pretty much out of ideas.

paulalex commented 4 years ago

Seems weird that you get permission denied on update if you have just created the cluster.

To add to this... I thought I would humour myself and press up on my kb and run the exact same apply again, and it worked in literally a second.

How is that possible that I get permission denied, then its ok?

paulalex commented 4 years ago

@dpiddockcmp I think I am going to run through the entire set of steps again but I wondered if you could correct my thinking if it sounds completely incorrect.

Could the reason that it fails and then works the second time be because the kubernetes provider has picked up the old values for the token, and cluster ca certificate and it tries to use these to update the config map and the build fails for a permission denied.

On the second apply it actually gets the new values and so the apply works this time. Its just a thought.

paulalex commented 4 years ago

@dpiddockcmp I did a little bit more testing today and I output the cluster token in my develop branch build, and then again whlist trying to upgrade to v8.0.0 of the eks module after it fails and then is successful on the second run.

So it looks like the cluster auth token that is retrieved by the kubernetes provider when I start the apply of the 8.0.0 version of the module is actually changed midway through upgrading.

Could this be a side effect of the fact that I am on 1.13 and when I upgrade to v8.0.0 it also upgrades my eks version to 1.14?

Here are the tokens from the two consecutive runs of a terraform apply (first to build the cluster from scratch using my develop branch, and then to upgrade to version 8.0.0 of the module (which first fails on the initial apply even though the cluster upgrades and then is successful on second run).

First build cluster from scratch:

aws_eks_cluster_token = k8s-aws-v1.aHR0cHM6Ly9zdHMuYW1hem9uYXdzLmNvbS8_QWN0aW9uPUdldENhbGxlcklkZW50aXR5JlZlcnNpb249MjAxMS0wNi0xNSZYLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFTSUEzTVNLU1FKUkUyREs2N0kyJTJGMjAyMDAxMjElMkZ1cy1lYXN0LTElMkZzdHMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDIwMDEyMVQxNTA2MTVaJlgtQW16LUV4cGlyZXM9MCZYLUFtei1TZWN1cml0eS1Ub2tlbj1Gd29HWlhJdllYZHpFSXolMkYlMkYlMkYlMkYlMkYlMkYlMkYlMkYlMkYlMkZ3RWFEQWlYOXF5OXdrcnJWNm5ncXlLcEFlWjJIaDRpTUl0UCUyRmNCV1hMMEtPSWFFSklGcXhzJTJCY3Z0cnFCZUV2VU1DSG1jNk13dyUyRmVPclo2QjRhdWRHYTBHZFVSVUY1SDQyWHNQTlJpZ1g4ZHRsM2R4MmVjRXFsdkNxaUdGMjIzRlQzV1ozTGVnZDJ3cFhlUzRFYVNsTzZCeVNHajd3RGJmTmMzQU1sNVczelZ2TThHdXZmZ3ZocURQYUUxWGZjJTJCRWRtMWFleTJYWmY1YklFZUJxdk9KdlFvUVFpSzdXQSUyQm5QRHdmNjFwTUN2bG1TcHJSSXlFdEhUSXBzMG93NmFiOFFVeUxmNEVHc0hLRElxMU1TM2dRcUFab0ZQeHk2TUolMkJBa3pnSkhobUFDWkFkZmRESlFqcXZJYWh3VHg3a1I3c1ElM0QlM0QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JTNCeC1rOHMtYXdzLWlkJlgtQW16LVNpZ25hdHVyZT1hNTVhMjQ0NzNlMzA4Y2Y4YTlhYzQwNzI3OTY4YTk3MmNiMjA3ZDhjOTVlYzcyZTk2ZTc4NmEwNGY3NGNlNDY1

Second upgrade to version 8.0.0:

aws_eks_cluster_token = k8s-aws-v1.aHR0cHM6Ly9zdHMuYW1hem9uYXdzLmNvbS8_QWN0aW9uPUdldENhbGxlcklkZW50aXR5JlZlcnNpb249MjAxMS0wNi0xNSZYLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFTSUEzTVNLU1FKUkUyREs2N0kyJTJGMjAyMDAxMjElMkZ1cy1lYXN0LTElMkZzdHMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDIwMDEyMVQxNTM4MTRaJlgtQW16LUV4cGlyZXM9MCZYLUFtei1TZWN1cml0eS1Ub2tlbj1Gd29HWlhJdllYZHpFSXolMkYlMkYlMkYlMkYlMkYlMkYlMkYlMkYlMkYlMkZ3RWFEQWlYOXF5OXdrcnJWNm5ncXlLcEFlWjJIaDRpTUl0UCUyRmNCV1hMMEtPSWFFSklGcXhzJTJCY3Z0cnFCZUV2VU1DSG1jNk13dyUyRmVPclo2QjRhdWRHYTBHZFVSVUY1SDQyWHNQTlJpZ1g4ZHRsM2R4MmVjRXFsdkNxaUdGMjIzRlQzV1ozTGVnZDJ3cFhlUzRFYVNsTzZCeVNHajd3RGJmTmMzQU1sNVczelZ2TThHdXZmZ3ZocURQYUUxWGZjJTJCRWRtMWFleTJYWmY1YklFZUJxdk9KdlFvUVFpSzdXQSUyQm5QRHdmNjFwTUN2bG1TcHJSSXlFdEhUSXBzMG93NmFiOFFVeUxmNEVHc0hLRElxMU1TM2dRcUFab0ZQeHk2TUolMkJBa3pnSkhobUFDWkFkZmRESlFqcXZJYWh3VHg3a1I3c1ElM0QlM0QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JTNCeC1rOHMtYXdzLWlkJlgtQW16LVNpZ25hdHVyZT1jZjY0ZjdjYTA5MTlkY2QwZTJlNzc5ZjA3YTFlNGQ2MzFmODQ4Yjg0ZThlMzI5NzFkOTIyMTUwNDdjODZlN2Rk

Is this the expected behaviour or is this a bug with the kubernetes provider?

dpiddockcmp commented 4 years ago

The IAM EKS tokens are only valid for 15 minutes. I would expect the token to change on every run of Terraform.

I guess if the window between generating the token and trying to use it to update aws-auth is too long then you will receive an access denied error.

Does the full output of the apply command give you any hints on when the data source is refreshed?

paulalex commented 4 years ago

The IAM EKS tokens are only valid for 15 minutes. I would expect the token to change on every run of Terraform.

This would be the reason then as the apply takes around 19-20 minutes to finish. If I can defer the retrieval of the data items until the cluster is ready then this issue would probably not appear.

The documentation for data providers suggests depends on can be used but is not recommended.

dotCipher commented 4 years ago

I noticed this issue while trying to debug my problem, and it seems like it's very similar to the issue I was having.

When creating the eks cluster from scratch, I noticed that the k8s provider wasn't referencing the correct endpoint string. More specifically, using this provider declaration:

provider "kubernetes" {
  alias = "kubernetes-utility"
  host = data.aws_eks_cluster.eks-utility.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks-utility.certificate_authority.0.data)
  token = data.aws_eks_cluster_auth.eks-utility.token
  load_config_file = false
}

Caused this issue after apply:

...
aws_subnet.subnet-50add535-subnet-50add535: Modifications complete after 0s [id=subnet-50add535]
module.eks-utility.kubernetes_config_map.aws_auth[0]: Creating...

Error: Post http://localhost/api/v1/namespaces/kube-system/configmaps: dial tcp [::1]:80: connect: connection refused

  on .terraform/modules/eks-utility/terraform-aws-modules-terraform-aws-eks-c9b9c96/aws_auth.tf line 52, in resource "kubernetes_config_map" "aws_auth":
  52: resource "kubernetes_config_map" "aws_auth" {

However, when I move the generated kubeconfig_<eks_name> to ~/.kube/config it seems to have a different error (Error: Unauthorized):

...
aws_iam_policy_attachment.AmazonEKS_CNI_Policy-policy-attachment: Modifications complete after 1s [id=AmazonEKS_CNI_Policy-policy-attachment]
aws_iam_policy_attachment.AmazonEKSClusterPolicy-policy-attachment: Modifications complete after 1s [id=AmazonEKSClusterPolicy-policy-attachment]
aws_iam_policy_attachment.AmazonEC2ContainerRegistryReadOnly-policy-attachment: Modifications complete after 1s [id=AmazonEC2ContainerRegistryReadOnly-policy-attachment]

Error: Unauthorized

  on .terraform/modules/eks-utility/terraform-aws-modules-terraform-aws-eks-c9b9c96/aws_auth.tf line 52, in resource "kubernetes_config_map" "aws_auth":
  52: resource "kubernetes_config_map" "aws_auth" {

This implies that the kubernetes terraform provider is still trying to read the config, instead of referencing the newly created eks cluster which was declared as a tf resource.

Do think this is a terraform bug?

paulalex commented 4 years ago

@dpiddockcmp To get around the issue you had:

My error was on the import command. It was ignoring all settings and picking up the default kube config file in ~/.kube/config. Trying to read from minikube which didn't have the configmap.

I set load_config_file = true and then exported KUBE_CONFIG from the commandline and this seemed to get around that, then changed it back to false to run the apply.

barryib commented 4 years ago

@paulalex can we close this issue ? It seems that you solved your problem.

paulalex commented 4 years ago

@barryib sure I will close it now, I was not able to defer the loading of the data until later on as the module went into error so right now the solution for me is to run it twice and manually which isnt ideal but its unrelated to this issue.

Cheers

paulalex commented 4 years ago

@dotCipher did you manage to work out your issue? I have terraform apply working fine when running terraform from my laptop, even using the assumed role credentials output into the Jenkins log file and running kubectl commands such as kubectl get cm aws-auth -n kube-system outputs the config map.

I think there is a bug in the provider because when I run terraform apply on jenkins using the same credentials I get the same error as you get when you export your config to ~/.kube/config:

module.eks.kubernetes_config_map.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]
2020/02/07 10:24:11 [ERROR] module.eks: eval: *terraform.EvalRefresh, err: Unauthorized
2020/02/07 10:24:11 [ERROR] module.eks: eval: *terraform.EvalSequence, err: Unauthorized

And in my jenkins file log: Error: Unauthorized

My jenkins server is running inside another eks cluster used for management tools and this terraform build is running inside a pod and it is building\managing another eks cluster in a different aws account so I dont know if this is in some way related.

paulalex commented 4 years ago

I have got this fixed on jenkins now... finally! So if you have this issue and are looking for answers see this issue:

https://github.com/terraform-providers/terraform-provider-kubernetes/issues/716

In short, inside the pod running your terraform build (if on a kubernetes cluster), then removing this environment variable should fix the issue:

KUBERNETES_SERVICE_HOST

dotCipher commented 4 years ago

Thanks, that helped

rastakajakwanna commented 4 years ago

Just in case you end up here (like me) while

  1. trying to resolve migration from terraform-aws-eks module version which is using local-exec kubectl to new version calling kubernetes module directly
  2. tf apply complains about Error: configmaps "aws-auth" already exists
  3. tf import fails with Error: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp [::1]:80: connect: connection refused

Fix:

  1. change temporarily your provider to use load_config_file = true and config_path ="your_kubeconf"
  2. tf import kube-config/aws_auth
  3. change your temporary provider configuration back to dynamic data
  4. apply drift
  5. aws eks update-kubeconfig # you know the drill Problem fixed.
abdennour commented 4 years ago

@rastakajakwanna ,regarding your comment https://github.com/terraform-aws-modules/terraform-aws-eks/issues/699#issuecomment-600729809 , I think we share the same situation ( 1, 2, 3) . However, I was not able to get how you fix it. Could you please elaborate more ? For example, post the snippet of TF Code before the Fix , and the new Snippet of TF code after the Fix ?

Appreciated!

paulalex commented 4 years ago

@abdennour This is the same issue I initially had and the same process fixed it for me. In the provider set load_config_file = true and then in your terminal session export KUBECONFIG=<your_config_path> and then run your terraform and see if this helps.

rastakajakwanna commented 4 years ago

@paulalex Everybody suggests exporting KUBECONFIG variable while one can define it in provider.tf instead. That is the difference in my answer in comparison with the others replies.

@abdennour Dynamic config in provider.tf (fails with errors)

provider "kubernetes" {
  host = module.eks.cluster_endpoint
  #cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority.0.data)
  # or use module output
  cluster_ca_certificate = base64decode(module.eks.cluster_ca)
  token                  = data.aws_eks_cluster_auth.eks.token
  load_config_file       = false
  version = "~> 1.9"
}

Change temporarily to static

provider "kubernetes" {
  host = module.eks-dev-infra.cluster_endpoint
  load_config_file = true
  # kubeconfig file relative to path where you execute tf, in my case it is the same dir
  config_path      = "kubeconfig_eks.yaml"
  version = "~> 1.9"
}

Then change back to dynamic.

However, I've found today, that it's not working for my Gitlab pipeline (due to lack of privileges to call eks api). But that's outside of the scope of this comment.

abdennour commented 4 years ago

@rastakajakwanna thank you so much. @paulalex BTW, I am running terraform process inside container and I've already passed KUBECONFIG env file. But it seems the kubernetes provider looks into ~/.kube/config and ignore KUBECONFIG. So what I did , I created my container image based on Terraform but with a customized entrypoint :

#!/bin/bash
if [  -f "${KUBECONFIG}" ] then;
  mkdir -p ${HOME}/.kube
  cat ${KUBECONFIG}  > ${HOME}/.kube/config
fi
exec $@
paulalex commented 4 years ago

Did you change the property load_config_file of the provider to true?

abdennour commented 4 years ago

@paulalex You are right.My mistake that I didn't read the documentation of this provider. Now things work without my custom image. reverting back to hashicorp/terraform official images.

abdennour commented 4 years ago

@rastakajakwanna while upgrading from 7.0.0 to 10.0.0 All issues are gone except this one Error: configmaps "aws-auth" already exists. I started by the static config :

provider "kubernetes" {
  host = module.eks.cluster_endpoint
  load_config_file = true
  # kubeconfig file relative to path where you execute tf, in my case it is the same dir
  config_path      = "kubeconfig_${local.cluster_name}"
  version = "~> 1.9"
}

Should I delete the aws-auth configmap (kubect -n kube-system delete cm aws-auth....) before upgrading the cluster.

@paulalex any thoughts?

jorgex1 commented 4 years ago

I was able to finally figure this out by importing my actual aws-auth configmap but then it got overwritten. is there a way to prevent the terraform module from applying it?

dpiddockcmp commented 4 years ago

You can stop the module from managing the aws-auth configmap by setting manage_aws_auth = false in your module block.

Warning: It will delete your configmap if you have already imported it to the terraform state. Remove it with e.g. terraform state rm module.eks.kubernetes_config_map.aws_auth before applying!

aholbreich commented 4 years ago

@What is the best option if i manage two clusters in one code? i don't use enviromentes or so.

barryib commented 4 years ago

@What is the best option if i manage two clusters in one code? i don't use enviromentes or so.

I never tested it, but I think you can use providers with aliases. One alias for one cluster and provide those aliases to terraform-eks-aws module.

More info https://www.terraform.io/docs/configuration/providers.html#selecting-alternate-providers

aholbreich commented 4 years ago

@barryib this seem to work! thx

sprinkaan88 commented 4 years ago

I am having the same issue. Created an eks cluster and every created successfully except for the aws_auth config map.

But I cannot connect to the eks cluster at all. I run the aws eks update-kubeconfig commnad and it successfully updates my .kube/config But when executing any kubectl command, it fails with : error: You must be logged in to the server (Unauthorized)

So cannot connect to it at all, hence any tampering with the provider to pass it details won't work in my case.

Any ideas why this would be the case; is something else going wrong with the creation for this to fail ?

dpiddockcmp commented 4 years ago

Are you accessing the cluster with the same user that created the cluster originally?

Clusters must be created with an IAM user or role. Do not use the root account, you will not be able to login.

sprinkaan88 commented 4 years ago

Yup, that was the issue, not using the same role.

markhorrocks commented 4 years ago

I had an Unauthorized config map error as well. I deleted a stale ~/.kube/config file and ran apply again which worked.

shankar96 commented 3 years ago

I got into same issue. I resolved with some sorts of observations. the cluster was created using https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v6.0.1/aws_auth.tf and was applying changes through updated eks terraform that is https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v12.2.0/aws_auth.tf have observed the creation of resource is totally different what i did, i deleted the old aws-auth configmap which were existed and did apply it worked perfectly.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.