Open jceresini opened 5 years ago
Running into the same issue here on EKS 1.14.6
.
Ahh....this explains our issue when testing with AWS SSO-created roles too. See the issue referenced in this document. This has been a problem for a quite a while (at least 14 months).
Pertinent passage:
For the rolearn be sure to remove the /aws-reserved/sso.amazonaws.com/ from the rolearn url, otherwise the arn will not be able to authorize as a valid user.
When we stumbled across this I assumed it was something about the SSO role but based on this issue it's probably the path.
We don't use EKS, but have had this issue with 1.12 and 1.14.6 with aws-iam-authenticator. If you edit the configmap to remove the /gitlab-ci
portion, and restart the pods, you will likely find that access works.
My co-worker and I suspect that is because of the way that sts
returns output for assumed role session arns.
We have a role arn:aws:iam::000000000000:/role/bosun/bosun_deploy
that we use for cluster administration of our kops
created clusters.
If you assume the role, and run aws sts get-caller-identity
, we get the following:
{
"UserId": "<redacted-AKID>:<redacted-userid>",
"Account": "000000000000",
"Arn": "arn:aws:sts::000000000000:assumed-role/bosun_deploy/<redacted-userid>"
}
I wish this was fixed, as of now, I'm not sure what to do other than creating a role with a shortened path and switch to it.
I suppose one can also just edit the role that gets input to the configmap itself.
Yeah, removing the path is how I identified it as the cause of the issue.
The field name is rolearn
and the path is part of the ARN for a given role.
I opened this so others running into the issue might find it, and also because I think something needs to address it, whether its documentation (though I don't think docs are sufficient without changing the name of the field in the configmap) or a bugfix
We just discovered the same, by using
$ curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
$ TOKEN=$(aws-iam-authenticator token -i fooCluster --token-only)
$ aws-iam-authenticator verify -i fooCluster -t ${TOKEN}
and comparing the roles that the Pod uses (containing a path) vs. the one that are set in the token (path missing).
For now our workaround is also adding a role mapping to an IAM Role that "doesn't actually exist".
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
I was able to reproduce this issue. I created two roles: K8s-Admin
and K8s-Admin-WithPath
, I created the roles using the following commands:
aws iam create-role \
--role-name K8s-Admin \
--description "Kubernetes administrator role (for AWS IAM Authenticator for Kubernetes)." \
--assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"arn:aws:iam::<account id>:root"},"Action":"sts:AssumeRole","Condition":{}}]}' \
--output text \
--query 'Role.Arn'
aws iam create-role \
--role-name K8s-Admin-WithPath \
--path "/kubernetes/" \
--description "Kubernetes administrator role (for AWS IAM Authenticator for Kubernetes)." \
--assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"arn:aws:iam::<account id>:root"},"Action":"sts:AssumeRole","Condition":{}}]}' \
--output text \
--query 'Role.Arn'
Mapped them to the cluster with:
eksctl create iamidentitymapping --cluster basic-demo --arn arn:aws:iam::<account id>:role/K8s-Admin--group system:masters --username iam-admin
eksctl create iamidentitymapping --cluster basic-demo --arn arn:aws:iam::<accound id>:role/kubernetes/K8s-Admin-WithPath --group system:masters --username iam-admin-withpath
Then attached the AWS ReadOnly
policy to both roles. Next, I created two AWS CLI profiles sandbox-k8s-admin
and sandbox-k8s-admin-withpath
specifying the rolearn
options to trigger an assume role. After creating the roles, I updated my local kubeconfig
:
eksctl utils write-kubeconfig --cluster=basic-demo --profile=sandbox-k8s-admin --set-kubeconfig-context --region=us-east-2
kubectl get nodes
# returned list of nodes, expected
Then switched over to the role with the path
eksctl utils write-kubeconfig --cluster=basic-demo --profile=sandbox-k8s-admin-withpath --set-kubeconfig-context --region=us-east-2
kubectl get nodes
# error: You must be logged in to the server (Unauthorized)
Any news on this? This is quite a weird behavior and hard to detect as an error.
We are seeing this issue as well, any word on resolution?
+1
I've enjoyed my 6+ hours lost to this.
terraform workaround:
join("/", values(regex("(?P<prefix>arn:aws:iam::[0-9]+:role)/[^/]+/(?P<role>.*)", <role-arn>)))
I'm not sure this is still needed with v0.5.1
.
terraform workaround:
join("/", values(regex("(?P<prefix>arn:aws:iam::[0-9]+:role)/[^/]+/(?P<role>.*)", <role-arn>)))
I'm not sure this is still needed with
v0.5.1
.
This was a very easy work-around for us, thank you
Any update? Seems that this is still an issue.
Hello, I'm having the same issue with aws-iam-authenticator
version 0.5.2
This caught me too today what a PIA indeed.. Can confirm that instance role with a path will not be able to auth against the cluster - hopefully this gets fixed soon.
Jan 28 05:05:01 ip-10-31-8-66.us-west-1.compute.internal kubelet[3907]: E0128 05:05:01.251418 3907 kubelet_node_status.go:92] Unable to register node "ip-10-31-8-66.us-west-1.compute.internal" with API server: Unauthorized
Adding this in the hope it saves someone else a few hours of their life.
A fix could be to have iam:GetRole
permissions and "lookup" the full role info by "short" role name.
Canonicalize()
- lookup the full role info to return the Role ARN from AWS (so roles with non default (/
) path would have the correct ARN)https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/get-role.html
I could create a sample PR if that helps.
Between #333, #268, #153 and #98 - would be good to get duplicates closed and it tracked in one place
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
@jceresini would you be willing to update the issue description to mention the likely duplicates? That would help with triage.
I'm not sure what you mean by that @sftim
The issue here is that the aws-auth
configMap expects a roleArn, but you have to mangle the actual roleArn for it to work. When I submitted this, the caveat wasn't documented (to my knowledge). Now this document seems to mention it:
https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
Important
The role ARN cannot include a path. The format of the role ARN must be arn:aws:iam::<123456789012>:role/
. For more information, see aws-auth ConfigMap does not grant access to the cluster.
IMO, that means the roleArn
field in the configMap isn't the roleArn.
If the authentication works without the path, I would assume its easy for the logic that performs the authentication to handle the ARN with or without the path. That would save new users, who enter the actual roleArn into the configMap, from running into this odd behavior... without breaking functionality for everyone that has already entered a path-less roleArn in their config as a workaround.
Please copy the list of duplicates from https://github.com/kubernetes-sigs/aws-iam-authenticator/issues/268#issuecomment-805911359 into the description of this issue @jceresini (at the top - there's an edit button). That copying will make the duplication of issues more obvious.
I've never seen github issues handled that way. Github has a way to mark issues as duplicates and make it obvious: https://docs.github.com/en/issues/tracking-your-work-with-issues/marking-issues-or-pull-requests-as-a-duplicate
Regarding that list of issues:
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
If you were willing to list those issues in the description for this issue, @jceresini, you'd be making life a little easier for other contributors. /remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I think one of the ways to fix this is for the authenticator to use the full arn if provided when doing a lookup, otherwise default to base role path (/
)
The issue seems to be here: https://github.com/kubernetes-sigs/aws-iam-authenticator/blob/85e50980d9d916ae95882176c18f14ae145f916f/pkg/arn/arn.go#L43
Not sure why, but path is dropped from the arn for some reason when doing a match.
If you have the ARN of an assumed role, you cannot infer the ARN of the role that was assumed. The path of the role is not encoded into the ARN of the assumed role. However, it is possible to query the AWS API to find out the ARN of a role that has that name, and to confirm if the Role ID for that role matches the Role ID available to the server part of aws-iam-authenticator
.
Once you know the ARN of the role and have confirmed the Role ID match, you have solid evidence of the caller's actual role ARN.
For more on Role IDs, see https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html (search for AROA
).
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
terraform workaround:
join("/", values(regex("(?P<prefix>arn:aws:iam::[0-9]+:role)/[^/]+/(?P<role>.*)", <role-arn>)))
This didn't work for us on ARNs that contain nested "directories" in the path (e.g. arn:aws:iam::123456789012:role/with/nested/directories
). Here's what did work:
replace(<role-arn>, "//.*//", "/")
I have a role with an ARN that looks like this:
arn:aws:iam::XXXXXXXXXXXX:role/gitlab-ci/gitlab-runner
. My aws-auth configmap was as follow:apiVersion: v1 kind: ConfigMap metadata: name: aws-auth namespace: kube-system data: mapRoles: | - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/EKSWorkerNode username: system:node:{{EC2PrivateDNSName}} groups: - system:bootstrappers - system:nodes - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/EKSServiceWorker username: kubernetes-admin groups: - system:masters - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/gitlab-ci/gitlab-runner username: gitlab-admin groups: - system:masters
I repeated got unauthorized errors from the cluster until I updated the
rolearn
toarn:aws:iam::XXXXXXXXXXXX:role/gitlab-runner
. After that change my access worked as expected.If it makes a difference, I'm using assume-role on our gitlab-runner, and using
aws eks update-kubeconfig --region=us-east-1 --name=my-cluster
to get kubectl configured.
Excuse me, Can U show me what is username: gitlab-admin ? Thanks
Mismo problema ... muchas gracias
@nckturner as you added the tag "important-soon" more than 2 years ago, what is the reason to have this issue still present ? Moreover I think @gothrek22 found the root cause https://github.com/kubernetes-sigs/aws-iam-authenticator/blob/85e50980d9d916ae95882176c18f14ae145f916f/pkg/arn/arn.go#L43, but the code explains nothing so have you an explaination on your side ?
If using paths in IAM is a "bad practice" it should be said, but if not this bug could be a real blocker if you have two roles with the same name in different paths... And it also makes any automation very tricky.
This is an important bug to fix. However, so far no contributor has provided a fix that has been merged.
Anyone who is willing to follow the Kubernetes code of conduct is welcome to work on this. Related to that: if you'd like (ie, if anyone would like) this bug fixed, and are willing to offer a bounty, that offer might help move things forward.
If people want to highlight this issue to the vendor, AWS, then please visit https://github.com/aws/containers-roadmap/issues/573 and add a thumbs-up reaction.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
My team just lost a few hours to this issue today. It'd be great to see it resolved.
My team just lost a few hours to this issue today. It'd be great to see it resolved.
Same thing happened to my team today...
/lifecycle frozen
Any update ? using paths in IAM is a "bad practice" or not ?
Lost two days on this probably should be fixed....
The following PR was merged, and appears to address the problem: https://github.com/kubernetes-sigs/aws-iam-authenticator/pull/670/, though it's unclear to me what the current effective status is, as I don't see any documentation updated as part of the pull request.
Looks like it was merged it but there has not been a release since then.
This change is live with EKS Access Entries, but is currently not looking at paths on roles in the aws-auth config map.
I have a role with an ARN that looks like this:
arn:aws:iam::XXXXXXXXXXXX:role/gitlab-ci/gitlab-runner
. My aws-auth configmap was as follow:I repeated got unauthorized errors from the cluster until I updated the
rolearn
toarn:aws:iam::XXXXXXXXXXXX:role/gitlab-runner
. After that change my access worked as expected.If it makes a difference, I'm using assume-role on our gitlab-runner, and using
aws eks update-kubeconfig --region=us-east-1 --name=my-cluster
to get kubectl configured.