jtblin / kube2iam

kube2iam provides different AWS IAM roles for pods running on Kubernetes
BSD 3-Clause "New" or "Revised" License
1.96k stars 318 forks source link

New version of kube2iam fails to get regions #367

Open samsilborydoxo opened 7 months ago

samsilborydoxo commented 7 months ago

We had had a cluster set to use the image tag latest, and when we attempted to add a new nodegroup we got this error.

{"level":"fatal","error":"operation error RDS: DescribeDBClusters, failed to resolve service endpoint, an AWS region is required, but was not found","time":"2022-12-22T02:00:30Z","message":"error initializing application"}

Upon doing a rollout restart on the daemonset the issue appeared on our older node group. We tested this on another cluster which was running docker.io/jtblin/kube2iam@sha256:2bcf95c937b0b5149ffe518e087de811e365badeea2a70b094e84b74ae156f33 It broke upon upgrading the image in the same fashion.

After a bit of thrashing around we added "--use-regional-sts-endpoint" to the command line and set the default aws region

        env:
        - name: AWS_REGION
          value: us-east-1
samsilborydoxo commented 7 months ago

We also added ec2:DescribeRegions to the policy, but that didn't resolve this issue although we left it in the kube2iam policy. I'm fairly certain the nodes role had this permission as part of it's other policies.

jtblin commented 7 months ago

latest is 0.11.2. There was no functional change with the previous latest which was 0.11., and the sha you mentioned above doesn't seem to correspond to any recent version so I am not sure what the issue is. STS endpoint is optional afaict.

nsharma-fy commented 7 months ago

We are also seeing similar issue after upgrading to EKS 1.25

$curl http://169.254.169.254:80/latest/meta-data/iam/security-credentials/node-iam-role operation error EC2: DescribeRegions, failed to resolve service endpoint, an AWS region is required, but was not found

jtblin commented 7 months ago

@nsharma-fy have you tried with a previous version of kube2iam? Is it failing because of the latest build or because of the EKS upgrade?

nsharma-fy commented 7 months ago

@jtblin , to add more, the only change we made was to upgrade EKS. The current version of kube2iam was working with EKS 1.24. The issue went away after removing kube2iam and adding a new node. AWS support also said it is kube2iam related.

I can try an older version of kube2iam, do you have a suggestion on which version to test

nsharma-fy commented 7 months ago

@jtblin, old image 0.11.1 works, so something changed in the latest version. It may not be related to the EKS upgrade. The timing was just a coincidence I guess. Thanks for your help.

Would you be looking at fixing the issue?

jtblin commented 7 months ago

The only differences between 0.11.1 and 0.11.2 were upgrading Go and the Alpine image so not sure why this would be happening and what could be "fixed": https://github.com/jtblin/kube2iam/compare/0.11.1...0.11.2

I've changed latest to point to 0.11.1 for now. I've merged the support for IMDSv2 which uses aws-go-sdk-v2 and it's available as jtblin/kube2iam:dev for testing. Can you give it a try?

Btw if you use EKS, may I ask why you are still using kube2iam and not IAM roles for service accounts ?

nsharma-fy commented 7 months ago

@jtblin , I tried jtblin/kube2iam:dev, the operation error EC2: DescribeRegion is seen with it.

We have a few application codes, that do not support sts assumerole API yet, so we depend on kube2iam for them

Bujail commented 7 months ago

I'm also seeing same issue in AWS with kube2iam v0.11.2.

time="2023-12-12T20:53:28Z" level=info msg="GET /latest/meta-data/iam/security-credentials/arn:aws:iam::xxxxxx:role/k8s_role (500) took 0.209007 ms" req.method=GET req.path="/latest/meta-data/iam/security-credentials/arn:aws:iam::xxxxxxx:role/k8s_role" req.remote=192.168.87.207 res.duration=0.20900700000000003 res.status=500 time="2023-12-12T20:53:28Z" level=error msg="Error assuming role operation error EC2: DescribeRegions, failed to resolve service endpoint, an AWS region is required, but was not found" ns.name=kube-system pod.iam.role="arn:aws:iam::xxxxxxxxx:role/k8s_role" req.method=GET req.path="/latest/meta-data/iam/security-credentials/arn:aws:iam::xxxxxxxxx:role/k8s_role" req.remote=192.168.87.212

As @samsilborydoxo mentioned, I added "--use-regional-sts-endpoint" to the command args and set the default aws region

    env:
    - name: AWS_REGION
      value: us-west-2

then kube2iam started to work.

blinohod commented 5 months ago

I want to confirm this issue and looks like image tag on Dockerhub is not matching 0.11.2 git tag (but matching origin/release-0.11.2 instead).

docker run --rm jtblin/kube2iam:0.11.2 --version
Version: 0.11.2 - Commit: 90419d8 - Date: 2023-11-26-23:40
...
git describe 90419d8
0.11.2-9-g90419d8 # <===== 9 commits after 0.11.2 git tag
...
git log 0.11.2..90419d8  --oneline 
90419d8 (origin/release-0.11.2, origin/fix-ci) Build docker cross-platform with buildx
46b84f0 Change coveralls and README badge to use circleci
0204867 Explicitely get x/crypto/ssh/terminal
c801194 CircleCI Commit
d2731a2 Update modules
f31e48e Simplify printing and returning error
d1b7643 Fix iam regions unit tests
750566d Fix github.com/jstemmer/go-junit-report go.sum version
0bf7505 feat: support IMDSv2 ( use aws-go-sdk-v2 ) (#344)

So 0.11.2 container image contains some significant changes added after 0.11.2 git tag ;-)