jtblin / kube2iam

kube2iam provides different AWS IAM roles for pods running on Kubernetes
BSD 3-Clause "New" or "Revised" License
1.98k stars 319 forks source link

Add support for Instance Metadata Service Version 2 (IMDSv2) #241

Closed msiuts closed 4 years ago

msiuts commented 4 years ago

AWS announced the Instance Metadata Service Version 2 (IMDSv2) which is already deployed and supported by the latests SDKs:

https://aws.amazon.com/about-aws/whats-new/2019/11/announcing-updates-amazon-ec2-instance-metadata-service/

The SDKs now try to do a PUT request to get the Session Token, which Kube2IAM answers with 403. Then the SDKs fall back the the IMDSv1 and work fine, but it causes warnings in the logs and is also reflected in the metrics.

It would be great if in future Kube2IAM would also implement IMDSv2.

btalbot commented 4 years ago

For k8s users running on AWS and upgrading to aws sdk that support IMDSv2 this is a breaking change for us. Not the fault of kube2iam but still breaking.

Logs of failing requests look like

kube2iam time="2019-11-21T20:42:14Z" level=info msg="PUT /latest/api/token (403) took 701714.000000 ns" req.method=PUT req.path=/latest/api/token req.remote=100.100.0.15 res.duration=701714 res.status=403

the response is very slow and the symptom to apps is very slow responses. However the SDK seems to fallback after several failures to the old metadata api so credentials are eventually available.

btalbot commented 4 years ago

A few more details about it from AWS here: https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/

Seems like they updated most (maybe all by now) of their SDKs to expect IMDSv2 token service to be available. Ruby aws-sdk-core starting with 3.79.0 will retry the token service PUT many times with backoff before falling back to V1 credentials.

nithu0115 commented 4 years ago

@btalbot I ran into similar issue yesterday. I then modified the response-hops to 3 which resolved my issue.

I hope that helps!

aws ec2 modify-instance-metadata-options --instance-id <instance ID> --http-put-response-hop-limit 3 --http-endpoint enabled --region <region>

from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html#configuring-instance-metadata-options

btalbot commented 4 years ago

Hmm, so you're saying that IMDS is accepting the PUT from kube2iam which sets the x-forwarded-for header? They document that those will be rejected with a 403 which is the symptom I'm seeing.

btalbot commented 4 years ago

From my reading of the code at https://github.com/jtblin/kube2iam/blob/b4d482cbc4e11da2b14a47502ae058ba734e766e/server/server.go#L356

the httputil reverse proxy will always add the 'X-Forwarded-For' header which is generally the correct thing to do, but AWS intentionally rejects PUT requests which contain this now whether using kube2iam or not.

curl -isS -X PUT http://169.254.169.254/latest/api/token -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"
HTTP/1.0 200 OK
Content-Length: 56
Content-Type: text/plain
Date: Fri, 22 Nov 2019 23:56:25 GMT
X-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600
Connection: close
Server: EC2ws
curl -isS -X PUT http://169.254.169.254/latest/api/token -H "X-aws-ec2-metadata-token-ttl-seconds: 21600" -H "x-forwarded-for: 1.2.3.4"
HTTP/1.0 403 Forbidden
Content-Length: 337
Content-Type: text/html
Date: Fri, 22 Nov 2019 23:56:35 GMT
Connection: close
Server: EC2ws
btalbot commented 4 years ago

AWS has closed the issue and has decided that they will not allow their SDK to not attempt to use IMDSv2. This means that any app using any aww-sdk (for any lang) running in a kube cluster that is also using kube2iam will see significant delays to get instance profile credentials.

They suggest that all HTTP prroxies to IMDS must not include the X-Forwarded-For HTTP header as the only way to avoid at least one round trip timeout period.

beveradb commented 4 years ago

I believe we're currently affected by this issue too, but don't quite understand it well enough to say yet.

However, we're seeing very slow (5-10 seconds per request) completion of AWS CLI commands, and a ton of PUT /latest/api/token (403) errors in the kube2iam logs, which both match @btalbot 's comments above.

Is there anything we can do to help fix this, or is the workaround to downgrade to an older aws sdk version for now?

btalbot commented 4 years ago

AFAIK, the work-arounds are to:

rbvigilante commented 4 years ago

There's a bit more discussion in https://github.com/uswitch/kiam/issues/359, but I'm pretty sure it's possible to stop the httputil reverse proxy from adding the X-Forwarded-For header by stripping the RemoteAddr from the request, either in a custom handler or the code around the proxy. I made a pull request for Kiam that should, I think, fix this issue (https://github.com/uswitch/kiam/pull/381), if that's of any use.

rbvigilante commented 4 years ago

I can confirm that the change I made in Kiam does indeed allow traffic through to IMDSv2.

marko-asplund commented 4 years ago

Any news on this? Would the change made in Kiam also be applicable for kube2iam?

arunalakmal commented 4 years ago

Any news on this?

TattiQ commented 4 years ago

Hey guys, any news on this one ?

mertant commented 4 years ago

Hi, any updates on this issue?

hanyouqing commented 4 years ago

Hi, any progress on this issue?

fradee commented 4 years ago

@jtblin Could you take a look at IMDSv2 (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html)? The kube2iam needs a fix. Or this project has died?

marko-asplund commented 4 years ago

@jtblin @struz @mwhittington21 I'd like to start by thanking you for your contributions on this project! 🙇 👍

I realise owning and making a git repo publicly available doesn't imply responsibility to fix issues, but since this is an issue that affects quite many persons it would be great to at least get some sort of response, be it "busy with other stuff", "I see problems with the proposed fix" or what have you.

IMO it would be fair to at least provide this information to help people make their contingency plans (e.g. forking, migrating etc.).

The last commit in this repo is on March 5th, so it would also great to know whether the project is in fact alive as asked by @fradee.

Thank you! 👍

mwhittington21 commented 4 years ago

Hi @marko-asplund, sorry for the late reply.

I think that you can assume that this repository is in maintenance mode at this point. There are very few new features being worked on and it is definitely not the primary focus of @jtblin and myself right now. We are happy to accept work and contributions but will not likely be spearheading any new initiatives or features for kube2iam.

marko-asplund commented 4 years ago

@mwhittington21 Thanks for your reply! And thanks again for the work you and @jtblin have put into this project! 🙇

@DaspawnW has submitted PR #270 that according to some reports on the PR fixes this issue. Any chance of getting that PR reviewed and if it checks out fine, publish a new release?

mwhittington21 commented 4 years ago

Sure thing, I'll take a look now

mwhittington21 commented 4 years ago

This should be fixed now after merging #270

marko-asplund commented 4 years ago

Awesome - thanks @mwhittington21 & @DaspawnW! 👍 🙇

ahmsb8884 commented 3 years ago

How is it.. fixed? I am still observing on 10.11

ghost commented 2 years ago

It does work, I dug a litttle deeper and started to suspect the framework was injecting some headers that the metadata endpoint disagree with.... then I found that the helm chart was delivering the old 0.10.9 image.

Pop this on the end of your values.yaml and try again

@ahmsb8884 FYI

image:
  tag: 0.10.11