jtblin / kube2iam

kube2iam provides different AWS IAM roles for pods running on Kubernetes
BSD 3-Clause "New" or "Revised" License
1.97k stars 318 forks source link

Intermittent 502 when trying to access IMDS v2 using Kube2iam #307

Open cb-salaikumar opened 3 years ago

cb-salaikumar commented 3 years ago

Issue Details

We are facing intermittent timeout issues when using Kube2iam with IMDS v2.

What does the log say ?

As per the logs, there is a "http proxy error context cancelled " followed by 502 for PUT /api/latest/token requests

How to replicate?

We just tried hitting the IMDS v2 meta data service using both CURL and AWS Java SDK. More than 3 concurrent requests might start triggering the issue.

We tried generating 9K requests with 3 concurrent requests , and have around 10/100 requests failing due to this issue

How do you confirm if the issue is not with IMDS v2 API Experimented the same on the Node Level. We had zero requests failing due to this issue.

Kube2iam versions

Log

ime="2021-04-21T15:39:45Z" level=info msg="PUT /latest/api/token (200) took 2.940652 ms" req.method=PUT req.path=/latest/api/token req.remote=10.10.213.42 res.duration=2.940652 res.status=200 time="2021-04-21T15:39:45Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=GET req.path=/latest/meta-data/ami-id req.remote=10.10.213.42 time="2021-04-21T15:39:45Z" level=info msg="GET /latest/meta-data/ami-id (200) took 3.739825 ms" req.method=GET req.path=/latest/meta-data/ami-id req.remote=10.10.213.42 res.duration=3.739825 res.status=200 time="2021-04-21T15:39:45Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=PUT req.path=/latest/api/token req.remote=10.10.213.42 time="2021-04-21T15:39:45Z" level=info msg="PUT /latest/api/token (200) took 2.322276 ms" req.method=PUT req.path=/latest/api/token req.remote=10.10.213.42 res.duration=2.322276 res.status=200 time="2021-04-21T15:39:45Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=GET req.path=/latest/meta-data/ami-id req.remote=10.10.213.42 time="2021-04-21T15:39:45Z" level=info msg="GET /latest/meta-data/ami-id (200) took 4.099684 ms" req.method=GET req.path=/latest/meta-data/ami-id req.remote=10.10.213.42 res.duration=4.099684 res.status=200 time="2021-04-21T15:39:45Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=PUT req.path=/latest/api/token req.remote=10.10.213.42 time="2021-04-21T15:39:45Z" level=info msg="PUT /latest/api/token (200) took 2.153839 ms" req.method=PUT req.path=/latest/api/token req.remote=10.10.213.42 res.duration=2.153839 res.status=200 2021/04/21 15:39:45 http: proxy error: context canceled time="2021-04-21T15:39:45Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=PUT req.path=/latest/api/token req.remote=10.10.213.42 time="2021-04-21T15:39:45Z" level=info msg="PUT /latest/api/token (502) took 1001.163778 ms" req.method=PUT req.path=/latest/api/token req.remote=10.10.213.42 res.duration=1001.163778 res.status=502 time="2021-04-21T15:39:45Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=GET req.path=/latest/meta-data/ami-id req.remote=10.10.213.42 time="2021-04-21T15:39:45Z" level=info msg="GET /latest/meta-data/ami-id (200) took 4.069870 ms" req.method=GET req.path=/latest/meta-data/ami-id req.remote=10.10.213.42 res.duration=4.06987 res.status=200 time="2021-04-21T15:39:45Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=GET req.path=/latest/dynamic/instance-identity/document req.remote=10.10.213.42 time="2021-04-21T15:39:45Z" level=info msg="GET /latest/dynamic/instance-identity/document (200) took 1.923733 ms" req.method=GET req.path=/latest/dynamic/instance-identity/document req.remote=10.10.213.42 res.duration=1.923733 res.status=200 time="2021-04-21T15:39:45Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=PUT req.path=/latest/api/token req.remote=10.10.213.42 time="2021-04-21T15:39:45Z" level=info msg="PUT /latest/api/token (200) took 3.447264 ms" req.method=PUT req.path=/latest/api/token req.remote=10.10.213.42 res.duration=3.4472639999999997 res.status=200 time="2021-04-21T15:39:45Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=GET req.path=/latest/meta-data/ami-id req.remote=10.10.213.42

cb-salaikumar commented 3 years ago

@jtblin, Please let me know if you need any further information on the issue.

wondersd commented 2 years ago

Ran into similar looking errors as above (502's, "http: proxy error: context canceled", and a long res.duration). In my case i had put requests/limits on the pod that we're too small.