kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.82k stars 1.41k forks source link

Does not work on Fresh EKS Cluster with Amazon Linux 2023 AMI Type Nodes #3695

Open sschamp opened 1 month ago

sschamp commented 1 month ago

Describe the bug The pods fail to run on EKS Nodes which are using AL2023 instead of AL2:

{"level":"info","ts":"2024-05-14T10:15:35Z","msg":"version","GitVersion":"v2.7.2","GitCommit":"fb6460383b75e937e24548e69b6732f49b88755c","BuildDate":"2024-03-22T21:39:56+0000"}
{"level":"error","ts":"2024-05-14T10:15:38Z","logger":"setup","msg":"unable to initialize AWS cloud","error":"failed to introspect vpcID from EC2Metadata or Node name, specify --aws-vpc-id instead if EC2Metadata is unavailable: failed to fetch VPC ID from instance metadata: EC2MetadataError: failed to make EC2Metadata request\n\n\tstatus code: 401, request id: "}

Steps to reproduce

Expected outcome The Pods to be able to read the meta-data of the Node Instance.

Environment

Additional Context:

It might be because AL2023 no longer allows you to query http://169.254.169.254/latest/meta-data/ directly. They have started using IMDSv2 instead of IMDSv1. (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html) You need to provide a token first:

curl -s http://169.254.169.254/latest/meta-data/ --header "X-aws-ec2-metadata-token: $TOKEN"

eg: /usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $(curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")" http://169.254.169.254/latest/meta-data/instance-id

aravindsagar commented 1 month ago

/kind bug

oliviassss commented 1 month ago

@sschamp, hi, it might be the ec2 instance use hop limit as 1 for default, can you change the hop limit to 2 and see if it fixes the issue? or you can specify the --aws-vpc-id directly through the controller flag. see: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/deploy/installation/#using-the-amazon-ec2-instance-metadata-server-version-2-imdsv2

sschamp commented 1 month ago

I went with the option of manually specifying --aws-vpc-id and all is well again. This issue can be closed.

oliviassss commented 1 month ago

Thanks for the confirmation, closing it now.

fcuello-fudo commented 1 month ago

Thanks for the confirmation, closing it now.

Can we please reopen? Although setting --aws-vpc-id works this is not matching the documentation, which states that if it's not specified it will be auto-detected (which works for Amazon Linux 2 but not for 2023). This is a breaking change that more people will likely encounter as AL2023 get more exposed.

oliviassss commented 1 month ago

@fcuello-fudo, can you check your instance hop limit? in order for the controller to fetch the vpc id it requires the hop limit to be at least 2 we call out in live doc: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.8/deploy/installation/#using-the-amazon-ec2-instance-metadata-server-version-2-imdsv2

jenademoodley commented 3 weeks ago

By default, AL2023 nodes in a managed node group have the hop limit set to 1: https://aws.amazon.com/blogs/containers/amazon-eks-optimized-amazon-linux-2023-amis-now-available/

For IMDSv2 the default hop count for MNG is set to 1.

Can we update the docs as this is a breaking change on the load balancer controller which users won't immediately be aware of.

fcuello-fudo commented 3 weeks ago

Can we update the docs as this is a breaking change on the load balancer controller which users won't immediately be aware of.

Yeah, that was also my point.

oliviassss commented 3 weeks ago

/kind documentation