Closed liyihuang closed 2 months ago
@liyihuang, interesting, thanks for the report. Since you're using cillium, I suppose you don't have vpc cni in your cluster right? check if you're hitting this? https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.8/how-it-works/#ip-mode
IP mode¶ Ingress traffic starts at the ALB and reaches the Kubernetes pods directly. CNIs must support directly accessible POD ip via secondary IP addresses on ENI.
no, I have removed the AWS CNI in the EKS environment.
Cilium has the ENI mode which it will manage IP from VPC and will assign them as the secondary IP on the NIC(https://docs.cilium.io/en/latest/network/concepts/ipam/eni/).
I think I might know why while I'm typing here. Cilium is the CNI with the EKS, and the cilium agent is not a normal pod with the pod IP address but the IP from the host. AWS LB controller will not be able to find the "endpoint IP" it's looking for like the others. I was going to throw the dlv to the aws-lb-controller but it looks like it's not necessary.
(⎈|arn:aws:eks:ca-central-1:679388779924:cluster/image-learning-liyi-2:default)~/go/bin k get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
aws-load-balancer-controller-775bc4868f-9xckb 1/1 Running 0 42m 10.6.1.79 ip-10-6-1-188.ca-central-1.compute.internal <none> <none>
cilium-2rwtv 1/1 Running 0 57m 10.6.0.83 ip-10-6-0-83.ca-central-1.compute.internal <none> <none>
cilium-bd5bq 1/1 Running 0 57m 10.6.2.181 ip-10-6-2-181.ca-central-1.compute.internal <none> <none>
cilium-kcdz8 1/1 Running 0 57m 10.6.1.188 ip-10-6-1-188.ca-central-1.compute.internal <none> <none>
cilium-operator-67bc84576c-6gp59 1/1 Running 0 57m 10.6.1.63 ip-10-6-1-63.ca-central-1.compute.internal <none> <none>
cilium-operator-67bc84576c-ptzjf 1/1 Running 0 57m 10.6.0.83 ip-10-6-0-83.ca-central-1.compute.internal <none> <none>
cilium-pvcxs 1/1 Running 0 57m 10.6.1.63 ip-10-6-1-63.ca-central-1.compute.internal <none> <none>
coredns-68c6b7b454-6gmx2 1/1 Running 0 70m 10.6.1.128 ip-10-6-1-63.ca-central-1.compute.internal <none> <none>
coredns-68c6b7b454-mfb6t 1/1 Running 0 70m 10.6.1.142 ip-10-6-1-63.ca-central-1.compute.internal <none> <none>
hubble-relay-665f995b56-2ps5x 1/1 Running 1 (57m ago) 63m 10.6.1.137 ip-10-6-1-63.ca-central-1.compute.internal <none> <none>
hubble-relay-665f995b56-7c7dh 1/1 Running 1 (57m ago) 63m 10.6.1.132 ip-10-6-1-63.ca-central-1.compute.internal <none> <none>
hubble-relay-665f995b56-jm8vx 1/1 Running 1 (57m ago) 63m 10.6.1.131 ip-10-6-1-63.ca-central-1.compute.internal <none> <none>
my-external-dns-94658f555-vblj4 1/1 Running 0 63m 10.6.1.140 ip-10-6-1-63.ca-central-1.compute.internal <none> <none>
As you can see here cilium-agent is on the host NS using the same host IP address.
Describe the bug the aws-load-balancer-controller just not registering targets for cilium gateway API service when the nlb-target-type is ip.
I have the following services some of them comming from cilium gateway with gateway API, details service are just normal service.
the aws aws-load-balancer-controller just not creating the actual backend for the 1st one when using
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
, but this annotation works fine with details service.In the logs LB controller logs, you can see it just dont have the registering targets messages that I usually see for other types.
here is the aws console screenshot
Steps to reproduce
Expected outcome the AWS LB controller should registering targets
Environment
AWS Load Balancer controller version {"level":"info","ts":"2024-06-25T21:20:04Z","msg":"version","GitVersion":"v2.7.1","GitCommit":"f689bbdf73d30f23b44acfef2c3b8e7280cd66ee","BuildDate":"2024-02-09T16:21:17+0000"}
Kubernetes version 1.29.4-eks-036c24b
Using EKS (yes/no), if so version? 1.29.4-eks-036c24b
Additional Context: I'm from the cilium team and will try my best to figure it out if that's caused by cilium