uswitch / kiam

Integrate AWS IAM with Kubernetes
Apache License 2.0
1.15k stars 238 forks source link

kiam iptables rules broken on AWS EKS... sometimes #508

Closed adq closed 3 years ago

adq commented 3 years ago

Hi, I've been busily getting KIAM working on a newly installed AWS EKS cluster. I've not installed any custom CNI stuff, so I assume I'm using the built in amazon CNI by default.

I'm using K8s 1.21 on EKS, and kiam 4.1 from your prebuilt image. I'm using the latest Amazon linux AMI as deployed by eksctl/eks.

I've got the agent/server setup and setup the correct annotations, but I had a weird glitch: I wasn't getting the correct instance profile back from AWS. ie although the kiam infrastructure is working, the transparent credential replacement wasn't working.

I've got a test node running the kiam-agent and also a pod I've annotated. I'm testing with a simple aws s3 ls.

I got a shell onto the kiam-agent to see what's going on. I found the following iptables had been auto-setup by k8s/kiam:

Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination        
  196 14786 KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
    0     0 DNAT       tcp  --  !eth0   *       0.0.0.0/0            169.254.169.254      tcp dpt:80 to:10.5.41.23:8181

As you can see, kiam has appended its rule after the KUBE-SERVICES one. And also, its never had any matches.

So, I tested manually inserting it (by running iptables -t nat -I 1 ... from the shell) before the KUBE-SERVICES one and retesting:

 pkts bytes target     prot opt in     out     source               destination        
    2   120 DNAT       tcp  --  !eth0      *       0.0.0.0/0            169.254.169.254      tcp dpt:80 to:10.5.41.23:8181
  203 15293 KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
    0     0 DNAT       tcp  --  !eth0   *       0.0.0.0/0            169.254.169.254      tcp dpt:80 to:10.5.41.23:8181

As you can see the rule is now catching packets, and in fact, the credentials insertion is working!

Later on though, I deleted and recreated the agent daemonset, and now the rules were created the correct way round (ie the kiam rule is now first). More testing shows that If I delete and recreate the daemonset, sometimes the rule is first (working) and sometimes it is second (not working)

This sounds like a timing issue from the daemonsets setup... is there a reason you're not always inserting the rule as rule 1?

I'm running the following daemonsets: kiam-agent aws-node kube-proxy

I'm assuming aws-node + kiam-agent are randomly switching which order they start up in, causing this issue.

adq commented 3 years ago

Hi, further investigation shows I'd screwed up the "!eth0" thing originally. Now it works whichever order the rules are in. Weird. Sorry for the noise!