jtblin / kube2iam

kube2iam provides different AWS IAM roles for pods running on Kubernetes
BSD 3-Clause "New" or "Revised" License
1.98k stars 319 forks source link

Problem with AWS CNI plugin and podIP cache #244

Closed ltagliamonte-dd closed 4 years ago

ltagliamonte-dd commented 4 years ago

Kubernetes with AWS CNI plugin. The AWS CNI keeps a pool of IPs per EC2 instance and assign the IP to a container when it gets created. I'm running jobs that creates an high churn of containers, and some of those are failing to get credentials, and kube2iam is returning:

2 pods ([acquire-xlxlb dockerbuild-dvq8b]) with the ip 10.6.51.24 indexed

I tracked down the error to this function https://github.com/jtblin/kube2iam/blob/42bea9880c50e88fc9fc544320c64573f66086c8/k8s/k8s.go#L89 The function is using an NewIndexerInformer and logging the discovered pod, the indexing key is the pod name.

What if we use a NewInformer and explicitly manage the cache additions/deletions? If we do so and a pod has hostNetwork we will just not add it to the cache.

Is there currently a workaround for this?

ltagliamonte-dd commented 4 years ago

I'm still recording some failures after refreshing the cache every 5m.

More context: The aws cni plugin has a pool of ENI/IPs per EC2 instance that associates to a pod when k8s creates one. Once the pod is Terminated the IP is reused after a cool down period of 60s (that is not configurable)

I have short lived containers and the cache used by kube2iam can end up containing 2 pods with the same IP. I could be wrong about this, but I remember to read that the order of the k8s pods events is not guaranteed neither is guaranteed that the event is sent by the k8s api. If this is the case the cache can end up in a bad state.

I understand why you implemented the check here: https://github.com/jtblin/kube2iam/blob/master/k8s/k8s.go#L102 but maybe for this case it is possible to query directly the api server and retrieve all the pods info instead of using the cache and make a decision based on fresh data, what do you think? Any other idea? @Jacobious52 @jtblin ?