lyft / cni-ipvlan-vpc-k8s

AWS VPC Kubernetes CNI driver using IPvlan
Apache License 2.0
360 stars 58 forks source link

IP batching #65

Closed jonathanburns closed 4 years ago

jonathanburns commented 5 years ago

This PR adds a "ipBatchSize" config to the IPAM plugin.

When creating an interface or requesting additional IPs, the plugin will request the smaller of:

By convention, an ipBatchSize == 0 indicates that the user wants to allocate the max limit every time.

dbyron0 commented 5 years ago

We've been trying to track down some gnarly networking struggles that feel like they're in this neighborhood, and that this could fix. Any chance you can talk about the motivation for this change?

Thanks so much for your help.

jonathanburns commented 5 years ago

@dbyron0 sure thing :)

The issue is that the rate-limit for ENI requests is very low. If you're trying to spin up a large number of pods simultaneously (common for things like "spark-on-k8s"), you quickly run up against the rate limits.

dbyron0 commented 5 years ago

Well, wouldn't it be exciting if this was it. Sorry if this is a noob question, but if we are hitting the rate limit, is there any evidence somewhere -- a log message or a metric or something?

Thanks again.

jonathanburns commented 5 years ago

@dbyron0 🤔

I haven't seen the error myself. Looking at the code, I'm guessing this might actually manifest in adding a bunch of ENIs until you cannot add any more ENIs for your instance type.

Seems like when you fail to allocate an IP on an ENI, the error would get swallowed here.

https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/master/plugin/ipam/main.go#L110-L111

The code would then attempt to add another ENI (until your instance can't add any more ENIs).

Then it would fail here: https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/master/plugin/ipam/main.go#L117

With a message showing that you cannot add any more ENIs.

We should add more verbose messaging around that. I'll add a ticket to track it.