Closed jonathanburns closed 4 years ago
We've been trying to track down some gnarly networking struggles that feel like they're in this neighborhood, and that this could fix. Any chance you can talk about the motivation for this change?
Thanks so much for your help.
@dbyron0 sure thing :)
The issue is that the rate-limit for ENI requests is very low. If you're trying to spin up a large number of pods simultaneously (common for things like "spark-on-k8s"), you quickly run up against the rate limits.
Well, wouldn't it be exciting if this was it. Sorry if this is a noob question, but if we are hitting the rate limit, is there any evidence somewhere -- a log message or a metric or something?
Thanks again.
@dbyron0 🤔
I haven't seen the error myself. Looking at the code, I'm guessing this might actually manifest in adding a bunch of ENIs until you cannot add any more ENIs for your instance type.
Seems like when you fail to allocate an IP on an ENI, the error would get swallowed here.
https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/master/plugin/ipam/main.go#L110-L111
The code would then attempt to add another ENI (until your instance can't add any more ENIs).
Then it would fail here: https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/master/plugin/ipam/main.go#L117
With a message showing that you cannot add any more ENIs.
We should add more verbose messaging around that. I'll add a ticket to track it.
This PR adds a "ipBatchSize" config to the IPAM plugin.
When creating an interface or requesting additional IPs, the plugin will request the smaller of:
By convention, an ipBatchSize == 0 indicates that the user wants to allocate the max limit every time.