lyft / cni-ipvlan-vpc-k8s

AWS VPC Kubernetes CNI driver using IPvlan
Apache License 2.0
360 stars 58 forks source link

move to chained CNI execution #47

Closed paulnivin closed 6 years ago

paulnivin commented 6 years ago
PaulFurtado commented 6 years ago

I'm excited to see ReuseIPWait implemented! I'm working on switching over to this from a similar AWS CNI plugin we wrote internally before lyft and aws released these plugins. Ours was implemented using routing, and we're itching for the ipvlan performance improvements. I identified IP reuse as an issue and was about to implement a similar change, so I'm happy I don't have to now, thanks!

For background: Early on with our CNI plugin, we noticed that rapid IP reuse was a big issue. We run hundreds of replicated clusters in Kubernetes like MySQL, Redis, etc so when we'd roll our nodes, we'd hit issues where a MySQL or redis process which was replicating from some IP was suddenly trying to replicate from a pod from a different MySQL/Redis cluster before their sidecars had time to tell them about their new master.

For this situation, 60s is generally good enough, but there is still a chance that it is too short, like if an operator is in the process of restarting on another node. We could tune that setting to something like 5 minutes, but then we'd also risk running out of IPs on smaller nodes. The way we handled this on our end was to treat the available IP pool as a FIFO queue so we'd optimistically wait as long as possible before reusing IPs. How would you feel about implementing that in this PR? It looks like it would be as easy as sorting registryFreeIPs from oldest to newest and then inverting the loop to iterate through registryFreeIPs instead of free.

If you'd prefer not to tackle it in this PR, I can follow up separately if you're all on board with the idea. Thanks!

lbernail commented 6 years ago

I just saw that you mentioned us for shared subnets, thanks. We've been using it for a few months on some mid-size clusters (100+ nodes) and it has been working great so far.

In addition, you could add to the doc that the containerd runtime also works (we haven't tested it at scale yet, but it will be the case very soon)

paulnivin commented 6 years ago

@PaulFurtado I'm fine with moving to an approach where we optimistically wait as long as possible before reusing IPs. Let's tackle that as part of a separate PR tho.