lyft / cni-ipvlan-vpc-k8s

AWS VPC Kubernetes CNI driver using IPvlan
Apache License 2.0
360 stars 58 forks source link

Additional interfaces could be in the same subnet #32

Closed lbernail closed 6 years ago

lbernail commented 6 years ago

The IPAM plugin currently requires that all ENIs attached to the instance are in different subnets, but the plugin will work if interfaces are in the same subnet (in IPVLAN L2 mode there is no route lookup on the host, packets just exit from the mater interface)

We currently run a slightly modified version of the plugin that removes this requirement and it works fine.

I'm happy to propose a PR to allow this behavior. Here is a first idea:

To make it work in L3 or L3S mode we will need to create additional ip route tables (one per interface) forcing traffic through the ipvlan master interface and create IP rules for each pod to use the appropriate route table (I haven't tested L3 mode with the plugin yet but I'm not sure it can work without IP rules even with different subnets).

Creating / Deleting route tables for L3/L3S modes should probably be done in configureInterface and a new unConfigureInterface function called in RemoveInterface.

Creating host IP rules could be done in new functions addIPRule and delIPRule called by cmdAdd and cmdDel

Also happy to integrate this in a PR

paulnivin commented 6 years ago

I need to work through all the edge cases, but this sounds intriguing -- I'm particularly concerned about traffic that bounces off of kube-proxy in the default namespace and either ends up going to another pod on the same EC2 instance or exits back out the originating pod to hit another EC2 instance.

paulnivin commented 6 years ago

How about we move to having all additional ENIs be created on the same subnet as the boot ENI and rip out all existing checks and support for multiple subnets? @theatrus and I are fine with this approach.

We can run the changes through conformance tests, but we're not expecting issues. Note that for deploys using large subnets, ARP table limits will need to be raised. We currently operate our legacy stack with large subnets and it's fine in AWS, particular given the lack of broadcast traffic.

Regarding l3/l3s support on AWS, what's the use case?

lbernail commented 6 years ago

I like the idea, it would be simpler The only limit is that the primary interface and the pod interfaces will be in the same subnet (not a big deal but depending on architecture choices it could be a limitation: we would not be able to use the secondary CIDR block for pods for instance)

We have no use case new for l3/l3s but since there is a "mode" option in the plugin I wanted to make sure not to break anything

In any case, I can help and work on the design we will choose

paulnivin commented 6 years ago

I'm also fine if there's an option for pods to be on a different subnet from the control plane/boot subnet. tldr; let's simplify the architecture, but if you all have network design constraints, we're happy to accomodate.

lbernail commented 6 years ago

After thinking about this and discussing it internally, we also think that using the boot ENI and removing support for multiple subnets is a good solution (nice simplification)

We may need other subnets in the future but I think going with the simplest solution is the best for now

I can PR this quickly if you want

paulnivin commented 6 years ago

If you have time to PR, that would be great.

We've also been testing the stack with Calico in policy only mode to add Network Policy support to cni-ipvlan-vpc-k8s. So far so good on testing, but out of an abundance of caution, current Calico documentation https://docs.projectcalico.org/v3.1/getting-started/kubernetes/requirements#ip-pool-configuration mentions that instance IPs must be distinct from pod IPs -- I believe this is just a documentation error and doesn't apply to policy only mode. cc @bpownow

bpownow commented 6 years ago

One of the engineers at Tigera said it's fine as long as a pod is never given an address that a node already uses (per Kubernetes requirements).

lbernail commented 6 years ago

@paulnivin Sure I'll work on it next week and let you know how it goes