Closed lbernail closed 6 years ago
I need to work through all the edge cases, but this sounds intriguing -- I'm particularly concerned about traffic that bounces off of kube-proxy in the default namespace and either ends up going to another pod on the same EC2 instance or exits back out the originating pod to hit another EC2 instance.
How about we move to having all additional ENIs be created on the same subnet as the boot ENI and rip out all existing checks and support for multiple subnets? @theatrus and I are fine with this approach.
We can run the changes through conformance tests, but we're not expecting issues. Note that for deploys using large subnets, ARP table limits will need to be raised. We currently operate our legacy stack with large subnets and it's fine in AWS, particular given the lack of broadcast traffic.
Regarding l3/l3s support on AWS, what's the use case?
I like the idea, it would be simpler The only limit is that the primary interface and the pod interfaces will be in the same subnet (not a big deal but depending on architecture choices it could be a limitation: we would not be able to use the secondary CIDR block for pods for instance)
We have no use case new for l3/l3s but since there is a "mode" option in the plugin I wanted to make sure not to break anything
In any case, I can help and work on the design we will choose
I'm also fine if there's an option for pods to be on a different subnet from the control plane/boot subnet. tldr; let's simplify the architecture, but if you all have network design constraints, we're happy to accomodate.
After thinking about this and discussing it internally, we also think that using the boot ENI and removing support for multiple subnets is a good solution (nice simplification)
We may need other subnets in the future but I think going with the simplest solution is the best for now
I can PR this quickly if you want
If you have time to PR, that would be great.
We've also been testing the stack with Calico in policy only mode to add Network Policy support to cni-ipvlan-vpc-k8s. So far so good on testing, but out of an abundance of caution, current Calico documentation https://docs.projectcalico.org/v3.1/getting-started/kubernetes/requirements#ip-pool-configuration mentions that instance IPs must be distinct from pod IPs -- I believe this is just a documentation error and doesn't apply to policy only mode. cc @bpownow
One of the engineers at Tigera said it's fine as long as a pod is never given an address that a node already uses (per Kubernetes requirements).
@paulnivin Sure I'll work on it next week and let you know how it goes
The IPAM plugin currently requires that all ENIs attached to the instance are in different subnets, but the plugin will work if interfaces are in the same subnet (in IPVLAN L2 mode there is no route lookup on the host, packets just exit from the mater interface)
We currently run a slightly modified version of the plugin that removes this requirement and it works fine.
I'm happy to propose a PR to allow this behavior. Here is a first idea:
NewInterface
(aws/interface.go)To make it work in L3 or L3S mode we will need to create additional ip route tables (one per interface) forcing traffic through the ipvlan master interface and create IP rules for each pod to use the appropriate route table (I haven't tested L3 mode with the plugin yet but I'm not sure it can work without IP rules even with different subnets).
Creating / Deleting route tables for L3/L3S modes should probably be done in
configureInterface
and a newunConfigureInterface
function called inRemoveInterface
.Creating host IP rules could be done in new functions
addIPRule
anddelIPRule
called bycmdAdd
andcmdDel
Also happy to integrate this in a PR