Open sathuish opened 3 years ago
Yep, as you discovered Calico can only detect MTU based on the local node's configuration. This was by design, and of course has some limitations. However, like you said, manual MTU configuration does exist for such situations.
Path MTU discovery might solve this, but is an undertaking we've so far tried to avoid due to the extra complexities it involves. For now, leaving this open as an enhancement, but suggest continuing to use manually configured MTU values.
We are trying to install the application across the regions
I'd also strongly recommend against running a single cluster across multiple regions, and instead use availability zones. A single Kubernetes cluster / Calico cluster across multiple regions is bound to cause you some pain, due to added latency and instability caused by running the control plane across the public internet.
If you need redundancy, I'd recommend a separate cluster-per-region, with nodes spread across AZs within the region.
@caseydavenport I wanted to follow up on this existing issue and raise the awareness about problems that happen when advertising services via BGP/ECMP.
In our environment we stumbled upon this and had to implement a fix. Cluster nodes are contained within a single region. Communication within a region always leverages jumbo mtu (inter-node and customer-to-externalIP). Communication between regions happens though upstream routers which have different connectivity means (main MPLS, but also backup VPN).
Even having MPLS in the path have caused issues due to 4 bytes it required for headers. Backup VPN may go though internet and mtu could be 1300 bytes.
The problem is described here https://blog.cloudflare.com/path-mtu-discovery-in-practice/ Cloudflare's implementation is available under https://github.com/cloudflare/pmtud
In our implementation (readme may not be up to date) we:
I am wondering whether that's something you still would consider to be in scope for Calico?
@matthewdupre might be the right one to comment on this.
My first inclination is that this would be best handled as a separate solution, with Calico exposing the necessary surfaces to enable implementing PMTU without actually writing the code into Calico itself. However, I am happy to be convinced otherwise - I am not an expert on PMTU.
@caseydavenport I wanted to follow up on this existing issue and raise the awareness about problems that happen when advertising services via BGP/ECMP.
In our environment we stumbled upon this and had to implement a fix. Cluster nodes are contained within a single region. Communication within a region always leverages jumbo mtu (inter-node and customer-to-externalIP). Communication between regions happens though upstream routers which have different connectivity means (main MPLS, but also backup VPN).
Even having MPLS in the path have caused issues due to 4 bytes it required for headers. Backup VPN may go though internet and mtu could be 1300 bytes.
The problem is described here https://blog.cloudflare.com/path-mtu-discovery-in-practice/ Cloudflare's implementation is available under https://github.com/cloudflare/pmtud
In our implementation (readme may not be up to date) we:
- push icmp 3/4 frag-needed packets into specific nflog group
- take the payload of the frag-needed packet and re-send it to all nodes within same cluster (for now, over separate L2 connection).
I am wondering whether that's something you still would consider to be in scope for Calico?
This is a very interesting discussion , as we are facing the same issue when using Calico in eBPF mode advertising our service ip , we kind of hit the problem of mtu got changed different network when reaches our service IP , when we can't respond to ICMP so packet get dropped.
when we can't respond to ICMP so packet get dropped.
@ehsan310 why can't you respond to icmp?
Any update on this one?
I have upgraded the cluster but did not get a chance to change mtu to default 1500 to see if the issue is fixed , we have a production workload so have to see what can i do to test it @tomastigera
We have an AWS multi-region instance. We are trying to install the application across the regions and the deployment is getting filed due to MTU auto-detection enabled in cni.
Expected Behavior
The data transfer/receiving should happen properly
Current Behavior
We are using calico version 3.18. We have set the MTU value as 0 to auto-detect the MTU for the calico. When it comes to AWS on-prem multi-region deployment, we face issues with the pod to pod communications. when we reduced the MTU value to 1350 the communication works properly without any issues.
Possible Solution
Add Path MTU discovery to Calico
Steps to Reproduce (for bugs)
Enable auto-detection in calico Deploy them in the AWS multi-region Transfer/receive the larger packets
Context
Add Path MTU discovery to Calico
Your Environment
Calico version 3.18.3 Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes v1.20.7 Operating System and version: Centos 7.9.2009