MTU is set appropriately, so packets aren't dropped by underlying network.
Current Behavior
When I deploy vl3 network with default settings, with clients in different clusters:
client interfaces have MTU 8941
in the same cluster there are no connection issues
When client is in a different cluster:
ping 172.16.0.1 works
ping 172.16.0.1 -s 5000 works but not all packets are delivered
ping 172.16.0.1 -s 6000 doesn't work
Local node interfaces have MTU of 9001.
But it seems like traffic between 2 AWS clusters doesn't support this high MTU.
It should be noted that ip a on AWS k8s nodes doesn't have node external IP in the list, so we can't get MTU from it.
Maybe external IP is implemented via some kind of external load balancer on a separate machine.
Failure Information (for bugs)
There are zero issues in NSM application logs.
Control plane doesn't break, data plane doesn't break, small packets pass through connection without issues.
User applications that send data in small chunks work without issues.
But when some application tries to send a lot of data at once all big packets are dropped.
`ip a` in node network namespace
```log
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 0a:e3:8b:c3:f1:f9 brd ff:ff:ff:ff:ff:ff
inet 192.168.46.124/19 brd 192.168.63.255 scope global dynamic eth0
valid_lft 2873sec preferred_lft 2873sec
inet6 fe80::8e3:8bff:fec3:f1f9/64 scope link
valid_lft forever preferred_lft forever
3: enicb6d346a0a4@if3: mtu 9001 qdisc noqueue state UP group default
link/ether 4e:b3:db:4d:bc:b6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::4cb3:dbff:fe4d:bcb6/64 scope link
valid_lft forever preferred_lft forever
4: eni6ae81b41b83@if3: mtu 9001 qdisc noqueue state UP group default
link/ether 6a:ae:c8:22:b3:8d brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::68ae:c8ff:fe22:b38d/64 scope link
valid_lft forever preferred_lft forever
5: eth1: mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 0a:bd:e3:c1:99:f5 brd ff:ff:ff:ff:ff:ff
inet 192.168.36.144/19 brd 192.168.63.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::8bd:e3ff:fec1:99f5/64 scope link
valid_lft forever preferred_lft forever
42: enie72f97ce6ee@if3: mtu 9001 qdisc noqueue state UP group default
link/ether c2:cb:9c:3b:38:72 brd ff:ff:ff:ff:ff:ff link-netnsid 9
inet6 fe80::c0cb:9cff:fe3b:3872/64 scope link
valid_lft forever preferred_lft forever
99: eni0d5f350a167@if3: mtu 9001 qdisc noqueue state UP group default
link/ether 32:f6:81:3b:64:df brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::30f6:81ff:fe3b:64df/64 scope link
valid_lft forever preferred_lft forever
125: enie4b3b6e5391@if3: mtu 9001 qdisc noqueue state UP group default
link/ether 76:b5:89:91:c8:d5 brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::74b5:89ff:fe91:c8d5/64 scope link
valid_lft forever preferred_lft forever
130: eni9e22cd98c7f@if3: mtu 9001 qdisc noqueue state UP group default
link/ether fa:5d:68:36:29:54 brd ff:ff:ff:ff:ff:ff link-netnsid 6
inet6 fe80::f85d:68ff:fe36:2954/64 scope link
valid_lft forever preferred_lft forever
142: eni30dfde8b999@if3: mtu 9001 qdisc noqueue state UP group default
link/ether 1a:03:4a:0e:31:ce brd ff:ff:ff:ff:ff:ff link-netnsid 4
inet6 fe80::1803:4aff:fe0e:31ce/64 scope link
valid_lft forever preferred_lft forever
143: enic7fb4c8f760@if3: mtu 9001 qdisc noqueue state UP group default
link/ether fa:64:78:00:36:e9 brd ff:ff:ff:ff:ff:ff link-netnsid 5
inet6 fe80::f864:78ff:fe00:36e9/64 scope link
valid_lft forever preferred_lft forever
144: eni1901d7496d3@if3: mtu 9001 qdisc noqueue state UP group default
link/ether 36:5c:1a:d5:9f:28 brd ff:ff:ff:ff:ff:ff link-netnsid 7
inet6 fe80::345c:1aff:fed5:9f28/64 scope link
valid_lft forever preferred_lft forever
145: eni979e34c8276@if3: mtu 9001 qdisc noqueue state UP group default
link/ether ee:7e:6c:a2:3a:7a brd ff:ff:ff:ff:ff:ff link-netnsid 8
inet6 fe80::ec7e:6cff:fea2:3a7a/64 scope link
valid_lft forever preferred_lft forever
```
`kubectl describe node`
```log
Name: ip-192-168-46-124.ec2.internal
Roles:
Labels: alpha.eksctl.io/cluster-name=aws-msm-perf-test-2
alpha.eksctl.io/nodegroup-name=aws-msm-perf-test-2
beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=m5.2xlarge
beta.kubernetes.io/os=linux
eks.amazonaws.com/capacityType=ON_DEMAND
eks.amazonaws.com/nodegroup=aws-msm-perf-test-2
eks.amazonaws.com/nodegroup-image=ami-013895b64fa9cbcba
eks.amazonaws.com/sourceLaunchTemplateId=lt-04bca74656998190b
eks.amazonaws.com/sourceLaunchTemplateVersion=1
failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1b
k8s.io/cloud-provider-aws=7bc949e7766fc3c0c52e1829c55878b1
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-192-168-46-124.ec2.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=m5.2xlarge
topology.kubernetes.io/region=us-east-1
topology.kubernetes.io/zone=us-east-1b
Annotations: alpha.kubernetes.io/provided-node-ip: 192.168.46.124
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 11 Sep 2023 10:17:49 +0700
Taints:
Unschedulable: false
Lease:
HolderIdentity: ip-192-168-46-124.ec2.internal
AcquireTime:
RenewTime: Fri, 15 Sep 2023 15:40:01 +0700
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 15 Sep 2023 15:35:37 +0700 Mon, 11 Sep 2023 10:17:47 +0700 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 15 Sep 2023 15:35:37 +0700 Mon, 11 Sep 2023 10:17:47 +0700 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 15 Sep 2023 15:35:37 +0700 Mon, 11 Sep 2023 10:17:47 +0700 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 15 Sep 2023 15:35:37 +0700 Mon, 11 Sep 2023 10:17:59 +0700 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.46.124
ExternalIP: 54.226.247.131
InternalDNS: ip-192-168-46-124.ec2.internal
Hostname: ip-192-168-46-124.ec2.internal
ExternalDNS: ec2-54-226-247-131.compute-1.amazonaws.com
Capacity:
cpu: 8
ephemeral-storage: 83873772Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32386544Ki
pods: 58
Allocatable:
cpu: 7910m
ephemeral-storage: 76224326324
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 31369712Ki
pods: 58
System Info:
Machine ID: ec2a0bfebc1849377752b648679a0237
System UUID: ec2a0bfe-bc18-4937-7752-b648679a0237
Boot ID: 4eeba8c8-7e35-4b11-ad99-d4724d2ea2c1
Kernel Version: 5.10.186-179.751.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.19
Kubelet Version: v1.27.4-eks-8ccc7ba
Kube-Proxy Version: v1.27.4-eks-8ccc7ba
ProviderID: aws:///us-east-1b/i-08933201f0cc43b22
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system aws-node-dph4t 25m (0%) 0 (0%) 0 (0%) 0 (0%) 4d5h
kube-system coredns-79df7fff65-b5fzm 100m (1%) 0 (0%) 70Mi (0%) 170Mi (0%) 4d5h
kube-system coredns-79df7fff65-wq4mw 100m (1%) 0 (0%) 70Mi (0%) 170Mi (0%) 4d5h
kube-system kube-proxy-gjbzk 100m (1%) 0 (0%) 0 (0%) 0 (0%) 4d5h
kube-system metrics-server-5dfcb456c-ctmhx 100m (1%) 0 (0%) 200Mi (0%) 0 (0%) 4d1h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 425m (5%) 0 (0%)
memory 340Mi (1%) 340Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
```
Deploy vl3 network (like this but I placed network service in one of the clusters instead of using a floating registry), with vl3 NSE in cluster 1.
In cluster 2 create kernel client that connects to the vl3 network
Run ip a in this client, make sure that MTU on NSM interface is 8941.
Check that ping works: ping 172.16.0.1 (ip here is supposed to be ip of the vl3 nse)
Check that ping with large additional load doesn't work: ping 172.16.0.1 -s 8000
In my tests I used Wireguard for connection between forwarders.
But previously I also tried to switch to ipsec, and I also had the same connection issues. I didn't specifically check MTU and dropped packet size when using ipsec, though.
Expected Behavior
MTU is set appropriately, so packets aren't dropped by underlying network.
Current Behavior
When I deploy vl3 network with default settings, with clients in different clusters:
ping 172.16.0.1
worksping 172.16.0.1 -s 5000
works but not all packets are deliveredping 172.16.0.1 -s 6000
doesn't workLocal node interfaces have MTU of 9001. But it seems like traffic between 2 AWS clusters doesn't support this high MTU.
It should be noted that
ip a
on AWS k8s nodes doesn't have node external IP in the list, so we can't get MTU from it. Maybe external IP is implemented via some kind of external load balancer on a separate machine.Failure Information (for bugs)
There are zero issues in NSM application logs. Control plane doesn't break, data plane doesn't break, small packets pass through connection without issues. User applications that send data in small chunks work without issues. But when some application tries to send a lot of data at once all big packets are dropped.
`ip a` in node network namespace
```log 1: lo:`kubectl describe node`
```log Name: ip-192-168-46-124.ec2.internal Roles:Steps to Reproduce
I used this config
ip a
in this client, make sure that MTU on NSM interface is 8941.ping 172.16.0.1
(ip here is supposed to be ip of the vl3 nse)ping 172.16.0.1 -s 8000
In my tests I used Wireguard for connection between forwarders. But previously I also tried to switch to ipsec, and I also had the same connection issues. I didn't specifically check MTU and dropped packet size when using ipsec, though.
Context
v1.8.0
,v1.9.0
,v1.10.0
,435613e15732b7fba1c047b495f88b1670f530a5