networkservicemesh / deployments-k8s

Apache License 2.0
42 stars 35 forks source link

Data plane connectivity issues on AWS cluster because of incorrect MTU #9798

Open d-uzlov opened 1 year ago

d-uzlov commented 1 year ago

Expected Behavior

MTU is set appropriately, so packets aren't dropped by underlying network.

Current Behavior

When I deploy vl3 network with default settings, with clients in different clusters:

Local node interfaces have MTU of 9001. But it seems like traffic between 2 AWS clusters doesn't support this high MTU.

It should be noted that ip a on AWS k8s nodes doesn't have node external IP in the list, so we can't get MTU from it. Maybe external IP is implemented via some kind of external load balancer on a separate machine.

Failure Information (for bugs)

There are zero issues in NSM application logs. Control plane doesn't break, data plane doesn't break, small packets pass through connection without issues. User applications that send data in small chunks work without issues. But when some application tries to send a lot of data at once all big packets are dropped.

`ip a` in node network namespace ```log 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 9001 qdisc mq state UP group default qlen 1000 link/ether 0a:e3:8b:c3:f1:f9 brd ff:ff:ff:ff:ff:ff inet 192.168.46.124/19 brd 192.168.63.255 scope global dynamic eth0 valid_lft 2873sec preferred_lft 2873sec inet6 fe80::8e3:8bff:fec3:f1f9/64 scope link valid_lft forever preferred_lft forever 3: enicb6d346a0a4@if3: mtu 9001 qdisc noqueue state UP group default link/ether 4e:b3:db:4d:bc:b6 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::4cb3:dbff:fe4d:bcb6/64 scope link valid_lft forever preferred_lft forever 4: eni6ae81b41b83@if3: mtu 9001 qdisc noqueue state UP group default link/ether 6a:ae:c8:22:b3:8d brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet6 fe80::68ae:c8ff:fe22:b38d/64 scope link valid_lft forever preferred_lft forever 5: eth1: mtu 9001 qdisc mq state UP group default qlen 1000 link/ether 0a:bd:e3:c1:99:f5 brd ff:ff:ff:ff:ff:ff inet 192.168.36.144/19 brd 192.168.63.255 scope global eth1 valid_lft forever preferred_lft forever inet6 fe80::8bd:e3ff:fec1:99f5/64 scope link valid_lft forever preferred_lft forever 42: enie72f97ce6ee@if3: mtu 9001 qdisc noqueue state UP group default link/ether c2:cb:9c:3b:38:72 brd ff:ff:ff:ff:ff:ff link-netnsid 9 inet6 fe80::c0cb:9cff:fe3b:3872/64 scope link valid_lft forever preferred_lft forever 99: eni0d5f350a167@if3: mtu 9001 qdisc noqueue state UP group default link/ether 32:f6:81:3b:64:df brd ff:ff:ff:ff:ff:ff link-netnsid 2 inet6 fe80::30f6:81ff:fe3b:64df/64 scope link valid_lft forever preferred_lft forever 125: enie4b3b6e5391@if3: mtu 9001 qdisc noqueue state UP group default link/ether 76:b5:89:91:c8:d5 brd ff:ff:ff:ff:ff:ff link-netnsid 3 inet6 fe80::74b5:89ff:fe91:c8d5/64 scope link valid_lft forever preferred_lft forever 130: eni9e22cd98c7f@if3: mtu 9001 qdisc noqueue state UP group default link/ether fa:5d:68:36:29:54 brd ff:ff:ff:ff:ff:ff link-netnsid 6 inet6 fe80::f85d:68ff:fe36:2954/64 scope link valid_lft forever preferred_lft forever 142: eni30dfde8b999@if3: mtu 9001 qdisc noqueue state UP group default link/ether 1a:03:4a:0e:31:ce brd ff:ff:ff:ff:ff:ff link-netnsid 4 inet6 fe80::1803:4aff:fe0e:31ce/64 scope link valid_lft forever preferred_lft forever 143: enic7fb4c8f760@if3: mtu 9001 qdisc noqueue state UP group default link/ether fa:64:78:00:36:e9 brd ff:ff:ff:ff:ff:ff link-netnsid 5 inet6 fe80::f864:78ff:fe00:36e9/64 scope link valid_lft forever preferred_lft forever 144: eni1901d7496d3@if3: mtu 9001 qdisc noqueue state UP group default link/ether 36:5c:1a:d5:9f:28 brd ff:ff:ff:ff:ff:ff link-netnsid 7 inet6 fe80::345c:1aff:fed5:9f28/64 scope link valid_lft forever preferred_lft forever 145: eni979e34c8276@if3: mtu 9001 qdisc noqueue state UP group default link/ether ee:7e:6c:a2:3a:7a brd ff:ff:ff:ff:ff:ff link-netnsid 8 inet6 fe80::ec7e:6cff:fea2:3a7a/64 scope link valid_lft forever preferred_lft forever ```
`kubectl describe node` ```log Name: ip-192-168-46-124.ec2.internal Roles: Labels: alpha.eksctl.io/cluster-name=aws-msm-perf-test-2 alpha.eksctl.io/nodegroup-name=aws-msm-perf-test-2 beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m5.2xlarge beta.kubernetes.io/os=linux eks.amazonaws.com/capacityType=ON_DEMAND eks.amazonaws.com/nodegroup=aws-msm-perf-test-2 eks.amazonaws.com/nodegroup-image=ami-013895b64fa9cbcba eks.amazonaws.com/sourceLaunchTemplateId=lt-04bca74656998190b eks.amazonaws.com/sourceLaunchTemplateVersion=1 failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1b k8s.io/cloud-provider-aws=7bc949e7766fc3c0c52e1829c55878b1 kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-192-168-46-124.ec2.internal kubernetes.io/os=linux node.kubernetes.io/instance-type=m5.2xlarge topology.kubernetes.io/region=us-east-1 topology.kubernetes.io/zone=us-east-1b Annotations: alpha.kubernetes.io/provided-node-ip: 192.168.46.124 node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Mon, 11 Sep 2023 10:17:49 +0700 Taints: Unschedulable: false Lease: HolderIdentity: ip-192-168-46-124.ec2.internal AcquireTime: RenewTime: Fri, 15 Sep 2023 15:40:01 +0700 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 15 Sep 2023 15:35:37 +0700 Mon, 11 Sep 2023 10:17:47 +0700 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 15 Sep 2023 15:35:37 +0700 Mon, 11 Sep 2023 10:17:47 +0700 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 15 Sep 2023 15:35:37 +0700 Mon, 11 Sep 2023 10:17:47 +0700 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 15 Sep 2023 15:35:37 +0700 Mon, 11 Sep 2023 10:17:59 +0700 KubeletReady kubelet is posting ready status Addresses: InternalIP: 192.168.46.124 ExternalIP: 54.226.247.131 InternalDNS: ip-192-168-46-124.ec2.internal Hostname: ip-192-168-46-124.ec2.internal ExternalDNS: ec2-54-226-247-131.compute-1.amazonaws.com Capacity: cpu: 8 ephemeral-storage: 83873772Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32386544Ki pods: 58 Allocatable: cpu: 7910m ephemeral-storage: 76224326324 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 31369712Ki pods: 58 System Info: Machine ID: ec2a0bfebc1849377752b648679a0237 System UUID: ec2a0bfe-bc18-4937-7752-b648679a0237 Boot ID: 4eeba8c8-7e35-4b11-ad99-d4724d2ea2c1 Kernel Version: 5.10.186-179.751.amzn2.x86_64 OS Image: Amazon Linux 2 Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.6.19 Kubelet Version: v1.27.4-eks-8ccc7ba Kube-Proxy Version: v1.27.4-eks-8ccc7ba ProviderID: aws:///us-east-1b/i-08933201f0cc43b22 Non-terminated Pods: (5 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system aws-node-dph4t 25m (0%) 0 (0%) 0 (0%) 0 (0%) 4d5h kube-system coredns-79df7fff65-b5fzm 100m (1%) 0 (0%) 70Mi (0%) 170Mi (0%) 4d5h kube-system coredns-79df7fff65-wq4mw 100m (1%) 0 (0%) 70Mi (0%) 170Mi (0%) 4d5h kube-system kube-proxy-gjbzk 100m (1%) 0 (0%) 0 (0%) 0 (0%) 4d5h kube-system metrics-server-5dfcb456c-ctmhx 100m (1%) 0 (0%) 200Mi (0%) 0 (0%) 4d1h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 425m (5%) 0 (0%) memory 340Mi (1%) 340Mi (1%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: ```

Steps to Reproduce

  1. Create 2 AWS clusters
    I used this config
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: aws-test-1
  region: us-east-1
  version: '1.27'

kubernetesNetworkConfig:
  ipFamily: IPv4

addons:
- name: vpc-cni
- name: coredns
- name: kube-proxy

iam:
  withOIDC: true

managedNodeGroups:
- name: aws-test-1
  instanceType: m5.2xlarge
  desiredCapacity: 1

  1. Deploy NSM interdomain setup for 2 clusters
  2. Deploy vl3 network (like this but I placed network service in one of the clusters instead of using a floating registry), with vl3 NSE in cluster 1.
  3. In cluster 2 create kernel client that connects to the vl3 network
  4. Run ip a in this client, make sure that MTU on NSM interface is 8941.
  5. Check that ping works: ping 172.16.0.1 (ip here is supposed to be ip of the vl3 nse)
  6. Check that ping with large additional load doesn't work: ping 172.16.0.1 -s 8000

In my tests I used Wireguard for connection between forwarders. But previously I also tried to switch to ipsec, and I also had the same connection issues. I didn't specifically check MTU and dropped packet size when using ipsec, though.

Context

d-uzlov commented 1 year ago

Maybe NSM could do something from this list to mitigate this issue:

Also, maybe there is some misconfiguration in AWS, either on my side or on AWS size.