projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.9k stars 1.31k forks source link

Support for NAT between Kubernetes nodes #5171

Open chaudhryfaisal opened 2 years ago

chaudhryfaisal commented 2 years ago

Expected Behavior

pod to pod networking should work in VXLAN mode

Current Behavior

pod to pod networking is not working across NAT

Your Environment

I have multiple nodes in different NAT ranges as shown in the diagram below as Scenario A and Scenario B.

multi-nat-k8s (1)

Deployment

Calico

#calico-config-cm.yaml
apiVersion: v1
data:
  # Configure the backend to use.
  calico_backend: "vxlan"
  # The CNI network configuration to install on each node. The special
  # values in this config will be automatically populated.
  cni_network_config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "calico",
          "log_level": "info",
          "log_file_path": "/var/log/calico/cni/cni.log",
          "datastore_type": "kubernetes",
          "nodename": "__KUBERNETES_NODE_NAME__",
          "mtu": __CNI_MTU__,
          "ipam": {
              "type": "calico-ipam"
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        },
        {
          "type": "bandwidth",
          "capabilities": {"bandwidth": true}
        }
      ]
    }
  # Typha is disabled.
  typha_service_name: "none"
  # Configure the MTU to use for workload interfaces and tunnels.
  # By default, MTU is auto-detected, and explicitly setting this field should not be required.
  # You can override auto-detection by providing a non-zero value.
#  veth_mtu: "0"
  veth_mtu: "1350"
# Source: calico/templates/calico-config.yaml
# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
metadata:
  name: calico-config
  namespace: kube-system

----
#calico-node-daemonset.yaml
apiVersion: apps/v1
# Source: calico/templates/calico-node.yaml
# This manifest installs the calico-node container, as well
# as the CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
metadata:
  labels:
    k8s-app: calico-node
  name: calico-node
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: calico-node
  template:
    metadata:
      labels:
        k8s-app: calico-node
    spec:
      containers:
        # Runs calico-node container on each Kubernetes node. This
        # container programs network policy and routes on each
        # host.
        - env:
            # Use Kubernetes API as the backing datastore.
            - name: DATASTORE_TYPE
              value: "kubernetes"
            # Wait for the datastore.
            - name: WAIT_FOR_DATASTORE
              value: "true"
            # Set based on the k8s node name.
            - name: NODENAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            # Choose the backend to use.
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  key: calico_backend
                  name: calico-config
            # Cluster type to identify the deployment type
            - name: CLUSTER_TYPE
              value: "k8s,bgp"
            # Auto-detect the BGP IP address.
            - name: IP
              value: "autodetect"
            # Enable IPIP
            #            - name: CALICO_IPV4POOL_IPIP
            #              value: "Never"
            # Enable or Disable VXLAN on the default IP pool.
            - name: CALICO_IPV4POOL_VXLAN
              value: "Always"
            # Set MTU for tunnel device used if ipip is enabled
            - name: FELIX_IPINIPMTU
              valueFrom:
                configMapKeyRef:
                  key: veth_mtu
                  name: calico-config
            # Set MTU for the VXLAN tunnel device.
            - name: FELIX_VXLANMTU
              valueFrom:
                configMapKeyRef:
                  key: veth_mtu
                  name: calico-config
            # Set MTU for the Wireguard tunnel device.
            - name: FELIX_WIREGUARDMTU
              valueFrom:
                configMapKeyRef:
                  key: veth_mtu
                  name: calico-config
            # The default IPv4 pool to create on startup if none exists. Pod IPs will be
            # chosen from this range. Changing this value after installation will have
            # no effect. This should fall within `--cluster-cidr`.
            - name: CALICO_IPV4POOL_CIDR
              value: "10.42.0.0/16"
            # Disable file logging so `kubectl logs` works.
            - name: CALICO_DISABLE_FILE_LOGGING
              value: "true"
            # Set Felix endpoint to host default action to ACCEPT.
            - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
              value: "ACCEPT"
            # Disable IPv6 on Kubernetes.
            - name: FELIX_IPV6SUPPORT
              value: "false"
            # Set Felix logging to "info"
            - name: FELIX_LOGSEVERITYSCREEN
              value: "info"
            - name: FELIX_HEALTHENABLED
              value: "true"
          envFrom:
            - configMapRef:
                # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
                name: kubernetes-services-endpoint
                optional: true
          image: rancher/mirrored-calico-node:v3.17.2
          imagePullPolicy: IfNotPresent
          livenessProbe:
            exec:
              command:
                - /bin/calico-node
                - -felix-live
            #                - -bird-live
            failureThreshold: 6
            initialDelaySeconds: 10
            periodSeconds: 10
          name: calico-node
          readinessProbe:
            exec:
              command:
                - /bin/calico-node
                - -felix-ready
            #                - -bird-ready
            periodSeconds: 10
          resources:
            requests:
              cpu: 250m
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /lib/modules
              name: lib-modules
              readOnly: true
            - mountPath: /run/xtables.lock
              name: xtables-lock
              readOnly: false
            - mountPath: /var/run/calico
              name: var-run-calico
              readOnly: false
            - mountPath: /var/lib/calico
              name: var-lib-calico
              readOnly: false
            - mountPath: /var/run/nodeagent
              name: policysync
            # For eBPF mode, we need to be able to mount the BPF filesystem at /sys/fs/bpf so we mount in the
            # parent directory.
            - mountPath: /sys/fs/
              # Bidirectional means that, if we mount the BPF filesystem at /sys/fs/bpf it will propagate to the host.
              # If the host is known to mount that filesystem already then Bidirectional can be omitted.
              mountPropagation: Bidirectional
              name: sysfs
            - mountPath: /var/log/calico/cni
              name: cni-log-dir
              readOnly: true
      hostNetwork: true
      initContainers:
        # This container performs upgrade from host-local IPAM to calico-ipam.
        # It can be deleted if this is a fresh installation, or if you have already
        # upgraded to use calico-ipam.
        - command: ["/opt/cni/bin/calico-ipam", "-upgrade"]
          env:
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  key: calico_backend
                  name: calico-config
          envFrom:
            - configMapRef:
                # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
                name: kubernetes-services-endpoint
                optional: true
          image: rancher/mirrored-calico-cni:v3.17.2
          imagePullPolicy: IfNotPresent
          name: upgrade-ipam
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /var/lib/cni/networks
              name: host-local-net-dir
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
        # This container installs the CNI binaries
        # and CNI network config file on each node.
        - command: ["/opt/cni/bin/install"]
          env:
            # Name of the CNI config file to create.
            - name: CNI_CONF_NAME
              value: "10-calico.conflist"
            # The CNI network config to install on each node.
            - name: CNI_NETWORK_CONFIG
              valueFrom:
                configMapKeyRef:
                  key: cni_network_config
                  name: calico-config
            # Set the hostname based on the k8s node name.
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            # CNI MTU Config variable
            - name: CNI_MTU
              valueFrom:
                configMapKeyRef:
                  key: veth_mtu
                  name: calico-config
            # Prevents the container from sleeping forever.
            - name: SLEEP
              value: "false"
          envFrom:
            - configMapRef:
                # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
                name: kubernetes-services-endpoint
                optional: true
          image: rancher/mirrored-calico-cni:v3.17.2
          imagePullPolicy: IfNotPresent
          name: install-cni
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
            - mountPath: /host/etc/cni/net.d
              name: cni-net-dir
        - image: rancher/mirrored-calico-pod2daemon-flexvol:v3.17.2
          imagePullPolicy: IfNotPresent
          name: flexvol-driver
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /host/driver
              name: flexvol-driver-host
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-node-critical
      restartPolicy: Always
      serviceAccountName: calico-node
      # Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
      # deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
      terminationGracePeriodSeconds: 0
      tolerations:
        # Make sure calico-node gets scheduled on all nodes.
        - effect: NoSchedule
          operator: Exists
        # Mark the pod as a critical add-on for rescheduling.
        - key: CriticalAddonsOnly
          operator: Exists
        - effect: NoExecute
          operator: Exists
      volumes:
        # Used by calico-node.
        - hostPath:
            path: /lib/modules
          name: lib-modules
        - hostPath:
            path: /var/run/calico
          name: var-run-calico
        - hostPath:
            path: /var/lib/calico
          name: var-lib-calico
        - hostPath:
            path: /run/xtables.lock
            type: FileOrCreate
          name: xtables-lock
        - hostPath:
            path: /sys/fs/
            type: DirectoryOrCreate
          name: sysfs
        # Used to install CNI.
        - hostPath:
            path: /opt/cni/bin
          name: cni-bin-dir
        - hostPath:
            path: /etc/cni/net.d
          name: cni-net-dir
        # Used to access CNI logs.
        - hostPath:
            path: /var/log/calico/cni
          name: cni-log-dir
        # Mount in the directory for host-local IPAM allocations. This is
        # used when upgrading from host-local to calico-ipam, and can be removed
        # if not using the upgrade-ipam init container.
        - hostPath:
            path: /var/lib/cni/networks
          name: host-local-net-dir
        # Used to create per-pod Unix Domain Sockets
        - hostPath:
            path: /var/run/nodeagent
            type: DirectoryOrCreate
          name: policysync
        # Used to install Flex Volume Driver
        - hostPath:
            path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
            type: DirectoryOrCreate
          name: flexvol-driver-host
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

Kubernetes

k get nodes
NAME             STATUS   ROLES               AGE   VERSION
10.158.192.106   Ready    worker              63m   v1.20.6
10.158.84.15     Ready    controlplane,etcd   63m   v1.20.6

$ k get pod -A -o wide
NAMESPACE       NAME                                      READY   STATUS      RESTARTS   AGE   IP               NODE             NOMINATED NODE   READINESS GATES
default         hello-cj82k                               2/2     Running     0          54m   10.42.43.5       10.158.192.106   <none>           <none>
default         hello-r85cc                               2/2     Running     0          54m   10.42.39.65      10.158.84.15     <none>           <none>
ingress-nginx   default-http-backend-f9745f-75p6x         1/1     Running     0          67m   10.42.43.4       10.158.192.106   <none>           <none>
ingress-nginx   nginx-ingress-controller-dkbhw            1/1     Running     1          67m   10.158.192.106   10.158.192.106   <none>           <none>
kube-system     calico-node-pw7l9                         1/1     Running     0          62m   10.158.84.15     10.158.84.15     <none>           <none>
kube-system     calico-node-r5hx5                         1/1     Running     0          57m   10.158.192.106   10.158.192.106   <none>           <none>
kube-system     coredns-5d5f598fb9-5h7nn                  1/1     Running     0          67m   10.42.43.3       10.158.192.106   <none>           <none>
kube-system     coredns-autoscaler-6f6d97f658-mvtbx       1/1     Running     0          67m   10.42.43.1       10.158.192.106   <none>           <none>
kube-system     metrics-server-74f44bbc45-xd2v8           1/1     Running     0          67m   10.42.43.2       10.158.192.106   <none>           <none>
kube-system     rke-coredns-addon-deploy-job-blnvp        0/1     Completed   0          67m   10.158.84.15     10.158.84.15     <none>           <none>
kube-system     rke-ingress-controller-deploy-job-s6z7r   0/1     Completed   0          67m   10.158.84.15     10.158.84.15     <none>           <none>
kube-system     rke-metrics-addon-deploy-job-7m2qq        0/1     Completed   0          67m   10.158.84.15     10.158.84.15     <none>           <none>

$ k get ippool
NAME                  AGE
default-ipv4-ippool   58m

$ k get ippool default-ipv4-ippool -o yaml
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  annotations:
    projectcalico.org/metadata: '{"uid":"fd09acbe-8bb8-4a95-87fc-94959c852330","creationTimestamp":"2021-12-09T19:54:38Z"}'
  creationTimestamp: "2021-12-09T19:54:38Z"
  generation: 1
  name: default-ipv4-ippool
  resourceVersion: "1110"
  uid: 19b33ef8-c3a7-43a0-9d60-1323d005ce00
spec:
  blockSize: 26
  cidr: 10.42.0.0/16
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Always

Routes and Network Interfaces

------------- 10.158.84.15 ( node2 ) -------------
> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:b3:f3:08 brd ff:ff:ff:ff:ff:ff
    inet 172.20.137.2/24 brd 172.20.137.255 scope global eth0
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:19:d5:80:a8 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
6: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 66:73:9e:f9:f6:53 brd ff:ff:ff:ff:ff:ff
    inet 10.42.39.64/32 scope global vxlan.calico
       valid_lft forever preferred_lft forever
7: calia78822465be@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
> ip r
default via 172.20.137.1 dev eth0 proto static 
10.42.39.65 dev calia78822465be scope link 
10.42.43.0/26 via 10.42.43.0 dev vxlan.calico onlink 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.20.137.0/24 dev eth0 proto kernel scope link src 172.20.137.2 
------------- 10.158.192.106 : ( node1 ) -------------
> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 169.254.0.2/32 scope global lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:8f:e8:d1 brd ff:ff:ff:ff:ff:ff
    inet 10.158.192.106/23 brd 10.158.193.255 scope global eth0
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:ad:d7:ed:d7 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
6: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 66:17:09:ef:c7:95 brd ff:ff:ff:ff:ff:ff
    inet 10.42.43.0/32 scope global vxlan.calico
       valid_lft forever preferred_lft forever
7: cali51bfbe3480a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
8: cali066c9d0d2ec@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
9: calibf3a63224a5@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
10: caliaa03eb96470@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
11: calif7a7d5ef8db@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 4
> ip r
default via 10.158.192.10 dev eth0  proto static 
10.42.39.64/26 via 10.42.39.64 dev vxlan.calico onlink 
10.42.43.1 dev cali51bfbe3480a  scope link 
10.42.43.2 dev cali066c9d0d2ec  scope link 
10.42.43.3 dev calibf3a63224a5  scope link 
10.42.43.4 dev caliaa03eb96470  scope link 
10.42.43.5 dev calif7a7d5ef8db  scope link 
10.158.192.0/23 dev eth0  proto kernel  scope link  src 10.158.192.106 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 

Analysis

POD to POD Test

NAME          READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
hello-cj82k   2/2     Running   0          56m   10.42.43.5    10.158.192.106   <none>           <none>
hello-r85cc   2/2     Running   0          56m   10.42.39.65   10.158.84.15     <none>           <none>

hello-cj82k >> 10.42.43.5 [10.158.192.106  >> 10.158.192.106]: GOOD
hello-cj82k >> 10.42.39.65 [10.158.192.106  >> 10.158.84.15]: NOT_WORKING
hello-r85cc >> 10.42.43.5 [10.158.84.15  >> 10.158.192.106]: NOT_WORKING
hello-r85cc >> 10.42.39.65 [10.158.84.15  >> 10.158.84.15]: GOOD

tcpdump for previous test

# node2
sudo tcpdump -v -n -i any port 4789
21:00:33.693658 IP (tos 0x0, ttl 64, id 12605, offset 0, flags [none], proto UDP (17), length 134)
    172.20.137.2.57746 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 64826, offset 0, flags [DF], proto ICMP (1), length 84)
    10.42.39.65 > 10.42.43.5: ICMP echo request, id 13568, seq 0, length 64
21:00:41.957075 IP (tos 0x0, ttl 64, id 13440, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.25072 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 4959, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.24896 > 10.42.43.2.4443: Flags [S], cksum 0x80c1 (correct), seq 3869876183, win 64240, options [mss 1460,sackOK,TS val 1042558583 ecr 0,nop,wscale 7], length 0
21:00:41.957615 IP (tos 0x0, ttl 64, id 13441, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.37471 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 52122, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.53379 > 10.42.43.2.4443: Flags [S], cksum 0x35ac (correct), seq 3934746570, win 64240, options [mss 1460,sackOK,TS val 1042558584 ecr 0,nop,wscale 7], length 0
21:00:41.957622 IP (tos 0x0, ttl 64, id 13442, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.1753 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 55720, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.59519 > 10.42.43.2.4443: Flags [S], cksum 0x2185 (correct), seq 3304626564, win 64240, options [mss 1460,sackOK,TS val 1042558584 ecr 0,nop,wscale 7], length 0
21:00:41.957816 IP (tos 0x0, ttl 64, id 13443, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.29715 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 16461, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.61275 > 10.42.43.2.4443: Flags [S], cksum 0x9f5c (correct), seq 3114344488, win 64240, options [mss 1460,sackOK,TS val 1042558584 ecr 0,nop,wscale 7], length 0
21:00:41.957940 IP (tos 0x0, ttl 64, id 13444, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.55992 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 20034, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.64007 > 10.42.43.2.4443: Flags [S], cksum 0x89de (correct), seq 3678800213, win 64240, options [mss 1460,sackOK,TS val 1042558584 ecr 0,nop,wscale 7], length 0
21:00:42.976144 IP (tos 0x0, ttl 64, id 13445, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.60937 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 55721, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.59519 > 10.42.43.2.4443: Flags [S], cksum 0x1d8b (correct), seq 3304626564, win 64240, options [mss 1460,sackOK,TS val 1042559602 ecr 0,nop,wscale 7], length 0
21:00:42.976170 IP (tos 0x0, ttl 64, id 13498, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.43210 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 16462, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.61275 > 10.42.43.2.4443: Flags [S], cksum 0x9b61 (correct), seq 3114344488, win 64240, options [mss 1460,sackOK,TS val 1042559603 ecr 0,nop,wscale 7], length 0
21:00:42.976195 IP (tos 0x0, ttl 64, id 13497, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.43612 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 20035, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.64007 > 10.42.43.2.4443: Flags [S], cksum 0x85e4 (correct), seq 3678800213, win 64240, options [mss 1460,sackOK,TS val 1042559602 ecr 0,nop,wscale 7], length 0
21:00:42.976224 IP (tos 0x0, ttl 64, id 13499, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.49919 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 52123, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.53379 > 10.42.43.2.4443: Flags [S], cksum 0x31b1 (correct), seq 3934746570, win 64240, options [mss 1460,sackOK,TS val 1042559603 ecr 0,nop,wscale 7], length 0
21:00:42.976230 IP (tos 0x0, ttl 64, id 13500, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.55217 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 4960, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.24896 > 10.42.43.2.4443: Flags [S], cksum 0x7cc5 (correct), seq 3869876183, win 64240, options [mss 1460,sackOK,TS val 1042559603 ecr 0,nop,wscale 7], length 0
21:00:44.992174 IP (tos 0x0, ttl 64, id 13615, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.33654 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 52124, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.53379 > 10.42.43.2.4443: Flags [S], cksum 0x29d2 (correct), seq 3934746570, win 64240, options [mss 1460,sackOK,TS val 1042561618 ecr 0,nop,wscale 7], length 0
21:00:44.992175 IP (tos 0x0, ttl 64, id 13614, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.42669 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 4961, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.24896 > 10.42.43.2.4443: Flags [S], cksum 0x74e6 (correct), seq 3869876183, win 64240, options [mss 1460,sackOK,TS val 1042561618 ecr 0,nop,wscale 7], length 0
21:00:44.992262 IP (tos 0x0, ttl 64, id 13616, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.37074 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 16463, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.61275 > 10.42.43.2.4443: Flags [S], cksum 0x9381 (correct), seq 3114344488, win 64240, options [mss 1460,sackOK,TS val 1042561619 ecr 0,nop,wscale 7], length 0
21:00:44.992269 IP (tos 0x0, ttl 64, id 13617, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.58950 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 55722, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.59519 > 10.42.43.2.4443: Flags [S], cksum 0x15aa (correct), seq 3304626564, win 64240, options [mss 1460,sackOK,TS val 1042561619 ecr 0,nop,wscale 7], length 0
21:00:44.992285 IP (tos 0x0, ttl 64, id 13618, offset 0, flags [none], proto UDP (17), length 110)
    172.20.137.2.54423 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 20036, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.64007 > 10.42.43.2.4443: Flags [S], cksum 0x7e03 (correct), seq 3678800213, win 64240, options [mss 1460,sackOK,TS val 1042561619 ecr 0,nop,wscale 7], length 0
^C
46 packets captured
56 packets received by filter
2 packets dropped by kernel

#node1
sudo tcpdump -v -n -i any port 4789
21:00:27.452314 IP (tos 0x0, ttl 64, id 53396, offset 0, flags [none], proto UDP (17), length 134)
    10.158.192.106.46499 > 172.20.137.2.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 818, offset 0, flags [DF], proto ICMP (1), length 84)
    10.42.43.5 > 10.42.39.65: ICMP echo request, id 19968, seq 0, length 64
21:00:33.696577 IP (tos 0x0, ttl 59, id 12605, offset 0, flags [none], proto UDP (17), length 134)
    10.158.84.15.9990 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 64826, offset 0, flags [DF], proto ICMP (1), length 84)
    10.42.39.65 > 10.42.43.5: ICMP echo request, id 13568, seq 0, length 64
21:00:41.956965 IP (tos 0x0, ttl 59, id 13440, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.16675 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 4959, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.24896 > 10.42.43.2.4443: Flags [S], cksum 0x80c1 (correct), seq 3869876183, win 64240, options [mss 1460,sackOK,TS val 1042558583 ecr 0,nop,wscale 7], length 0
21:00:41.957036 IP (tos 0x0, ttl 59, id 13442, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.9211 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 55720, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.59519 > 10.42.43.2.4443: Flags [S], cksum 0x2185 (correct), seq 3304626564, win 64240, options [mss 1460,sackOK,TS val 1042558584 ecr 0,nop,wscale 7], length 0
21:00:41.957055 IP (tos 0x0, ttl 59, id 13441, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.28513 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 52122, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.53379 > 10.42.43.2.4443: Flags [S], cksum 0x35ac (correct), seq 3934746570, win 64240, options [mss 1460,sackOK,TS val 1042558584 ecr 0,nop,wscale 7], length 0
21:00:41.957131 IP (tos 0x0, ttl 59, id 13444, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.20023 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 20034, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.64007 > 10.42.43.2.4443: Flags [S], cksum 0x89de (correct), seq 3678800213, win 64240, options [mss 1460,sackOK,TS val 1042558584 ecr 0,nop,wscale 7], length 0
21:00:41.957325 IP (tos 0x0, ttl 59, id 13443, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.19783 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 16461, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.61275 > 10.42.43.2.4443: Flags [S], cksum 0x9f5c (correct), seq 3114344488, win 64240, options [mss 1460,sackOK,TS val 1042558584 ecr 0,nop,wscale 7], length 0
21:00:42.975254 IP (tos 0x0, ttl 59, id 13498, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.55979 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 16462, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.61275 > 10.42.43.2.4443: Flags [S], cksum 0x9b61 (correct), seq 3114344488, win 64240, options [mss 1460,sackOK,TS val 1042559603 ecr 0,nop,wscale 7], length 0
21:00:42.975403 IP (tos 0x0, ttl 59, id 13500, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.54059 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 4960, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.24896 > 10.42.43.2.4443: Flags [S], cksum 0x7cc5 (correct), seq 3869876183, win 64240, options [mss 1460,sackOK,TS val 1042559603 ecr 0,nop,wscale 7], length 0
21:00:42.975415 IP (tos 0x0, ttl 59, id 13445, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.9051 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 55721, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.59519 > 10.42.43.2.4443: Flags [S], cksum 0x1d8b (correct), seq 3304626564, win 64240, options [mss 1460,sackOK,TS val 1042559602 ecr 0,nop,wscale 7], length 0
21:00:42.975422 IP (tos 0x0, ttl 59, id 13499, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.43789 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 52123, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.53379 > 10.42.43.2.4443: Flags [S], cksum 0x31b1 (correct), seq 3934746570, win 64240, options [mss 1460,sackOK,TS val 1042559603 ecr 0,nop,wscale 7], length 0
21:00:42.975428 IP (tos 0x0, ttl 59, id 13497, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.29861 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 20035, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.64007 > 10.42.43.2.4443: Flags [S], cksum 0x85e4 (correct), seq 3678800213, win 64240, options [mss 1460,sackOK,TS val 1042559602 ecr 0,nop,wscale 7], length 0
21:00:44.991229 IP (tos 0x0, ttl 59, id 13614, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.27707 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 4961, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.24896 > 10.42.43.2.4443: Flags [S], cksum 0x74e6 (correct), seq 3869876183, win 64240, options [mss 1460,sackOK,TS val 1042561618 ecr 0,nop,wscale 7], length 0
21:00:44.991306 IP (tos 0x0, ttl 59, id 13618, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.40853 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 20036, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.64007 > 10.42.43.2.4443: Flags [S], cksum 0x7e03 (correct), seq 3678800213, win 64240, options [mss 1460,sackOK,TS val 1042561619 ecr 0,nop,wscale 7], length 0
21:00:44.991623 IP (tos 0x0, ttl 59, id 13615, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.23041 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 52124, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.53379 > 10.42.43.2.4443: Flags [S], cksum 0x29d2 (correct), seq 3934746570, win 64240, options [mss 1460,sackOK,TS val 1042561618 ecr 0,nop,wscale 7], length 0
21:00:44.991752 IP (tos 0x0, ttl 59, id 13617, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.58719 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 55722, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.59519 > 10.42.43.2.4443: Flags [S], cksum 0x15aa (correct), seq 3304626564, win 64240, options [mss 1460,sackOK,TS val 1042561619 ecr 0,nop,wscale 7], length 0
21:00:44.991777 IP (tos 0x0, ttl 59, id 13616, offset 0, flags [none], proto UDP (17), length 110)
    10.158.84.15.29753 > 10.158.192.106.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 16463, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.39.64.61275 > 10.42.43.2.4443: Flags [S], cksum 0x9381 (correct), seq 3114344488, win 64240, options [mss 1460,sackOK,TS val 1042561619 ecr 0,nop,wscale 7], length 0
^C
47 packets captured
47 packets received by filter
0 packets dropped by kernel

Warnings from calico node pods

k get pods -n kube-system -lk8s-app=calico-node -o wide
NAME                READY   STATUS    RESTARTS   AGE   IP               NODE             NOMINATED NODE   READINESS GATES
calico-node-pw7l9   1/1     Running   0          71m   10.158.84.15     10.158.84.15     <none>           <none>
calico-node-r5hx5   1/1     Running   0          65m   10.158.192.106   10.158.192.106   <none>           <none>

$ k logs -n kube-system calico-node-pw7l9 | grep -v INFO
CALICO_NETWORKING_BACKEND is vxlan - no need to run a BGP daemon
Calico node started successfully

2021-12-09 19:54:40.601 [ERROR][55] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 19:54:40.934 [ERROR][55] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 19:54:40.935 [ERROR][55] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 19:54:40.965 [ERROR][55] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 19:54:41.070 [ERROR][55] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 20:02:40.042 [WARNING][55] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="calia78822465be"
2021-12-09 20:02:41.727 [WARNING][55] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="calia78822465be"

$ k logs -n kube-system calico-node-r5hx5 | grep -v INFO
CALICO_NETWORKING_BACKEND is vxlan - no need to run a BGP daemon
Calico node started successfully

2021-12-09 20:01:15.175 [WARNING][50] felix/int_dataplane.go 444: Can't enable XDP acceleration. error=kernel is too old (have: 4.15.0-159 but want at least: 4.16.0)
2021-12-09 20:01:15.432 [ERROR][50] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 20:01:15.622 [ERROR][50] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 20:01:15.683 [ERROR][50] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 20:01:15.787 [ERROR][50] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 20:01:15.894 [ERROR][50] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 20:01:15.995 [ERROR][50] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 20:01:16.101 [ERROR][50] felix/route_table.go 920: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4
2021-12-09 20:01:17.769 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="cali51bfbe3480a"
2021-12-09 20:01:18.082 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="cali066c9d0d2ec"
2021-12-09 20:01:18.373 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="calibf3a63224a5"
2021-12-09 20:01:23.404 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="calibf3a63224a5"
2021-12-09 20:01:23.450 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="cali066c9d0d2ec"
2021-12-09 20:01:23.803 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="caliaa03eb96470"
2021-12-09 20:01:24.477 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="cali51bfbe3480a"
2021-12-09 20:01:29.736 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="caliaa03eb96470"
2021-12-09 20:01:30.807 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="cali066c9d0d2ec"
2021-12-09 20:01:31.432 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="calibf3a63224a5"
2021-12-09 20:02:40.345 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="calif7a7d5ef8db"
2021-12-09 20:02:42.065 [WARNING][50] felix/endpoint_mgr.go 1085: Could not set accept_ra: <nil> ifaceName="calif7a7d5ef8db"

observations

  1. node1 -> node2: packets never leave node1
  2. node2 -> node1: packets reach node2 and node2 never respondes
song-jiang commented 2 years ago

@chaudhryfaisal Thanks for the details and it is very helpful for us to understand the issue.

I think this is working as expected. Calico vxlan expects node to node traffic without NAT. It will drop any vxlan packet if the source is not known to Calico. This is a security feature.

chaudhryfaisal commented 2 years ago

Yes it seems Calico vxlan or ipip+bgp expects node to node traffic without NAT. I was hoping there is some configuration available to accomplish this

caseydavenport commented 2 years ago

What would an enhancement to support this look like?