k8snetworkplumbingwg / ovs-cni

Open vSwitch CNI plugin
Apache License 2.0
218 stars 70 forks source link

debug the network connectivity issue between nodes #275

Closed zeddit closed 1 year ago

zeddit commented 1 year ago

I am trying multus with ovs-cni with the following configuration, but the pod cannot ping successfully to pods on other nodes.

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ovs-net-2-vlan
  annotations:
    k8s.v1.cni.cncf.io/resourceName: ovs-cni.network.kubevirt.io/br1
spec:
  config: '{
      "cniVersion": "0.4.0",
      "type": "ovs",
      "bridge": "br1",
      "ipam": {
        "type": "static"
      }
  }'
---
apiVersion: v1
kind: Pod
metadata:
  name: p
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
        {
          "name":"ovs-net-2-vlan",
          "ips": ["10.0.31.11/24"]
        }
]'
spec:
  containers:
  - name: samplepod
    command: ["sleep", "99999"]
    image: alpine
  nodeSelector:
    kubernetes.io/hostname: test-master-cpu01
---
apiVersion: v1
kind: Pod
metadata:
  name: q
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
        {
          "name":"ovs-net-2-vlan",
          "ips": ["10.0.31.12/24"]
        }
]'
spec:
  containers:
  - name: samplepod
    command: ["sleep", "99999"]
    image: alpine
  nodeSelector:
    kubernetes.io/hostname: test-kworker1-cpu01
    #kubernetes.io/hostname: test-master-cpu01

however, when these two pods are on the same host. it can ping to each other.. I don't know how to debug the problem.

my ovs installation step and configuration is listed below.

# step 1. prepare the ovs environment on the host. (ubuntu 20.04)
apt install openvswitch-switch #     ovs_version: "2.13.8"
systemctl start openvswitch-switch

ovs-vsctl add-br br1
ovs-vsctl add-port br1 ens160
ip addr flush ens160
# modify netplan to use br
netplan apply
# after that, my host could connect to the network outside, and I am deploying a pod on the host, who could also connect with the world.

# step 2. install ovs-cni
kubectl apply -f https://github.com/kubevirt/cluster-network-addons-operator/releases/download/v0.89.0/namespace.yaml
kubectl apply -f https://github.com/kubevirt/cluster-network-addons-operator/releases/download/v0.89.0/network-addons-config.crd.yaml
kubectl apply -f https://github.com/kubevirt/cluster-network-addons-operator/releases/download/v0.89.0/operator.yaml

# apply
apiVersion: networkaddonsoperator.network.kubevirt.io/v1
kind: NetworkAddonsConfig
metadata:
  name: cluster
spec:
  ovs: {}

the outside network configuration: My network between nodes is configured with vlan, but the vlan is added on esxi layer, my host is a vm on the esxi. so I think the vlan tag could be added automatically.

I don't know how to debug the issue. could you please give me some advice, great thanks.

besides, is there any slack channel or discord for me to ask questions.

zeddit commented 1 year ago

I went some more experiments with the ovs. I mainly test the problem with ovs. the experiments are shown below.

# on one host
ip netns add ovs
ip link add dev vm1 type veth peer name vm2
ip link set vm2 netns ovs
ovs-vsctl add-port br1 vm1
ip netns exec ovs ip addr add 10.0.14.1/24 dev vm2
ip netns exec ovs ip link set dev vm2 up
ip netns exec ovs bash
ping -I vm2 10.0.0.1

however, the packet cannot go outside, and when I am using tcpdump to capture the packet on vm2 interface, I cannot find even one icmp packet or arp packet.

it seems ovs do not forward arp packet, thus my interface cannot use l2 network.

截屏2023-08-18 00 30 25

I think the problem is with my ovs. but I don't know how to debug it and make it connect with the outside world. could you give me some advices. thanks a lot.

zeddit commented 1 year ago

now I am wondering why pods on the same host could talk to each other through ovs. I am surprising that these is no arp reply, but the pod could set the arp table. so I think there should be some controller who setup the arp table inside the pod.

截屏2023-08-18 00 53 42

Now I wonder to know how to setup the connection between host through ovs. what's the recommand settings and best practice. there lacks documentations and it's quite difficult for newbees, I could help with the documentation about setting up openvswitch on ubuntu.

zeddit commented 1 year ago

I figured it out. It's a problem with ESXi.