smartxworks / virtink

Lightweight Virtualization Add-on for Kubernetes
Apache License 2.0
492 stars 41 forks source link

VM network isn't reacheable with flannel CNI #40

Closed kelvich closed 2 years ago

kelvich commented 2 years ago

I've tried to get Virtink running on stock k3s and can't connect to VM:

curl -sfL https://get.k3s.io | K3S_KUBECONFIG_MODE="644" sh -
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml
kubectl apply -f https://github.com/smartxworks/virtink/releases/download/v0.10.0/virtink.yaml
cat <<EOF | kubectl apply -f -
apiVersion: virt.virtink.smartx.com/v1alpha1
kind: VirtualMachine
metadata:
  name: ubuntu-container-rootfs
spec:
  instance:
    memory:
      size: 1Gi
    kernel:
      image: smartxworks/virtink-kernel-5.15.12
      cmdline: "console=ttyS0 root=/dev/vda rw"
    disks:
      - name: ubuntu
      - name: cloud-init
    interfaces:
      - name: pod
  volumes:
    - name: ubuntu
      containerRootfs:
        image: smartxworks/virtink-container-rootfs-ubuntu
        size: 4Gi
    - name: cloud-init
      cloudInit:
        userData: |-
          #cloud-config
          password: password
          chpasswd: { expire: False }
          ssh_pwauth: True
  networks:
    - name: pod
      pod: {}
EOF
export VM_NAME=ubuntu-container-rootfs
export VM_POD_NAME=$(kubectl get vm $VM_NAME -o jsonpath='{.status.vmPodName}')
export VM_IP=$(kubectl get pod $VM_POD_NAME -o jsonpath='{.status.podIP}')
kubectl run ssh-$VM_NAME --rm --image=alpine --restart=Never -it -- /bin/sh -c "apk add openssh-client && ssh ubuntu@$VM_IP"
If you don't see a command prompt, try pressing enter.
ssh: connect to host 10.42.0.14 port 22: Host is unreachable

Switching to Calico helps (saw that in e2e tests).

scuzhanglei commented 2 years ago

@kelvich thanks for you report, we are looking into this issue. there is a workaround, as v0.10.0 released last week. you can use VM with masquerade binding method, I have tested it with flannel and works well.

kelvich commented 2 years ago

@scuzhanglei Thank you, I can confirm that masquarade works for me with the k3s+flannel.

As a nice side effect, it also allows passing migration is disabled when VM has a bridged interface to the pod network check for creating a migration.

scuzhanglei commented 2 years ago

@kelvich I found set flannel hairpinMode: false will work. You can configure it by update kube-flannel/kube-flannel-cfg configmap and restart all flannel pods. This is a workarround, we will try to find a better solution.

carezkh commented 2 years ago

It will affect the MAC learning table on bridge in pod virt-prerunner when using CNI flannel with flag hairpinMode: true.

The network model when using Virtink bridge network and CNI flannel likes below.

[flannel bridge cni0]<--->[veth-xx]<--->[eth0-nic]<--->[bridge br-eth0]<--->[tap-eth0]<--->[eth0]
^-------host------------------------^ ^---------------pod------------------------------^ ^--vm--^

During the VM startup, we use cloud-int to config VM IP addr (in cloud-init step init-network). VM send a DHCP request broadcast packet, it will be received by DHCP server listening in br-eth0. As we known, when a Linux bridge receives a packet with a new source MAC address from a particular bridge port, it stores the MAC address along with the port number in its MAC learning table. At this point, the MAC table of br-eth0 contains an entry vm-eth0-MAC via tap-eth0. The br-eth0 will also forward DHCP broadcast packet to eth0-nic, the flannel bridge cni0 will receive the broadcast packet and send it back to veth-xx, because the hairpin mode on port veth-xx is trun on. At this point, br-eth0 receives the DHCP broadcast packet again and updates MAC table entry vm-eth0-MAC via eth0-nic. The DHCP server reply will be sent through port eth0-nic, never received by the DHCP client in VM.

Currently you can disable CNI flannel hairpin mode, but this will cause the VM to be unable to access the services pointing to itself. For example, start a nginx VM and create a service for it, you can not access the service in nginx VM.

You can also drop the DHCP broadcast packets forwarded by br-eth0, but this will not solve the problem completely, because the ARP learn broadcast will also affect the br-eth0 MAC learning table, and these packages cannot be discarded.

Maybe the reasonable solution could be disable the MAC learning on br-eth0 in port eth0-nic. Because there are only two ports on bridge br-eth0 (eth0-nic and tap-eth0), and hairpin mode in br-eth0 is off, packets received from one port must be sent to another, it doesn't matter to disable the MAC learning on bridge.