projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.99k stars 1.33k forks source link

Kubevirt hot migrate failed with fixed ip address in ipip mode. #8663

Closed GaoChX closed 4 months ago

GaoChX commented 7 months ago

Expected Behavior

When the virtual machines of kubevirt are being migrated, a virt-launcher pod with the same IP address is created on another node. And began the migration process. After the migration is completed, the old pod will change to the 'Completed' status.

Current Behavior

This functions normally in vxlan mode, but does not work properly in IPIP mode.

Possible Solution

The issue lies in this line of code: (https://github.com/GaoChX/calico/blob/71d6f8385a6272fc517e192fabc0898f7f565792/cni-plugin/pkg/dataplane/linux/dataplane_linux.go#L346)

In VXLAN mode, the remote route maintained by the other compute node is a subnet, but in IPIP mode, it is a /32 host route be maintained by bird. Therefore, adding a route in this manner can lead to conflicts, resulting in an error being returned.

Steps to Reproduce (for bugs)

  1. Set calico work in ipip mode.
  2. Create a kubevirt pod with fixed ip address.
  3. Ensure that there are more than two nodes available.
  4. Try to migrate.

Context

Here is error log:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "206aa4ad8adc7966a48670507ec39cda27b4a13703905d5ca6499d7d33e8cc61": plugin type="multus" name="multus-cni-network" failed (add): [xcloud-default/virt-launcher-gcx-test-lhqmh/548833b8-b463-44ab-82c3-82000adc737b:gcx-test-nad-pod-network-eth1]: error adding container to network "gcx-test-nad-pod-network-eth1": error adding host side routes for interface: cali8ba68918cba, error: route (Ifindex: 57, Dst: 100.66.0.2/32, Scope: link) already exists for an interface other than 'cali8ba68918cba': route (Ifindex: 8, Dst: 100.66.0.2/32, Scope: universe, Iface: tunl0)

I tried replacing RouteAdd with RouteReplace, and it worked very well.

Your Environment

caseydavenport commented 6 months ago

In VXLAN mode, the remote route maintained by the other compute node is a subnet, but in IPIP mode, it is a /32 host route be maintained by bird

The remote route maintained by BIRD should also be a subnet for the IPAM block that contains the route, not a /32.

The main times you should see /32 routes is when you have over-provisioned your IP pool (resulting in IP borrowing from other nodes) or if e.g., the IP pool itself has been deleted.

I think the first step is figuring out why you're seeing a /32 route advertised in this case instead of using the aggregated route.

tomastigera commented 4 months ago

@GaoChX do you have any more information? I am closing it now, but feel free to reopen if you made any relevant progress.