kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.94k stars 442 forks source link

[BUG] IP conflict when kube-ovn-controller restart #4304

Open oilbeater opened 2 months ago

oilbeater commented 2 months ago

Kube-OVN Version

v1.13.0

Kubernetes Version

v1.29.2

Operation-system/Kernel Version

"Ubuntu 22.04.4 LTS" 6.5.0-1023-gcp

Description

When the kube-ovn-controller is down and Pod is deleted, the lsp and IP crd still exist for a while when the kube-ovn-controller starts again. If a Pod is allocated to a same IP then we meet the conflict.

The lsp gc is removed from controller start because of it may take long time and prevent other Pod recover from incident where lots of Pods are deleted during kube-ovn-controller is down. And the ipam init now only recover information from Pod annotations so it will still allocated IP address for deleted Pod although these IPs may still used by lsp.

Steps To Reproduce

  1. Create a PodA with a static IP
  2. Scale the kube-ovn-controller down to 0
  3. Delete the PodA
  4. Scale the kube-ovn-controller up to 1
  5. Create a PodB with a static IP that is same with PodA

Current Behavior

Pod can not running and kubectl ko nbctl show can see the conflict lsp:

    port nginx.default
        addresses: ["3a:11:8e:0b:1f:6a 10.16.0.10"]
    port nginx5.default
        addresses: ["96:db:65:72:ba:46 10.16.0.10"]

Expected Behavior

Pod can run normally

github-actions[bot] commented 3 weeks ago

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.