When the kube-ovn-controller is down and Pod is deleted, the lsp and IP crd still exist for a while when the kube-ovn-controller starts again. If a Pod is allocated to a same IP then we meet the conflict.
The lsp gc is removed from controller start because of it may take long time and prevent other Pod recover from incident where lots of Pods are deleted during kube-ovn-controller is down. And the ipam init now only recover information from Pod annotations so it will still allocated IP address for deleted Pod although these IPs may still used by lsp.
Steps To Reproduce
Create a PodA with a static IP
Scale the kube-ovn-controller down to 0
Delete the PodA
Scale the kube-ovn-controller up to 1
Create a PodB with a static IP that is same with PodA
Current Behavior
Pod can not running and kubectl ko nbctl show can see the conflict lsp:
port nginx.default
addresses: ["3a:11:8e:0b:1f:6a 10.16.0.10"]
port nginx5.default
addresses: ["96:db:65:72:ba:46 10.16.0.10"]
Kube-OVN Version
v1.13.0
Kubernetes Version
v1.29.2
Operation-system/Kernel Version
"Ubuntu 22.04.4 LTS" 6.5.0-1023-gcp
Description
When the kube-ovn-controller is down and Pod is deleted, the lsp and IP crd still exist for a while when the kube-ovn-controller starts again. If a Pod is allocated to a same IP then we meet the conflict.
The lsp gc is removed from controller start because of it may take long time and prevent other Pod recover from incident where lots of Pods are deleted during kube-ovn-controller is down. And the ipam init now only recover information from Pod annotations so it will still allocated IP address for deleted Pod although these IPs may still used by lsp.
Steps To Reproduce
Current Behavior
Pod can not running and
kubectl ko nbctl show
can see the conflict lsp:Expected Behavior
Pod can run normally