Pod Checker occasionally deletes vPods unexpectedly

wondywang commented 2 years ago

What steps did you take and what happened:

When there are a large number of VCs in the cluster, vPods are unexpectedly deleted by the Pod Checker accidentally. The logic that triggers deletion is in https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/main/virtualcluster/pkg/syncer/resources/pod/checker.go#L313.

func (c *controller) PatrollerDo() {
    // step1: get pList
    pList, err := c.podLister.List(util.GetSuperClusterListerLabelsSelector())

    for _, cluster := range clusterNames {
        // step2: get vList
        vList := &corev1.PodList{}
        if err := c.MultiClusterController.List(cluster, vList); err != nil {
            // ...
        }
    }
}

Here, we get the pList and vList cache in two steps with different time. As the number of clusters become more, there will be a greater difference. Some pPods will not be visible when Pods created during Checker run. This will trigger the pod force delete logic.

    // pPod not found and vPod still exists, the pPod may be deleted manually or by controller pod eviction.
    // If the vPod has not been bound yet, we can create pPod again.
    // If the vPod has been bound, we'd better delete the vPod since the new pPod may have a different nodename.
    if isPodScheduled(vPod) {
        c.forceDeleteVPod(vObj.GetOwnerCluster(), vPod, false)
        return
    }

Meanwhile, Pod DWS syncer will trigger deletion of pPod.

What did you expect to happen:

When finding that a vPod has been bound without pPod, we should double check whether the pPod exists before the pod force deletion .

/kind bug

wondywang commented 2 years ago

/assign @Fei-Guo /assign @christopherhein

Fei-Guo commented 2 years ago

Thanks for catching this! I think the double check of pPod seems to be a good fix.

kubernetes-retired / cluster-api-provider-nested

Pod Checker occasionally deletes vPods unexpectedly #308