cloneset-controller stuck in reconcile to wait ScaleExpectations statisfied.
What you expected to happen:
cloneset-controller will never stuck, and continue to reconcile when ScaleExpectations timeout.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
t1: client-go@kruise-manager watch rv=1
t2: cloneset-controller@kruise-manager create pod A, rv=100, and expectation pod A Create Event using scaleExpectations.ExpectScale(podA)
t3: tons of watch Event coming, now, APIServer watch cache ranage be [rv=100, rv=1000]
t4: pod A deleted by others from etcd
t5: watch connection with APIServer randomly broken, client-go@kruise-manager re-watch with rv=1 and recieved "too old resource versione" becuase of slow watch Event handling(maybe Event produced too fast or Event consumed too slow), then re-list Pods
t6: after re-list Pods, cloneset-controller@kruise-manager will never recieved pod A Create&Delete Event becuase pod A was deleted at t3
t7: cloneset-controller@kruise-manager will stuck forever util restart
How to fix
if scaleSatisfied, unsatisfiedDuration, scaleDirtyPods := clonesetutils.ScaleExpectations.SatisfiedExpectations(request.String()); !scaleSatisfied {
if unsatisfiedDuration >= expectations.ExpectationTimeout {
klog.Warningf("Expectation unsatisfied overtime for %v, scaleDirtyPods=%v, overtime=%v", request.String(), scaleDirtyPods, unsatisfiedDuration)
// should delete expectation when timeout
clonesetutils.ScaleExpectations.DeleteExpectations(request.String())
return reconcile.Result{RequeueAfter: 10* time.Seconds}, nil
}
klog.V(4).Infof("Not satisfied scale for %v, scaleDirtyPods=%v", request.String(), scaleDirtyPods)
return reconcile.Result{RequeueAfter: expectations.ExpectationTimeout - unsatisfiedDuration}, nil
}
What happened:
cloneset-controller stuck in reconcile to wait ScaleExpectations statisfied.
What you expected to happen:
cloneset-controller will never stuck, and continue to reconcile when ScaleExpectations timeout.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
How to fix
Environment:
kubectl version
):