k8snetworkplumbingwg / whereabouts

A CNI IPAM plugin that assigns IP addresses cluster-wide
Apache License 2.0
282 stars 124 forks source link

Rechecking pending Pods (conflict resolved) #375

Closed nicklesimba closed 1 year ago

nicklesimba commented 1 year ago

This fix resolves the issue where, after a forceful node reboot, force deleting a pod in a stateful set causes the pod to be recreated and remain indefinitely in the Pending state.

This is a rebase of https://github.com/k8snetworkplumbingwg/whereabouts/pull/195

maiqueb commented 1 year ago

Maybe we should re-write the reconciler (which is cron-triggered) to use informers instead of this retry when the pod is pending.

We would be able to re-queue that pod liveness check to a later date.

Looking at the code, I'm especially worried about the blocking wait until the pod is no longer in pending state.

EDIT: another thing that could be helpful is to use pagination when listing the resources. Of course this would only make sense if we manage to correlate the number of the returned API results to the OOM kills.

nicklesimba commented 1 year ago

Looking at the code, I'm especially worried about the blocking wait until the pod is no longer in pending state.

@maiqueb can you point out where a blocking wait is happening? AFAIU, there's only a wait for 500ms at a time, and only three retries can happen, totalling 1.5s of waiting per pod. To me this seems very reasonable even if the issue were to crop up in several pods.

nicklesimba commented 1 year ago

Fixed a unit test (a test that was previously expected to fail should now pass) with latest force push. It was a bit too small of a change to keep as its own commit so I squashed it down.

maiqueb commented 1 year ago

Looking at the code, I'm especially worried about the blocking wait until the pod is no longer in pending state.

@maiqueb can you point out where a blocking wait is happening? AFAIU, there's only a wait for 500ms at a time, and only three retries can happen, totalling 1.5s of waiting per pod. To me this seems very reasonable even if the issue were to crop up in several pods.

here: https://github.com/k8snetworkplumbingwg/whereabouts/pull/375/files#diff-8a16f9c8d1f2a9d01692f7cf9b2ee6a6ceceef840a9daaf4ee7e7e173aaf7ebfR133

It's a sleep: we block the thread for that duration.

Trying to say we should try to re-queue the request, and check if in the next iteration the pod we read is no longer in pending state.

Is there something preventing this approach ?

nicklesimba commented 1 year ago

To summarize some discussion: we have decided to proceed with merging this for now, and to track Miguel's suggested implementation as a separate task. The downside of the current implementation is that having a lot of pending pods at the same time will cause the reconcile cycle to take a long time. However, the current implementation still solves pods stuck in pending state, and is overall better than not having a fix.

To do things the "proper" way, we will need to keep a list of the pending pods in the reconcile looper struct, and retry for them. This would also need to be integrated with the ip-control-loop to sync retries.