When the apiservers/clusters are unavailable for a longer period of time there can be a race between yawollet and the loadbalancermachine_status_controller, which wants to delete stale machines. A race which the yawollet is likely to lose, since its backoff is very high. (The controller is likely to be running again sooner, because it will be stuck in trying to get/renew its leader lease, which happens more frequently).
This PR fixes this by introducing a grace period, before the LBM is actually deleted. So if the deletion conditions are met (shouldMachineBeDeleted) we now set a condition, and only if that condition has passed the grace period (and the machine is still not ready), we actually delete it.
At the same time, this caps the exponential error backoff of the yawollet to match the reconciliation period in the "happy path".
When the apiservers/clusters are unavailable for a longer period of time there can be a race between yawollet and the
loadbalancermachine_status_controller
, which wants to delete stale machines. A race which the yawollet is likely to lose, since its backoff is very high. (The controller is likely to be running again sooner, because it will be stuck in trying to get/renew its leader lease, which happens more frequently).This PR fixes this by introducing a grace period, before the LBM is actually deleted. So if the deletion conditions are met (
shouldMachineBeDeleted
) we now set a condition, and only if that condition has passed the grace period (and the machine is still not ready), we actually delete it.At the same time, this caps the exponential error backoff of the yawollet to match the reconciliation period in the "happy path".