This seems counterintuitive, but: If there's issues where the scheduler plugin's internal state does not match the cluster (either because it's delayed, or consistent due to some bug), we can end up performing the correction action at the Filter step (i.e. allowing a Pod onto a node) but subsequently incorrectly rejecting it at the Reserve step.
When this kind of inconsistency happens, it's often made more severe by the fact that the scheduler framework does not take reserve failures into account when retrying scheduling something — if Filter and Reserve disagree, the Pod can get stuck repeatedly failing to be scheduled onto the same node, even if there's room elsewhere in the cluster.
In practice, we find we're much more likely to have Reserve failures because of bugs in our scheduler plugin (false positives) rather than racy resource acquisition (true positives).
Feature idea(s) / DoD
Change Reserve so that it cannot reject the pod.
Implementation ideas
This should be as simple as changing the value of a boolean passed to (*AutoscaleEnforcer).reserveResources()
This must be discussed internally before implementing — we should make sure that we can continue to detect issues that would have previously caused scheduling failures.
Problem description / Motivation
This seems counterintuitive, but: If there's issues where the scheduler plugin's internal state does not match the cluster (either because it's delayed, or consistent due to some bug), we can end up performing the correction action at the
Filter
step (i.e. allowing a Pod onto a node) but subsequently incorrectly rejecting it at theReserve
step.When this kind of inconsistency happens, it's often made more severe by the fact that the scheduler framework does not take reserve failures into account when retrying scheduling something — if
Filter
andReserve
disagree, the Pod can get stuck repeatedly failing to be scheduled onto the same node, even if there's room elsewhere in the cluster.In practice, we find we're much more likely to have
Reserve
failures because of bugs in our scheduler plugin (false positives) rather than racy resource acquisition (true positives).Feature idea(s) / DoD
Change
Reserve
so that it cannot reject the pod.Implementation ideas
This should be as simple as changing the value of a boolean passed to
(*AutoscaleEnforcer).reserveResources()
This must be discussed internally before implementing — we should make sure that we can continue to detect issues that would have previously caused scheduling failures.