Currently the load on the scheduler is somewhat unusual: we have (usually) short (but uneven) lifetimes of computes, with varying external load producing regular usage spikes that
This load sometimes interacts with our node scoring algorithm to result in chaotic (in the mathematical sense) and cyclical fluctuations in reserved resources on the nodes. This has a single primary effect:
We fail to produce nodes with lower usage when the cluster has capacity to get rid of a node (meaning we remain overprovisioned)
In particular, this happens most visibly when a node is added due to external demand — sometimes it is removed after demand returns to normal, but sometimes another node's usage goes down instead (but not far enough to be removed).
To mitigate the issues above, the scheduler plugin should de-prioritize newer nodes - providing both a consistent ordering (preventing "swapping" usage between nodes) and explicitly prioritizing removal of nodes that are added to satisfy immediate demand (which will have fewer long-running computes).
Implementation ideas
From the slack thread linked above:
I'm imagining that the new node scoring algorithm should be the following (note scores are always 0 to 100).
If a node's usage is >85%
Score is 33 * (1 - (usage fraction - 0.85)) — i.e. higher usage is worse
Else, if it's one of the youngest ceil(20% of N) nodes:
Score is 33 + min(33, rank within youngest nodes) — i.e. younger (rank is a smaller number) is worse (overloaded terms; I intend that youngest is rank 1, second-youngest is rank 2, etc.)
Otherwise
Score is 66 + (usage fraction * 33) — i.e. higher usage is better
Problem description / Motivation
Currently the load on the scheduler is somewhat unusual: we have (usually) short (but uneven) lifetimes of computes, with varying external load producing regular usage spikes that
This load sometimes interacts with our node scoring algorithm to result in chaotic (in the mathematical sense) and cyclical fluctuations in reserved resources on the nodes. This has a single primary effect:
In particular, this happens most visibly when a node is added due to external demand — sometimes it is removed after demand returns to normal, but sometimes another node's usage goes down instead (but not far enough to be removed).
Here's a recent example:
Discussion here: https://neondb.slack.com/archives/C03TN5G758R/p1709660933447909
Feature idea(s) / DoD
To mitigate the issues above, the scheduler plugin should de-prioritize newer nodes - providing both a consistent ordering (preventing "swapping" usage between nodes) and explicitly prioritizing removal of nodes that are added to satisfy immediate demand (which will have fewer long-running computes).
Implementation ideas
From the slack thread linked above: