With our desire to move towards overcommitting (#517), and the recent change to put the file cache on disk (neondatabase/cloud#7516), we run higher risks of k8s node degradations because of those resources actually getting used.
Currently, the scheduler plugin:
Only triggers live migration when reserved resources on a node go above a threshold
Scores nodes only based on reserved resources
Picks which VMs to migrate only based on 1-minute load average
This leaves us over-exposed to risks of failures from disk usage, among other things. When we start overcommitting, we will similarly be at risk for possible node-level OOMs, or CPU starvation (which will not affect each VM equally, because we don't have proper CPU requests).
DoD
The scheduler plugin takes live node and/or pod metrics into account when scheduling, when deciding to trigger migration, and when picking migration targets.
Design work is required to come up with the algorithms to use for node scoring, etc. Migration targets are especially tricky, because we basically have a trolley problem with noisy tenants — either we migrate them, which will likely take a long time because they're noisy, or we migrate many other VMs.
Tasks
- [ ] ...
- [ ] List tasks as they're created for this Epic
Motivation
With our desire to move towards overcommitting (#517), and the recent change to put the file cache on disk (neondatabase/cloud#7516), we run higher risks of k8s node degradations because of those resources actually getting used.
Currently, the scheduler plugin:
This leaves us over-exposed to risks of failures from disk usage, among other things. When we start overcommitting, we will similarly be at risk for possible node-level OOMs, or CPU starvation (which will not affect each VM equally, because we don't have proper CPU requests).
DoD
The scheduler plugin takes live node and/or pod metrics into account when scheduling, when deciding to trigger migration, and when picking migration targets.
Implementation ideas
It seems like
k8s.io/metrics
is the package to use.Design work is required to come up with the algorithms to use for node scoring, etc. Migration targets are especially tricky, because we basically have a trolley problem with noisy tenants — either we migrate them, which will likely take a long time because they're noisy, or we migrate many other VMs.
Tasks
Other related tasks, Epics, and links
355
576