Epic: Scheduler-triggered migration informed by CPU/memory/disk metrics

sharnoff commented 10 months ago

Motivation

With our desire to move towards overcommitting (#517), and the recent change to put the file cache on disk (neondatabase/cloud#7516), we run higher risks of k8s node degradations because of those resources actually getting used.

Currently, the scheduler plugin:

Only triggers live migration when reserved resources on a node go above a threshold
Scores nodes only based on reserved resources
Picks which VMs to migrate only based on 1-minute load average

This leaves us over-exposed to risks of failures from disk usage, among other things. When we start overcommitting, we will similarly be at risk for possible node-level OOMs, or CPU starvation (which will not affect each VM equally, because we don't have proper CPU requests).

DoD

The scheduler plugin takes live node and/or pod metrics into account when scheduling, when deciding to trigger migration, and when picking migration targets.

Implementation ideas

It seems like k8s.io/metrics is the package to use.

Design work is required to come up with the algorithms to use for node scoring, etc. Migration targets are especially tricky, because we basically have a trolley problem with noisy tenants — either we migrate them, which will likely take a long time because they're noisy, or we migrate many other VMs.

Tasks

- [ ] ...
- [ ] List tasks as they're created for this Epic

Other related tasks, Epics, and links

355
576

sharnoff commented 9 months ago

Some links from slack:

Omrigan commented 6 months ago

neondatabase / autoscaling