Load average can over-estimate demand for CPU

Background

The autoscaler-agent calculates the "goal CU" based on demand for CPU, using the guest kernel's 1-minute load average metric.

Load average represents an exponentially weighted moving average, updated every 5 seconds, based on the instantaneous number of running or runnable tasks at that moment in time — i.e., it's an average of the queue size.

For workloads that are spiky in their parallelism, this can result in dramatic over-estimations if we interpret it as "demand" for CPU time. If there's 4x as many tasks as CPUs, each task may contribute 4x as much as they should to our measure of "demand" (because fair scheduling would result in all tasks being in the queue for 4x as long).

In practice we believe this issue is quite rare (hence: why this is marked as "tech debt"), but it's still worth addressing.

For more on load average, refer to:

Example of a user hitting this: https://neondb.slack.com/archives/C03TN5G758R/p1728409813336859

Implementation ideas

Don't use load average...?

It'd still be useful to get a measure of how much demand for CPU there is, but load average clearly doesn't give us that (and unfortunately CPU time won't, either).

neondatabase / autoscaling

Load average can over-estimate demand for CPU #1114

Background

Implementation ideas