agent/core: Use cached LFC for memory scaling signal

neondatabase / autoscaling

Postgres vertical autoscaling in k8s

Apache License 2.0

166 stars 21 forks source link

In short: In addition to scaling when there's a lot of memory used by postgres, we should also scale up to make sure that enough of the LFC is able to fit into the page cache alongside it.

To answer "how much is enough of the LFC", this PR takes the minimum of the estimated LFC working set size (from window size) and the cached memory (from the Cached field of /proc/meminfo, via vector's host metrics).

This PR also adds the memoryTotalFractionTarget field to the scaling config, serving a similar purpose to memoryUsageFractionTarget, but applying to the total of memory usage and cached data.

This PR is part of #1030 and must be deployed before the vm-monitor changes in order to make sure we don't regress performance for workloads that are both memory-heavy and rely on LFC being in the VM's page cache.

For more info, see: https://www.notion.so/neondatabase/0f75b15d47ad479094861302a99114af.

Notes for review:

This PR is broken into two commits -- the first is a refactor to make the second one easier to read.
Planning to test on staging with neondatabase/neon#8668 using some familiar workloads (in particular: LFC-aware scaling tests and pgvector index build)

Impacted Packages	Coverage Δ	:robot:
github.com/neondatabase/autoscaling/pkg/agent/core	68.76% (+0.93%)	:thumbsup:
github.com/neondatabase/autoscaling/pkg/api	3.06% (-0.10%)	:thumbsdown:

Impacted Packages

Coverage Δ

:robot:

github.com/neondatabase/autoscaling/pkg/agent/core

68.76% (+0.93%)

:thumbsup:

github.com/neondatabase/autoscaling/pkg/api

3.06% (-0.10%)

:thumbsdown:

neondatabase / autoscaling

agent/core: Use cached LFC for memory scaling signal #1031

Merging this branch changes the coverage (1 decrease, 1 increase)

HTML Report