Vertical autoscaling for a fleet of postgres instances running in a Kubernetes cluster.
For more on how Neon's Autoscaling works, check out https://neon.tech/docs/introduction/autoscaling.
Autoscaling is used internally within Neon, and makes some minor assumptions about Neon-specifics.
We do not officially support use of autoscaling externally — in other words, you're welcome to try it out yourself, submit bugs, fork the code, etc., but we make no guarantees about timely responses to issues from running locally.
For help from the community, check out our Discord: https://neon.tech/discord.
The deployment files and a vm-builder binary are attached to each release.
Check out Building and running below for local development.
We want to dynamically change the amount of CPUs and memory of running postgres instances, without breaking TCP connections to postgres.
This relatively easy when there's already spare resources on the physical (Kubernetes) node, but it takes careful coordination to move postgres instances from one node to another when the original node doesn't have the room.
We've tried a bunch of existing tools and settled on the following:
autoscaler-agent
pod that triggers scaling decisions and makes resource
requests to the K8s scheduler on the VMs' behalf to reserve additional resources for themvm-monitor
binary, which communicates to the autoscaler-agent so that it can
immediately respond to memory pressure by scaling up (among other things).Networking is preserved across migrations by giving each VM an additional IP address on a bridge network spanning the cluster with a flat topology; the L2 network figures out "by itself" where to send the packets after migration.
For more information, refer to ARCHITECTURE.md.
[!NOTE] NeonVM and Autoscaling are not expected to work outside Linux x86.
Build NeonVM Linux kernel (it takes time, can be run only once)
make kernel
Build docker images:
make docker-build
Start local cluster with kind
or k3d
:
make kind-setup # or make k3d-setup
Deploy NeonVM and Autoscaling components
make deploy
Build and load the test VM:
make pg16-disk-test
Start the test VM:
kubectl apply -f vm-deploy.yaml
Broadly, the run-bench.sh
script just exists to be expensive on CPU, so that more vCPU will be
allocated to the vm. You can run it with:
scripts/run-bench.sh
# or:
VM_NAME=postgres16-disk-test scripts/run-bench.sh
allocate-loop
To test on-demand memory reservation, the allocate-loop
binary is built into the test VM, and
can be used to slowly increasing memory allocations of arbitrary size. For example:
# After ssh-ing into the VM:
cgexec -g memory:neon-test allocate-loop 256 2280
#^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^ ^^^^
# run it in the neon-test cgroup ; use 256 <-> 2280 MiB
To run e2e tests you need to install dependencies:
You can either download them from their websites or install using Homebrew: brew install kubectl kind k3d kuttl
make kind-setup # or make k3d-setup, if you'd like to use k3d
make kernel
make deploy
make example-vms
make e2e