We should provide metrics for per-pod resource requests / VM usage

Rationale

There's a few reasons:

The values "VM usage, else pod resources.requests" are used by cluster-autoscaler, and it currently doesn't expose those metrics
The only similar thing we have is what the scheduler has reserved per-node for pods/VMs, which is further from the "ground truth" of what's actually used.
We don't have a way to see the resource usage of VMs
- See also: https://github.com/neondatabase/autoscaling/issues/375

Implementation ideas

I think all these problems can be solved in one go.

My tentative idea is to create a new, separately deployed single-instance-per-cluster component that will expose two metrics for every running pod in the cluster: pod_or_vm_cpu_requests and pod_or_vm_mem_requests (in bytes). Maybe the metrics should be prefixed by the component name, not sure.

These metrics will be defined for each running K8s pod as:

If the vm.neon.tech/usage annotation is defined, use the CPU and memory given there
- The VM controller sets this — see: https://github.com/neondatabase/autoscaling/pull/231
Else, use the sum of all containers' resources.requests, or zero if there are none.

This is what our patched cluster-autoscaler is making decisions with.

The actual implementation should be pretty easy — pkg/util/watch can be used, and the "add"/"update"/"delete" callbacks should be relatively simple as well.

Areas of future work

We might also want to have a separate metric for VM usage that's only present for VMs, so in grafana we can say "VM usage OR regular pod CPU usage" (where pod CPU usage is actually more like

I could equally see this being used to show migrations for each VM. Cluster-wide totals can be derived from the autoscaler-agent's billing metrics (because it tracks the counts for each VM phase), but we don't have anything available per-VM.

Prior issues, discussions:

cc @cicdteam, @arssher as people who may be interested in the outcome of this.

neondatabase / autoscaling