neondatabase / autoscaling

Postgres vertical autoscaling in k8s
Apache License 2.0
151 stars 21 forks source link

scheduler plugin should have metrics for time events spend waiting in the queue #862

Closed sharnoff closed 4 months ago

sharnoff commented 6 months ago

Problem description / Motivation

We don't have this internal queue in the scheduler that's critical to correct functioning, but no observability into whether the queue is actually healthy.

Feature idea(s) / DoD

Scheduler plugin should have metrics exposed for queue wait duration, maybe similar to how neonvm-controller does it.

Omrigan commented 4 months ago

@sharnoff added new metrics to https://neonprod.grafana.net/d/adbt34laf8ni8f/neon-autoscaling-scheduler-plugin?from=1713951126014&to=1713961926014&var-environment=prod&var-datasource=ZNX49CDVz&var-cluster=All&var-node_group=main_m6id_metal&var-node=All&var-autoscaler_agent_pod_suffixes=All&orgId=1&refresh=1m, please review

sharnoff commented 4 months ago

Looks good! Had a couple quick thoughts, but not enough to block closing this issue. Can discuss next week