Closed sharnoff closed 4 months ago
@sharnoff added new metrics to https://neonprod.grafana.net/d/adbt34laf8ni8f/neon-autoscaling-scheduler-plugin?from=1713951126014&to=1713961926014&var-environment=prod&var-datasource=ZNX49CDVz&var-cluster=All&var-node_group=main_m6id_metal&var-node=All&var-autoscaler_agent_pod_suffixes=All&orgId=1&refresh=1m, please review
Looks good! Had a couple quick thoughts, but not enough to block closing this issue. Can discuss next week
Problem description / Motivation
We don't have this internal queue in the scheduler that's critical to correct functioning, but no observability into whether the queue is actually healthy.
Feature idea(s) / DoD
Scheduler plugin should have metrics exposed for queue wait duration, maybe similar to how neonvm-controller does it.