Open sharnoff opened 8 months ago
Latency metrics measured by pkg/agent/core.State
would have allowed us to notice the effects of #614 (i.e. core.State
believed there was a super long-running request).
Status: waiting on @sharnoff and @stradig to review the internal RFC.
Motivation
Two reasons:
DoD
We should have histogram metrics recording:
Implementation ideas
AFAICT the basic idea is that we store some extra info in
agent/core.State
and add some extra callbacks inagent/core.Config
to increment some metrics when we determine that various parts of scaling (and the entire thing) have occurred.More design work is required, because the edge cases are quite subtle.
Tasks
Other related tasks, Epics, and links