If you want to implement this feature, comment to let us know (we'll work with you on design, scheduling, etc.)
Issue details
Affected area/feature
Operator metrics
Add a metric that is similar to the stacks_failing
stacks_failing - a set of gauge time series, labelled by namespace, that gives the number of stacks currently failing (stack.status.lastUpdate.state is failed)
that tracks the stacks that are currently being reconciled.
Compared to controller_runtime_active_workers the new metric should be labeled with the stack name, thus allowing visualization for what metrics are currently being updated, and to add alerts if a stack takes too long to update.
Hello!
Issue details
Affected area/feature
Operator metrics
Add a metric that is similar to the stacks_failing
that tracks the stacks that are currently being reconciled.
Compared to
controller_runtime_active_workers
the new metric should be labeled with the stack name, thus allowing visualization for what metrics are currently being updated, and to add alerts if a stack takes too long to update.