Description
The MetricPipeline supports already an input type runtime which emits metrics around the container and pod resource consumption. What is missing are further typical metrics:
from the apiserver about configured resource limits
Having these metrics available, basic troubleshooting for kubernetes workload including alerting can be fullfiled.
Goal
Provide a way to collect a typical set of metrics for basic workload troubleshooting (comparable to the metrics used by the dashboards provided by the kube-prometheus-stack)
Criterias
Typical metrics are collectable which are needed to troubleshoot
Pod compute resource
Node resource usage
Volume resource usage
Health of workloads (deployment stuck for example)
Namespace specific metrics can be enabled per namespace (probably independent from non-namespaces resources)
Node and Volume related metrics can be enabled optional to workload related metrics
[ ] Switch defaults to "on" for basic selectors, so that new pipelines expose them by default while existing ones stay where they are
Reasons
The current feature set is a good start but are missing apiserver related details like limits to get a complete picture for troubleshooting and defining relevant alerts. Furthermore typical workload health related metrics are missing from the apiserver. Also volumes and node statistics are important in daily operations.
Feature will be fully rolled out with version 1.27.0. Afterwards, the defaults get changed so that the sub-selectors are enabled by default for new clusters
Description The MetricPipeline supports already an input type
runtime
which emits metrics around the container and pod resource consumption. What is missing are further typical metrics:Having these metrics available, basic troubleshooting for kubernetes workload including alerting can be fullfiled.
Goal Provide a way to collect a typical set of metrics for basic workload troubleshooting (comparable to the metrics used by the dashboards provided by the kube-prometheus-stack)
Criterias
Actions
Reasons The current feature set is a good start but are missing apiserver related details like limits to get a complete picture for troubleshooting and defining relevant alerts. Furthermore typical workload health related metrics are missing from the apiserver. Also volumes and node statistics are important in daily operations.
Attachments
Release Notes