Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting

Description The MetricPipeline supports already an input type runtime which emits metrics around the container and pod resource consumption. What is missing are further typical metrics:

from the apiserver about configured resource limits
from the apiserver around the state of workloads
from the kubelet statistics of the volumes
from the kubelet statistics of the nodes mainly the typical metrics resulting from the kubletstatsreceiver and the k8sclusterreceiver

Having these metrics available, basic troubleshooting for kubernetes workload including alerting can be fullfiled.

Goal Provide a way to collect a typical set of metrics for basic workload troubleshooting (comparable to the metrics used by the dashboards provided by the kube-prometheus-stack)

Criterias

Typical metrics are collectable which are needed to troubleshoot
- Pod compute resource
- Node resource usage
- Volume resource usage
- Health of workloads (deployment stuck for example)
Namespace specific metrics can be enabled per namespace (probably independent from non-namespaces resources)
Node and Volume related metrics can be enabled optional to workload related metrics

Actions

[x] Preparations
- [x] Build understanding of the available receivers and make an API proposal (https://github.com/kyma-project/telemetry-manager/issues/1001)
- [x] Come up with a concept on how to run the k8sclusterreceiver which does not fit into the current architectural setup (https://github.com/kyma-project/telemetry-manager/issues/1003)
[ ] Implementation & Rollout
- [x] Have the existing input configurable for pod and container (https://github.com/kyma-project/telemetry-manager/issues/1183)
- [x] Enrich the container and pod sub-inputs with the metrics from the k8sclusterreceiver (https://github.com/kyma-project/telemetry-manager/issues/1184)
- [x] Add new inputs for nodes being disabled by default for new pipelines (https://github.com/kyma-project/telemetry-manager/issues/1300)
- [x] Add manual Cloud Logging Dashboard for node metrics (https://github.com/kyma-project/telemetry-manager/pull/1494)
- [x] Add new inputs for volumes being disabled by default for new pipelines (https://github.com/kyma-project/telemetry-manager/issues/1301)
- [x] Add manual Cloud Logging Dashboard for volume metrics (https://github.com/kyma-project/telemetry-manager/pull/1528)
- [x] #1521
- [X] Add manual Cloud Logging Dashboard for workload metrics (https://github.com/kyma-project/telemetry-manager/pull/1577)
- [ ] Switch defaults to "on" for basic selectors, so that new pipelines expose them by default while existing ones stay where they are

Reasons The current feature set is a good start but are missing apiserver related details like limits to get a complete picture for troubleshooting and defining relevant alerts. Furthermore typical workload health related metrics are missing from the apiserver. Also volumes and node statistics are important in daily operations.

Attachments

Release Notes

kyma-project / telemetry-manager

Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting #972