influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

Extend kube_inventory plugin to include resourcequota measurement and extend node and pod measurement with few more metrics #9621

Open varunjain0606 opened 3 years ago

varunjain0606 commented 3 years ago

Feature Request

The kubernetes and kube_inventory input plugins have most of the metrics to monitor k8s infrastructure needs but few resources and metrics are still missing from the set of plugins which can be easily extended and will help in better k8s monitoring.

Proposal

The kube_inventory plugin can be extended to not only have capacity and allocatable quantity metrics but also other health metrics like node status, node count and if node is schedulable or not. Also new resource measurement like resourcequoata can be added for better monitoring. These are the metrics that can be easily extended using "k8s.io/api/core/v1" library.

  1. kubernetes_node_condition_status
  2. kubernetes_node_count
  3. kubernetes_unschedulable
Ex. 
for _, val := range n.Status.Conditions {
.
.
    fields["status_condition"] = string(val.Status)
}
fields["spec_unschedulable"] = n.Spec.Unschedulable

Also new measurement type can be included with following metrics.

kubernetes_resourcequota

Current behavior

Currently the metrics mentioned above have not been included in any input plugin.

Desired behavior

After the implementation of the feature, the kube_inventory plugin output should be something like this.

_> kubernetesnode,host=vjain count=8i 1628918652000000000 > kubernetes_node,condition=Ready,host=vjain,node_name=ip-172-17-0-2.internal,status=True statuscondition=1i 1629177980000000000 > kubernetes_node,cluster_namespace=tools,condition=Ready,host=vjain,node_name=ip-172-17-0-2.internal,status=True allocatable_cpu_cores=4i,allocatable_memory_bytes=7186567168i,allocatable_millicpu_cores=4000i,allocatable_pods=110i,capacity_cpu_cores=4i,capacity_memory_bytes=7291424768i,capacity_millicpu_cores=4000i,capacity_pods=110i,spec_unschedulable=0i,status_condition=1i 1628918652000000000

Use case

We are planning to migrate our monitoring infrastructure from prometheus to telegraf and trying to fill up those gaps in the metrics desired. Combining this feature with already raised https://github.com/influxdata/telegraf/issues/8546 will serve our purpose.

powersj commented 2 years ago

next steps: get PR updated without unnecessary binary or test file, review PR