paypal / load-watcher

Load watcher is a cluster-wide aggregator of metrics, developed for Trimaran: Real Load Aware Scheduler in Kubernetes.
Other
65 stars 34 forks source link

Provide more details about cpu and memory prometheus metric. #51

Open WLBF opened 2 years ago

WLBF commented 2 years ago

It took me some time to find out what exactly instance:node_cpu:ratio metirc is. It seems cpu and memory metric is come from helm-charts/charts/kube-prometheus-stack/templates/prometheus/rules/kube-prometheus-node-recording.rules.yaml rule which is is removed and seems be repalced by instance:node_load1_per_cpu:ratio rule in later verison. I think it is better to have detail description about cpu and memory metric and provide a way to configure name of cpu and memory metric.

wangchen615 commented 2 years ago

@WLBF , thanks for reporting this. @atantawi got a similar experience by using helm to install prometheus. The version @atantawi experienced issue is: https://prometheus-community.github.io/helm-charts/

Currently, the node usage metrics we used are from instance:node_cpu:ratio and instance:node_memory_utilisation:ratio. The successful tested version of prometheus is kube-prometheus, https://github.com/prometheus-operator/kube-prometheus .

We will dig out more details about the helm version and see if there is a need to allow developers to configure the metric name.