Open reefland opened 2 years ago
Had the same issue - and was becoming desperate and pulling my hair. Finally this is my config I came up with and it seems to work with cadvisor v0.46 on a Kubernetes cluster v1.24.8 setup via Rancher.
# CADVISOR SCRAPE JOB for extra installed cadvisor because of k8s v1.24 with containerd problems where some labels just have empty values on RKE clusters
- job_name: "kubernetes-cadvisor"
kubernetes_sd_configs:
- role: pod # we get needed info from the pods
namespaces:
names:
- monitoring # in namespace monitoring
selectors:
- role: pod
label: "app=cadvisor" # and only select the cadvisor pods with this label set as source
metric_relabel_configs: # we relabel some labels inside the scraped metrics
# this should look at the scraped metric and replace/add the label inside
- source_labels: [container_label_io_kubernetes_pod_namespace]
target_label: "namespace"
- source_labels: [container_label_io_kubernetes_pod_name]
target_label: "pod"
- source_labels: [container_label_io_kubernetes_container_name]
target_label: "container"
Now the container_*
-metrics have labels that are needed in the Grafana-Dashboards we use here for Kubernetes clusters. For example:
container_memory_usage_bytes{container="cadvisor", container_label_io_kubernetes_container_name="cadvisor", container_label_io_kubernetes_pod_name="cadvisor-x6pfx", container_label_io_kubernetes_pod_namespace="monitoring", id="/kubepods/burstable/pod08586cc5-da59-499a-a60b-f7bf859ce7a5/77b8b44fce648487d4ed47dd9b143148e6cccb53ba2a73bfe9277d22f1a305d7", image="sha256:78367b75ee31241d19875ea7a1a6fa06aa42377bba54dbe8eac3f4722fd036b5", instance="10.42.2.139:8080", job="kubernetes-cadvisor", name="k8s_cadvisor_cadvisor-x6pfx_monitoring_08586cc5-da59-499a-a60b-f7bf859ce7a5_0", namespace="monitoring", pod="cadvisor-x6pfx"}
this blog https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2 helped a lot to understand how the different relabel_configs
work.
That helped a little... I now have working container name, but still no pod
returned by the external cadvisor:
container_cpu_usage_seconds_total{namespace="monitoring", container="grafana"}
container_cpu_usage_seconds_total{container="grafana", container_label_io_kubernetes_container_name="grafana", container_label_io_kubernetes_pod_namespace="monitoring", cpu="total", id="/kubepods/besteffort/pode00da7e6-0e0f-4cd9-aa75-b1e9bab32b38/8959546a3f87530a3059f775191d254d43db3a8ccf17bfa98495ab25a869326d", image="docker.io/grafana/grafana:9.2.4", instance="10.42.0.9:8080", job="kubernetes-cadvisor", name="8959546a3f87530a3059f775191d254d43db3a8ccf17bfa98495ab25a869326d", namespace="monitoring"}
name="8959546a3f87530a3059f775191d254d43db3a8ccf17bfa98495ab25a869326d"
value and not the pod name.Whereas the kubelet cadvisor does have the pod name:
container_cpu_usage_seconds_total{namespace="monitoring", pod=~"grafana.*"}
:
container_cpu_usage_seconds_total{cpu="total", endpoint="https-metrics", id="/kubepods/besteffort/pode00da7e6-0e0f-4cd9-aa75-b1e9bab32b38", instance="testlinux", job="kubelet", metrics_path="/metrics/cadvisor", namespace="monitoring", node="testlinux", pod="grafana-ff88df95-lbvr2", service="prometheus-kubelet"}
Did you apply any of the overlays such as cadvisor-args.yaml
??
hmm, strange.
not completely sure what's going on here on our systems as the cadvisor-stuff was set up by a colleague who left the company a few weeks ago, left a mess, and I have now to figure out how to fix the prometheus/prometheus-operator setup etc. :|
TLDR. I had a look and it seems the cadvisors run with following arguments :D
--housekeeping_interval=2s
--max_housekeeping_interval=15s
--event_storage_event_limit=default=0
--event_storage_age_limit=default=0
--enable_metrics=app,cpu,disk,diskIO,memory,network,process
--docker_only
--store_container_labels=false
--whitelisted_container_labels=io.kubernetes.container.name, io.kubernetes.pod.name,io.kubernetes.pod.namespace, io.kubernetes.pod.name,io.kubernetes.pod.name
you see io.kubernetes.pod.name
is in there multiple times :shrug: - where it's only seen once in the example
Even, stranger.. I noticed that the only 2 I had working were the ones with NO SPACES in the --whitelisted_container_labels
field as shown above (from cadvisor-args.yaml
overlay file). I removed that space and it started to work!
Weird.
Had the same issue - and was becoming desperate and pulling my hair. Finally this is my config I came up with and it seems to work with cadvisor v0.46 on a Kubernetes cluster v1.24.8 setup via Rancher.
# CADVISOR SCRAPE JOB for extra installed cadvisor because of k8s v1.24 with containerd problems where some labels just have empty values on RKE clusters - job_name: "kubernetes-cadvisor" kubernetes_sd_configs: - role: pod # we get needed info from the pods namespaces: names: - monitoring # in namespace monitoring selectors: - role: pod label: "app=cadvisor" # and only select the cadvisor pods with this label set as source metric_relabel_configs: # we relabel some labels inside the scraped metrics # this should look at the scraped metric and replace/add the label inside - source_labels: [container_label_io_kubernetes_pod_namespace] target_label: "namespace" - source_labels: [container_label_io_kubernetes_pod_name] target_label: "pod" - source_labels: [container_label_io_kubernetes_container_name] target_label: "container"
Now the
container_*
-metrics have labels that are needed in the Grafana-Dashboards we use here for Kubernetes clusters. For example:container_memory_usage_bytes{container="cadvisor", container_label_io_kubernetes_container_name="cadvisor", container_label_io_kubernetes_pod_name="cadvisor-x6pfx", container_label_io_kubernetes_pod_namespace="monitoring", id="/kubepods/burstable/pod08586cc5-da59-499a-a60b-f7bf859ce7a5/77b8b44fce648487d4ed47dd9b143148e6cccb53ba2a73bfe9277d22f1a305d7", image="sha256:78367b75ee31241d19875ea7a1a6fa06aa42377bba54dbe8eac3f4722fd036b5", instance="10.42.2.139:8080", job="kubernetes-cadvisor", name="k8s_cadvisor_cadvisor-x6pfx_monitoring_08586cc5-da59-499a-a60b-f7bf859ce7a5_0", namespace="monitoring", pod="cadvisor-x6pfx"}
this blog https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2 helped a lot to understand how the different
relabel_configs
work.
I have the same problem on
container_cpu_usage_seconds_total
all results returned from above promQL does not have image field, it's quite strange. the same monitoring chart works well in rke v1.20.8 (without cadvisor)
For me its fix by configuring containerd path to : --containerd=/run/k3s/containerd/containerd.sock after that its start showing metrics
I have successfully deployed cadvisor 0.45.0 (tried
v0.45.0-containerd-cri
as well) as daemonset on K3S Kubernetes / Containerd. I've only applied thecadvisor-args.yaml
overlay as the others did not seem relevant.History The bundled K3s (
v1.24.3+k3s1
) containerd is disabled as it does not support ZFS snapshotter. Instead I'm using the containerd from Ubuntu 22.04 (1.5.9-0ubuntu3
) and while it functions perfectly with containers for K3s and ZFS snapshotter, it does not work properly with kubelet / cAdvisor / Prometheus asimage=
andcontainer=
are missing. And a simple Prometheus query such as:Returned an empty set.
What I See Now It was suggested I try this cadvisor instead, and it is better.. almost but not quiet right. Hopefully I'm just missing something. Now that same Prometheus query returns
111
rows, here is an example for 3:What doesn't seem right:
containers
now equal"cadvisor"
instead of the value specified incontainer_label_io_kubernetes_container_name
namespace
now equal"cadvisor"
instead of the value specified incontainer_label_io_kubernetes_pod_namespace
pods
now equal"cadvisor-tqbj6"
instead of the value specified inid
A Prometheus Query of
container_cpu_usage_seconds_total{image!="",container!="cadvisor"}
returns an empty set.Suggestions?