google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
17.26k stars 2.33k forks source link

Prometheus - Error on ingesting out-of-order samples on cAdvisor DaemonSet #3217

Open reefland opened 1 year ago

reefland commented 1 year ago

cAdvisor v0.46.0 Deployed as DaemonSet on Ubuntu 22.04 with containerd 1.5.9-0ubuntu3.1 (needed for ZFS snapshotter) with K3s v1.25.4+k3s1

Prometheus Logs are full of messages such as:

ts=2022-12-25T00:17:56.075Z caller=scrape.go:1681 level=warn component="scrape manager" scrape_pool=kubernetes-cadvisor target=http://10.42.0.202:8080/metrics msg="Error on ingesting out-of-order samples" num_dropped=165
ts=2022-12-25T00:18:21.127Z caller=scrape.go:1681 level=warn component="scrape manager" scrape_pool=kubernetes-cadvisor target=http://10.42.2.212:8080/metrics msg="Error on ingesting out-of-order samples" num_dropped=183
ts=2022-12-25T00:18:26.317Z caller=scrape.go:1681 level=warn component="scrape manager" scrape_pool=kubernetes-cadvisor target=http://10.42.3.60:8080/metrics msg="Error on ingesting out-of-order samples" num_dropped=155
ts=2022-12-25T00:18:34.144Z caller=scrape.go:1681 level=warn component="scrape manager" scrape_pool=kubernetes-cadvisor target=http://10.42.1.146:8080/metrics msg="Error on ingesting out-of-order samples" num_dropped=183

This is pretty high: prometheus_target_scrapes_sample_out_of_order_total: 665354.

The IP addresses in error messages above line up with my cAdvisor DaemondSet:

up{instance="10.42.0.202:8080", job="kubernetes-cadvisor", metrics_path="/metrics"}  |  1
up{instance="10.42.1.146:8080", job="kubernetes-cadvisor", metrics_path="/metrics"}  | 1
up{instance="10.42.2.212:8080", job="kubernetes-cadvisor", metrics_path="/metrics"}  | 1
up{instance="10.42.3.60:8080", job="kubernetes-cadvisor", metrics_path="/metrics"}  | 1
up{instance="10.42.4.58:8080", job="kubernetes-cadvisor", metrics_path="/metrics"}  | 1

overlays/cadvisor-args.yaml:

# This patch is an example of setting arguments for the cAdvisor container.
# This set of arguments mirrors what the kubelet currently uses for cAdvisor, 
# enables only cpu, memory, diskIO, disk and network metrics, and shows only
# container metrics.
---
apiVersion: apps/v1 # for Kubernetes versions before 1.9.0 use apps/v1beta2
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: cadvisor
spec:
  selector:
    matchLabels:
      name: cadvisor
  template:
    spec:
      containers:
        - name: cadvisor
          args:
            # https://github.com/google/cadvisor/blob/master/docs/runtime_options.md
            - --housekeeping_interval=15s
            - --max_housekeeping_interval=25s
            - --event_storage_event_limit=default=0
            - --event_storage_age_limit=default=0
            - --disable_metrics=advtcp,cpuset,memory_numa,percpu,sched,tcp,udp,disk,diskIO,accelerator,hugetlb,referenced_memory,cpu_topology,resctrl
            - --enable_metrics=app,cpu,disk,diskIO,memory,network,process
            - --docker_only                                         # only show stats for docker containers
            - --store_container_labels=false
            - --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace

In the Kube Prometheus Stack Helm values section, I disable Kublet builtin cAdvisor scrapes:

        kubelet:
          enabled: true
          namespace: kube-system
          serviceMonitor:
            cAdvisor: false

Other relevant sections within the values file for cAdvisor:

            additionalScrapeConfigs:
              # CADVISOR SCRAPE JOB for externally installed cadvisor because of k8s with containerd problems
              - job_name: "kubernetes-cadvisor"
                kubernetes_sd_configs:
                  - role: pod  # we get needed info from the pods
                    namespaces:
                      names: 
                        - cadvisor
                    selectors:
                      - role: pod
                        label: "app=cadvisor"  # and only select the cadvisor pods with this label set as source
                metric_relabel_configs:  # we relabel some labels inside the scraped metrics
                  # this should look at the scraped metric and replace/add the label inside
                  - source_labels: [container_label_io_kubernetes_pod_namespace]
                    target_label: "namespace"
                  - source_labels: [container_label_io_kubernetes_pod_name]
                    target_label: "pod"
                  - source_labels: [container_label_io_kubernetes_container_name]
                    target_label: "container"

                ## metrics_path is required to match upstream rules and charts
                relabel_configs:
                  - action: replace
                    sourceLabels: [__metrics_path__]
                    targetLabel: metrics_path
AliDevOps8 commented 10 months ago

Hello, Did you find any solution please ? Regards

reefland commented 10 months ago

Nope. I gave up on that effort trying to run cAdvisor DaemonSet.