google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.89k stars 2.31k forks source link

Inconsistent container metrics in prometheus route #1704

Closed zeisss closed 6 years ago

zeisss commented 7 years ago

Our cadvisor reports different containers each we time I query the /metrics route. The problems are consistent across various environments and VMs. I initially found #1635 and thought this to be the same, but the linked #1572 explains that cadvisor seems to pickup two systemd slices for the container, which is not the case according to my logs. Thus a separate issue, just to be sure.

17:50 $ curl -s http://docker-012.<domain>:8701/metrics | fgrep container_cpu_usage_seconds_total| wc -l
      98
17:51 $ curl -s http://docker-012.<domain>:8701/metrics | fgrep container_cpu_usage_seconds_total| wc -l
      18
17:51 $ curl -s http://docker-012.<domain>:8701/metrics | fgrep container_cpu_usage_seconds_total| wc -l
      98
17:51 $ curl -s http://docker-012.<domain>:8701/metrics | fgrep container_cpu_usage_seconds_total| wc -l

:8701 is started as follows: $ sudo /opt/cadvisor/bin/cadvisor -port 8701 -logtostderr -v=10

Neither dockerd nor cadvisor print any logs during those requests.

Startup Logs

I0725 17:02:09.462596  109834 storagedriver.go:50] Caching stats in memory for 2m0s
I0725 17:02:09.462727  109834 manager.go:143] cAdvisor running in container: "/"
W0725 17:02:09.496040  109834 manager.go:151] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp 127.0.0.1:15441: getsockopt: connection refused
I0725 17:02:09.531430  109834 fs.go:117] Filesystem partitions: map[/dev/dm-0:{mountpoint:/ major:254 minor:0 fsType:ext4 blockSize:0} /dev/mapper/rs--pre--docker--012--vg-var:{mountpoint:/var/lib/docker/aufs major:254 minor:2 fsType:ext4 blockSize:0} /dev/mapper/rs--pre--docker--012--vg-varlog:{mountpoint:/var/log major:254 minor:3 fsType:ext4 blockSize:0}]
I0725 17:02:09.534803  109834 manager.go:198] Machine: {NumCores:8 CpuFrequency:2397223 MemoryCapacity:38034182144 MachineID:c63b565c3eea4c1bab8cc5d972595a51 SystemUUID:423B1F3E-804D-219F-8D0B-EECB74C81279 BootID:9b2c8857-539f-4adf-b2b5-c8e2672968b8 Filesystems:[{Device:/dev/mapper/rs--pre--docker--012--vg-var DeviceMajor:254 DeviceMinor:2 Capacity:40179982336 Type:vfs Inodes:2501856 HasInodes:true} {Device:/dev/mapper/rs--pre--docker--012--vg-varlog DeviceMajor:254 DeviceMinor:3 Capacity:20020748288 Type:vfs Inodes:1250928 HasInodes:true} {Device:/dev/dm-0 DeviceMajor:254 DeviceMinor:0 Capacity:12366823424 Type:vfs Inodes:775200 HasInodes:true}] DiskMap:map[254:1:{Name:dm-1 Major:254 Minor:1 Size:1023410176 Scheduler:none} 254:2:{Name:dm-2 Major:254 Minor:2 Size:40957378560 Scheduler:none} 254:3:{Name:dm-3 Major:254 Minor:3 Size:20476592128 Scheduler:none} 8:0:{Name:sda Major:8 Minor:0 Size:75161927680 Scheduler:cfq} 254:0:{Name:dm-0 Major:254 Minor:0 Size:12700352512 Scheduler:none}] NetworkDevices:[{Name:eth0 MacAddress:00:50:56:bb:37:43 Speed:10000 Mtu:1500}] Topology:[{Id:0 Memory:38034182144 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:15728640 Type:Unified Level:3}]} {Id:2 Memory:0 Cores:[{Id:0 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:15728640 Type:Unified Level:3}]} {Id:4 Memory:0 Cores:[{Id:0 Threads:[2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:15728640 Type:Unified Level:3}]} {Id:6 Memory:0 Cores:[{Id:0 Threads:[3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:15728640 Type:Unified Level:3}]} {Id:8 Memory:0 Cores:[{Id:0 Threads:[4] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:15728640 Type:Unified Level:3}]} {Id:10 Memory:0 Cores:[{Id:0 Threads:[5] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:15728640 Type:Unified Level:3}]} {Id:12 Memory:0 Cores:[{Id:0 Threads:[6] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:15728640 Type:Unified Level:3}]} {Id:14 Memory:0 Cores:[{Id:0 Threads:[7] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:15728640 Type:Unified Level:3}]}] CloudProvider:Unknown InstanceType:Unknown InstanceID:None}
I0725 17:02:09.535661  109834 manager.go:204] Version: {KernelVersion:3.16.0-4-amd64 ContainerOsVersion:Debian GNU/Linux 8 (jessie) DockerVersion:1.13.1 DockerAPIVersion:1.26 CadvisorVersion:v0.26.1 CadvisorRevision:d19cc94}
I0725 17:02:09.577920  109834 factory.go:351] Registering Docker factory
W0725 17:02:09.577951  109834 manager.go:247] Registration of the rkt container factory failed: unable to communicate with Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp 127.0.0.1:15441: getsockopt: connection refused
I0725 17:02:09.577957  109834 factory.go:54] Registering systemd factory
I0725 17:02:09.578235  109834 factory.go:86] Registering Raw factory
I0725 17:02:09.578542  109834 manager.go:1121] Started watching for new ooms in manager
I0725 17:02:09.579461  109834 oomparser.go:185] oomparser using systemd
I0725 17:02:09.579565  109834 factory.go:116] Factory "docker" was unable to handle container "/"
I0725 17:02:09.579582  109834 factory.go:105] Error trying to work out if we can handle /: / not handled by systemd handler
I0725 17:02:09.579586  109834 factory.go:116] Factory "systemd" was unable to handle container "/"
I0725 17:02:09.579592  109834 factory.go:112] Using factory "raw" for container "/"
I0725 17:02:09.579959  109834 manager.go:913] Added container: "/" (aliases: [], namespace: "")
I0725 17:02:09.580102  109834 handler.go:325] Added event &{/ 2017-07-22 16:40:48.746304841 +0200 CEST containerCreation {<nil>}}
I0725 17:02:09.580139  109834 manager.go:288] Starting recovery of all containers
I0725 17:02:09.580237  109834 container.go:407] Start housekeeping for container "/"

Logs for a container

Example: I am missing the metrics for f7ba91df74c8. Cadvisor mentions the container ID only once:

I0725 17:02:09.693203  109834 factory.go:112] Using factory "docker" for container "/docker/f7ba91df74c8b923cf66ba2e0ef4190a2089f7dd258d7d57f7e92034192a1855"
I0725 17:02:09.695423  109834 manager.go:913] Added container: "/docker/f7ba91df74c8b923cf66ba2e0ef4190a2089f7dd258d7d57f7e92034192a1855" (aliases: [containernameredacted f7ba91df74c8b923cf66ba2e0ef4190a2089f7dd258d7d57f7e92034192a1855], namespace: "docker")
I0725 17:02:09.695640  109834 handler.go:325] Added event &{/docker/f7ba91df74c8b923cf66ba2e0ef4190a2089f7dd258d7d57f7e92034192a1855 2017-07-25 16:20:00.930924661 +0200 CEST containerCreation {<nil>}}
I0725 17:02:09.695779  109834 container.go:407] Start housekeeping for container "/docker/f7ba91df74c8b923cf66ba2e0ef4190a2089f7dd258d7d57f7e92034192a1855"

System

cadvisor_version_info{cadvisorRevision="d19cc94",cadvisorVersion="v0.26.1",dockerVersion="1.13.1",kernelVersion="3.16.0-4-amd64",osVersion="Debian GNU/Linux 8 (jessie)"} 1

We are running an old docker swarm setup with consul, consul-template and nginx per host. No Kubernetes.

micahhausler commented 7 years ago

We're observing the same behavior, in the kubernetes 1.7.0 kubelet (port 4194), and the docker image for v0.26.1

Versions:

docker: 1.12.6
Kubelet: v1.7.0+coreos.0
OS: CoreOS Linux 1409.7.0
Kernel Version: 4.11.11-coreos

I ran cadvisor on kubernetes using the following DaemonSet

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: default
  labels:
    app: "cadvisor"
spec:
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: "cadvisor"
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: 4194
        prometheus.io/path: '/metrics'
    spec:
      containers:
      - name: "cadvisor"
        image: "google/cadvisor:v0.26.1"
        args:
        - "-port=4194"
        - "-logtostderr"
        livenessProbe:
          httpGet:
            path: /api
            port: 4194
        volumeMounts:
        - name: root
          mountPath: /rootfs
          readOnly: true
        - name: var-run
          mountPath: /var/run
        - name: sys
          mountPath: /sys
          readOnly: true
        - name: var-lib-docker
          mountPath: /var/lib/docker
          readOnly: true
        - name: docker-socket
          mountPath: /var/run/docker.sock
        resources:
          limits:
            cpu: 500.0m
            memory: 256Mi
          requests:
            cpu: 250.0m
            memory: 128Mi
      restartPolicy: Always
      volumes:
      - name: "root"
        hostPath:
          path: /
      - name: "var-run"
        hostPath:
          path: /var/run
      - name: "sys"
        hostPath:
          path: /sys
      - name: "var-lib-docker"
        hostPath:
          path: /var/lib/docker
      - name: "docker-socket"
        hostPath:
          path: /var/run/docker.sock

And this is what it looked like in Prometheus:

screen shot 2017-07-27 at 3 47 45 pm
zeisss commented 7 years ago

Running the binary without root permissions fixes the problems, but now container labels are missing. Using the -docker-only flag or accessing docker via tcp/ip leads to no change from the initial behavior.

fabxc commented 7 years ago

@zeisss @micahhausler are you both running Prometheus 2.0? In 1.x versions the flapping metrics are not caught by the new staleness handling and thus it should have no immediately visible effect.

In general it's definitely wrong behavior by cAdvisor though that violates the /metrics contract. This seems to be a recent regression. @derekwaynecarr @timothysc any idea what could have caused this?

idexter commented 7 years ago

@fabxc I'm using Prometheus 1.5.2 and cAdvisor on host machine and I also have this problem. As @zeisss said, if I run cAdvisor without root permission, this fix the problem except that container labels is missing.

Worst of all with this bug is that Prometheus sometimes lose some containers metrics... In Grafana my graph with running containers looks like this:

containers_graph

And I see Alerts from AlertManager that containers is down, but actually all containers working all time.

zeisss commented 7 years ago

We currently have a workaround by running cadvisor as an explicit user. this is ok for us, as having the CPU and memory graphs is already a win for us. But afaict this mode is missing the docker container labels as well as network and disk I/O metrics.

zeisss commented 7 years ago

@fabxc no, we are still running a 1.x prometheus version - but having Prometheus work around this bug in cadvisor is not a good solution IMO.

zeisss commented 7 years ago

We are currently in the progress of updating our DEV cluster to Docker 17.06-ce where we are still seeing this behavior, if run as root (/opt/cadvisor/bin/cadvisor -port 8701 -logtostderr):

$ while true; do curl -sS docker-host:8701/metrics | fgrep container_cpu_system_seconds_total | wc -l; sleep 1; done
      28
      28
       9
       9
       5
       6
^C
# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
# TYPE cadvisor_version_info gauge
cadvisor_version_info{cadvisorRevision="057293a",cadvisorVersion="v0.26.0.20+057293a1796d6a-dirty",dockerVersion="17.06.0-ce",kernelVersion="3.16.0-4-amd64",osVersion="Debian GNU/Linux 8 (jessie)"} 1
sylr commented 7 years ago

I've the same issue with Kubernetes 1.7.2 & 1.7.3.

https://github.com/kubernetes/kubernetes/issues/50151

bassebaba commented 7 years ago

I have the exact same problem as @DexterHD, makes me crazy, my container-down alert spams me with false alerts all the time.

screen shot 2017-08-06 at 07 42 53
igortg commented 7 years ago

I just started to explore cAdvisor. Seems to have the same issue using InfluxDB:

image

Hermain commented 7 years ago

Having the same issue with docker 17.06, prometheus and docker swarm. Running v0.24.1 solved it for me

matthiasr commented 7 years ago

cc @grobie

dixudx commented 7 years ago

/cc

roman-vynar commented 7 years ago

Same thing, 0.26, 0.26.1 are unusable with Prometheus (in our case 1.7.x). They provide a random number of metrics - different number of metrics exposed by /metrics path at a single moment. Had to go back to the old good 0.25. Docker 17.03/17.06.

bassebaba commented 7 years ago

@Hermain @roman-vynar According to release notes 0.26 "Bug: Fix prometheus metrics." So when reverting to 0.25, one misses out on whatever they fixed (but at the same time did break something and introduced the gaps)? I cant find the prometheus-commit that's connected to v0.26 in order to see whats "fixed".

Do we have an ETA on fixing this? No devs in this issue? And no assignee?

matthiasr commented 7 years ago

According to https://github.com/google/cadvisor/issues/1690#issuecomment-313597011 the fix in 0.26.1 isn't working / incomplete, maybe this is the same problem?

matthiasr commented 7 years ago

Does this problem happen on a cAdvisor built from master, which includes #1679?

sylr commented 7 years ago

~If someone can indicate me how to build hyperkube with a custom cAdvisor commit I'd like to make some tests.~ I think I found how to do this.

Thanks.

Cas-pian commented 7 years ago

I meet the same promblem using cadvisor 0.26.1 and prometheus 1.7.1, but it's OK when I changed cadvisor to v0.25.0, and it OK with cadvisor 0.26.1 and prometheus 1.5.3, I'm a little confused, it seems to be a compatibility issue.

bboreham commented 7 years ago

Seeing the same high-level symptoms: for me it's the labels that are missing, not the containers. And when the labels are missing I get a lot more lines for other cgroups.

I'm running kuberntes 1.7.3 on Ubuntu Linux ip-172-20-3-76 4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Two examples from the same kubelet on the same machine, a few seconds apart:

Example 1:

# curl -s 127.0.0.1:10255/metrics/cadvisor | grep container_cpu_user_seconds_total
# HELP container_cpu_user_seconds_total Cumulative user cpu time consumed in seconds.
# TYPE container_cpu_user_seconds_total counter
container_cpu_user_seconds_total{id="/"} 3.6788206e+06
container_cpu_user_seconds_total{id="/init.scope"} 69.43
container_cpu_user_seconds_total{id="/kubepods"} 3.49797001e+06
container_cpu_user_seconds_total{id="/kubepods/besteffort"} 162742.99
container_cpu_user_seconds_total{id="/kubepods/besteffort/pod13eacef1-8342-11e7-9534-0a97ed59c75e"} 69.47
container_cpu_user_seconds_total{id="/kubepods/besteffort/pod5f43c843-7db5-11e7-9534-0a97ed59c75e"} 703.82
container_cpu_user_seconds_total{id="/kubepods/besteffort/pod6b2e45d7-7db5-11e7-9534-0a97ed59c75e"} 70.04
container_cpu_user_seconds_total{id="/kubepods/besteffort/pod94ad7fd4-8351-11e7-9534-0a97ed59c75e"} 363.18
container_cpu_user_seconds_total{id="/kubepods/besteffort/pod965b711b-8262-11e7-9534-0a97ed59c75e"} 5.9
container_cpu_user_seconds_total{id="/kubepods/besteffort/podd2b82b9c-8355-11e7-9534-0a97ed59c75e"} 35733.13
container_cpu_user_seconds_total{id="/kubepods/besteffort/pode4c7eace-8352-11e7-9534-0a97ed59c75e"} 150.78
container_cpu_user_seconds_total{id="/kubepods/burstable"} 3.33525364e+06
container_cpu_user_seconds_total{id="/kubepods/burstable/pod53559243-7db5-11e7-9534-0a97ed59c75e"} 276743.3
container_cpu_user_seconds_total{id="/kubepods/burstable/pod55af46fe-834c-11e7-9534-0a97ed59c75e"} 105958.75
container_cpu_user_seconds_total{id="/kubepods/burstable/pod7964f3e653196edee64f6bad72589dee"} 366.77
container_cpu_user_seconds_total{id="/kubepods/burstable/pod7964f3e653196edee64f6bad72589dee/8d2eb34023eab40d08ba6e4be149e315c3844749f8321f44be2dcda024534757/\"\""} 366.65
container_cpu_user_seconds_total{id="/kubepods/burstable/podc7af9dff-8364-11e7-9534-0a97ed59c75e"} 434974.97
container_cpu_user_seconds_total{id="/kubepods/burstable/podcb5d3cc0-8364-11e7-9534-0a97ed59c75e"} 891563
container_cpu_user_seconds_total{id="/kubepods/burstable/podcf18531c-8365-11e7-9534-0a97ed59c75e"} 17225.18
container_cpu_user_seconds_total{id="/system.slice"} 151482.27
container_cpu_user_seconds_total{id="/system.slice/acpid.service"} 0
container_cpu_user_seconds_total{id="/system.slice/apparmor.service"} 0
container_cpu_user_seconds_total{id="/system.slice/apport.service"} 0
container_cpu_user_seconds_total{id="/system.slice/atd.service"} 0
container_cpu_user_seconds_total{id="/system.slice/cgroupfs-mount.service"} 0
container_cpu_user_seconds_total{id="/system.slice/cloud-config.service"} 0.32
container_cpu_user_seconds_total{id="/system.slice/cloud-final.service"} 0.37
container_cpu_user_seconds_total{id="/system.slice/cloud-init-local.service"} 0
container_cpu_user_seconds_total{id="/system.slice/cloud-init.service"} 0.63
container_cpu_user_seconds_total{id="/system.slice/console-setup.service"} 0
container_cpu_user_seconds_total{id="/system.slice/cron.service"} 25.49
container_cpu_user_seconds_total{id="/system.slice/dbus.service"} 14.82
container_cpu_user_seconds_total{id="/system.slice/docker.service"} 94117.92
container_cpu_user_seconds_total{id="/system.slice/ebtables.service"} 0
container_cpu_user_seconds_total{id="/system.slice/grub-common.service"} 0
container_cpu_user_seconds_total{id="/system.slice/ifup@cbr0.service"} 0
container_cpu_user_seconds_total{id="/system.slice/ifup@ens3.service"} 0.79
container_cpu_user_seconds_total{id="/system.slice/irqbalance.service"} 40.56
container_cpu_user_seconds_total{id="/system.slice/iscsid.service"} 1.69
container_cpu_user_seconds_total{id="/system.slice/keyboard-setup.service"} 0
container_cpu_user_seconds_total{id="/system.slice/kmod-static-nodes.service"} 0
container_cpu_user_seconds_total{id="/system.slice/kubelet.service"} 21323.06
container_cpu_user_seconds_total{id="/system.slice/lvm2-lvmetad.service"} 8.94
container_cpu_user_seconds_total{id="/system.slice/lvm2-monitor.service"} 0
container_cpu_user_seconds_total{id="/system.slice/lxcfs.service"} 0.37
container_cpu_user_seconds_total{id="/system.slice/lxd-containers.service"} 0
container_cpu_user_seconds_total{id="/system.slice/mdadm.service"} 0.02
container_cpu_user_seconds_total{id="/system.slice/networking.service"} 0
container_cpu_user_seconds_total{id="/system.slice/ondemand.service"} 0
container_cpu_user_seconds_total{id="/system.slice/open-iscsi.service"} 0
container_cpu_user_seconds_total{id="/system.slice/polkitd.service"} 3.63
container_cpu_user_seconds_total{id="/system.slice/rc-local.service"} 0
container_cpu_user_seconds_total{id="/system.slice/resolvconf.service"} 0
container_cpu_user_seconds_total{id="/system.slice/rsyslog.service"} 100.82
container_cpu_user_seconds_total{id="/system.slice/setvtrgb.service"} 0
container_cpu_user_seconds_total{id="/system.slice/snapd.firstboot.service"} 0
container_cpu_user_seconds_total{id="/system.slice/snapd.service"} 0.04
container_cpu_user_seconds_total{id="/system.slice/ssh.service"} 51.39
container_cpu_user_seconds_total{id="/system.slice/system-getty.slice"} 0
container_cpu_user_seconds_total{id="/system.slice/system-serial\\x2dgetty.slice"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-journal-flush.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-journald.service"} 489.31
container_cpu_user_seconds_total{id="/system.slice/systemd-logind.service"} 3.02
container_cpu_user_seconds_total{id="/system.slice/systemd-modules-load.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-random-seed.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-remount-fs.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-sysctl.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-timesyncd.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-tmpfiles-setup-dev.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-tmpfiles-setup.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-udev-trigger.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-udevd.service"} 0.52
container_cpu_user_seconds_total{id="/system.slice/systemd-update-utmp.service"} 0
container_cpu_user_seconds_total{id="/system.slice/systemd-user-sessions.service"} 0
container_cpu_user_seconds_total{id="/system.slice/ufw.service"} 0
container_cpu_user_seconds_total{id="/user.slice"} 29270.98

Example 2:

# curl -s 127.0.0.1:10255/metrics/cadvisor | grep container_cpu_user_seconds_total
# HELP container_cpu_user_seconds_total Cumulative user cpu time consumed in seconds.
# TYPE container_cpu_user_seconds_total counter
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/besteffort/pod5f43c843-7db5-11e7-9534-0a97ed59c75e/e49ec1309ec25475a7edd8c4dd6d7003fef3f7debd053b234716649d920ac15f",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_prom-node-exporter-w4nvq_monitoring_5f43c843-7db5-11e7-9534-0a97ed59c75e_1",namespace="monitoring",pod_name="prom-node-exporter-w4nvq"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/besteffort/pod6b2e45d7-7db5-11e7-9534-0a97ed59c75e/2bf50e4b99aaf24eb05a61b9808d9e60d4fd78ba47ac7669ce29bb3f8c862501",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_reboot-required-rn9h4_monitoring_6b2e45d7-7db5-11e7-9534-0a97ed59c75e_1",namespace="monitoring",pod_name="reboot-required-rn9h4"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/besteffort/pod94ad7fd4-8351-11e7-9534-0a97ed59c75e/f3a1a656eabae83bb3a50206d7278b154fe1ddf2521e6a0bfd31667642867968",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_memcached-296817331-t3q5v_kube-system_94ad7fd4-8351-11e7-9534-0a97ed59c75e_0",namespace="kube-system",pod_name="memcached-296817331-t3q5v"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/besteffort/pod965b711b-8262-11e7-9534-0a97ed59c75e/c6e3b1012a1e607e4d164233f96a4c2ef83f377fc9dfb82e0dab7fc218e4e72a",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_kured-wp23j_kube-system_965b711b-8262-11e7-9534-0a97ed59c75e_1",namespace="kube-system",pod_name="kured-wp23j"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/besteffort/podd2b82b9c-8355-11e7-9534-0a97ed59c75e/e62bf79dd1981e285df9138a057b481357a5be6e464b43235e1335ac33bcf00b",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_fluxd-3608285890-x4bz7_kube-system_d2b82b9c-8355-11e7-9534-0a97ed59c75e_0",namespace="kube-system",pod_name="fluxd-3608285890-x4bz7"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/besteffort/pode4c7eace-8352-11e7-9534-0a97ed59c75e/e618f7cb1f3ec97f463ed9f97143890b80c730f53075a127d9f59714aab35163",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_nats-651776541-6vrk3_scope_e4c7eace-8352-11e7-9534-0a97ed59c75e_0",namespace="scope",pod_name="nats-651776541-6vrk3"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/burstable/pod53559243-7db5-11e7-9534-0a97ed59c75e/44f9f0113185f75f827eca36a42a7d2f91e166594c63eb2efecc7155eda03a70",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_scope-probe-master-3cktj_kube-system_53559243-7db5-11e7-9534-0a97ed59c75e_1",namespace="kube-system",pod_name="scope-probe-master-3cktj"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/burstable/pod55af46fe-834c-11e7-9534-0a97ed59c75e/9f9696a06e93a617a4e606731a474966c139681eef1a66344f0d06c965c68e47",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_authfe-1607895901-bjnd9_default_55af46fe-834c-11e7-9534-0a97ed59c75e_0",namespace="default",pod_name="authfe-1607895901-bjnd9"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/burstable/pod7964f3e653196edee64f6bad72589dee/7c3dc6bb8bb540224ca1f6d121d5fe2c5df0606ce5d45e7a0c802c29765c6625",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_kube-proxy-ip-172-20-3-76.ec2.internal_kube-system_7964f3e653196edee64f6bad72589dee_1",namespace="kube-system",pod_name="kube-proxy-ip-172-20-3-76.ec2.internal"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/burstable/podc7af9dff-8364-11e7-9534-0a97ed59c75e/3f5329bc7772496d70821ea9c9bc80045af6c29299c42d74e2d27baf8c3cc72a",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_prometheus-2177618048-kgczb_monitoring_c7af9dff-8364-11e7-9534-0a97ed59c75e_0",namespace="monitoring",pod_name="prometheus-2177618048-kgczb"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/burstable/podcb5d3cc0-8364-11e7-9534-0a97ed59c75e/decb876fb0dad43964deed741609ee45d3cf9049ae9f3ac934aefd596695302c",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_fluxsvc-438909710-2jtz8_fluxy_cb5d3cc0-8364-11e7-9534-0a97ed59c75e_0",namespace="fluxy",pod_name="fluxsvc-438909710-2jtz8"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/burstable/podcf18531c-8365-11e7-9534-0a97ed59c75e/133181c676d51606d4fa3d7d5c7e7455535636d30c5629526f0ba0cac5fcb522",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_fluentd-loggly-z9jp4_monitoring_cf18531c-8365-11e7-9534-0a97ed59c75e_0",namespace="monitoring",pod_name="fluentd-loggly-z9jp4"} 0
container_cpu_user_seconds_total{container_name="POD",id="/kubepods/burstable/pode84e93da-865d-11e7-940d-12467a080e24/dabbd2c12e2d2666dd818b0c44be54760a701bdaf850ee4804b32efd36c42754",image="gcr.io/google_containers/pause-amd64:3.0",name="k8s_POD_collection-3392593966-7nxpw_scope_e84e93da-865d-11e7-940d-12467a080e24_0",namespace="scope",pod_name="collection-3392593966-7nxpw"} 0
container_cpu_user_seconds_total{container_name="authfe",id="/kubepods/burstable/pod55af46fe-834c-11e7-9534-0a97ed59c75e/c796e0b2c3afc41e1ed6750c9dc9f5550e19efe25f0aa717fe4f9b2578c16c67",image="quay.io/weaveworks/authfe@sha256:c82cb113d15e20f65690aa3ca7f3374ae7ed2257dee2bc131bd61b1ac2bf180a",name="k8s_authfe_authfe-1607895901-bjnd9_default_55af46fe-834c-11e7-9534-0a97ed59c75e_0",namespace="default",pod_name="authfe-1607895901-bjnd9"} 64953.6
container_cpu_user_seconds_total{container_name="billing-ingester",id="/kubepods/burstable/pode84e93da-865d-11e7-940d-12467a080e24/50c82895bd84971bc6b8b9f5873512710ab06f754a0e0d3261bc20a2fddd4533",image="quay.io/weaveworks/billing-ingester@sha256:5fd857a96cac13e9f96678e63a07633af45de0e83a34e8ef28f627cf0589a042",name="k8s_billing-ingester_collection-3392593966-7nxpw_scope_e84e93da-865d-11e7-940d-12467a080e24_0",namespace="scope",pod_name="collection-3392593966-7nxpw"} 187.34
container_cpu_user_seconds_total{container_name="collection",id="/kubepods/burstable/pode84e93da-865d-11e7-940d-12467a080e24/f33f520aa2ed2c6f2277064fed34c4797ddf76a0a0bef25309348517cb1c4030",image="quay.io/weaveworks/scope@sha256:45be0490dba82f68a20faba8994cde307e9ace863a310196ba91401122bda4f8",name="k8s_collection_collection-3392593966-7nxpw_scope_e84e93da-865d-11e7-940d-12467a080e24_0",namespace="scope",pod_name="collection-3392593966-7nxpw"} 5411.72
container_cpu_user_seconds_total{container_name="exporter",id="/kubepods/besteffort/pod94ad7fd4-8351-11e7-9534-0a97ed59c75e/cabd4c16d300232a8b823bd5a9553816ff7f0830c6d91634651b4f723035664f",image="prom/memcached-exporter@sha256:b814aa209e2d5969be2ab4c65b5eda547ba657fd81ba47f48b980d20b14befb7",name="k8s_exporter_memcached-296817331-t3q5v_kube-system_94ad7fd4-8351-11e7-9534-0a97ed59c75e_0",namespace="kube-system",pod_name="memcached-296817331-t3q5v"} 142.5
container_cpu_user_seconds_total{container_name="exporter",id="/kubepods/besteffort/pode4c7eace-8352-11e7-9534-0a97ed59c75e/81fd164c5cc91a483b73ada15ce13f19d3171fc6beddc940fc2b6e747141905d",image="tomwilkie/nats_exporter@sha256:189354d9c966f94d9685009250dc360582baf02f76ecbaa2233e15cff2bc8f7f",name="k8s_exporter_nats-651776541-6vrk3_scope_e4c7eace-8352-11e7-9534-0a97ed59c75e_0",namespace="scope",pod_name="nats-651776541-6vrk3"} 107.62
container_cpu_user_seconds_total{container_name="fluentd-loggly",id="/kubepods/burstable/podcf18531c-8365-11e7-9534-0a97ed59c75e/6fe6a67e02419f47a21854b73734042c0d457d42704be4302356180e4f357935",image="quay.io/weaveworks/fluentd-loggly@sha256:19a02a2f8627573572cc2ee3c706aa4ccdab0f59c3a04e577d28035681d30ddc",name="k8s_fluentd-loggly_fluentd-loggly-z9jp4_monitoring_cf18531c-8365-11e7-9534-0a97ed59c75e_0",namespace="monitoring",pod_name="fluentd-loggly-z9jp4"} 17386.12
container_cpu_user_seconds_total{container_name="flux",id="/kubepods/besteffort/podd2b82b9c-8355-11e7-9534-0a97ed59c75e/d4ef6d20b97c7f0fefc9d13c0f4b94290eb661035bf21b7f07f38acdd18cb85d",image="quay.io/weaveworks/flux@sha256:e462c0a7c316f5986b3808360dc7c8c269466033c75a1b9553aa8175e02646f7",name="k8s_flux_fluxd-3608285890-x4bz7_kube-system_d2b82b9c-8355-11e7-9534-0a97ed59c75e_0",namespace="kube-system",pod_name="fluxd-3608285890-x4bz7"} 36097.96
container_cpu_user_seconds_total{container_name="fluxsvc",id="/kubepods/burstable/podcb5d3cc0-8364-11e7-9534-0a97ed59c75e/aa00624319b1a96a18e0a4717f13e7456e558fea8b84e2694dc8d2b168a44d3d",image="quay.io/weaveworks/fluxsvc@sha256:8d91991f2f6894def54afda4b4afb858b0502ed841a7188db48210b94bfdae4a",name="k8s_fluxsvc_fluxsvc-438909710-2jtz8_fluxy_cb5d3cc0-8364-11e7-9534-0a97ed59c75e_0",namespace="fluxy",pod_name="fluxsvc-438909710-2jtz8"} 897247.03
container_cpu_user_seconds_total{container_name="kube-proxy",id="/kubepods/burstable/pod7964f3e653196edee64f6bad72589dee/8d2eb34023eab40d08ba6e4be149e315c3844749f8321f44be2dcda024534757",image="gcr.io/google_containers/kube-proxy-amd64@sha256:dba7121df9f74b40901fb655053af369f58c82c3636d8125986ce474a759be80",name="k8s_kube-proxy_kube-proxy-ip-172-20-3-76.ec2.internal_kube-system_7964f3e653196edee64f6bad72589dee_1",namespace="kube-system",pod_name="kube-proxy-ip-172-20-3-76.ec2.internal"} 368.98
container_cpu_user_seconds_total{container_name="kured",id="/kubepods/besteffort/pod965b711b-8262-11e7-9534-0a97ed59c75e/12b3c19d2f114a6a111fdc0375bb0c27fb9e108c166e6f674aeddcd5178faa0b",image="weaveworks/kured@sha256:305b073cd3fff9ba0f21a570ee8a9c018d30274fc35045134164c762f44828e0",name="k8s_kured_kured-wp23j_kube-system_965b711b-8262-11e7-9534-0a97ed59c75e_1",namespace="kube-system",pod_name="kured-wp23j"} 5.91
container_cpu_user_seconds_total{container_name="logging",id="/kubepods/burstable/pod55af46fe-834c-11e7-9534-0a97ed59c75e/8d7e46f3d99d2f13b04b7e07a4f1062e82450f02f8f7f03c8fb33a83f0248857",image="quay.io/weaveworks/logging@sha256:63c4e6783884e6fcdd24026606756748e5913ab4978efa61ed09034074ddbe27",name="k8s_logging_authfe-1607895901-bjnd9_default_55af46fe-834c-11e7-9534-0a97ed59c75e_0",namespace="default",pod_name="authfe-1607895901-bjnd9"} 41780.76
container_cpu_user_seconds_total{container_name="memcached",id="/kubepods/besteffort/pod94ad7fd4-8351-11e7-9534-0a97ed59c75e/e5d81ddecc6a587e55491e837db3ed46f274e3b02c764f4d6d1ca2e6228fbe0c",image="memcached@sha256:00b68b00139155817a8b1d69d74865563def06b3af1e6fc79ac541a1b2f6b961",name="k8s_memcached_memcached-296817331-t3q5v_kube-system_94ad7fd4-8351-11e7-9534-0a97ed59c75e_0",namespace="kube-system",pod_name="memcached-296817331-t3q5v"} 222.96
container_cpu_user_seconds_total{container_name="nats",id="/kubepods/besteffort/pode4c7eace-8352-11e7-9534-0a97ed59c75e/511ce33319ecc50b928e3dda7025d643c310a5573d89596f89798496d9868342",image="nats@sha256:2dfb204c4d8ca4391dbe25028099535745b3a73d0cf443ca20a7e2504ba93b26",name="k8s_nats_nats-651776541-6vrk3_scope_e4c7eace-8352-11e7-9534-0a97ed59c75e_0",namespace="scope",pod_name="nats-651776541-6vrk3"} 44.25
container_cpu_user_seconds_total{container_name="prom-node-exporter",id="/kubepods/besteffort/pod5f43c843-7db5-11e7-9534-0a97ed59c75e/1ceb1514b5339c67c70ec37d609d361d5ba656ee3697a12de0918f9902d0a134",image="weaveworks/node_exporter@sha256:4f0c14e89da784857570185c4b9f57acb20f4331ef10e013731ac9274243a5a8",name="k8s_prom-node-exporter_prom-node-exporter-w4nvq_monitoring_5f43c843-7db5-11e7-9534-0a97ed59c75e_1",namespace="monitoring",pod_name="prom-node-exporter-w4nvq"} 707.54
container_cpu_user_seconds_total{container_name="prom-run",id="/kubepods/besteffort/pod6b2e45d7-7db5-11e7-9534-0a97ed59c75e/75468eaf52cf3577dbb462d586fc5aa49a3f5a151fb668a734f8e99f825c1fc5",image="quay.io/weaveworks/docker-ansible@sha256:452d1249e40650249beb700349c7deee26c15da2621e8590f3d56033babb890b",name="k8s_prom-run_reboot-required-rn9h4_monitoring_6b2e45d7-7db5-11e7-9534-0a97ed59c75e_1",namespace="monitoring",pod_name="reboot-required-rn9h4"} 70.57
container_cpu_user_seconds_total{container_name="prometheus",id="/kubepods/burstable/podc7af9dff-8364-11e7-9534-0a97ed59c75e/e4e3b4f6285c9a12415f347aadbf150c6d782e6b881d2701d4257bf3a4de2651",image="prom/prometheus@sha256:4bf7ad89d607dd8de2f0cff1df554269bff19fe0f18ee482660f7a5dc685d549",name="k8s_prometheus_prometheus-2177618048-kgczb_monitoring_c7af9dff-8364-11e7-9534-0a97ed59c75e_0",namespace="monitoring",pod_name="prometheus-2177618048-kgczb"} 438158.08
container_cpu_user_seconds_total{container_name="scope-probe",id="/kubepods/burstable/pod53559243-7db5-11e7-9534-0a97ed59c75e/e57413febbcc1c28321ccb99df3bf30b9d6555a1db62b743d1b4ee877f23346b",image="quay.io/weaveworks/scope@sha256:bc6ee4a4a568f8075573a8ac44c27759307fce355c22ad66acb1e944b6361b62",name="k8s_scope-probe_scope-probe-master-3cktj_kube-system_53559243-7db5-11e7-9534-0a97ed59c75e_1",namespace="kube-system",pod_name="scope-probe-master-3cktj"} 278471.28
container_cpu_user_seconds_total{container_name="watch",id="/kubepods/burstable/podc7af9dff-8364-11e7-9534-0a97ed59c75e/fe6cdaa2c542c90cbca951cd97952d35c8c42fcd5e8f452030369a98e27c9b3f",image="weaveworks/watch@sha256:bb113953e19fff158de017c447be337aa7a3709c3223aeeab4a5bae50ee6f159",name="k8s_watch_prometheus-2177618048-kgczb_monitoring_c7af9dff-8364-11e7-9534-0a97ed59c75e_0",namespace="monitoring",pod_name="prometheus-2177618048-kgczb"} 0.1

Different metrics in the same scrape will be fine, e.g. container_fs_inodes_free

bboreham commented 7 years ago

I think I figured out what is going wrong.

The function DefaultContainerLabels() conditionally adds various metric labels from container labels - name, image, etc. When used inside kubelet this function is containerPrometheusLabels() but essentially the same.

However, when it receives the metrics, Prometheus checks that all metrics in the same family have the same label set, and rejects those that do not.

Since containers are collected in (somewhat) random order, depending on which kind is seen first you get one set of metrics or the other.

Changing the container labels function to always add the same set of labels, adding "" when it doesn't have a real value, eliminates the issue in my testing.

dashpole commented 7 years ago

Thanks @bboreham! Can you submit a PR with your fix? I will try and get this in the 1.8 release.

matthiasr commented 7 years ago

@dashpole this also needs to be fixed in the Kubernetes 1.7.x series, or it will be impossible to collect useful container metrics for anyone who relies on the Prometheus format.

If the real fix is too complex for a cherry-pick, there is an option that can be passed when initialising the Prometheus client that turns off these validations and restores the previous behaviour. Of course, not producing invalid metrics in the first place is preferable.

Shouldn't these errors have shown up in log files all over the place?

matthiasr commented 7 years ago

Related discussions are ongoing in kubernetes/kube-state-metrics#194 – I thought there is a knob but in reality it's just using a hacked up Prometheus client. I think @bboreham's fix is the right way to go and should be cherry-picked both onto the cAdvisor and Kubernetes release branches.

dashpole commented 7 years ago

@matthiasr I can cherrypick this to 1.7. We had some errors that were introduced when we updated to prometheus v0.8.0 (#1680), but were not sure what the root cause was until now. Because the checks were introduced recently, we couldn't point to a change in cAdvisor that caused the inconsistency, and haven't had a chance to look into this myself yet.

matthiasr commented 7 years ago

Aha, another knob. Looks like #1679 is not sufficient, since the issue still persists?

dashpole commented 7 years ago

Yes, I think that just made it report an incomplete set of metrics instead of none at all.

bboreham commented 7 years ago

Actually it looks like the function in cAdvisor is more complicated, as it copies all Docker labels, etc.

I will make a PR for kubelet.

To clarify, so far my testing has been in a stand-alone program bringing in parts of kubelet to find out what it was doing.

dashpole commented 7 years ago

It looks like #51473 fixes this in kubernetes, and ill cherrypick it into the 1.7 branch. However, this wont fix it for stand-alone cAdvisor, as that uses the DefaultContainerLabels function. It would appear that exposing container labels as prometheus labels is considered an anti-patern. I am not quite sure what the best way forward is in that respect.

dashpole commented 7 years ago

seems like InfluxDB has similar issues: https://github.com/google/cadvisor/issues/1730

bboreham commented 7 years ago

I know nothing of stand-alone cAdvisor usage. Are there any users reading this?

Could we have a pre-defined set of container labels which are copied as metric labels?

dashpole commented 7 years ago

once we have the label whitelist #1730, we can ensure all whitelisted labels are present as metric labels, or set them as ""

matthiasr commented 7 years ago

But would that whitelist be mandatory?

Even for an unbounded set of labels, I think the following would work:

  1. collect all label names
  2. initialize a map with these names as keys, all values as empty strings
  3. for each metric
    1. make a copy of the map
    2. in the copy, set the values for each label that has one
    3. generate the metrics with all labels, empty or not

This way, the consistency condition will be fulfilled, but because the Prometheus server actually treats empty values the same as the label not being present, maintains current behavior at query time.

This is a bit inefficient (multiple passes over the data, lots of copies and allocations), maybe someone can come up with a better solution?

On Tue, Aug 29, 2017, 00:48 David Ashpole notifications@github.com wrote:

once we have the label whitelist #1730 https://github.com/google/cadvisor/issues/1730, we can ensure all whitelisted labels are present as metric labels, or set them as ""

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/cadvisor/issues/1704#issuecomment-325504774, or mute the thread https://github.com/notifications/unsubscribe-auth/AAICBg21L0iGbsqT6Bfg3b-CtcIKLllsks5sc0O8gaJpZM4Oixos .

dashpole commented 7 years ago

Does the set of labels need to be consistent across time, or just at a single point in time? That would only work if we don't need consistency across time.

matthiasr commented 7 years ago

Only at each point in time.

On Wed, Aug 30, 2017, 23:46 David Ashpole notifications@github.com wrote:

Does the set of labels need to be consistent across time, or just at a single point in time? That would only work if we don't need consistency across time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/cadvisor/issues/1704#issuecomment-326128466, or mute the thread https://github.com/notifications/unsubscribe-auth/AAICBlgyW0sZuqfsBYCgmZkdWFefPU61ks5sddgwgaJpZM4Oixos .

bboreham commented 7 years ago

It's only checked by the Prometheus client library at each scrape, but surely the desire from Prometheus is that labelling be consistent over all samples for the same metric?

It's not even "across time"; consider two samples from different machines taken at the same time.

matthiasr commented 7 years ago

In principle, yes, but there's a limit to how far into the future you can predict which labels there will be. Things get wonky if this changes all the time, and depending on the queries potentially at the point of change, but at some point things do need to change.

An approach that is pretty common (and that kube-state-metrics uses) is to contain the variable label sets in a separate "foo_labels" metric. This metric would need to deal with this variability but all the other metrics would have a fixed set of labels. This pushes the responsibility for getting labels and actual metrics together to query time, with the hope that at that point you know which labels you want. This kind of joining is possible in Prometheus, but I don't know if there are other systems that consume this endpoint; and if you want to do this kind of fundamental change at all.

beorn7 commented 7 years ago

So yeah, this is fundamentally the same problem as the kube-state-metric issue referenced above.

I see three possible paths to a solution:

  1. The “technically correct” solution (the best kind of correct ;o): Munge all the container labels into a single Prometheus label with some syntax convention, like @matthiasr suggested above, e.g. `container_labels = "foo:bar,dings:bums". This has the problem that you would need quite involved relabeling rules on the Prometheus side to extract the labels you need. And it would need changes of already established procedures around usage of container labels (or the Kubernetes labels from kube-state-metrics).
  2. The pragmatic solution: Similar to what I proposed for https://github.com/kubernetes/kube-state-metrics/pull/194 , client_golang could provide a LabelFixingRegistry that auto-adds missing labels with empty strings, i.e. if some metrics have foo="bar" but others don't, it would attach foo="" to those other metrics. A variant on this would be to not provide this tooling in client_golang and ask the user of the package to implement it themself, as already suggested above. I believe the latter would be in line with what @brian-brazil said in https://github.com/kubernetes/kube-state-metrics/pull/194, namely that we should not make it easy for users to create labels with essentially inconsistent label dimensions.
  3. The “embrace badness of the world” solution: We could just give up and assume that there will always be label inconsistencies somewhere. Prometheus already deals with that by assuming all missing labels actually do exist but with an empty string as their value. What I suggested in the previous item would, ironically, have exactly the same effect on the metrics stored in Prometheus as just leaving out the labels with empty label values. However, we do require label consistency within a single scrape right now (at least that's what the current registry in client_golang implements). So we needed to give up on that officially. I would still make it an explicit opt-in, i.e. you needed to use a LenientRegistry that would allow label inconsistencies. The downside over the previous bullet point is that we needed to change our contract about exposition. Also, the empty-valued labels created by the solution in the previous bullet points would be a nice marker of where the inconsistencies happen.

Looking forward to feedback. If we go for a solution that required either a LenientRegistry or a LabelFixingRegistry, this tooling needed to be provided in client_golang, which most likely boils down to myself coding it.

brian-brazil commented 7 years ago

However, when it receives the metrics, Prometheus checks that all metrics in the same family have the same label set, and rejects those that do not.

This sounds like a bug on the Prometheus side. This should cause the whole scrape to fail, not silently drop metrics. Partial data is to be avoided.

Munge all the container labels into a single Prometheus label with some syntax convention, like @matthiasr suggested above, e.g. `container_labels = "foo:bar,dings:bums"

I don't think that's a great idea, labels should be represented as labels and we generally try to dissuade users from building up structure inside label values. Having non-trivial relabelling rules doesn't really help anyone.

A variant on this would be to not provide this tooling in client_golang and ask the user of the package to implement it themself, as already suggested above.

Yes, this is what I'd go for. It shouldn't be too many lines of code. Likely most of these labels should also be moved to an per-container _info metric rather than being on all time series.

I believe the latter would be in line with what @brian-brazil said in kubernetes/kube-state-metrics#194, namely that we should not make it easy for users to create labels with essentially inconsistent label dimensions.

Yes, the client library guidelines are very clear about not allowing this for direct instrumentation, so it'd be in the spirit of the guidelines not to allow this given that Go already does label consistency checks. The Go client is the only client currently checking for this sort of inconsistency, though with the 2.0 scrape parser being laxer I can see other clients starting to have some checks to make up for that.

beorn7 commented 7 years ago

OK, nobody likes approach 1. Fair enough. @matthiasr also told me, that was not what he meant. My bad for not reading carefully enough.

However, when it receives the metrics, Prometheus checks that all metrics in the same family have the same label set, and rejects those that do not.

This sounds like a bug on the Prometheus side. This should cause the whole scrape to fail, not silently drop metrics. Partial data is to be avoided.

I guess, “Prometheus” above means “the Prometheus client library”. The default behavior is indeed to fail the whole scrape, but you can explicitly set a continue on error behavior to still serve as many metrics as possible, see https://godoc.org/github.com/prometheus/client_golang/prometheus/promhttp#HandlerErrorHandling .

I like the approach of a …_info metric with all those container labels instead of assigning them everywhere. That's also how Kubernetes labels are handled. However, this approach is orthogonal to solving the consistency problem (it just reduces it to only one metric family).

Let's figure out if we want support for this in client_golang at all. If not, you know what to do in cAdvisor, and the same has to happen in kube-state-metrics. I'll document that approach in client_golang then.

If we feel we should have support in client_golang, the question would be between LenientRegistry (easy to implement, five lines of code or something) or the LabelFixingRegistry (slightly more complicated). In any case, it would be opt-in with a lot of warning signs attached to it.

matthiasr commented 7 years ago

I'm for the LabelFixingRegistry in the library. We already have two concrete examples at hand that need this, and if we don't solve it generally we'll just end up with several badly copy-pasted versions of the same thing.

On Thu, Aug 31, 2017, 17:35 Björn Rabenstein notifications@github.com wrote:

OK, nobody likes approach 1. Fair enough. @matthiasr https://github.com/matthiasr also told me, that was not what he meant. My bad for not reading carefully enough.

However, when it receives the metrics, Prometheus checks that all metrics in the same family have the same label set, and rejects those that do not.

This sounds like a bug on the Prometheus side. This should cause the whole scrape to fail, not silently drop metrics. Partial data is to be avoided.

I guess, “Prometheus” above means “the Prometheus client library”. The default behavior is indeed to fail the whole scrape, but you can explicitly set a continue on error behavior to still serve as many metrics as possible, see https://godoc.org/github.com/prometheus/client_golang/prometheus/promhttp#HandlerErrorHandling .

I like the approach of a …_info metric with all those container labels instead of assigning them everywhere. That's also how Kubernetes labels are handled. However, this approach is orthogonal to solving the consistency problem (it just reduces it to only one metric family).

Let's figure out if we want support for this in client_golang at all. If not, you know what to do in cAdvisor, and the same has to happen in kube-state-metrics. I'll document that approach in client_golang then.

If we feel we should have support in client_golang, the question would be between LenientRegistry (easy to implement, five lines of code or something) or the LabelFixingRegistry (slightly more complicated). In any case, it would be opt-in with a lot of warning signs attached to it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/cadvisor/issues/1704#issuecomment-326333626, or mute the thread https://github.com/notifications/unsubscribe-auth/AAICBi8dhW_mlC2OSpkig4TpAkR46-Luks5sdtKrgaJpZM4Oixos .

brian-brazil commented 7 years ago

I guess, “Prometheus” above means “the Prometheus client library”. The default behavior is indeed to fail the whole scrape, but you can explicitly set a continue on error behavior to still serve as many metrics as possible, see

Yes, that's what I meant. In that case it's a Cadvisor bug that it sets ContinueOnError rather than using the default HTTPErrorOnError as that was hiding this problem. This was introduced in #1679.

However, this approach is orthogonal to solving the consistency problem (it just reduces it to only one metric family).

Agreed.

Let's figure out if we want support for this in client_golang at all.

I would say no, there are only two use cases so far which I don't think is enough. Even with warning signs users will use it where it doesn't apply, just like ContinueOnError was used here to paper a over problem rather than fixing it.

I've implemented related code in the past by hand, it's not particularly complicated to write. It's standard data munging.

If we feel we should have support in client_golang, the question would be between LenientRegistry (easy to implement, five lines of code or something

If we go for it I'd go for this, but it'd feel weird that there'd now be the default registry settings, the lenient registry and the pedantic registry.

beorn7 commented 7 years ago

cAdvisor devs, how do you feel about implementing the "fill up with empty-valued labels to reach label consistency" as done in https://github.com/vladimirvivien/kubernetes/commit/8935d66160f5a53306c914c57f718aad58a8b508 ?

@brancz as the main https://github.com/kubernetes/kube-state-metrics/ dev, how do you feel about implementing it in parallel?

Just trying to test the waters if we want/need support in the Prometheus client_golang for that.

mfournier commented 7 years ago

For those stuck on 0.25.0 because of this issue, I've cherry-picked (04fc089) the patch to kube-state-metrics mentioned above (https://github.com/google/cadvisor/issues/1704#issuecomment-325418911) onto cadvisor's local copy of client_golang/prometheus/registry.go. This simply voids the labels consistency checking introduced in 0.26.0. I also pushed an image with the workaround to docker.io/camptocamp/cadvisor:v0.27.1_with-workaround-for-1704

NB: this is merely a workaround until a proper fix is available in a release !

ghost commented 6 years ago

We're observing the same behavior with version 0.27.0 and Docker 17.06.1. Metrics always contain cAdvisor, alertmanager and Prometheus, but every couple of minutes, our applications' containers metrics are missing. Could you please update if (and when) a fix would be available? @mfournier workaround URL is broken. Thanks.

beorn7 commented 6 years ago

After several discussions I had with various people, I came to the conclusion we want to support "label filling" within the Prometheus Go client. You can track progress here: https://github.com/prometheus/client_golang/issues/355

brian-brazil commented 6 years ago

I've looked into this, and there looks to be a simpler solution.

I believe that using the approach at https://github.com/kubernetes/kubernetes/pull/51473 in Cadvisor would be sufficient to resolve the issue here. That is in DefaultContainerLabels produce an empty string for the missing labels.

Is there something I'm missing?

brian-brazil commented 6 years ago

Ah, I see. It's the container.Spec.Labels and container.Spec.Envs which need extra handling.

brian-brazil commented 6 years ago

I've put together https://github.com/google/cadvisor/pull/1831 which I believe will fix this.

dashpole commented 6 years ago

The fix is released in version v0.28.3