google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
17.11k stars 2.32k forks source link

cAdvisor constantly polls data that has been disabled #2897

Closed eero-t closed 3 years ago

eero-t commented 3 years ago

Setup:

Use-case:

Expected output:

Actual output:

eero-t commented 3 years ago

If I don't mount "/dev/kmsg" cAdvisor complains at start about not being able to read something which I had disabled ("oom_event"): W0623 16:50:21.655644 1 manager.go:289] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory

PS. I actually want cAdvisor to have access only to sysfs, with no capabilities nor root i.e: --cap-drop ALL --user 65534:65534 --volume=/sys:/sys:ro and it seems to be working ok, but it still does these extra reads.

Creatone commented 3 years ago

I believe that this list is used only for Prometheus: https://github.com/google/cadvisor/blob/master/metrics/prometheus.go#L108

cAdvisor still collects these data but not serving metrics.

I'll try to change it :)

Creatone commented 3 years ago

The reason for that is reading all cgroup stats, then keep only enabled.

https://github.com/google/cadvisor/blob/fc235468d30b09cb6b1a9339a464e523b38d3c77/container/libcontainer/handler.go#L83-L93

The GetStats() came from runc/libcontainer package. I'll try to introduce an internal function that will take into consideration enabled metrics.

eero-t commented 3 years ago

Actually the issue I'm more worried about is there being no support for disabling rest of the metrics, ones that are redundant because kubelet already provides them (with its own vendored cAdvisor version).

And it seems that there are no options to disable extra endpoints (e.g. JSON API ones, if one is interested only on Prometheus metrics), like kubelet has. IMHO that is less of an issue that the redundant metrics though.

There are other tickets which touch on these subjects though, so I was hesitant to add my own for them...

eero-t commented 3 years ago

https://github.com/google/cadvisor/pull/2900 changes look good to me. I'll test them later (hopefully today).

eero-t commented 3 years ago

Looks much better with #2900 (this is all files accessed during 1 minute):

# strace -f -e openat -p $(pidof cadvisor) 2> cadvisor-trace.txt
^C
# grep openat cadvisor-trace.txt | grep -v "openat resumed" | cut -d'"' -f2 | sed 's%^.*/%%' | sort | uniq -c | sort -nr
   2002 cpu.shares
   2002 cpu.cfs_quota_us
   2002 cpu.cfs_period_us
   1976 pids.max
    134 system.slice
    133 wpa_supplicant.service
    133 user.slice
    133 upower.service
    133 udisks2.service
    133 thermald.service
    133 system-systemd\\x2dfsck.slice
    133 system-modprobe.slice
    133 system-getty.slice
    133 systemd-udevd.service
    133 systemd-timesyncd.service
    133 systemd-resolved.service
    133 systemd-logind.service
    133 systemd-journald.service
    133 switcheroo-control.service
    133 ssh.service
    133 rtkit-daemon.service
    133 rsyslog.service
    133 rpc-statd.service
    133 rpcbind.socket
    133 rpcbind.service
    133 rngd.service
    133 polkit.service
    133 NetworkManager.service
    133 networkd-dispatcher.service
    133 ModemManager.service
    133 lightdm.service
    133 kubepods.slice
    133 kubepods-burstable.slice
    133 kubepods-burstable-podf764b2f6_7ccc_467f_a506_d83705ab75d5.slice
    133 kubepods-burstable-podde1b6d0f_d0b8_45ee_b392_fae13ffd25f2.slice
    133 kubepods-burstable-podbb570f97_21cd_4a07_aed1_2db2111543e7.slice
    133 kubepods-besteffort.slice
    133 kubepods-besteffort-podf5976133_f5d4_46e1_9033_e65b64ebb6fd.slice
    133 kubepods-besteffort-podaa345d81_8b41_42f2_866a_aeaab745c79a.slice
    133 kubepods-besteffort-poda641626f_2edd_4b20_bf1a_534530313705.slice
    133 kubepods-besteffort-pod94ed4ee4_5e0b_4882_a19f_17393cfdf6ce.slice
    133 kubepods-besteffort-pod74a0e475_3bd6_452c_b6bd_705fd68cc204.slice
    133 kubepods-besteffort-pod64452191_4eee_4aac_b027_902fe377d294.slice
    133 kubepods-besteffort-pod51bb93fa_b1f7_4586_b9d2_0888026354c6.slice
    133 kubepods-besteffort-pod492f22f8_d5f9_4542_8976_b490f1539ee0.slice
    133 kubelet.service
    133 irqbalance.service
    133 docker.socket
    133 docker.service
    133 dbus.service
    133 cron.service
    133 containerd.service
    133 colord.service
    133 avahi-daemon.service
    133 accounts-daemon.service
     54 cpu,cpuacct
     52 stat
     52 os-release
     52 meminfo
     27 gpu
     27 devices
     26 limits
     26 fd
      4 kubepods-burstable-podbb570f97_21cd_4a07_aed1_2db2111543e7.slice:cri-containerd:d379fba6e1befb1df3d4ab201c97382c0fa8f5ae50eeaef8b21f7c056302752e
      4 kubepods-burstable-podbb570f97_21cd_4a07_aed1_2db2111543e7.slice:cri-containerd:bcbd176f8fd124740b60e8a806fd0114ca1168a639e839ef8be51230f97ee5d5
      4 kubepods-besteffort-podaa345d81_8b41_42f2_866a_aeaab745c79a.slice:cri-containerd:573e49a970a6bb43d1a1c0af196e8f4abe098dcda51f9bcf51682802be8f4757
      4 kubepods-besteffort-podaa345d81_8b41_42f2_866a_aeaab745c79a.slice:cri-containerd:4e03e0ca2d9e15492f301b8c9f52b87b0ac5996136d46c09f8173655ee98a3eb
      4 kubepods-besteffort-poda641626f_2edd_4b20_bf1a_534530313705.slice:cri-containerd:f2232a0b70bbb8b21dc89361f19784e214e3fcf1e5384360c29ed14a8e3739ef
      4 kubepods-besteffort-poda641626f_2edd_4b20_bf1a_534530313705.slice:cri-containerd:85b8b47e77b49b15d6c28a15ab7236b84218cced1cce0850784e227100884ab5
      4 kubepods-besteffort-pod94ed4ee4_5e0b_4882_a19f_17393cfdf6ce.slice:cri-containerd:714e5cbdce066b9efc5189a4fe08b3fbc0619c564546820a3f0822d0593ff9b1
      4 kubepods-besteffort-pod94ed4ee4_5e0b_4882_a19f_17393cfdf6ce.slice:cri-containerd:19ff66188a3110cbd1569ab2187d84cc0938a8aa7e8632365bef255e0597b902
      4 kubepods-besteffort-pod492f22f8_d5f9_4542_8976_b490f1539ee0.slice:cri-containerd:e7b531d0e5758213b7a4e811829f8928eff747484fba81b1bb4e720829c60614
      4 kubepods-besteffort-pod492f22f8_d5f9_4542_8976_b490f1539ee0.slice:cri-containerd:bb49bd2162bd63f23bcd4d76ff27b23193082109d7f5be323b63e7981eb08030
      4 kubepods-besteffort-pod492f22f8_d5f9_4542_8976_b490f1539ee0.slice:cri-containerd:563a3e209fd4aaef791496b6202894f5fd270261490de4cbb17f23038260fa51
      3 sys-kernel-tracing.mount
      3 sys-kernel-debug.mount
      3 sys-kernel-config.mount
      3 sys-fs-fuse-connections.mount
      3 run-rpc_pipefs.mount
      3 kubepods-burstable-podf764b2f6_7ccc_467f_a506_d83705ab75d5.slice:cri-containerd:b9d5806912a7f8147b0b1682988122043e83f13e1a0e48692194ba6688a13567
      3 kubepods-burstable-podf764b2f6_7ccc_467f_a506_d83705ab75d5.slice:cri-containerd:3342463ca3fca5320b3edde1ba722390f5093388b6abf7f75f5c0bd9198b52e0
      3 kubepods-burstable-podf764b2f6_7ccc_467f_a506_d83705ab75d5.slice:cri-containerd:2fc9a01728091828187e97e8773e17961559a65edf85d35ca186fb443f7e41f4
      3 kubepods-burstable-podde1b6d0f_d0b8_45ee_b392_fae13ffd25f2.slice:cri-containerd:d4b486fdacb92259464d0aa8c87963837779c6cecf8c4bf7a8263ca43b2394cc
      3 kubepods-burstable-podde1b6d0f_d0b8_45ee_b392_fae13ffd25f2.slice:cri-containerd:6125d4d388bd7474fb4179ca07338b78c036f89124303236f432ace0c264cea4
      3 kubepods-burstable-podde1b6d0f_d0b8_45ee_b392_fae13ffd25f2.slice:cri-containerd:01611422d4d25c5200d7307fffad7925b5a71fb46586c929a46d900ebe71203b
      3 kubepods-besteffort-podf5976133_f5d4_46e1_9033_e65b64ebb6fd.slice:cri-containerd:af89e80c9dc0460aaa99cc9b614264d2df6bca91d8f906f88fd06546a01ad7d1
      3 kubepods-besteffort-podf5976133_f5d4_46e1_9033_e65b64ebb6fd.slice:cri-containerd:1c1a013377bb85563fa10a62bc994fe2eae142f396f78374f751ea15930483d0
      3 kubepods-besteffort-pod74a0e475_3bd6_452c_b6bd_705fd68cc204.slice:cri-containerd:dfb211ab4fd56dbc3c3284e35068862ace7e890ab16676bf9772e28af84aa83c
      3 kubepods-besteffort-pod74a0e475_3bd6_452c_b6bd_705fd68cc204.slice:cri-containerd:501de36cdfc9ad8ec7fd7a35b25a0b6a3c7f8f9dd5b0cdca802a920cd04bee86
      3 kubepods-besteffort-pod64452191_4eee_4aac_b027_902fe377d294.slice:cri-containerd:5a78c22f268c5f3158b177d349275bc948c6429cce8e269ce77e856e62ebf531
      3 kubepods-besteffort-pod64452191_4eee_4aac_b027_902fe377d294.slice:cri-containerd:35a072c279da2debb7264522f5ac6e1d68adfb47c90b44124c7c3f49664a068c
      3 kubepods-besteffort-pod51bb93fa_b1f7_4586_b9d2_0888026354c6.slice:cri-containerd:768487bafd91b81dd752eb2b79da9a1f5e422b8504bbb09cc0d22a4faf60c055
      3 kubepods-besteffort-pod51bb93fa_b1f7_4586_b9d2_0888026354c6.slice:cri-containerd:63fe9be1dabb42c6a84e8662a191d6934126640bc6ba119216691909b3028dc5
      3 dev-mqueue.mount
      3 dev-hugepages.mount
      3 boot-efi.mount

While it's not part of this bug, I'm wondering does cAdvisor really expect that OS release could change every second? All OS updates I've done, have taken minutes, not seconds, even from local 100MB/s mirror to SSD...

I don't map host rootfs to container i.e. OS release and service updates happen only when the image itself is updated, so it would be good to get rid of that kind of (redundant to me) CPU overhead too.

Should I file a new bug for that?