google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.92k stars 2.31k forks source link

Fails startup on RHEL: Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory #1843

Open davidkarlsen opened 6 years ago

davidkarlsen commented 6 years ago

docker info:

[root@alp-aot-ccm10 ~]# docker info
Containers: 8
 Running: 7
 Paused: 0
 Stopped: 1
Images: 10
Server Version: 1.12.6
Storage Driver: overlay
 Backing Filesystem: xfs
Logging Driver: syslog
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: null host overlay bridge
 Authorization: rhel-push-plugin
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Security Options: seccomp
Kernel Version: 3.10.0-693.11.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 4
Total Memory: 15.51 GiB
Name: alp-aot-ccm10
ID: QF2M:VGHP:7RGI:D64L:XDJ5:SXXE:CKZP:Y4MS:MT24:JLCE:FEXS:JQ6O
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://myregistry.com:8085/v1/
Insecure Registries:
 127.0.0.0/8
Registries: myregistry.com:8085 (secure), registry.access.redhat.com (secure), docker.io (secure)

docker cmd:

docker run --name cadvisor  --volume=/:/rootfs:ro --volume=/var/run:/var/run:rw --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro -p 8080:8080 -eSERVICE_TAGS=prom_monitored google/cadvisor:v0.28.3 >& /tmp/cadvisor.log

log: https://www.pastefile.com/WJ7uuP

Looks a lot like:

With v0.28.0 it will actually start - but spit out a number of error messages like:

W1217 23:18:37.497988       1 container.go:358] Failed to create summary reader for "/system.slice/dev-disk-by\\x2did-dm\\x2duuid\\x2dLVM\\x2d1muAsoqiIpZiueG0ikoq6e8KZAJ9kdGiFrEWVCang9KEuR5dfGaq8RSFx6Xiq9fS.swap": none of the resources are being tracked.
W1217 23:18:37.498174       1 container.go:358] Failed to create summary reader for "/system.slice/dev-disk-by\\x2duuid-37d4e587\\x2d0f8a\\x2d4256\\x2d91df\\x2dffa294eae1de.swap": none of the resources are being tracked.
W1217 23:18:37.499201       1 container.go:358] Failed to create summary reader for "/system.slice/sysstat.service": none of the resources are being tracked.
W1217 23:18:37.499379       1 container.go:358] Failed to create summary reader for "/system.slice/systemd-random-seed.service": none of the resources are being tracked.
W1217 23:18:37.499484       1 container.go:358] Failed to create summary reader for "/system.slice/systemd-vconsole-setup.service": none of the resources are being tracked.
W1217 23:18:37.499827       1 container.go:358] Failed to create summary reader for "/system.slice/dev-mapper-rootvg\\x2dswap3_lv.swap": none of the resources are being tracked.
W1217 23:18:37.499936       1 container.go:358] Failed to create summary reader for "/system.slice/systemd-journal-catalog-update.service": none of the resources are being tracked.
W1217 23:18:37.500544       1 container.go:358] Failed to create summary reader for "/system.slice/abrt-ccpp.service": none of the resources are being tracked.
W1217 23:18:37.500693       1 container.go:358] Failed to create summary reader for "/system.slice/dev-disk-by\\x2did-dm\\x2dname\\x2drootvg\\x2dswap2_lv.swap": none of the resources are being tracked.
W1217 23:18:37.500797       1 container.go:358] Failed to create summary reader for "/system.slice/kmod-static-nodes.service": none of the resources are being tracked.
W1217 23:18:37.502601       1 container.go:358] Failed to create summary reader for "/system.slice/network.service": none of the resources are being tracked.
W1217 23:18:37.502738       1 container.go:358] Failed to create summary reader for "/system.slice/system-lvm2\\x2dpvscan.slice": none of the resources are being tracked.
W1217 23:18:37.507801       1 container.go:358] Failed to create summary reader for "/system.slice/system-systemd\\x2dfsck.slice": none of the resources are being tracked.
W1217 23:18:37.507960       1 container.go:358] Failed to create summary reader for "/system.slice/systemd-sysctl.service": none of the resources are being tracked.
W1217 23:18:37.508358       1 container.go:358] Failed to create summary reader for "/system.slice/systemd-update-done.service": none of the resources are being tracked.
W1217 23:18:37.508473       1 container.go:358] Failed to create summary reader for "/system.slice/dev-disk-by\\x2did-dm\\x2duuid\\x2dLVM\\x2d1muAsoqiIpZiueG0ikoq6e8KZAJ9kdGiCFD2klFSb7FQHBT8TYjNh6UpLHplJWB6.swap": none of the resources are being tracked.
W1217 23:18:37.508677       1 container.go:358] Failed to create summary reader for "/system.slice/dev-dm\\x2d0.swap": none of the resources are being tracked.
I1217 23:18:37.508726       1 manager.go:327] Recovery completed
I1217 23:18:37.561849       1 cadvisor.go:159] Starting cAdvisor version: v0.28.0-3d2e7fc on port 8080
vikaschoudhary16 commented 6 years ago

If you dont use --volume=/sys:/sys:ro then it starts fine. Another solution is:

mount -o remount,rw '/sys/fs/cgroup' ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu

tn-osimis commented 6 years ago

Confirming:

# (. /etc/os-release && echo $PRETTY_NAME)
CentOS Linux 7 (Core)
# docker-compose run --rm cadvisor --version
cAdvisor version v0.28.3 (1e567c2)
# docker-compose run --rm cadvisor
I0202 14:32:55.337212       1 storagedriver.go:50] Caching stats in memory for 2m0s
I0202 14:32:55.338005       1 manager.go:151] cAdvisor running in container: "/sys/fs/cgroup/cpuacct,cpu"
I0202 14:32:55.341946       1 fs.go:139] Filesystem UUIDs: map[203575e2-7c86-408b-b2e0-59df18bba2fb:/dev/sda1 2fbf3bea-7fde-4f32-8ac9-e9402a07da5d:/dev/sdc 390dbbea-ae34-44e6-a0fe-1b0c8a061827:/dev/sdb1 bf383770-7408-48df-b204-d408f67e439b:/dev/sda2]
I0202 14:32:55.341976       1 fs.go:140] Filesystem partitions: map[/dev/sdb1:{mountpoint:/rootfs/mnt/resource major:8 minor:17 fsType:ext4 blockSize:0} shm:{mountpoint:/dev/shm major:0 minor:123 fsType:tmpfs blockSize:0} tmpfs:{mountpoint:/dev major:0 minor:127 fsType:tmpfs blockSize:0} /dev/sda2:{mountpoint:/var/lib/docker major:8 minor:2 fsType:xfs blockSize:0} /dev/sda1:{mountpoint:/rootfs/boot major:8 minor:1 fsType:xfs blockSize:0}]
I0202 14:32:55.346862       1 manager.go:225] Machine: {NumCores:4 CpuFrequency:2394447 MemoryCapacity:16809521152 HugePages:[{PageSize:1048576 NumPages:0} {PageSize:2048 NumPages:0}] MachineID:d1085630399a48c6b29cf2e1de0eb5f4 SystemUUID:9EE4C12D-E3F9-114B-A81E-1D9E29FFBFAB BootID:bce6d330-b70d-4534-9ef8-6c9702c66abe Filesystems:[{Device:/dev/sdb1 DeviceMajor:8 DeviceMinor:17 Capacity:33685192704 Type:vfs Inodes:2097152 HasInodes:true} {Device:shm DeviceMajor:0 DeviceMinor:123 Capacity:67108864 Type:vfs Inodes:2051943 HasInodes:true} {Device:overlay DeviceMajor:0 DeviceMinor:118 Capacity:31671447552 Type:vfs Inodes:15472128 HasInodes:true} {Device:tmpfs DeviceMajor:0 DeviceMinor:127 Capacity:8404758528 Type:vfs Inodes:2051943 HasInodes:true} {Device:/dev/sda2 DeviceMajor:8 DeviceMinor:2 Capacity:31671447552 Type:vfs Inodes:15472128 HasInodes:true} {Device:/dev/sda1 DeviceMajor:8 DeviceMinor:1 Capacity:520785920 Type:vfs Inodes:256000 HasInodes:true}] DiskMap:map[8:16:{Name:sdb Major:8 Minor:16 Size:34359738368 Scheduler:deadline} 8:32:{Name:sdc Major:8 Minor:32 Size:139586437120 Scheduler:deadline} 2:0:{Name:fd0 Major:2 Minor:0 Size:4096 Scheduler:deadline} 8:0:{Name:sda Major:8 Minor:0 Size:32212254720 Scheduler:deadline}] NetworkDevices:[{Name:br-bbac878bbbfb MacAddress:02:42:ec:de:0b:71 Speed:0 Mtu:1500} {Name:eth0 MacAddress:00:0d:3a:b3:75:01 Speed:40000 Mtu:1500}] Topology:[{Id:0 Memory:17179402240 Cores:[{Id:0 Threads:[0 1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]} {Id:1 Threads:[2 3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]}] Caches:[]}] CloudProvider:Azure InstanceType:Unknown InstanceID:9EE4C12D-E3F9-114B-A81E-1D9E29FFBFAB}
I0202 14:32:55.347416       1 manager.go:231] Version: {KernelVersion:3.10.0-693.17.1.el7.x86_64 ContainerOsVersion:Alpine Linux v3.4 DockerVersion:Unknown DockerAPIVersion:Unknown CadvisorVersion:v0.28.3 CadvisorRevision:1e567c2}
I0202 14:32:57.349385       1 factory.go:54] Registering systemd factory
I0202 14:32:57.350048       1 factory.go:86] Registering Raw factory
I0202 14:32:57.350405       1 manager.go:1178] Started watching for new ooms in manager
W0202 14:32:57.350432       1 manager.go:313] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
I0202 14:32:57.353222       1 manager.go:329] Starting recovery of all containers
I0202 14:32:57.424192       1 manager.go:334] Recovery completed
F0202 14:32:57.466754       1 cadvisor.go:156] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory
# grep --after-context=8 cadvisor:$ docker-compose.yml 
    cadvisor:
        image: google/cadvisor:v0.28.3
        volumes:
            - /:/rootfs:ro
            - /var/run:/var/run:ro
            - /sys:/sys:ro
            - /var/lib/docker/:/var/lib/docker:ro
            - /dev/disk:/dev/disk:ro
        restart: unless-stopped
# date
Fri Feb  2 14:35:31 UTC 2018
# uname --kernel-release
3.10.0-693.17.1.el7.x86_64

After removing /sys bindmount:

cadvisor_1             | I0202 14:50:32.674041       1 storagedriver.go:50] Caching stats in memory for 2m0s
cadvisor_1             | I0202 14:50:32.674871       1 manager.go:151] cAdvisor running in container: "/sys/fs/cgroup/cpuacct,cpu"
cadvisor_1             | I0202 14:50:32.679600       1 fs.go:139] Filesystem UUIDs: map[bf383770-7408-48df-b204-d408f67e439b:/dev/sda2 203575e2-7c86-408b-b2e0-59df18bba2fb:/dev/sda1 2fbf3bea-7fde-4f32-8ac9-e9402a07da5d:/dev/sdc 390dbbea-ae34-44e6-a0fe-1b0c8a061827:/dev/sdb1]
cadvisor_1             | I0202 14:50:32.679652       1 fs.go:140] Filesystem partitions: map[/dev/sdb1:{mountpoint:/rootfs/mnt/resource major:8 minor:17 fsType:ext4 blockSize:0} shm:{mountpoint:/dev/shm major:0 minor:123 fsType:tmpfs blockSize:0} tmpfs:{mountpoint:/dev major:0 minor:127 fsType:tmpfs blockSize:0} /dev/sda2:{mountpoint:/var/lib/docker major:8 minor:2 fsType:xfs blockSize:0} /dev/sda1:{mountpoint:/rootfs/boot major:8 minor:1 fsType:xfs blockSize:0}]
cadvisor_1             | I0202 14:50:32.683995       1 manager.go:225] Machine: {NumCores:4 CpuFrequency:2394447 MemoryCapacity:16809521152 HugePages:[{PageSize:1048576 NumPages:0} {PageSize:2048 NumPages:0}] MachineID:d1085630399a48c6b29cf2e1de0eb5f4 SystemUUID:9EE4C12D-E3F9-114B-A81E-1D9E29FFBFAB BootID:bce6d330-b70d-4534-9ef8-6c9702c66abe Filesystems:[{Device:overlay DeviceMajor:0 DeviceMinor:118 Capacity:31671447552 Type:vfs Inodes:15472128 HasInodes:true} {Device:tmpfs DeviceMajor:0 DeviceMinor:127 Capacity:8404758528 Type:vfs Inodes:2051943 HasInodes:true} {Device:/dev/sda2 DeviceMajor:8 DeviceMinor:2 Capacity:31671447552 Type:vfs Inodes:15472128 HasInodes:true} {Device:/dev/sda1 DeviceMajor:8 DeviceMinor:1 Capacity:520785920 Type:vfs Inodes:256000 HasInodes:true} {Device:/dev/sdb1 DeviceMajor:8 DeviceMinor:17 Capacity:33685192704 Type:vfs Inodes:2097152 HasInodes:true} {Device:shm DeviceMajor:0 DeviceMinor:123 Capacity:67108864 Type:vfs Inodes:2051943 HasInodes:true}] DiskMap:map[2:0:{Name:fd0 Major:2 Minor:0 Size:4096 Scheduler:deadline} 8:0:{Name:sda Major:8 Minor:0 Size:32212254720 Scheduler:deadline} 8:16:{Name:sdb Major:8 Minor:16 Size:34359738368 Scheduler:deadline} 8:32:{Name:sdc Major:8 Minor:32 Size:139586437120 Scheduler:deadline}] NetworkDevices:[{Name:eth0 MacAddress:02:42:ac:12:00:0b Speed:10000 Mtu:1500}] Topology:[{Id:0 Memory:17179402240 Cores:[{Id:0 Threads:[0 1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]} {Id:1 Threads:[2 3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]}] Caches:[]}] CloudProvider:Azure InstanceType:Unknown InstanceID:9EE4C12D-E3F9-114B-A81E-1D9E29FFBFAB}
cadvisor_1             | I0202 14:50:32.684572       1 manager.go:231] Version: {KernelVersion:3.10.0-693.17.1.el7.x86_64 ContainerOsVersion:Alpine Linux v3.4 DockerVersion:Unknown DockerAPIVersion:Unknown CadvisorVersion:v0.28.3 CadvisorRevision:1e567c2}
cadvisor_1             | I0202 14:50:34.685916       1 factory.go:54] Registering systemd factory
cadvisor_1             | I0202 14:50:34.686479       1 factory.go:86] Registering Raw factory
cadvisor_1             | I0202 14:50:34.687028       1 manager.go:1178] Started watching for new ooms in manager
cadvisor_1             | W0202 14:50:34.687068       1 manager.go:313] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
cadvisor_1             | I0202 14:50:34.689742       1 manager.go:329] Starting recovery of all containers
cadvisor_1             | I0202 14:50:34.691993       1 manager.go:334] Recovery completed
cadvisor_1             | I0202 14:50:34.694875       1 cadvisor.go:162] Starting cAdvisor version: v0.28.3-1e567c2 on port 8080

The message "cAdvisor running in container: "/sys/fs/cgroup/cpuacct,cpu"" may or may not suggest the kernel is still exposing the necessary data within the container, and that bindmounting sys may or may not be necessary to get all features.

alrf commented 6 years ago

I have the same issue - "Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct: no such file or directory"

I0326 10:31:33.675230       1 manager.go:231] Version: {KernelVersion:4.4.30-32.54.amzn1.x86_64 ContainerOsVersion:Alpine Linux v3.4 DockerVersion:17.12.0-ce DockerAPIVersion:1.35 CadvisorVersion:v0.28.3 CadvisorRevision:1e567c2}
E0326 10:31:33.684262       1 factory.go:340] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops
I0326 10:31:33.684748       1 factory.go:356] Registering Docker factory
I0326 10:31:35.685162       1 factory.go:54] Registering systemd factory
I0326 10:31:35.685727       1 factory.go:86] Registering Raw factory
I0326 10:31:35.686198       1 manager.go:1178] Started watching for new ooms in manager
I0326 10:31:35.686649       1 manager.go:329] Starting recovery of all containers
I0326 10:31:35.686677       1 manager.go:334] Recovery completed
F0326 10:31:35.686690       1 cadvisor.go:156] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct: no such file or directory
robertofabrizi commented 6 years ago

Fails on the latest AWS Amazon Linux as well.

summera commented 6 years ago

Also having this issue on Amazon Linux. Removing /sys:/sys:ro worked but missing data now.

summera commented 6 years ago

What worked for me was mounting the following volumes on amazon linux

      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /cgroup:/sys/fs/cgroup:ro
      - /dev/disk/:/dev/disk:ro

I think the key was /cgroup:/sys/fs/cgroup:ro

chinglinwen commented 5 years ago

I encountered the same issue

I using kubeadm install k8s v1.14, it's show the issue, I then try run cadvisor with docker as follow, it show the same error

docker run from the readme

sudo docker run \ --volume=/:/rootfs:ro \ --volume=/var/run:/var/run:ro \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --volume=/dev/disk/:/dev/disk:ro \ --publish=8090:8080 \ --detach=true \ --name=cadvisor \ google/cadvisor:latest

[root@kube-master-90-101 cpu,cpuacct]# docker logs cadvisor -f
W0412 02:42:52.504836       1 manager.go:349] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
W0412 02:42:53.775922       1 container.go:523] Failed to update stats for container "/libcontainer_5726_systemd_test_default.slice": open /sys/fs/cgroup/cpu,cpuacct/libcontainer_5726_systemd_test_default.slice/cpuacct.usage: no such file or directory, continuing to push stats
W0412 02:42:53.961954       1 container.go:523] Failed to update stats for container "/libcontainer_5774_systemd_test_default.slice": failed to parse memory.memsw.usage_in_bytes - read /sys/fs/cgroup/memory/libcontainer_5774_systemd_test_default.slice/memory.memsw.usage_in_bytes: no such device, continuing to push stats
E0412 02:43:03.729233       1 helpers.go:137] readString: Failed to read "/sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.limit_in_bytes": read /sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.limit_in_bytes: no such device
E0412 02:43:03.729351       1 helpers.go:137] readString: Failed to read "/sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.memsw.limit_in_bytes": read /sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.memsw.limit_in_bytes: no such device
E0412 02:43:03.729413       1 helpers.go:137] readString: Failed to read "/sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.soft_limit_in_bytes": read /sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.soft_limit_in_bytes: no such device
E0412 02:43:03.729604       1 helpers.go:137] readString: Failed to read "/sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.limit_in_bytes": read /sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.limit_in_bytes: no such device
E0412 02:43:03.729719       1 helpers.go:137] readString: Failed to read "/sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.memsw.limit_in_bytes": read /sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.memsw.limit_in_bytes: no such device
E0412 02:43:03.729775       1 helpers.go:137] readString: Failed to read "/sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.soft_limit_in_bytes": read /sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.soft_limit_in_bytes: no such device
W0412 02:43:03.730389       1 container.go:523] Failed to update stats for container "/libcontainer_5800_systemd_test_default.slice": failed to parse memory.usage_in_bytes - read /sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice/memory.usage_in_bytes: no such device, continuing to push stats
W0412 02:43:03.736941       1 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_5800_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): readdirent: no such file or directory
W0412 02:43:03.737093       1 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_5800_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_5800_systemd_test_default.slice: no such file or directory
W0412 02:43:03.776870       1 container.go:523] Failed to update stats for container "/libcontainer_5807_systemd_test_default.slice": failed to parse memory.kmem.failcnt - read /sys/fs/cgroup/memory/libcontainer_5807_systemd_test_default.slice/memory.kmem.failcnt: no such device, continuing to push stats
W0412 02:43:13.731604       1 container.go:523] Failed to update stats for container "/libcontainer_5890_systemd_test_default.slice": read /sys/fs/cgroup/memory/libcontainer_5890_systemd_test_default.slice/memory.use_hierarchy: no such device, continuing to push stats
W0412 02:43:13.856336       1 container.go:523] Failed to update stats for container "/libcontainer_5918_systemd_test_default.slice": open /sys/fs/cgroup/cpu,cpuacct/libcontainer_5918_systemd_test_default.slice/cpuacct.stat: no such file or directory, continuing to push stats
W0412 02:43:23.725388       1 container.go:523] Failed to update stats for container "/libcontainer_5956_systemd_test_default.slice": read /sys/fs/cgroup/cpu,cpuacct/libcontainer_5956_systemd_test_default.slice/cpuacct.usage: no such device, continuing to push stats
W0412 02:43:44.013934       1 container.go:523] Failed to update stats for container "/libcontainer_6159_systemd_test_default.slice": open /sys/fs/cgroup/cpu,cpuacct/libcontainer_6159_systemd_test_default.slice/cpuacct.usage: no such file or directory, continuing to push stats
W0412 02:43:53.849481       1 container.go:523] Failed to update stats for container "/libcontainer_6237_systemd_test_default.slice": open /sys/fs/cgroup/cpu,cpuacct/libcontainer_6237_systemd_test_default.slice/cpuacct.usage: no such file or directory, continuing to push stats
W0412 02:43:53.961765       1 container.go:523] Failed to update stats for container "/libcontainer_6259_systemd_test_default.slice": failed to parse memory.memsw.usage_in_bytes - read /sys/fs/cgroup/memory/libcontainer_6259_systemd_test_default.slice/memory.memsw.usage_in_bytes: no such device, continuing to push stats
W0412 02:44:03.959716       1 container.go:523] Failed to update stats for container "/libcontainer_6329_systemd_test_default.slice": failed to parse memory.kmem.tcp.max_usage_in_bytes - read /sys/fs/cgroup/memory/libcontainer_6329_systemd_test_default.slice/memory.kmem.tcp.max_usage_in_bytes: no such device, continuing to push stats
W0412 02:44:13.726519       1 container.go:523] Failed to update stats for container "/libcontainer_6354_systemd_test_default.slice": failed to parse memory.kmem.tcp.max_usage_in_bytes - read /sys/fs/cgroup/memory/libcontainer_6354_systemd_test_default.slice/memory.kmem.tcp.max_usage_in_bytes: no such device, continuing to push stats
W0412 02:44:13.773203       1 container.go:523] Failed to update stats for container "/libcontainer_6361_systemd_test_default.slice": failed to parse memory.limit_in_bytes - read /sys/fs/cgroup/memory/libcontainer_6361_systemd_test_default.slice/memory.limit_in_bytes: no such device, continuing to push stats
W0412 02:44:13.998371       1 container.go:523] Failed to update stats for container "/libcontainer_6415_systemd_test_default.slice": failed to parse memory.memsw.failcnt - read /sys/fs/cgroup/memory/libcontainer_6415_systemd_test_default.slice/memory.memsw.failcnt: no such device, continuing to push stats
W0412 02:44:33.779144       1 container.go:523] Failed to update stats for container "/libcontainer_6498_systemd_test_default.slice": failed to parse memory.memsw.limit_in_bytes - read /sys/fs/cgroup/memory/libcontainer_6498_systemd_test_default.slice/memory.memsw.limit_in_bytes: no such device, continuing to push stats
W0412 02:44:43.773088       1 container.go:523] Failed to update stats for container "/libcontainer_6582_systemd_test_default.slice": failed to parse memory.failcnt - read /sys/fs/cgroup/memory/libcontainer_6582_systemd_test_default.slice/memory.failcnt: no such device, continuing to push stats
W0412 02:44:43.851186       1 container.go:523] Failed to update stats for container "/libcontainer_6603_systemd_test_default.slice": failed to parse memory.kmem.tcp.max_usage_in_bytes - read /sys/fs/cgroup/memory/libcontainer_6603_systemd_test_default.slice/memory.kmem.tcp.max_usage_in_bytes: no such device, continuing to push stats
[root@kube-master-90-101 cpu,cpuacct]# docker info
Containers: 17
 Running: 11
 Paused: 0
 Stopped: 6
Images: 11
Server Version: 18.06.2-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.63GiB
Name: kube-master-90-101
ID: 32JE:TXBU:5XXB:LOKS:VZRC:VPA7:MTWT:73YB:VOH3:I5C6:2HZA:QIM3
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 harbor.haodai.net
 127.0.0.0/8
Registry Mirrors:
 https://5s7givlk.mirror.aliyuncs.com/
Live Restore Enabled: false
docker inspect google/cadvisor |grep Created
        "Created": "2018-11-12T21:58:11.043673622Z",
tn-osimis commented 4 years ago

It seems the issue was addressed sometime between 0.28 and 0.33, although we haven't bothered to check which exact version/patch fixed it. We've tested 0.33 (container starts successfully) without workaround on CentOS 8.

Maybe wait for a second confirmation since we don't have a lot of data/understanding, but it's looking like this issue can be closed.

ElCoyote27 commented 4 years ago

@tn-osimis Please allow me to disagree on closing the issue. At the current time, RHEL7 is hugely more widespread than RHEL8. Also, with the opportunity to run either docker or podman/buildah/etc, monitoring containers on RHEL7 becomes a very popular feature.

tn-osimis commented 4 years ago

@ElCoyote27 ah, if it still fails on RHEL7/CentOS7 then definitely keep it open indeed. Could you confirm you tested again on a recent cAdvisor version?

The reason I ask is because we did check to see if the offending file in /sys was perhaps renamed in RHEL8/CentOS8 (which would have explained why it worked) but it turns out that it had not. This means we believe nothing changed in RHEL8/CentOS8 with regard to this issue. Consequently we assumed the fix was in cAdvisor itself. Of course, this analysis could be wrong (and I'm writing from memory after looking looking at a coworker's screen).

ElCoyote27 commented 4 years ago

@tn-osimis Sure, let me test on RHEL7.7 and report back.

ElCoyote27 commented 4 years ago

@tn-osimis Still fails on RHEL7 (tested on RHEL7.7):

$ sudo docker run   --volume=/:/rootfs:ro   --volume=/var/run:/var/run:ro   --volume=/sys:/sys:ro   --volume=/var/lib/docker/:/var/lib/docker:ro   --volume=/dev/disk/:/dev/disk:ro   --publish=8080:8080   --name=cadvisor   gcr.io/google-containers/cadvisor:latest
W0115 17:09:45.588458       1 manager.go:256] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
W0115 17:09:45.684134       1 container.go:409] Failed to create summary reader for "/system.slice/system-openvpn\\x2dserver.slice/openvpn-server@krynn.service": none of the resources are being tracked.
F0115 17:09:45.829246       1 cadvisor.go:186] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory
ElCoyote27 commented 4 years ago

That one works for 0.27.3 on RHEL7.7 (but is ugly as hell)

docker run --rm -i \
        --name cadvisor \
        --volume=/:/rootfs:ro \
        --volume=/cgroup:/cgroup:ro \
        --volume=/var/run:/var/run:rw \
        --volume=/var/lib/docker/:/var/lib/docker:ro \
        --volume=/sys/bus:/sys/bus:ro \
        --volume=/sys/dev:/sys/dev:ro \
        --volume=/sys/devices:/sys/devices:ro \
        --volume=/sys/block:/sys/block:ro \
        --volume=/sys/class:/sys/class:ro \
        --volume=/sys/power:/sys/power:ro \
        --volume=/sys/firmware:/sys/firmware:ro \
        --volume=/sys/kernel:/sys/kernel:ro \
        --volume=/sys/module:/sys/module:ro \
        --volume=/sys/hypervisor:/sys/hypervisor:ro \
        --volume=/sys/fs/bpf:/sys/fs/bpf:ro \
        --volume=/sys/fs/xfs:/sys/fs/xfs:ro \
        --volume=/sys/fs/ext4:/sys/fs/ext4:ro \
        --volume=/sys/fs/fuse:/sys/fs/fuse:ro \
        --volume=/sys/fs/resctrl:/sys/fs/resctrl:ro \
        --volume=/sys/fs/selinux:/sys/fs/selinux:ro \
        --volume=/sys/fs/pstore:/sys/fs/pstore:ro \
        --volume=/sys/fs/cgroup/cpuset:/sys/fs/cgroup/cpuset:ro \
        --volume=/sys/fs/cgroup/blkio:/sys/fs/cgroup/blkio:ro \
        --volume=/sys/fs/cgroup/memory:/sys/fs/cgroup/memory:ro \
        --volume=/sys/fs/cgroup/pids:/sys/fs/cgroup/pids:ro \
        --volume=/sys/fs/cgroup/freezer:/sys/fs/cgroup/freezer:ro \
        --volume=/sys/fs/cgroup/net_prio:/sys/fs/cgroup/net_prio:ro \
        --volume=/sys/fs/cgroup/hugetlb:/sys/fs/cgroup/hugetlb:ro \
        --volume=/sys/fs/cgroup/devices:/sys/fs/cgroup/devices:ro \
        --volume=/sys/fs/cgroup/cpu,cpuacct:/sys/fs/cgroup/cpuacct,cpu:ro \
        --volume=/sys/fs/cgroup/perf_event:/sys/fs/cgroup/perf_event:ro \
        --volume=/sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd:ro \
        -p 8080:8080 -eSERVICE_TAGS=prom_monitored \
        google/cadvisor:v0.27.3 "$*"
ElCoyote27 commented 4 years ago

Please note that the above 'workaround' also works for 0.33.0 on RHEL7.7

secustor commented 4 years ago

@ElCoyote27 I could not replicate that workaround with CentOS Linux release 7.6.1810 (Core) and cAdvisor versions 0.27.3, 0.33.0 and 0.34.0.

/usr/bin/docker-current: Error response from daemon: error while creating mount source path '/sys/fs/xfs': mkdir /sys/fs/xfs: operation not permitted.

I get these error for random volumes under /sys/, at least it looks like it.
Changing the mount type from RO to RW has no effect on this behaviour.

ElCoyote27 commented 4 years ago

@secustor Remove the 'xfs' section. It probably means you don't have any XFS filesystems (the module isn't loaded). Or do a 'modprobe xfs' on the host.

secustor commented 4 years ago

@ElCoyote27 Thx, by removing the offending volumes I have managed get it running.

"/sys/fs/xfs:/sys/fs/xfs:ro",
"/sys/fs/fuse:/sys/fs/fuse:ro",
"/sys/fs/resctrl:/sys/fs/resctrl:ro",
"/sys/fs/selinux:/sys/fs/selinux:ro",
daverodgers77 commented 4 years ago

hi

im running rhel 7.8 and the latest version of cadvisor , but im also getting the error:

Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory

Ive tried adding the following to my docker run command:

--volume=/cgroup:/cgroup:ro \ or --volume=/cgroup:/sys/fs/cgroup:ro \

but no luck

If i try: --volume=/sys/fs/cgroup/cpu,cpuacct:/sys/fs/cgroup/cpuacct,cpu:ro \

i get a different error:

/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "container init exited prematurely".

Is it actually possible to use Cadvisor with rhel at the moment?

thanks

Dave

daverodgers77 commented 4 years ago

https://github.com/google/cadvisor/issues/2155#issuecomment-631306789

ive added a comment on 2155 , with a cfg that seems to work for me.

adityai commented 3 years ago

https://github.com/google/cadvisor/issues/2155#issuecomment-631306789 - the docker command in this comment for #2155 worked for me.