google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.85k stars 2.31k forks source link

cadvisor not working #1928

Open gauravkarens opened 6 years ago

gauravkarens commented 6 years ago

I0416 10:31:24.513018 1 storagedriver.go:50] Caching stats in memory for 2m0s I0416 10:31:24.513510 1 manager.go:151] cAdvisor running in container: "/sys/fs/cgroup/cpuacct,cpu" I0416 10:31:24.608594 1 fs.go:139] Filesystem UUIDs: map[] I0416 10:31:24.608634 1 fs.go:140] Filesystem partitions: map[/dev/sda6:{mountpoint:/rootfs/var major:8 minor:6 fsType:ext3 blockSize:0} /dev/sda1:{mountpoint:/rootfs/boot major:8 minor:1 fsType:ext3 blockSize:0} /dev/sda5:{mountpoint:/rootfs/tmp major:8 minor:5 fsType:ext3 blockSize:0} /dev/mapper/optvg-optlv:{mountpoint:/rootfs/opt major:253 minor:0 fsType:ext4 blockSize:0} shm:{mountpoint:/rootfs/opt/docker-latest/lib/containers/a949b8024c39b71fde0c74f6eb3fc44fa81c336a340fe80bb3a514924e43b01b/shm major:0 minor:80 fsType:tmpfs blockSize:0} overlay:{mountpoint:/ major:0 minor:178 fsType:overlay blockSize:0} tmpfs:{mountpoint:/dev major:0 minor:182 fsType:tmpfs blockSize:0} /dev/sda2:{mountpoint:/rootfs major:8 minor:2 fsType:ext3 blockSize:0}] I0416 10:31:24.618696 1 manager.go:225] Machine: {NumCores:4 CpuFrequency:2666761 MemoryCapacity:12331003904 HugePages:[{PageSize:2048 NumPages:0}] MachineID:776138ac4fe643cf9e2174648538e555 SystemUUID:4202D7A4-008A-D850-0214-03D3256FA251 BootID:a6f94e20-f8b7-4ac6-a822-505bfc698872 Filesystems:[{Device:/dev/sda6 DeviceMajor:8 DeviceMinor:6 Capacity:3103539200 Type:vfs Inodes:196608 HasInodes:true} {Device:/dev/sda1 DeviceMajor:8 DeviceMinor:1 Capacity:499355648 Type:vfs Inodes:128016 HasInodes:true} {Device:/dev/sda5 DeviceMajor:8 DeviceMinor:5 Capacity:3103539200 Type:vfs Inodes:196608 HasInodes:true} {Device:/dev/mapper/optvg-optlv DeviceMajor:253 DeviceMinor:0 Capacity:63254081536 Type:vfs Inodes:3932160 HasInodes:true} {Device:shm DeviceMajor:0 DeviceMinor:80 Capacity:67108864 Type:vfs Inodes:1505249 HasInodes:true} {Device:overlay DeviceMajor:0 DeviceMinor:178 Capacity:63254081536 Type:vfs Inodes:3932160 HasInodes:true} {Device:tmpfs DeviceMajor:0 DeviceMinor:182 Capacity:6165499904 Type:vfs Inodes:1505249 HasInodes:true} {Device:/dev/sda2 DeviceMajor:8 DeviceMinor:2 Capacity:9377800192 Type:vfs Inodes:589824 HasInodes:true}] DiskMap:map[8:0:{Name:sda Major:8 Minor:0 Size:53687091200 Scheduler:deadline} 8:16:{Name:sdb Major:8 Minor:16 Size:64424509440 Scheduler:deadline} 253:0:{Name:dm-0 Major:253 Minor:0 Size:64399343616 Scheduler:none} 2:0:{Name:fd0 Major:2 Minor:0 Size:4096 Scheduler:deadline}] NetworkDevices:[{Name:eth0 MacAddress:02:42:ac:11:00:0e Speed:10000 Mtu:1500}] Topology:[{Id:0 Memory:12884434944 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:25165824 Type:Unified Level:3}]} {Id:2 Memory:0 Cores:[{Id:0 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:25165824 Type:Unified Level:3}]} {Id:4 Memory:0 Cores:[{Id:0 Threads:[2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:25165824 Type:Unified Level:3}]} {Id:6 Memory:0 Cores:[{Id:0 Threads:[3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:25165824 Type:Unified Level:3}]}] CloudProvider:Unknown InstanceType:Unknown InstanceID:None} I0416 10:31:24.621578 1 manager.go:231] Version: {KernelVersion:3.10.0-693.11.6.el7.x86_64 ContainerOsVersion:Alpine Linux v3.4 DockerVersion:1.13.1 DockerAPIVersion:1.26 CadvisorVersion:v0.28.3 CadvisorRevision:1e567c2} I0416 10:31:24.670378 1 factory.go:356] Registering Docker factory I0416 10:31:26.671372 1 factory.go:54] Registering systemd factory I0416 10:31:26.672593 1 factory.go:86] Registering Raw factory I0416 10:31:26.674437 1 manager.go:1178] Started watching for new ooms in manager W0416 10:31:26.674473 1 manager.go:313] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory I0416 10:31:26.684839 1 manager.go:329] Starting recovery of all containers I0416 10:31:26.685812 1 manager.go:334] Recovery completed I0416 10:31:26.690794 1 cadvisor.go:162] Starting cAdvisor version: v0.28.3-1e567c2 on port 8080

Please let me know how to fix this issue

dashpole commented 6 years ago

What problem are you experiencing? I don't see any unusual errors in the logs...

gauravkarens commented 6 years ago

The UI is not opening on the 8080 port also can u suggest is there an API which can be used to stock the metric data coming from cadvisor

dashpole commented 6 years ago

Not sure what you mean by "stock"ing metrics... How are you running cAdvisor? Just with docker, or in kubernetes as a daemonset, or using the one built into the kubelet?

gauravkarens commented 6 years ago

I need to do an analysis of what resource utilization is associated with per container basis for a period of one day and to do that I will have to have record of the metric data produced by cadvisor so how do I do that

dashpole commented 6 years ago

can you try running this curl localhost:8080/metrics from the host and see if there is output? That would help determine if it is a problem with cAdvisor, or a problem reaching cAdvisor from your browser.

Eric-LiuGang commented 6 years ago

@dashpole Hi, I cacth this problem too. The container not running and can't start the container. OS : Centos 7.4.1 docker version : 1.13.1 And Install docker by yum as default.

dashpole commented 6 years ago

It likely isn't an issue with docker @Eric-LiuGang, please share your cadvisor log, and try curling the /metrics endpoint as I describe in my previous comment.

Eric-LiuGang commented 6 years ago

When I run cadvisor with

sudo docker run \ --volume=/:/rootfs:ro \ --volume=/var/run:/var/run:rw \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --publish=8080:8080 \ --detach=true \ --name=cadvisor \ google/cadvisor:latest

The container run up about 2 seconds and then stoped,so port 8080 is not up

The log as follows: Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: I0427 02:44:48.700521 1 storagedriver.go:50] Caching stats in memory for 2m0s Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: I0427 02:44:48.701273 1 manager.go:151] cAdvisor running in container: "/sys/fs/cgroup/cpuacct,cpu" Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: time="2018-04-27T10:44:48.723316610+08:00" level=warning msg="failed to retrieve docker-runc version: unknown output format: runc version 1.0.0-rc2\nspec: 1.0.0-rc2-dev\n" Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: time="2018-04-27T10:44:48.723412514+08:00" level=warning msg="failed to retrieve docker-init version" Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: time="2018-04-27T10:44:48.748355750+08:00" level=warning msg="failed to retrieve docker-runc version: unknown output format: runc version 1.0.0-rc2\nspec: 1.0.0-rc2-dev\n" Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: time="2018-04-27T10:44:48.748467364+08:00" level=warning msg="failed to retrieve docker-init version" Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: I0427 02:44:48.753382 1 fs.go:139] Filesystem UUIDs: map[] Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: I0427 02:44:48.753428 1 fs.go:140] Filesystem partitions: map[/dev/mapper/docker-253:0-50554515-850de0dbc61dcb167094b4cd7e0f60c4dec47c37cf2e2479fb245a92cfae9331:{mountpoint:/ major:253 minor:3 fsType:xfs blockSize:0} tmpfs:{mountpoint:/dev major:0 minor:39 fsType:tmpfs blockSize:0} /dev/mapper/cl-root:{mountpoint:/var/lib/docker major:253 minor:0 fsType:xfs blockSize:0} /dev/sda1:{mountpoint:/rootfs/boot major:8 minor:1 fsType:xfs blockSize:0} shm:{mountpoint:/dev/shm major:0 minor:36 fsType:tmpfs blockSize:0}] Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: I0427 02:44:48.760924 1 manager.go:225] Machine: {NumCores:4 CpuFrequency:2400085 MemoryCapacity:3974868992 HugePages:[{PageSize:2048 NumPages:0}] MachineID:5bb269792be0410d8e8cf4bcc561b3c5 SystemUUID:423735B7-EB95-DE79-C03D-EDA16581FDD0 BootID:8ae20ada-81dd-4d42-84c2-dbd650375df7 Filesystems:[{Device:shm DeviceMajor:0 DeviceMinor:36 Capacity:67108864 Type:vfs Inodes:485213 HasInodes:true} {Device:/dev/mapper/docker-253:0-50554515-850de0dbc61dcb167094b4cd7e0f60c4dec47c37cf2e2479fb245a92cfae9331 DeviceMajor:253 DeviceMinor:3 Capacity:10725883904 Type:vfs Inodes:5242368 HasInodes:true} {Device:tmpfs DeviceMajor:0 DeviceMinor:39 Capacity:1987432448 Type:vfs Inodes:485213 HasInodes:true} {Device:/dev/mapper/cl-root DeviceMajor:253 DeviceMinor:0 Capacity:18238930944 Type:vfs Inodes:8910848 HasInodes:true} {Device:/dev/sda1 DeviceMajor:8 DeviceMinor:1 Capacity:1063256064 Type:vfsInodes:524288 HasInodes:true}] DiskMap:map[253:0:{Name:dm-0 Major:253 Minor:0 Size:18249416704 Scheduler:none} 253:1:{Name:dm-1 Major:253 Minor:1 Size:2147483648 Scheduler:none} 253:2:{Name:dm-2 Major:253 Minor:2 Size:107374182400 Scheduler:none} 253:3:{Name:dm-3 Major:253 Minor:3 Size:10737418240 Scheduler:none} 2:0:{Name:fd0 Major:2 Minor:0 Size:4096 Scheduler:deadline} 8:0:{Name:sda Major:8 Minor:0 Size:21474836480 Scheduler:deadline}] NetworkDevices:[{Name:ens160 MacAddress:00:50:56:b7:49:9c Speed:10000 Mtu:1500} {Name:ens192 MacAddress:00:50:56:b7:75:05 Speed:10000 Mtu:1500}] Topology:[{Id:0 Memory:4294500352 Cores:[{Id:0 Threads:[0] Caches:[]} {Id:1 Threads:[1] Caches:[]}] Caches:[{Size:12582912 Type:Unified Level:3}]} {Id:1 Memory:0 Cores:[{Id:0 Threads:[2] Caches:[]} {Id:1 Threads:[3] Caches:[]}] Caches:[{Size:12582912 Type:Unified Level:3}]}] CloudProvider:Unknown InstanceType:Unknown InstanceID:None} Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: I0427 02:44:48.762428 1 manager.go:231] Version: {KernelVersion:3.10.0-693.21.1.el7.x86_64 ContainerOsVersion:Alpine Linux v3.4 DockerVersion:1.13.1 DockerAPIVersion:1.26 CadvisorVersion:v0.28.3 CadvisorRevision:1e567c2} Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: time="2018-04-27T10:44:48.781672520+08:00" level=warning msg="failed to retrieve docker-runc version: unknown output format: runc version 1.0.0-rc2\nspec: 1.0.0-rc2-dev\n" Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: time="2018-04-27T10:44:48.781791536+08:00" level=warning msg="failed to retrieve docker-init version" Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: E0427 02:44:48.786038 1 factory.go:340] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops Apr 27 10:44:48 docker2.gang_test dockerd-current[6646]: I0427 02:44:48.787405 1 factory.go:356] Registering Docker factory Apr 27 10:44:50 docker2.gang_test dockerd-current[6646]: I0427 02:44:50.788041 1 factory.go:54] Registering systemd factory Apr 27 10:44:50 docker2.gang_test dockerd-current[6646]: I0427 02:44:50.788776 1 factory.go:86] Registering Raw factory Apr 27 10:44:50 docker2.gang_test dockerd-current[6646]: I0427 02:44:50.789363 1 manager.go:1178] Started watching for new ooms in manager Apr 27 10:44:50 docker2.gang_test dockerd-current[6646]: W0427 02:44:50.789410 1 manager.go:313] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory Apr 27 10:44:50 docker2.gang_test dockerd-current[6646]: I0427 02:44:50.794910 1 manager.go:329] Starting recovery of all containers Apr 27 10:44:50 docker2.gang_test dockerd-current[6646]: I0427 02:44:50.877151 1 manager.go:334] Recovery completed Apr 27 10:44:50 docker2.gang_test dockerd-current[6646]: F0427 02:44:50.908107 1 cadvisor.go:156] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory Apr 27 10:44:50 docker2.gang_test oci-systemd-hook[6916]: systemdhook : 408094c2ab7d: Skipping as container command is /usr/bin/cadvisor, not init or systemd Apr 27 10:44:50 docker2.gang_test oci-umount[6917]: umounthook : 408094c2ab7d: only runs in prestart stage, ignoring Apr 27 10:44:50 docker2.gang_test dockerd-current[6646]: time="2018-04-27T10:44:50.979845203+08:00" level=error msg="containerd: deleting container" error="exit status 1: \"container 408094c2ab7d550bda56d583f9541a5af8e482ba021b993cc2a7d5daf6bf0842 does not exist\none or more of the container deletions failed\n\"" Apr 27 10:44:51 docker2.gang_test kernel: docker0: port 1(veth0f40d62) entered disabled state Apr 27 10:44:51 docker2.gang_test NetworkManager[647]: [1524797091.0181] manager: (veth2a03b9c): new Veth device (/org/freedesktop/NetworkManager/Devices/28) Apr 27 10:44:51 docker2.gang_test kernel: docker0: port 1(veth0f40d62) entered disabled state Apr 27 10:44:51 docker2.gang_test kernel: device veth0f40d62 left promiscuous mode Apr 27 10:44:51 docker2.gang_test kernel: docker0: port 1(veth0f40d62) entered disabled state Apr 27 10:44:51 docker2.gang_test NetworkManager[647]: [1524797091.0314] device (veth0f40d62): released from master device docker0 Apr 27 10:44:51 docker2.gang_test kernel: XFS (dm-3): Unmounting Filesystem Apr 27 10:44:51 docker2.gang_test dockerd-current[6646]: time="2018-04-27T10:44:51.102927017+08:00" level=warning msg="408094c2ab7d550bda56d583f9541a5af8e482ba021b993cc2a7d5daf6bf0842 cleanup: failed to unmount secrets: invalid argument"

The docker service start status: /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver=journald --signature-verification=false

Is “selinux enabled” OK?

jwmarshall commented 6 years ago

I have a similar issue where cadvisor never completes the "Starting recovery of all containers" phase. According to the logs "Recovery completed" event is never emitted and the web UI port is never opened. I've strace'd cadvisor and there is an overwhelming number of futex errors all pointing to the same address.

futex(0x13ae8b0, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)

The issue happens running in docker as well as via systemd on the host. It was working yesterday without any issues, but I suspect that it's something related to Docker. Unfortunately I cannot restart the docker daemon nor the host at this time to see if it fixes the issue. AFAICT there are no stuck or zombie containers. Cadvisor has been running for over three hours now with no change. I'm trying to replicate the issue on another system that I can restart if necessary.

Not sure why cadvisor fails to report its version number in the log, it was installed via apt, but maybe this will help:

Version: {KernelVersion:4.15.0-20-generic ContainerOsVersion:Ubuntu 18.04 LTS DockerVersion:17.12.1-ce DockerAPIVersion:1.35 CadvisorVersion: CadvisorRevision:}

apt-cache says its cadvisor version 0.2.71

jwmarshall commented 6 years ago

Pointing cadvisor at an invalid socket file for docker causes it to start immediately, though without docker statistics obviously. The futex errors remain so maybe theyre unrelated.