google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.85k stars 2.31k forks source link

Cadvisor restarting #1895

Open hshahar opened 6 years ago

hshahar commented 6 years ago

I am using cadvisr version: v0.28.3-1e567c2 on swarm mode ubuntu 16 when deploying the service the Cadviser continuously performs a restarts & Recovery cycle without stoping I see the following error: "Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory" The configuration is:

cadvisor: image: google/cadvisor:latest networks:

dashpole commented 6 years ago

The error message you list shouldn't cause cAdvisor to restart. Can you provide more of your log?

dashpole commented 6 years ago

I would also recommend using tagged releases. Perhaps try out v0.29.0?

hshahar commented 6 years ago

HI tnx for your answer. I changed to v0.29.0 This is the log from the docker: 1 storagedriver.go:50] Caching stats in memory for 2m0s I0304 15:06:45.141205 1 manager.go:154] cAdvisor running in container: "/sys/fs/cgroup/cpu,cpuacct" I0304 15:06:45.215549 1 fs.go:142] Filesystem UUIDs: map[] I0304 15:06:45.215570 1 fs.go:143] Filesystem partitions: map[tmpfs:{mountpoint:/dev major:0 minor:103 fsType:tmpfs blockSize:0} /dev/xvda1:{mountpoint:/var/lib/docker/overlay2 major:202 minor:1 fsType:ext4 blockSize:0} /dev/xvdf:{mountpoint:/rootfs/gluster/data major:202 minor:80 fsType:xfs blockSize:0} shm:{mountpoint:/rootfs/var/lib/docker/containers/f375c3eae11fd9939170b0d93d485774408483ff19afc15cc32d76d6c0a998ab/shm major:0 minor:78 fsType:tmpfs blockSize:0}] I0304 15:06:45.219300 1 manager.go:227] Machine: {NumCores:4 CpuFrequency:2400068 MemoryCapacity:16825765888 HugePages:[{PageSize:1048576 NumPages:0} {PageSize:2048 NumPages:0}] MachineID:e9e76c2f45f5416390755da00379b451 SystemUUID:EC2ADA12-F744-1CF5-5CFD-5A9BDF79232C BootID:3954118f-f1d1-43e8-996a-7cc2aafe824e Filesystems:[{Device:tmpfs DeviceMajor:0 DeviceMinor:103 Capacity:67108864 Type:vfs Inodes:2053926 HasInodes:true} {Device:/dev/xvda1 DeviceMajor:202 DeviceMinor:1 Capacity:20749852672 Type:vfs Inodes:2560000 HasInodes:true} {Device:/dev/xvdf DeviceMajor:202 DeviceMinor:80 Capacity:8579448832 Type:vfs Inodes:4194304 HasInodes:true} {Device:shm DeviceMajor:0 DeviceMinor:78 Capacity:67108864 Type:vfs Inodes:2053926 HasInodes:true} {Device:overlay DeviceMajor:0 DeviceMinor:42 Capacity:20749852672 Type:vfs Inodes:2560000 HasInodes:true}] DiskMap:map[202:0:{Name:xvda Major:202 Minor:0 Size:21474836480 Scheduler:none} 202:80:{Name:xvdf Major:202 Minor:80 Size:8589934592 Scheduler:none}] NetworkDevices:[{Name:eth0 MacAddress:0a:58:d9:18:bf:2a Speed:0 Mtu:9001}] Topology:[{Id:0 Memory:16825765888 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]} {Id:1 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]} {Id:2 Threads:[2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]} {Id:3 Threads:[3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:31457280 Type:Unified Level:3}]}] CloudProvider:AWS InstanceType:t2.xlarge InstanceID:i-00368e8946e2112b8} I0304 15:06:45.220074 1 manager.go:233] Version: {KernelVersion:4.4.0-1049-aws ContainerOsVersion:Alpine Linux v3.4 DockerVersion:17.09.1-ce DockerAPIVersion:1.32 CadvisorVersion:v0.29.0 CadvisorRevision:aaaa65d} I0304 15:06:45.249195 1 factory.go:356] Registering Docker factory I0304 15:06:47.249563 1 factory.go:54] Registering systemd factory I0304 15:06:47.250974 1 factory.go:86] Registering Raw factory I0304 15:06:47.252178 1 manager.go:1205] Started watching for new ooms in manager W0304 15:06:47.252197 1 manager.go:340] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory I0304 15:06:47.252919 1 manager.go:356] Starting recovery of all containers I0304 15:06:47.332977 1 manager.go:361] Recovery completed I0304 15:06:47.440577 1 cadvisor.go:163] Starting cAdvisor version: v0.29.0-aaaa65d on port 8080 2018/03/04 15:07:05 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial unix:///var/run/containerd/containerd.sock: timeout"; Reconnecting to {unix:///var/run/containerd/containerd.sock

It looks like its not restarting but dont know what this err means? ( failed to ceate....)

dashpole commented 6 years ago

What version of docker? It looks like it is trying and failing to connect to containerd.

delskev commented 6 years ago

I got the same problem, here's the error message: 2018/03/16 11:10:17 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial unix:///var/run/containerd/containerd.sock: timeout"; Reconnecting to {unix:///var/run/containerd/containerd.sock }

and here's my docker's version: Docker version 18.02.0-ce, build fc4de447b5

dashpole commented 6 years ago

cc @Random-Liu it looks like there are some issues with docker 18.02.0. Does that version have the new containerd built in?

Random-Liu commented 6 years ago

@delskev Does docker 18.02.0 has a socket at /var/run/containerd/containerd.sock?

If it doesn't, the error message is expected I think.

dashpole commented 6 years ago

The containerd error is probably a red herring. See my comment here: https://github.com/google/cadvisor/issues/1910#issuecomment-375394532 @hshahar @delskev are there any other error messages in the log. Would you be able to upload more of your log for me to look at?