google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.62k stars 2.29k forks source link

Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache #2341

Closed iglov closed 1 year ago

iglov commented 4 years ago

I have cadvisor runned as systemd unit on centos7 server, and on logs every minute i see an error: cadvisor[103112]: W1120 14:47:56.678182 103112 container.go:422] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache What is this and how to fix it?

OS: CentOS Linux release 7.7.1908 (Core) Cadvisor version: v0.34.0 (24a6a52f) Runned with flags: --docker=unix:///var/run/docker.sock --listen_ip=192.168.49.177 --port=4194 --disable_root_cgroup_stats=true --docker_only=true --logtostderr=true validate:

cAdvisor version: v0.34.0

OS version: CentOS Linux 7 (Core)

Kernel version: [Supported and recommended]
    Kernel version is 3.10.0-957.27.2.el7.x86_64. Versions >= 2.6 are supported. 3.0+ are recommended.

Cgroup setup: [Supported and recommended]
    Available cgroups: map[blkio:1 cpu:1 cpuacct:1 cpuset:1 devices:1 freezer:1 hugetlb:1 memory:1 net_cls:1 net_prio:1 perf_event:1 pids:1]
    Following cgroups are required: [cpu cpuacct]
    Following other cgroups are recommended: [memory blkio cpuset devices freezer]
    Hierarchical memory accounting enabled. Reported memory usage includes memory used by child containers.
    Cpu cfs bandwidth is enabled.

Cgroup mount setup: [Supported and recommended]
    Cgroups are mounted at /sys/fs/cgroup.
    Cgroup mount directories: blkio cpu cpu,cpuacct cpuacct cpuset devices freezer hugetlb memory net_cls net_cls,net_prio net_prio perf_event pids systemd 
    Any cgroup mount point that is detectible and accessible is supported. /sys/fs/cgroup is recommended as a standard location.
    Cgroup mounts:
    cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
    cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
    cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
    cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
    cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
    cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_prio,net_cls 0 0
    cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
    cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
    cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
    cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
    cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0

Docker version: [Supported and recommended]
    Docker version is 19.03.2. Versions >= 1.0 are supported. 1.2+ are recommended.

Docker driver setup: [Supported and recommended]
    Storage driver is overlay2.

Block device setup: [Supported, but not recommended]
    None of the devices support 'cfq' I/O scheduler. No disk stats can be reported.
     Disk "dm-0" Scheduler type "none".
     Disk "dm-14" Scheduler type "none".
     Disk "sdf" Scheduler type "deadline".
     Disk "dm-11" Scheduler type "none".
     Disk "dm-2" Scheduler type "deadline".
     Disk "sdc" Scheduler type "deadline".
     Disk "dm-8" Scheduler type "none".
     Disk "dm-1" Scheduler type "none".
     Disk "dm-13" Scheduler type "none".
     Disk "sdd" Scheduler type "deadline".
     Disk "dm-12" Scheduler type "none".
     Disk "dm-15" Scheduler type "none".
     Disk "dm-3" Scheduler type "deadline".
     Disk "dm-5" Scheduler type "deadline".
     Disk "dm-6" Scheduler type "deadline".
     Disk "sdg" Scheduler type "deadline".
     Disk "sdk" Scheduler type "deadline".
     Disk "dm-10" Scheduler type "none".
     Disk "dm-4" Scheduler type "deadline".
     Disk "dm-7" Scheduler type "none".
     Disk "sde" Scheduler type "deadline".
     Disk "sdh" Scheduler type "deadline".
     Disk "sdi" Scheduler type "deadline".
     Disk "sdj" Scheduler type "deadline".
     Disk "dm-9" Scheduler type "none".
     Disk "sda" Scheduler type "deadline".
     Disk "sdb" Scheduler type "deadline".

Inotify watches: 

Managed containers: 
    /docker/885218b4482590eacd6b833b73ace81ea47bec55d289f938bed9212598a395aa
        Namespace: docker
        Aliases:
            kibana
            885218b4482590eacd6b833b73ace81ea47bec55d289f938bed9212598a395aa
    /docker/8d0e2be90b560d0f010caf54f19beb2bf351cbfc72ad167085978017c2cfd274
        Namespace: docker
        Aliases:
            logger
            8d0e2be90b560d0f010caf54f19beb2bf351cbfc72ad167085978017c2cfd274
    /
    /docker/8c81f9da5fd00038183e70f624784a7f00fa394b8e57db8da049e78ccc7dd9b3
        Namespace: docker
        Aliases:
            grafana
            8c81f9da5fd00038183e70f624784a7f00fa394b8e57db8da049e78ccc7dd9b3
    /docker/21470eaa8aba17420810a41a0c132e353ae9dd8284b1ee308aa8dd61e8618eff
        Namespace: docker
        Aliases:
            kibana_nginx
            21470eaa8aba17420810a41a0c132e353ae9dd8284b1ee308aa8dd61e8618eff
    /docker/7641d657dde2d2cb742e02235e771563a8cbd90323c8a9138e250dba3b7231d0
        Namespace: docker
        Aliases:
            prometheus
            7641d657dde2d2cb742e02235e771563a8cbd90323c8a9138e250dba3b7231d0
    /docker/85d6202f073cd8b3c11b8876e12aff5707874b7bee6a526c3967e46103fcf991
        Namespace: docker
        Aliases:
            elasticsearch
            85d6202f073cd8b3c11b8876e12aff5707874b7bee6a526c3967e46103fcf991
    /docker/876c03fb744659d908f7438cdef53014e8d5cb8b910e71a83e99def44fbd7463
        Namespace: docker
        Aliases:
            registry_ui
            876c03fb744659d908f7438cdef53014e8d5cb8b910e71a83e99def44fbd7463
    /docker/b741731238af219329e4e87a9e8f82f91f46795de1f64d762db085c29a6c6dbc
        Namespace: docker
        Aliases:
            resolver
            b741731238af219329e4e87a9e8f82f91f46795de1f64d762db085c29a6c6dbc
dashpole commented 4 years ago

Hmmm... This might be a bug with --disable_root_cgroup_stats. The message seems to indicate that it is looking for stats for the root ("/") cgroup. You can probably get rid of the error message by not adding that parameter.

I'll triage this and look into it when I can find time. Or, if you are interested, you can try and dig into it and submit a fix. The error is coming from nextHousekeepingInterval(), and is called as part of housekeeping().

Other than the log message, what symptoms are you seeing? Are any metrics you expect missing?

iglov commented 4 years ago

Hey @dashpole ! Nah, i do not want to shut off disable_root_cgroup_stats cuz i do not want get 100500 cgroup metrics :) I do not know how exactly it affected on my monitoring, i thought somebody here tell me, i just see this error on logs and it is worry me :) P.S. Unfortunately i am not a programmer and i have no idea how to fix it :(

nobody4t commented 4 years ago

@dashpole Is there any progress on this? we got the same issue.

dashpole commented 4 years ago

@dongwangdw are there any other symptoms other than the log message?

sbueringer commented 4 years ago

@dashpole We're also using the the flag since 1-2 weeks and apart from the log message we see no symptoms.

dashpole commented 4 years ago

ok, i'm 90% sure we just need to lower the log verbosity when running with --disable_root_cgroup_stats

nobody4t commented 4 years ago

@dashpole The error messages are below.

E0216 04:10:40.342721 8215 cadvisor_stats_provider.go:440] Partial failure issuing cadvisor.ContainerInfoV2: partial failures: ["/libcontainer_61055_systemd_test_default.slice": RecentStats: unable to find data in memory cache]

hyperkube: E0217 00:04:33.996461 8215 helpers.go:137] readString: Failed to read "/sys/fs/cgroup/memory/libcontainer_108824_systemd_test_default.slice/memory.limit_in_bytes": read /sys/fs/cgroup/memory/libcontainer_108824_systemd_test_default.slice/memory.limit_in_bytes: no such device

dashpole commented 4 years ago

@dongwangdw that looks unrelated to this issue. Feel free to open a new issue if you would like.

trallnag commented 3 years ago

The following shows up in the logs when I activate

W0929 12:48:23.346172 1 container.go:448] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W0929 12:48:29.519371 1 prometheus.go:1789] Couldn't get containers: partial failures: ["/": containerDataToContainerInfo: unable to find data in memory cache]

In addition, most / many metrics disappear. Is this related?

"--allow_dynamic_housekeeping=true",
"--global_housekeeping_interval=1m0s",
"--housekeeping_interval=10s",
# Currently buggy. https://github.com/google/cadvisor/issues/2602.
# "--disable_root_cgroup_stats=true", 
"--raw_cgroup_prefix_whitelist=/ecs",
"--docker_only=true",
"--store_container_labels=false",
join("", [
  "--whitelisted_container_labels='",
  "com.amazonaws.ecs.container-name,",
  "com.amazonaws.ecs.task-definition-family,",
  "promstack.namespace,",
  "promstack.alias,",
  "promstack.api_type,",
  "'"
]),
"--disable_metrics=tcp,advtcp,udp,sched,hugetlb,disk,diskIO,accelerator,resctrl",
sbueringer commented 3 years ago

@trallnag I'm not sure what you changed but maybe our experience can help. We upgraded from cadvisor 0.36 to 0.37 and all container_ metrics disappeared. We use cadvisor in Kubernetes with containerd. Removing the --disable_root_cgroup_stats option solved our problem and we got container_ metrics again.

itsx commented 3 years ago

Hey @dashpole @iglov We run into the exact same issue, when we wanted to upgrade from v0.33.0 to v0.36.0 docker logs command shows following warnings, in about 1 minute intervals:

W1030 19:58:24.826509       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 19:59:26.022628       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:00:27.176149       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:01:27.980109       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:02:29.232640       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:03:29.366618       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:04:30.211497       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:05:30.825290       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:06:31.976701       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:07:33.350365       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:08:33.915653       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:09:34.812295       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:10:35.357128       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:11:35.930555       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
W1030 20:12:37.640540       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache

Also cAdvisor UI is broken. Browser shows just this message: failed to get container "/" with error: unable to find data in memory cache

However exported metrics and Prometheus /metrics endpoint are Ok.

Our setup:

cAdvisor: sudo docker run --name cadvisor_test -d --restart=always --volume=/:/rootfs:ro --volume=/var/run:/var/run:ro --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --volume=/dev/disk/:/dev/disk:ro --publish=8082:8080 gcr.io/cadvisor/cadvisor:v0.36.0 --docker_only=true --store_container_labels=false --disable_root_cgroup_stats=true --v=0

Notes:

OS: No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.2 LTS Release: 18.04 Codename: bionic

Docker:

$ sudo docker --info

Client: Docker Engine - Community
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:02:36 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:01:06 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.7
  GitCommit:        8fba4e9a7d01810a393d5d25a3621dc101981175
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

$ sudo docker info

Client:
 Debug Mode: false

Server:
 Containers: 23
  Running: 23
  Paused: 0
  Stopped: 0
 Images: 110
 Server Version: 19.03.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-118-generic
 Operating System: Ubuntu 18.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.66GiB
 Name: XXX
 ID: I3VV:H32G:ZJHD:NOV5:A424:VXV...
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
itsx commented 3 years ago

I run cAdvisor again with the --v=99 flag (now correctly bumping the verbosity up :)). Whole output is too long to post here, so I chose snippet around W1030 Warning: There are some Error messages, which might help ...

docker logs output:


...
[{Size:26214400 Type:Unified Level:3}]}] CloudProvider:Unknown InstanceType:Unknown InstanceID:None}
I1030 20:54:39.288789       1 manager.go:199] Version: {KernelVersion:4.15.0-118-generic ContainerOsVersion:Alpine Linux v3.10 DockerVersion:19.03.13 DockerAPIVersion:1.40 CadvisorVersion:v0.36.0 CadvisorRevision:4fe450a2}
I1030 20:54:39.291301       1 factory.go:123] Registration of the mesos container factory failed: unable to create mesos agent client: failed to get version
I1030 20:54:39.291331       1 factory.go:54] Registering systemd factory
I1030 20:54:39.294215       1 factory.go:137] Registering containerd factory
I1030 20:54:39.294424       1 factory.go:123] Registration of the crio container factory failed: Get http://%2Fvar%2Frun%2Fcrio%2Fcrio.sock/info: dial unix /var/run/crio/crio.sock: connect: no such file or directory
I1030 20:54:39.324261       1 factory.go:369] Registering Docker factory
I1030 20:54:39.324763       1 factory.go:101] Registering Raw factory
I1030 20:54:39.325248       1 manager.go:1158] Started watching for new ooms in manager
I1030 20:54:39.330242       1 nvidia.go:53] No NVIDIA devices found.
I1030 20:54:39.330329       1 factory.go:167] Error trying to work out if we can handle /: / not handled by systemd handler
I1030 20:54:39.330345       1 factory.go:178] Factory "systemd" was unable to handle container "/"
I1030 20:54:39.330387       1 factory.go:178] Factory "containerd" was unable to handle container "/"
I1030 20:54:39.330400       1 factory.go:178] Factory "docker" was unable to handle container "/"
I1030 20:54:39.330419       1 factory.go:174] Using factory "raw" for container "/"
I1030 20:54:39.331484       1 manager.go:950] Added container: "/" (aliases: [], namespace: "")
I1030 20:54:39.332237       1 handler.go:325] Added event &{/ 2020-10-30 15:34:44.043969861 +0000 UTC containerCreation {<nil>}}
I1030 20:54:39.332367       1 manager.go:272] Starting recovery of all containers
I1030 20:54:39.332722       1 container.go:467] Start housekeeping for container "/"
W1030 20:54:39.332981       1 container.go:425] Failed to get RecentStats("/") while determining the next housekeeping: unable to find data in memory cache
I1030 20:54:39.362562       1 factory.go:167] Error trying to work out if we can handle /docker/b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333: /docker/b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333 not handled by systemd handler
I1030 20:54:39.362597       1 factory.go:178] Factory "systemd" was unable to handle container "/docker/b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333"
I1030 20:54:39.363544       1 factory.go:167] Error trying to work out if we can handle /docker/b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333: failed to load container: container "b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333" in namespace "k8s.io": not found
I1030 20:54:39.363574       1 factory.go:178] Factory "containerd" was unable to handle container "/docker/b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333"
I1030 20:54:39.366333       1 factory.go:174] Using factory "docker" for container "/docker/b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333"
I1030 20:54:39.368321       1 manager.go:950] Added container: "/docker/b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333" (aliases: [qxxx b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333], namespace: "docker")
I1030 20:54:39.369107       1 handler.go:325] Added event &{/docker/b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333 2020-05-26 13:59:15.572317419 +0000 UTC containerCreation {<nil>}}
I1030 20:54:39.369439       1 factory.go:167] Error trying to work out if we can handle /system.slice/snapd.seeded.service: /system.slice/snapd.seeded.service not handled by systemd handler
I1030 20:54:39.369748       1 factory.go:178] Factory "systemd" was unable to handle container "/system.slice/snapd.seeded.service"
I1030 20:54:39.370035       1 factory.go:178] Factory "containerd" was unable to handle container "/system.slice/snapd.seeded.service"
I1030 20:54:39.370483       1 factory.go:178] Factory "docker" was unable to handle container "/system.slice/snapd.seeded.service"
I1030 20:54:39.369507       1 container.go:467] Start housekeeping for container "/docker/b05469a97e2c679d51acc3966437695144f642497dab1cef910a0064f1c39333"
I1030 20:54:39.370731       1 factory.go:171] Factory "raw" can handle container "/system.slice/snapd.seeded.service", but ignoring.
I1030 20:54:39.371142       1 manager.go:908] ignoring container "/system.slice/snapd.seeded.service"
I1030 20:54:39.371178       1 factory.go:167] Error trying to work out if we can handle /system.slice/cloud-init.service: /system.slice/cloud-init.service not handled by systemd handler
I1030 20:54:39.371198       1 factory.go:178] Factory "systemd" was unable to handle container "/system.slice/cloud-init.service"
I1030 20:54:39.371218       1 factory.go:178] Factory "containerd" was unable to handle container "/system.slice/cloud-init.service"
I1030 20:54:39.371242       1 factory.go:178] Factory "docker" was unable to handle container "/system.slice/cloud-init.service"
I1030 20:54:39.371263       1 factory.go:171] Factory "raw" can handle container "/system.slice/cloud-init.service", but ignoring.
I1030 20:54:39.371290       1 manager.go:908] ignoring container "/system.slice/cloud-init.service"
I1030 20:54:39.371317       1 factory.go:167] Error trying to work out if we can handle /docker/317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947: /docker/317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947 not handled by systemd handler
I1030 20:54:39.371340       1 factory.go:178] Factory "systemd" was unable to handle container "/docker/317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947"
I1030 20:54:39.372072       1 factory.go:167] Error trying to work out if we can handle /docker/317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947: failed to load container: container "317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947" in namespace "k8s.io": not found
I1030 20:54:39.372098       1 factory.go:178] Factory "containerd" was unable to handle container "/docker/317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947"
I1030 20:54:39.374996       1 factory.go:174] Using factory "docker" for container "/docker/317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947"
I1030 20:54:39.377106       1 manager.go:950] Added container: "/docker/317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947" (aliases: [gxxx 317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947], namespace: "docker")
I1030 20:54:39.377962       1 handler.go:325] Added event &{/docker/317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947 2020-05-06 07:43:01.701829768 +0000 UTC containerCreation {<nil>}}
I1030 20:54:39.378270       1 factory.go:167] Error trying to work out if we can handle /system.slice/systemd-networkd.service: /system.slice/systemd-networkd.service not handled by systemd handler
I1030 20:54:39.378402       1 factory.go:178] Factory "systemd" was unable to handle container "/system.slice/systemd-networkd.service"
I1030 20:54:39.378536       1 factory.go:178] Factory "containerd" was unable to handle container "/system.slice/systemd-networkd.service"
I1030 20:54:39.378674       1 factory.go:178] Factory "docker" was unable to handle container "/system.slice/systemd-networkd.service"
I1030 20:54:39.378807       1 factory.go:171] Factory "raw" can handle container "/system.slice/systemd-networkd.service", but ignoring.
I1030 20:54:39.378950       1 manager.go:908] ignoring container "/system.slice/systemd-networkd.service"
I1030 20:54:39.379078       1 factory.go:171] Factory "systemd" can handle container "/system.slice/sys-fs-fuse-connections.mount", but ignoring.
I1030 20:54:39.379315       1 manager.go:908] ignoring container "/system.slice/sys-fs-fuse-connections.mount"
I1030 20:54:39.379474       1 factory.go:171] Factory "systemd" can handle container "/system.slice/dev-mqueue.mount", but ignoring.
I1030 20:54:39.378358       1 container.go:467] Start housekeeping for container "/docker/317ac8a9aeba3a7df7c2933becf93da3801765c958df1b80e4b4b8ecebb5d947"
I1030 20:54:39.379678       1 manager.go:908] ignoring container "/system.slice/dev-mqueue.mount"
I1030 20:54:39.380064       1 factory.go:167] Error trying to work out if we can handle /system.slice/grub-common.service: /system.slice/grub-common.service not handled by systemd handler
I1030 20:54:39.380094       1 factory.go:178] Factory "systemd" was unable to handle container "/system.slice/grub-common.service"
I1030 20:54:39.380116       1 factory.go:178] Factory "containerd" was unable to handle container "/system.slice/grub-common.service"
I1030 20:54:39.380136       1 factory.go:178] Factory "docker" was unable to handle container "/system.slice/grub-common.service"
I1030 20:54:39.380159       1 factory.go:171] Factory "raw" can handle container "/system.slice/grub-common.service", but ignoring.
I1030 20:54:39.380188       1 manager.go:908] ignoring container "/system.slice/grub-common.service"
I1030 20:54:39.380212       1 factory.go:167] Error trying to work out if we can handle /system.slice/snapd.socket: /system.slice/snapd.socket not handled by systemd handler
I1030 20:54:39.380232       1 factory.go:178] Factory "systemd" was unable to handle container "/system.slice/snapd.socket"
I1030 20:54:39.380250       1 factory.go:178] Factory "containerd" was unable to handle container "/system.slice/snapd.socket"
I1030 20:54:39.380266       1 factory.go:178] Factory "docker" was unable to handle container "/system.slice/snapd.socket"
I1030 20:54:39.380286       1 factory.go:171] Factory "raw" can handle container "/system.slice/snapd.socket", but ignoring.
I1030 20:54:39.380311       1 manager.go:908] ignoring container "/system.slice/snapd.socket"
I1030 20:54:39.380345       1 factory.go:167] Error trying to work out if we can handle /docker/71cc527a91e9c97649c8ad906b7af1b6bc9e1ed5f668c2855396f2e66fe71313: /docker/71cc527a91e9c97649c8ad906b7af1b6bc9e1ed5f668c2855396f2e66fe71313 not handled by systemd handler
I1030 20:54:39.380375       1 factory.go:178] Factory "systemd" was unable to handle container "/docker/71cc527a91e9c97649c8ad906b7af1b6bc9e1ed5f668c2855396f2e66fe71313"
I1030 20:54:39.380974       1 factory.go:167] Error trying to work out if we can handle /docker/71cc527a91e9c97649c8ad906b7af1b6bc9e1ed5f668c2855396f2e66fe7131
...
caoshitong369 commented 3 years ago

When I set --disable_root_cgroup_stats=true, container.go:448] failed to get recentstats("/") while determining the next housekeeping: unable to find data in memory cache

caoshitong369 commented 3 years ago

image image image

m1keil commented 3 years ago
Docker version 20.10.2, build 2291f61
cAdvisor version v0.37.0 (65fa5b44)
Ubuntu 20.04.1 LTS

Having --disable_root_cgroup_stats=true results in no container_* metrics and similar errors in logs as described above. We found this out why upgrading from 16.04 LTS that also included an upgrade of cAdvisor from 0.36 to 0.37 and Docker 19.03 to 20.10.

I've tried downgrading cAvisor without luck.

null-test-7 commented 3 years ago

I tried to restart docker and kubelet,but it didn't work. I reboot the node, the cadvisor back to normal.