google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
17.16k stars 2.32k forks source link

Issue launching container with cgroups v2 / AWS Linux 2023 #3468

Closed benw10-1 closed 8 months ago

benw10-1 commented 9 months ago

OS: AWS Linux 2023.2.20231113 Docker Version: 24.0.5 cAdvisor Image: cadvisor:latest

Launch cmd:

docker run \
  --volume=/dev/kmsg:/dev/kmsg:ro \
  --volume=/:/rootfs:ro \
 --volume=/var/run:/var/run:rw \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--volume=/sys:/sys:ro \
  -p=9225:9225 \
  --detach=true \
  --name=cadvisor \
  --privileged=true \
gcr.io/cadvisor/cadvisor \
 --port=9225 --store_container_labels=false --docker_only --disable_metrics=disk,network,tcp,udp,sched,process

docker logs ... output:

goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc0000d0001, 0xc00026a000, 0x68, 0xb8)
    /go/pkg/mod/k8s.io/klog/v2@v2.2.0/klog.go:996 +0xb8
k8s.io/klog/v2.(*loggingT).output(0x2441740, 0xc000000003, 0x0, 0x0, 0xc0001f5570, 0x2375833, 0xb, 0xaf, 0x0)
    /go/pkg/mod/k8s.io/klog/v2@v2.2.0/klog.go:945 +0x19d
k8s.io/klog/v2.(*loggingT).printf(0x2441740, 0x3, 0x0, 0x0, 0x16bcdd4, 0x1e, 0xc000679ea0, 0x1, 0x1)
    /go/pkg/mod/k8s.io/klog/v2@v2.2.0/klog.go:733 +0x17a
k8s.io/klog/v2.Fatalf(...)
    /go/pkg/mod/k8s.io/klog/v2@v2.2.0/klog.go:1456
main.main()
    /go/src/github.com/google/cadvisor/cmd/cadvisor.go:175 +0x3b8

goroutine 20 [syscall]:
os/signal.signal_recv(0x0)
    /usr/lib/go/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
    /usr/lib/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
    /usr/lib/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 21 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x2441740)
    /go/pkg/mod/k8s.io/klog/v2@v2.2.0/klog.go:1131 +0x8b
created by k8s.io/klog/v2.init.0
    /go/pkg/mod/k8s.io/klog/v2@v2.2.0/klog.go:416 +0xd6

goroutine 9 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc000130100)
    /go/pkg/mod/go.opencensus.io@v0.22.4/stats/view/worker.go:276 +0x100
created by go.opencensus.io/stats/view.init.0
    /go/pkg/mod/go.opencensus.io@v0.22.4/stats/view/worker.go:34 +0x68
F0204 01:01:37.719214       1 cadvisor.go:175] Failed to create a manager: mountpoint for cpu not found

Assuming it's an issue with AWS Linux 2023 using cgroup v2 instead of v1. Not completely sure though, as it could be specific to AWS Linux 2023 or I could be missing something obvious.

When looking for the cpu folder I could not find it at the correct spot.

On prev. OS (AWS Linux 2), the cpu folder was here:

ls /sys/fs/cgroup
blkio  cpu  cpuacct  cpu,cpuacct  cpuset  devices  freezer  hugetlb  memory  net_cls  net_cls,net_prio  net_prio  perf_event  pids  systemd

But now on AWS Linux 2024 it looks like:

ls /sys/fs/cgroup/
cgroup.controllers      cgroup.pressure  cgroup.subtree_control  cpu.stat               dev-hugepages.mount  io.cost.model  io.stat           memory.reclaim  sys-fs-fuse-connections.mount  sys-kernel-tracing.mount
cgroup.max.depth        cgroup.procs     cgroup.threads          cpuset.cpus.effective  dev-mqueue.mount     io.cost.qos    memory.numa_stat  memory.stat     sys-kernel-config.mount        system.slice
cgroup.max.descendants  cgroup.stat      cpu.pressure            cpuset.mems.effective  init.scope           io.pressure    memory.pressure   misc.capacity   sys-kernel-debug.mount         user.slice

Seems like my volume mapping is just wrong, but now do I have to map all these files individually or something?

(sanity check):

mount | grep 'cgroup'
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel)

Note that downgrading to cgroup v1 is not really an option.

jjgmckenzie commented 8 months ago

@benw10-1 I had this exact issue and I discoved that I was using gcr.io/cadvisor/cadvisor:latest - which is not the latest version, it's from 2020. What worked for me was using gcr.io/cadvisor/cadvisor:v0.47.2 instead; now everything work fine.