google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.97k stars 2.31k forks source link

Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/system.slice/unbound-anchor.service #2533

Open isabelnoronha61 opened 4 years ago

isabelnoronha61 commented 4 years ago

docker-compose.yml version: '3.7' services: cadvisor: image: google/cadvisor volumes:

CPU usage goes beyond 100%.

dashpole commented 4 years ago

Interesting... The "no such file or directory" errors usually indicate that the cgroup was created and then immediately removed before cAdvisor was able to process the event. Generally, they are safe to ignore.

dashpole commented 4 years ago

You should probably also specify the cAdvisor version explicitly. We had to stop pushing images with the "latest" tag recently, as we implemented an immutable image policy recently. Do you know what version is actually running?

isabelnoronha61 commented 4 years ago

You should probably also specify the cAdvisor version explicitly. We had to stop pushing images with the "latest" tag recently, as we implemented an immutable image policy recently. Do you know what version is actually running? Today I pulled the latest image gcr.io/google_containers/cadvisor:v0.36.0 On my monitoring host this is the snippet of docker-compose file: cadvisor: image: gcr.io/google_containers/cadvisor:v0.36.0 volumes:

  • /:/rootfs:ro
  • /var/run:/var/run:rw
  • /sys:/sys:ro
  • /var/lib/docker/:/var/lib/docker:ro
  • /cgroup/cpu:/cgroup/cpu
  • /cgroup/cpuacct:/cgroup/cpuacct
  • /cgroup/cpuset:/cgroup/cpuset
  • /cgroup/memory:/cgroup/memory
  • /cgroup/blkio:/cgroup/blkio

    - /cgroup:/sys/fs/cgroup:ro

  • /cgroup:/cgroup:ro privileged: true ports:
  • 8080:8080 command:
  • --allow_dynamic_housekeeping=true
  • --housekeeping_interval=30s
  • --global_housekeeping_interval=2m
  • --disable_metrics=disk,tcp,udp
  • --docker_only=true cadvisor logs: I0514 05:57:31.939039 1 manager.go:1148] Exiting thread watching subcontainers I0514 05:57:31.939072 1 manager.go:365] Exiting global housekeeping thread I0514 05:57:31.939092 1 cadvisor.go:231] Exiting given signal: terminated I0514 07:22:52.001142 1 manager.go:1148] Exiting thread watching subcontainers I0514 07:22:52.001212 1 manager.go:365] Exiting global housekeeping thread I0514 07:22:52.001281 1 cadvisor.go:231] Exiting given signal: terminated

However, the same docker-compose file running on a target which contains around 2K containers gives following log. F0514 07:48:34.137076 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpu,cpuacct: no space left on device F0514 07:48:48.975485 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuset: no space left on device F0514 07:49:06.327658 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpu,cpuacct: no space left on device F0514 07:49:35.141404 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/devices: no space left on device F0514 07:50:45.864350 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/pids: no space left on device F0514 07:51:00.594952 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/memory: no space left on device F0514 07:51:22.240519 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/devices: no space left on device F0514 07:51:37.171702 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/pids: no space left on device F0514 07:51:52.068816 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpu,cpuacct: no space left on device F0514 07:52:17.045079 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpu,cpuacct: no space left on device F0514 07:52:32.120150 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/devices: no space left on device F0514 07:52:47.124536 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/pids: no space left on device F0514 07:53:05.982415 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/devices: no space left on device F0514 07:53:20.969768 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpu,cpuacct: no space left on device F0514 07:53:35.691449 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/blkio: no space left on device F0514 07:53:50.408721 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpu,cpuacct: no space left on device F0514 07:54:15.782048 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpu,cpuacct: no space left on device F0514 07:54:30.962349 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/memory: no space left on device F0514 07:54:45.685793 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/devices: no space left on device F0514 07:55:00.551526 1 cadvisor.go:188] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpu,cpuacct: no space left on device W0514 07:56:01.662330 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/user.slice/user-0.slice/session-7648.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/user.slice/user-0.slice/session-7648.scope: no such file or directory W0514 07:56:01.662576 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/user.slice/user-0.slice/session-7648.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/user.slice/user-0.slice/session-7648.scope: no such file or directory W0514 07:56:01.710082 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/user.slice/user-0.slice/session-7648.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/user.slice/user-0.slice/session-7648.scope: no such file or directory 2020/05/14 07:56:22 http: superfluous response.WriteHeader call from github.com/prometheus/client_golang/prometheus/promhttp.httpError (http.go:306) I0514 07:56:32.876908 1 manager.go:1148] Exiting thread watching subcontainers I0514 07:56:32.876978 1 manager.go:365] Exiting global housekeeping thread I0514 07:56:32.877026 1 cadvisor.go:231] Exiting given signal: terminated W0514 07:57:09.568674 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7649.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7649.scope: no such file or directory W0514 07:57:09.568885 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/user.slice/user-0.slice/session-7649.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/user.slice/user-0.slice/session-7649.scope: no such file or directory W0514 07:57:09.568959 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/user.slice/user-0.slice/session-7649.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/user.slice/user-0.slice/session-7649.scope: no such file or directory W0514 07:57:09.569006 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/user.slice/user-0.slice/session-7649.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/user.slice/user-0.slice/session-7649.scope: no such file or directory W0514 07:58:04.142329 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/user.slice/user-0.slice/session-7650.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/user.slice/user-0.slice/session-7650.scope: no such file or directory W0514 07:58:04.223811 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/user.slice/user-0.slice/session-7650.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/user.slice/user-0.slice/session-7650.scope: no such file or directory W0514 07:58:04.255806 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/user.slice/user-0.slice/session-7650.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/user.slice/user-0.slice/session-7650.scope: no such file or directory W0514 08:00:02.283887 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/sysstat-collect.service: no such file or directory W0514 08:00:02.284123 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/blkio/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/system.slice/sysstat-collect.service: no such file or directory W0514 08:00:02.284217 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/system.slice/sysstat-collect.service: no such file or directory W0514 08:00:02.284277 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/system.slice/sysstat-collect.service: no such file or directory W0514 08:00:02.284342 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/system.slice/sysstat-collect.service: no such file or directory W0514 08:00:02.285099 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7652.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7652.scope: no such file or directory W0514 08:00:02.285342 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7652.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7652.scope: no such file or directory W0514 08:00:02.285560 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/user.slice/user-0.slice/session-7652.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/user.slice/user-0.slice/session-7652.scope: no such file or directory W0514 08:00:02.285812 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/user.slice/user-0.slice/session-7652.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/user.slice/user-0.slice/session-7652.scope: no such file or directory W0514 08:00:02.285943 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/user.slice/user-0.slice/session-7652.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/user.slice/user-0.slice/session-7652.scope: no such file or directory 2020/05/14 08:00:28 http: superfluous response.WriteHeader call from github.com/prometheus/client_golang/prometheus/promhttp.httpError (http.go:306) W0514 08:01:02.167895 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7653.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7653.scope: no such file or directory W0514 08:01:03.218814 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7653.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7653.scope: no such file or directory W0514 08:01:03.218994 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/user.slice/user-0.slice/session-7653.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/user.slice/user-0.slice/session-7653.scope: no such file or directory W0514 08:01:03.219060 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/user.slice/user-0.slice/session-7653.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/user.slice/user-0.slice/session-7653.scope: no such file or directory W0514 08:01:03.219126 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/user.slice/user-0.slice/session-7653.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/user.slice/user-0.slice/session-7653.scope: no such file or directory W0514 08:02:02.624902 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7654.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7654.scope: no such file or directory W0514 08:02:02.688773 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7654.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7654.scope: no such file or directory W0514 08:02:02.689125 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/user.slice/user-0.slice/session-7654.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/user.slice/user-0.slice/session-7654.scope: no such file or directory W0514 08:02:02.689252 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/user.slice/user-0.slice/session-7654.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/user.slice/user-0.slice/session-7654.scope: no such file or directory W0514 08:02:02.770711 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/user.slice/user-0.slice/session-7654.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/user.slice/user-0.slice/session-7654.scope: no such file or directory 2020/05/14 08:04:22 http: superfluous response.WriteHeader call from github.com/prometheus/client_golang/prometheus/promhttp.httpError (http.go:306) W0514 08:05:01.957659 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/user.slice/user-0.slice/session-7657.scope": 0x40000100 == IN_CREATE|IN_ISDIR): open /sys/fs/cgroup/devices/user.slice/user-0.slice/session-7657.scope: no such file or directory W0514 08:05:01.958081 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/user.slice/user-0.slice/session-7657.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/user.slice/user-0.slice/session-7657.scope: no such file or directory W0514 08:06:02.058847 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7658.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7658.scope: no such file or directory W0514 08:06:02.106817 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7658.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7658.scope: no such file or directory W0514 08:06:02.442804 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/user.slice/user-0.slice/session-7658.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/user.slice/user-0.slice/session-7658.scope: no such file or directory W0514 08:06:02.787811 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/user.slice/user-0.slice/session-7658.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/user.slice/user-0.slice/session-7658.scope: no such file or directory W0514 08:06:03.747833 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/user.slice/user-0.slice/session-7658.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/user.slice/user-0.slice/session-7658.scope: no such file or directory 2020/05/14 08:06:23 http: superfluous response.WriteHeader call from github.com/prometheus/client_golang/prometheus/promhttp.httpError (http.go:306) W0514 08:07:02.943069 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7659.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/user.slice/user-0.slice/session-7659.scope: no such file or directory W0514 08:07:02.945246 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7659.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/user.slice/user-0.slice/session-7659.scope: no such file or directory W0514 08:07:02.945510 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/user.slice/user-0.slice/session-7659.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/user.slice/user-0.slice/session-7659.scope: no such file or directory W0514 08:07:02.945722 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/user.slice/user-0.slice/session-7659.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/user.slice/user-0.slice/session-7659.scope: no such file or directory W0514 08:07:02.946031 1 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/user.slice/user-0.slice/session-7659.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/user.slice/user-0.slice/session-7659.scope: no such file or directory

I increased max_user_watches like below: sudo sysctl fs.inotify.max_user_watches=1048576

isabelnoronha61 commented 4 years ago

is cAdvisor capable of monitoring around 2K containers? When I use the top command I get cAdvisor exceeding CPU usage to 1000%. I have configured various run-time flags.

dashpole commented 4 years ago

Try with --disable_metrics=percpu,hugetlb,sched,tcp,udp,advtcp,disk

isabelnoronha61 commented 4 years ago

After upgrading to v0.36.0 I don't get system/slice metrics anymore.

isabelnoronha61 commented 4 years ago

Try with --disable_metrics=percpu,hugetlb,sched,tcp,udp,advtcp,disk

Still, it's going beyond 500%!! Are there any tweaks? cadvisor:

image: google/cadvisor

image: gcr.io/google_containers/cadvisor:v0.36.0
volumes:
  - /:/rootfs:ro
  - /var/run:/var/run:rw
  - /sys:/sys:ro
  - /var/lib/docker/:/var/lib/docker:ro
  #- /cgroup/cpu:/cgroup/cpu
  #- /cgroup/cpuacct:/cgroup/cpuacct
  #- /cgroup/cpuset:/cgroup/cpuset
  #- /cgroup/memory:/cgroup/memory
  #- /cgroup/blkio:/cgroup/blkio
  #- /cgroup:/sys/fs/cgroup:ro
  - /cgroup:/cgroup:ro
ports:
  - 8080:8080
privileged: true
command:
  - --allow_dynamic_housekeeping=true
  - --housekeeping_interval=5m
  - --global_housekeeping_interval=2m
  - --disable_metrics=percpu,sched,tcp,udp,advtcp,disk,network
  - --docker_only=true
restart: always
deploy:
  mode: global
isabelnoronha61 commented 4 years ago

Is it because the number of containers is 2K?

dashpole commented 4 years ago

I havent ever run with that many before, so it is definitely possible.

isabelnoronha61 commented 4 years ago

Okay is there any way I can do some kind of workaround and lower the CPU consumption? This is going to be not just in one server but at least 14 servers. I am using such a huge no. of containers for simulation.

isabelnoronha61 commented 4 years ago

Can cAdvisor do some kind of load balancing while doing discovery for the container metrics?

dashpole commented 4 years ago

I'm not sure I understand the last comment. In general, I welcome any performance improvements, so if you can generate any perf flamegraphs or otherwise, I am happy to help identify the primary consumers of CPU time, and figure out how to optimize cadvisor to meet your needs.

isabelnoronha61 commented 4 years ago

cadvisor14

isabelnoronha61 commented 4 years ago
perf
isabelnoronha61 commented 4 years ago

@dashpole Is there any other way to get proper CPU stats ?

dashpole commented 4 years ago

it almost looks like perf events are enabled... I can see __perf_event_task_schedule... and __intel_pmu_enable... cc @iwankgb.

@isabelnoronha61 can you try the previous version with the same args? v0.35.0 IIRC.

The actual cadvisor go code's use is on the left side. Serving requests (the far left) is about half of that usage, and the other portion is likely for scanning cgroups and collecting the metrics.

iwankgb commented 4 years ago

@dashpole @isabelnoronha61 I will take a look at this in the European evening but what seems to be weird to be is that we can't see perf_event_open syscall (it would be 298 rather than 64). @isabelnoronha61 is there any chance that there is another application collecting perf events running on the host? Or resctr filesystem (memory bandwidth and cache allocation and monitoring) is used? It would explain all these writes to MSRs.

iwankgb commented 4 years ago

BTW - I don't thing @isabelnoronha61 enabled perf events so they should not be collected at all; unless there is some insane bug causing cAdvisor to collect some events when no configuration is provided.

iwankgb commented 4 years ago

@isabelnoronha61 if you were using perf tool to generate flame graph then you probably affected overall system performance: each time context switch occurred MSRs must have been read from and written to, otherwise counters would not store valid values. Is there any chance to zoom on the right half of the image? This is were the answer is, I think. @dashpole I don't think it's related to perf in cAdvisor but it's definitely side effect of using perf in general. runtime.findrunnable() is responsible for finding a goroutine waiting for execution and if writing to MSRs happens upper in the stack then it is related to measuring cAdvisor performance, I believe.

dashpole commented 4 years ago

Ah, that makes sense. Sorry for the goose chase. @isabelnoronha61 if you could share the svg, that would be helpful.

Also, can you share cAdvisor's CPU usage, in cores during the run?

isabelnoronha61 commented 4 years ago

@iwankgb Yeah I'm not using perf in cadvisor. I'm taking the system stats. Here is the CPU usage based on cores When cadvisor is 500%

cadvisor_500%

This is very odd I don't understand!

Weird

Even though cadvisor is just 123%

Weid2
isabelnoronha61 commented 4 years ago

Ah, that makes sense. Sorry for the goose chase. @isabelnoronha61 if you could share the svg, that would be helpful.

Also, can you share the cAdvisor's CPU usage, in cores during the run?

I tried sending svg but git doesn't allow svg format.

iwankgb commented 4 years ago

I think that cAdvisor's CPU usage will depend on number of monitored containers. As @dashpole mentioned above 2000 is quite a large number. You can take a look at #1498 - some useful advice might be hidden there. You can try to rebuild cAdvisor with pprof support - it might get more useful information on what is causing the problem.

isabelnoronha61 commented 4 years ago

@iwankgb yeah sure. Could you have a look at this png? profile

iwankgb commented 4 years ago

It seems to me that Prometheus exposition format and HTTP response compression are your problem. You can try to use storage driver instead of Prometheus but I have no idea if it's feasible in your case and I won't promise you that it will help. If not then it might be possible to disabl compression in Prometheus client (you'll have to verify it).

dashpole commented 4 years ago

Yep. Serving all of the metrics in prometheus format is the main user of CPU it seems. This is where something like the opentelemetry format would be useful to have...

dashpole commented 4 years ago

The json endpoints are likely even worse. I'm not sure about the storage drivers.

isabelnoronha61 commented 4 years ago

So can the scraped metrics from cAdvisor be stored directly in VictoriaMetrics using storage driver flags instead of Prometheus TSDB? Then in prometheus config make use of remote_read: and continue with promql queries and render ing on grafana?

iwankgb commented 4 years ago

It looks doable judging by VictoriaMetrics readme.