oijkn / Docker-Raspberry-PI-Monitoring

A docker-compose stack solution for monitoring host and containers with Prometheus, Grafana, cAdvisor and NodeExporter.
MIT License
363 stars 55 forks source link

monitoring-cadvisor unable to get data from containers. #17

Closed iAmSaugata closed 1 year ago

iAmSaugata commented 2 years ago

I am on PiOS 64 Lite, dashboard is running but no data from containers. I have updated the /boot/cmdline.txt and /etc/docker/daemon.json as suggested in the following thread. https://github.com/oijkn/Docker-Raspberry-PI-Monitoring/issues/7

And this is what let me run all the containers without any errors.

git clone https://github.com/oijkn/Docker-Raspberry-PI-Monitoring.git mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/data mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning/datasources mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning/plugins mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning/notifiers mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning/dashboards mkdir /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/data mkdir /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/config sudo chown -R 472:472 /home/pi/Docker-Raspberry-PI-Monitoring/grafana sudo chown -R 472:472 /home/pi/Docker-Raspberry-PI-Monitoring/prometheus sudo chmod 777 /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/data sudo mv /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/prometheus.yml /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/config/prometheus.yml sudo touch /home/pi/Docker-Raspberry-PI-Monitoring/grafana/grafana.ini sudo docker-compose up -d

This is the error message for [monitoring-cadvisor]

failed to fetch hugetlb info github.com/opencontainers/runc/libcontainer/cgroups/fs2.statHugeTlb /go/pkg/mod/github.com/opencontainers/runc@v1.0.0-rc95/libcontainer/cgroups/fs2/hugetlb.go:35 github.com/opencontainers/runc/libcontainer/cgroups/fs2.(manager).GetStats /go/pkg/mod/github.com/opencontainers/runc@v1.0.0-rc95/libcontainer/cgroups/fs2/fs2.go:125 github.com/google/cadvisor/container/libcontainer.(Handler).GetStats /go/src/github.com/google/cadvisor/container/libcontainer/handler.go:83 github.com/google/cadvisor/container/raw.(rawContainerHandler).GetStats /go/src/github.com/google/cadvisor/container/raw/handler.go:232 github.com/google/cadvisor/manager.(containerData).updateStats /go/src/github.com/google/cadvisor/manager/container.go:637 github.com/google/cadvisor/manager.(containerData).housekeepingTick /go/src/github.com/google/cadvisor/manager/container.go:583 github.com/google/cadvisor/manager.(containerData).housekeeping /go/src/github.com/google/cadvisor/manager/container.go:531 runtime.goexit /usr/lib/go/src/runtime/asm_arm64.s:1133], continuing to push stats W0316 05:24:55.189639 1 container.go:586] Failed to update stats for container "/system.slice/ssh.service": error while statting cgroup v2: [open /sys/kernel/mm/hugepages: no such file or directory failed to fetch hugetlb info github.com/opencontainers/runc/libcontainer/cgroups/fs2.statHugeTlb /go/pkg/mod/github.com/opencontainers/runc@v1.0.0-rc95/libcontainer/cgroups/fs2/hugetlb.go:35 github.com/opencontainers/runc/libcontainer/cgroups/fs2.(manager).GetStats /go/pkg/mod/github.com/opencontainers/runc@v1.0.0-rc95/libcontainer/cgroups/fs2/fs2.go:125 github.com/google/cadvisor/container/libcontainer.(Handler).GetStats /go/src/github.com/google/cadvisor/container/libcontainer/handler.go:83 github.com/google/cadvisor/container/raw.(rawContainerHandler).GetStats /go/src/github.com/google/cadvisor/container/raw/handler.go:232 github.com/google/cadvisor/manager.(containerData).updateStats /go/src/github.com/google/cadvisor/manager/container.go:637 github.com/google/cadvisor/manager.(containerData).housekeepingTick /go/src/github.com/google/cadvisor/manager/container.go:583 github.com/google/cadvisor/manager.(containerData).housekeeping /go/src/github.com/google/cadvisor/manager/container.go:531 runtime.goexit /usr/lib/go/src/runtime/asm_arm64.s:1133], continuing to push stats W0316 05:24:55.196379 1 container.go:586] Failed to update stats for container "/system.slice/docker-8c01dc5b893a7e67d5fa12416d5457993bf01e4c94590b069bd115da9746a5ee.scope": error while statting cgroup v2: [open /sys/kernel/mm/hugepages: no such file or directory failed to fetch hugetlb info github.com/opencontainers/runc/libcontainer/cgroups/fs2.statHugeTlb /go/pkg/mod/github.com/opencontainers/runc@v1.0.0-rc95/libcontainer/cgroups/fs2/hugetlb.go:35 github.com/opencontainers/runc/libcontainer/cgroups/fs2.(manager).GetStats /go/pkg/mod/github.com/opencontainers/runc@v1.0.0-rc95/libcontainer/cgroups/fs2/fs2.go:125 github.com/google/cadvisor/container/libcontainer.(Handler).GetStats /go/src/github.com/google/cadvisor/container/libcontainer/handler.go:83 github.com/google/cadvisor/container/docker.(dockerContainerHandler).GetStats /go/src/github.com/google/cadvisor/container/docker/handler.go:460 github.com/google/cadvisor/manager.(containerData).updateStats /go/src/github.com/google/cadvisor/manager/container.go:637 github.com/google/cadvisor/manager.(containerData).housekeepingTick /go/src/github.com/google/cadvisor/manager/container.go:583 github.com/google/cadvisor/manager.(containerData).housekeeping /go/src/github.com/google/cadvisor/manager/container.go:531 runtime.goexit /usr/lib/go/src/runtime/asm_arm64.s:1133], continuing to push stats W0316 05:25:10.338801 1 manager.go:696] Error getting data for container /system.slice/systemd-journald.service because of race condition W0316 05:25:10.340027 1 manager.go:696] Error getting data for container /system.slice/udisks2.service because of race condition W0316 05:25:10.340726 1 manager.go:696] Error getting data for container /system.slice/docker-e5dc78565eabaf33e8c9fafee9fdc258a51675cad55c62b72853cf829b6015b7.scope because of race condition W0316 05:25:10.341810 1 manager.go:696] Error getting data for container /system.slice/dbus.service because of race condition W0316 05:25:10.342723 1 manager.go:696] Error getting data for container /user.slice/user-1000.slice/user@1000.service/init.scope because of race condition W0316 05:25:10.343710 1 manager.go:696] Error getting data for container /system.slice/system-getty.slice/getty@tty1.service because of race condition W0316 05:25:10.344704 1 manager.go:696] Error getting data for container /user.slice/user-1000.slice/user@1000.service/app.slice because of race condition W0316 05:25:10.345870 1 manager.go:696] Error getting data for container /system.slice/polkit.service because of race condition W0316 05:25:10.346642 1 manager.go:696] Error getting data for container /system.slice/docker-05d382abaeafcf2e927f58d382a0db1a2f9bf6046b541ea9dd68039fcece8246.scope because of race condition W0316 05:25:10.347522 1 manager.go:696] Error getting data for container /system.slice/docker-8214be662772b232ddf894a0a148046d890ef0ebd6c0aac0c1357a1e34dcaeaf.scope because of race condition W0316 05:25:10.348694 1 manager.go:696] Error getting data for container /system.slice/docker.socket because of race condition W0316 05:25:10.349664 1 manager.go:696] Error getting data for container /system.slice/system-bthelper.slice/bthelper@hci0.service because of race condition W0316 05:25:10.350410 1 manager.go:696] Error getting data for container /system.slice/docker-8c01dc5b893a7e67d5fa12416d5457993bf01e4c94590b069bd115da9746a5ee.scope because of race condition W0316 05:25:10.351592 1 manager.go:696] Error getting data for container /system.slice/systemd-udevd.service because of race condition W0316 05:25:10.354269 1 manager.go:696] Error getting data for container /system.slice/wpa_supplicant.service because of race condition W0316 05:25:10.355296 1 manager.go:696] Error getting data for container /system.slice/docker-91fc4c0d91cbf1006e32a05b411cdd93256df1cd81e0941a55152fe32f60d7cd.scope because of race condition W0316 05:25:10.357057 1 manager.go:696] Error getting data for container /system.slice/system-modprobe.slice because of race condition W0316 05:25:10.359454 1 manager.go:696] Error getting data for container /system.slice/cron.service because of race condition W0316 05:25:10.360809 1 manager.go:696] Error getting data for container /system.slice/ssh.service because of race condition W0316 05:25:10.361964 1 manager.go:696] Error getting data for container /init.scope because of race condition W0316 05:25:10.363086 1 manager.go:696] Error getting data for container /user.slice/user-1000.slice/user@1000.service/app.slice/dbus.socket because of race condition W0316 05:25:10.364426 1 manager.go:696] Error getting data for container /system.slice/systemd-timesyncd.service because of race condition W0316 05:25:10.365281 1 manager.go:696] Error getting data for container /system.slice/docker-bc086bdffefc5fcb26a585393fea83aa2c34b7086fb0a3ed4d0bc3bb50f14acb.scope because of race condition W0316 05:25:10.366045 1 manager.go:696] Error getting data for container /system.slice/docker-cbb07b89b243ec69e84d41a8e6882759e883acabd79d252afff812f5ca86ee2d.scope because of race condition W0316 05:25:10.367837 1 manager.go:696] Error getting data for container /system.slice because of race condition W0316 05:25:10.369058 1 manager.go:696] Error getting data for container /system.slice/containerd.service because of race condition W0316 05:25:10.370220 1 manager.go:696] Error getting data for container /system.slice/rsyslog.service because of race condition W0316 05:25:10.371408 1 manager.go:696] Error getting data for container /system.slice/rng-tools-debian.service because of race condition W0316 05:25:10.372182 1 manager.go:696] Error getting data for container /system.slice/docker-5b314ce7d65024a244eb3d8de2c84a3aa1b40985ad9a60519630c342f218eeba.scope because of race condition W0316 05:25:10.373434 1 manager.go:696] Error getting data for container /system.slice/systemd-logind.service because of race condition W0316 05:25:10.374623 1 manager.go:696] Error getting data for container /system.slice/system-systemd\x2dfsck.slice because of race condition W0316 05:25:10.375867 1 manager.go:696] Error getting data for container /system.slice/docker.service because of race condition W0316 05:25:10.378050 1 manager.go:696] Error getting data for container /user.slice/user-1000.slice because of race condition W0316 05:25:10.379356 1 manager.go:696] Error getting data for container /system.slice/bluetooth.service because of race condition W0316 05:25:10.380133 1 manager.go:696] Error getting data for container /system.slice/docker-20de10b24a1ba0aa4d5ae0b3c37dd631d1364e34bd110547f96b006c32029040.scope because of race condition W0316 05:25:10.380907 1 manager.go:696] Error getting data for container /system.slice/docker-6fd738388b5bc9ba4ef77d193976a760d2ee89d3bb7414c65c7fa53ae320231b.scope because of race condition W0316 05:25:10.382111 1 manager.go:696] Error getting data for container /system.slice/system-bthelper.slice because of race condition W0316 05:25:10.383069 1 manager.go:696] Error getting data for container /user.slice/user-1000.slice/session-129.scope because of race condition W0316 05:25:10.383866 1 manager.go:696] Error getting data for container /system.slice/docker-2b893059e92263d4eedc1510749edece6f4ca649b555516f59402911208ce5c0.scope because of race condition W0316 05:25:10.385079 1 manager.go:696] Error getting data for container /system.slice/avahi-daemon.service because of race condition W0316 05:25:10.386091 1 manager.go:696] Error getting data for container /user.slice/user-1000.slice/session-127.scope because of race condition W0316 05:25:10.387280 1 manager.go:696] Error getting data for container /user.slice because of race condition W0316 05:25:10.388309 1 manager.go:696] Error getting data for container /user.slice/user-1000.slice/user@1000.service because of race condition W0316 05:25:10.389847 1 manager.go:696] Error getting data for container /system.slice/dhcpcd.service because of race condition W0316 05:25:10.391069 1 manager.go:696] Error getting data for container /system.slice/triggerhappy.service because of race condition W0316 05:25:10.392404 1 manager.go:696] Error getting data for container /system.slice/hciuart.service because of race condition W0316 05:25:10.393589 1 manager.go:696] Error getting data for container /system.slice/system-getty.slice because of race condition

iAmSaugata commented 2 years ago

For those who are having same problem, please build your own docker image by using Dockerfile from the following location. https://github.com/google/cadvisor/issues/3011#issuecomment-975630481

Use the following for error free deployment in your PiOS 64 Bit.

git clone https://github.com/oijkn/Docker-Raspberry-PI-Monitoring.git

mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/data

mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning

mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning/datasources

mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning/plugins

mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning/notifiers

mkdir /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning/dashboards

mkdir /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/data

mkdir /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/config

sudo chown -R 472:472 /home/pi/Docker-Raspberry-PI-Monitoring/grafana

sudo chown -R 472:472 /home/pi/Docker-Raspberry-PI-Monitoring/prometheus

sudo chmod 777 /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/data

sudo mv /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/prometheus.yml /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/config/prometheus.yml

sudo touch /home/pi/Docker-Raspberry-PI-Monitoring/grafana/grafana.ini

cd Docker-Raspberry-PI-Monitoring

$null > docker-compose.yml

sudo nano docker-compose.yml

Use the following content for your docker-compose.yml

version: "3.8" services: cadvisor: container_name: monitoring-cadvisor command: --raw_cgroup_prefix_whitelist=/docker/ --disable_metrics=hugetlb devices:

  • /dev/kmsg expose:
  • 8080 hostname: rpi-cadvisor image: my_image:latest ipc: shareable networks:
  • rpimonitor_default ports:
  • 8228:8080 privileged: true restart: unless-stopped volumes:
  • /:/rootfs:ro
  • /var/run:/var/run:ro
  • /sys:/sys:ro
  • /var/lib/docker/:/var/lib/docker:ro
  • /dev/disk/:/dev/disk:ro

    grafana: container_name: monitoring-grafana environment:

  • GF_USERS_ALLOW_SIGN_UP=false
  • GF_PATHS_CONFIG=/etc/grafana/grafana.ini
  • GF_PATHS_DATA=/var/lib/grafana
  • GF_PATHS_HOME=/usr/share/grafana
  • GF_PATHS_LOGS=/var/log/grafana
  • GF_PATHS_PLUGINS=/var/lib/grafana/plugins
  • GF_PATHS_PROVISIONING=/etc/grafana/provisioning hostname: rpi-grafana image: grafana/grafana:latest networks:
  • rpimonitor_default ports:
  • 3000:3000 restart: unless-stopped volumes:

    to be modified depending on your needs

  • /home/pi/Docker-Raspberry-PI-Monitoring/grafana/data:/var/lib/grafana
  • /home/pi/Docker-Raspberry-PI-Monitoring/grafana/provisioning:/etc/grafana/provisioning
  • /home/pi/Docker-Raspberry-PI-Monitoring/grafana/grafana.ini:/etc/grafana/grafana.ini

    node-exporter: container_name: monitoring-node-exporter expose:

  • 9100 hostname: rpi-exporter image: prom/node-exporter:latest networks:
  • rpimonitor_default restart: unless-stopped

    prometheus: command:

  • '--config.file=/etc/prometheus/prometheus.yml'
  • '--storage.tsdb.path=/prometheus' container_name: monitoring-prometheus expose:
  • 9090 hostname: rpi-prom image: prom/prometheus:latest networks:
  • rpimonitor_default restart: unless-stopped volumes:

    to be modified depending on your needs

  • /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/data:/prometheus
  • /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/config:/etc/prometheus

    - /home/pi/Docker-Raspberry-PI-Monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yaml

    links:

  • cadvisor:cadvisor
  • node-exporter:node-exporter

networks: rpimonitor_default: external: true

Save it and execute following.

sudo docker-compose up -d

iAmSaugata commented 2 years ago

Monitoring

monitor

I guess it is still not able to get the memory data.

memory

martadinata666 commented 2 years ago

So which version you are building? More detailed about cmdline. PiOS 64bit, i assume bullseye?

https://github.com/novaspirit/pi-hosted/blob/master/docs/rpi_docker_monitor.md

iAmSaugata commented 2 years ago

Update : All the information is now showing, you have to update the /boot/cmdline.txt like following (in a single line) screenshot.

docker-memory

iAmSaugata commented 2 years ago

So which version you are building? More detailed about cmdline. PiOS 64bit, i assume bullseye?

https://github.com/novaspirit/pi-hosted/blob/master/docs/rpi_docker_monitor.md

Yes, PiOS 64bit bullsey.

Updated with the instructed value /boot/cmdline.txt , now all the information showing properly. https://github.com/novaspirit/pi-hosted/blob/master/docs/rpi_docker_monitor.md

iAmSaugata commented 2 years ago

This is the one. 'Race condition' and 'failed to fetch hugetlb info' errors in Raspbian · Issue #3011 · google/cadvisor (github.com)https://github.com/google/cadvisor/issues/3011

FROM arm64v8/golang as build RUN apt update && apt install -y git dmsetup RUN git clone \ --branch release-v0.38 \ --depth 1 \ https://github.com/google/cadvisor.git \ /go/src/github.com/google/cadvisor WORKDIR /go/src/github.com/google/cadvisor RUN make build

FROM arm64v8/debian COPY --from=build /go/src/github.com/google/cadvisor/cadvisor /usr/bin/cadvisor EXPOSE 8080 ENTRYPOINT ["/usr/bin/cadvisor", "-logtostderr"]


From: Dedy Martadinata S @.> Sent: Wednesday, March 16, 2022 11:00 PM To: oijkn/Docker-Raspberry-PI-Monitoring @.> Cc: Saugata Datta @.>; Author @.> Subject: Re: [oijkn/Docker-Raspberry-PI-Monitoring] monitoring-cadvisor unable to get data from containers. (Issue #17)

So which version you are building?

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Foijkn%2FDocker-Raspberry-PI-Monitoring%2Fissues%2F17%23issuecomment-1069383879&data=04%7C01%7C%7Cc49e8f5e9cf74f15bbe508da0772b031%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637830486406007308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=MoWc%2FnX5cP3Edw3MzAwuchvjy%2B%2Bexwjbkbfp%2F%2BAO12k%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGMEAMHRNRJIHMH74XE3MV3VAILD5ANCNFSM5Q22JRZA&data=04%7C01%7C%7Cc49e8f5e9cf74f15bbe508da0772b031%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637830486406007308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KqzCDuvPxRf7JgC2kVGvoZkLC6NY9zTbUrevmencQAw%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7Cc49e8f5e9cf74f15bbe508da0772b031%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637830486406007308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Ifc6Q1L%2F%2Fk%2F3gJ5kjFEHTvjD%2FPE5vrrAjsiPR4GUqRk%3D&reserved=0 or Androidhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7Cc49e8f5e9cf74f15bbe508da0772b031%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637830486406163548%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ZqUN8Z%2FvVoAfkfGbsweVr%2FEyXB%2Bowio5Z1u%2Bsrm98k0%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

martadinata666 commented 2 years ago

the branch part 0.38, change into 0.44 and remove the cmdline tweak.

Robin-KEB commented 2 years ago

@martadinata666 Is it needed to remove the cmdline tweak? Because if you remove it your docker stats command won't show memory data.

martadinata666 commented 2 years ago

So here is a thing abt cadvisor. cadvisor <= 0.39 can't do cgroup v2 properly, many data missing., so the cmdline tweaking to downgrade your system to cgroup v1. After that 0.40+ need remove the tweak as it already support v2 properly, 0.44 latest known most little issue with this. Thats how supposedly work.

@Robin-KEB dunno which system you are using, using newer cadvisor wil detect automatically as it should be. And I get docker stats show memory on PiOS only. So I assume it PiOS? It got stripped feature, you need the line tweak. Basically any PiOS either 32/64bit need the line tweak.

oijkn commented 1 year ago

This issue has been inactive for some time. Please let me know if it's still relevant and need attention or I will close it.