google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
17k stars 2.31k forks source link

Disk metrics for containers when using overlay2 with size limits #2667

Open anghelutar opened 4 years ago

anghelutar commented 4 years ago

I am trying to monitoring root partitions inside containers using gcr.io/cadvisor/cadvisor:v0.37.0 and Prometheus.

Docker is set up with:

"storage-driver": "overlay2",
"storage-opts": ["overlay2.size=5G"],

For example, for the following container, I would like to get the 5% (actually 4.06% when computed exactly)

_[root@infra-05 docker-compose]# docker exec -ti infra-05_logging-elasticsearch df Filesystem 1K-blocks Used Available Use% Mounted on overlay 5242880 213172 5029708 5% / tmpfs 65536 0 65536 0% /dev tmpfs 8023452 0 8023452 0% /sys/fs/cgroup /dev/mapper/vg0-varlibdockerlv 41932800 30960284 10972516 74% /etc/hosts shm 65536 0 65536 0% /dev/shm /dev/mapper/vg0-vardockerelasticsearchlv 146785280 24772532 122012748 17% /usr/share/elasticsearch/data /dev/mapper/vg0-lvroot 3997376 3023344 747936 81% /usr/share/elasticsearch/config/log4j2.properties tmpfs 8023452 0 8023452 0% /proc/acpi tmpfs 8023452 0 8023452 0% /proc/scsi tmpfs 8023452 0 8023452 0% /sys/firmware

Following query in Prometheus:

_container_fs_usage_bytes{instance="infra-05.node.consul:9103",name=~".elasticsearch."}/container_fs_limitbytes{instance="infra-05.node.consul:9103",name=~".elasticsearch."}*100

brings back a single line:

_{container_label_co_elastic_logs_module="elasticsearch",container_label_com_docker_compose_config_hash="ad29cbce151d36e15c4b13d72b1818bc0808e1936890e51e46e9c6e0816bb6a7",container_label_com_docker_compose_container_number="1",container_label_com_docker_compose_oneoff="False",container_label_com_docker_compose_project="docker-compose",container_label_com_docker_compose_service="logging-elasticsearch",container_label_com_docker_compose_version="1.21.2",container_label_org_label_schema_build_date="2019-11-26T01:06:52.520070Z",container_label_org_label_schema_license="Elastic-License",container_label_org_label_schema_name="Elasticsearch",container_label_org_label_schema_schema_version="1.0",container_label_org_label_schema_url="https://www.elastic.co/products/elasticsearch",container_label_org_label_schema_usage="https://www.elastic.co/guide/en/elasticsearch/reference/index.html",container_label_org_label_schema_vcs_ref="e9ccaed468e2fac2275a3761849cbee64b39519f",container_label_org_label_schema_vcs_url="https://github.com/elastic/elasticsearch",container_label_org_label_schema_vendor="Elastic",container_label_org_label_schema_version="7.5.0",container_label_org_opencontainers_image_created="2019-11-26T01:06:52.520070Z",container_label_org_opencontainers_image_documentation="https://www.elastic.co/guide/en/elasticsearch/reference/index.html",container_label_org_opencontainers_image_licenses="Elastic-License",container_label_org_opencontainers_image_revision="e9ccaed468e2fac2275a3761849cbee64b39519f",container_label_org_opencontainers_image_source="https://github.com/elastic/elasticsearch",container_label_org_opencontainers_image_title="Elasticsearch",container_label_org_opencontainers_image_url="https://www.elastic.co/products/elasticsearch",container_label_org_opencontainers_image_vendor="Elastic",container_label_org_opencontainers_image_version="7.5.0",customer="Xenit",device="/dev/mapper/vg0-varlibdockerlv",environment="Xenit_PROD",id="/docker/677f57566219f459f30d6c4abac0235b3ab43c0f8031747bf676dfa9b33d1047",image="elasticsearch:7.5.0",instance="infra-05.node.consul:9103",job="cadvisor",name="infra-05_logging-elasticsearch",service="cadvisor"} | 0.5084897741147741_

I noticed that the root partition in that container is available if I query for device=~overlay.*:

_container_fs_usage_bytes{instance="infra-05.node.consul:9103",device=~"overlay."}/container_fs_limit_bytes{instance="infra-05.node.consul:9103",device=~"overlay."}*100_

_{customer="Xenit",device="overlay_0-109",environment="XenitPROD",id="/",instance="infra-05.node.consul:9103",job="cadvisor",service="cadvisor"} | 4.0659332275390625 ......................

But how to link the device "overlay_0-109" with the container infra-05_logging-elasticsearch?

dashpole commented 4 years ago

To make sure I understand: The usage is correct, but the limit isn't associated with the correct container?

anghelutar commented 4 years ago

It's not associated with any container, the only non-custom labels I get are:

device="overlay_0-109",id="/",instance="infra-05.node.consul:9103"

Looks like a machine metric.

dashpole commented 4 years ago

Yeah, i'm not 100% sure, but I think this is the same problem as we have with docker volumes. We can monitor the disk space used by the container writable layer by using (the golang equivalent of) du, but we don't associate filesystems mounted into containers with the container.

anghelutar commented 4 years ago

A mount or a volume can belong to multiple containers, but the root partition is usually not a mount and thus it could be linked to a container. Is it technically not possible?

dashpole commented 4 years ago

Oh, I get it. Everything looks correct from cAdvisor's perspective: That is the root device "/". I was thinking this would be represented as some other device. In that case, you are just trying to figure out how to do the join?

I think you might need to join with the container metric first before dividing. I believe that would generate one metric stream for each container with the value of the node level metric, which you can then use as the denominator in your query.

anghelutar commented 4 years ago

Uhm, don’t think that possible, at least I don't know how to do that. How/where to join?

bobsongplus commented 3 years ago

any process ?

amaraldavi1 commented 4 months ago

I'm also with this issue. Any progress on this? Thank you!