google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
17.15k stars 2.32k forks source link

whitelisted docker container labels not visible on prometheus #2570

Closed pradykaushik closed 4 years ago

pradykaushik commented 4 years ago

I am using prometheus to monitor container specific metrics using cadvisor as the exporter. First of all I would like to say that setting up cadvisor has been super easy (kudos to the developers!! 👍 ) and the documentation has been helpful. I am having a slight issue when it comes to exporting whitelisted container labels as prometheus labels.

The docker containers that are running on my cluster have the following labels assigned to them.

  1. task_id=<mesos task id>
  2. task_hostname=<host on which task is running>

I am running both prometheus and cadvisor standalone with the following configs for each of them Prometheus systemd file

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target

Prometheus config yaml

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'cadvisor'
    scrape_interval: 5s
    static_configs:
      - targets: ['<ip>:9090']

cAdvisor systemd file

[Unit]
Description=cAdvisor
Wants=network-online.target
After=network-online.target docker.service

[Service]
User=cadvisor
Group=cadvisor
Type=simple
ExecStart=/usr/local/bin/cadvisor -port 9090 -store_container_labels=false -whitelisted_container_labels=electron_task_hostname,electron_task_id

[Install]
WantedBy=multi-user.target

Once I start running docker containers on the machine running cadvisor, I am not able to view any metric with the labels electron_task_hostname and electron_task_id.

Is there something that I'm missing here? Any insight would be much appreciated.

dashpole commented 4 years ago

Can you just do a docker inspect to make sure the labels are on the docker container?

For reference, the code that does the whitelisting is here: https://github.com/google/cadvisor/blob/master/metrics/prometheus.go#L1667

Also, what verision of cAdvisor?

pradykaushik commented 4 years ago

docker inspect on one of the running containers shows me the labels under the Config element.

"Labels": {
                "electron_task_hostname": "<hostname>",
                "electron_task_id": "electron-h2-7-609e2041-9e22-482f-a5f1-659f42e0cafe"
}

cAdvisor version = v0.36.0.223

I also tried not whitelisting and just having store_container_labels=true which is the default one and that doesn't work either.

dashpole commented 4 years ago

Also, we add the prefix "containerlabel" to container label labels, to ensure they don't collide with common prometheus labels (e.g. name or image). Can you see if there is a label container_label_electron_task_hostname?

pradykaushik commented 4 years ago

I ran the query container_cpu_load_average_10s{container_label_electron_task_hostname!=""} and that tells me that there is no data.

dashpole commented 4 years ago

That metric might not exist if you don't have --enable_load_reader set. Can you try container_cpu_usage_seconds_total?

pradykaushik commented 4 years ago

query: container_cpu_usage_seconds_total{container_label_electron_task_hostname!=""} result: No data.

On the host machine docker ps shows me two containers running and inspecting them I can see the labels.

dashpole commented 4 years ago

can you curl localhost:9090/metrics to check what is coming from cAdvisor? It may be easier than trying to write promql

pradykaushik commented 4 years ago

It's a huge file so I just curl localhost:9090/metrics | grep container_label_electron_task_hostname and it's empty. Note that here localhost is the host on which cAdvisor is running (different host from that running prometheus).

Prometheus is running on one of the mesos master nodes. cAdvisor is running on mesos agents.

dashpole commented 4 years ago

can you look at one metric, and post the labels that it has?

One other possibility is that it isn't able to connect to the container runtime.

dashpole commented 4 years ago

can you also grab the logs for cadvisor?

pradykaushik commented 4 years ago

sure.

here are the logs sudo journalctl -a -u cadvisor.

Jun 03 15:56:09 stratos-001 cadvisor[5531]: W0603 15:56:09.982693    5531 manager.go:1140] Failed to process watch event {EventType:0 Name:/docker/4a4b4063667c2d74b46fbe02a9c8dc7568f28347a30ba370c7f937e: failed to identify the read-write layer ID for container
Jun 03 15:56:13 stratos-001 cadvisor[5531]: W0603 15:56:13.129789    5531 manager.go:1140] Failed to process watch event {EventType:0 Name:/docker/e9e95485960a088cf6095f3379f9b2007e3ef8a670191ba1b03a78c: failed to identify the read-write layer ID for container
Jun 03 15:56:14 stratos-001 cadvisor[5531]: W0603 15:56:14.854820    5531 manager.go:1140] Failed to process watch event {EventType:0 Name:/docker/50c9f320cb57c1b6f740a0bf35f6d8b9fe951fc7bf43dbc3f15b78c: failed to identify the read-write layer ID for container
Jun 03 15:56:16 stratos-001 cadvisor[5531]: W0603 15:56:16.169959    5531 manager.go:1140] Failed to process watch event {EventType:0 Name:/docker/27539e8905b833852a6b0a2a91d82a708ef2ca6b55f5b5db558c7c4: failed to identify the read-write layer ID for container
Jun 03 15:56:19 stratos-001 cadvisor[5531]: W0603 15:56:19.010615    5531 manager.go:1140] Failed to process watch event {EventType:0 Name:/docker/59fed89c8e5b3cadd8cf412631f835a4f40b63938b0abb9d2479cce: failed to identify the read-write layer ID for container
Jun 03 15:56:20 stratos-001 cadvisor[5531]: W0603 15:56:20.337683    5531 manager.go:1140] Failed to process watch event {EventType:0 Name:/docker/3f7b4be6b3f3edc33c8118a9bf7a7bc1c241902086fbf0b330b59c1: failed to identify the read-write layer ID for container

Seems to be a permissions issue. Although, cadvisor user has been added to the docker group and therefore. Anything else that I might need to do for increasing privileges?

pradykaushik commented 4 years ago

I changed the systemd file to run cAdvisor with root privileges and that fixed the issue. Thanks a ton @dashpole.

dashpole commented 4 years ago

Glad to hear you got it working.