Label container_label_com_docker_swarm_node_id is the same for different nodes in swarm cluster

matjaz99 commented 2 years ago

Hello

I have a Swarm cluster of 3 nodes and I have deployed cAdvisor globaly. Then I have deployed some services (in my case ElasticSearch with 3 instances (es01, es02 and es03 - one instance per node, within the same stack), but could be any other service as well).

In Prometheus I do receive metrics from all 3 nodes (according to instance label), but the label container_label_com_docker_swarm_node_id is showing the same value regardless if the metric originates from different node/instance, which I think is wrong.
Also labels container_label_com_docker_swarm_service_name, container_label_com_docker_swarm_service_id, container_label_com_docker_swarm_task_name or container_label_com_docker_swarm_task_id are all showing only one of the services from all 3 nodes (es01 in this case).

From metrics I can incorrectly assume that service es01 is running on all 3 nodes and that all 3 nodes have the same node_id. Actually there are no metrics about the other two services at all.

My setup is the same on all 3 nodes: CentOS 7 for OS, Docker 18.09.3, cAdvisor v0.43.0.

I am attaching also a screenshot for better explanation. Wrong labels are marked with yellow. What confuses me is the fact that instance label clearly shows that metrics were correctly collected from all 3 nodes and I would expect to get 3 different node_id values.

What am I doing wrong? Did I miss something?

Cheers,
Matjaž

P.S. It would be nice to have also _swarm_id label in metrics, so I can correlate which nodes belong to the same Swarm cluster.

ZealousMacwan commented 2 years ago

@matjaz99 I'm trying to implement what you have already achieved, getting all nodes container metrics but I'm only getting for manager node, could you please share your compose file and prometheus.yml

matjaz99 commented 2 years ago

Hi @ZealousMacwan

Sorry for late reply. Here is compose file and prometheus.yml:

compose.yml

version: '3.6'

networks:
  monitoring_network:
    driver: overlay
    attachable: true

configs:
  prometheus_config:
    file: ./prometheus_config/prometheus.yml
  alert_rules:
    file: ./prometheus_config/alert_rules/alert_rules.yml

services:

  prometheus:
    image: prom/prometheus:v2.31.1
    ports:
      - 9090:9090
    networks:
      - monitoring_network
    command:
      - '--config.file=/prometheus_config/prometheus.yml'
      - '--web.listen-address=:9090'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.external-url=https://prometheus/prometheus/'
      - '--web.route-prefix=/'
      - '--web.enable-lifecycle'
      - '--web.enable-admin-api'
      - '--storage.tsdb.path=/prometheus_data'
      - '--storage.tsdb.retention.time=380d'
      - '--storage.tsdb.retention.size=450GB'
      - '--storage.tsdb.min-block-duration=15m'
      - '--storage.tsdb.max-block-duration=15m'
    volumes:
      - /data/prometheus:/prometheus_data
      - ./prometheus_config/targets:/prometheus_config/targets
      - /etc/hosts:/etc/hosts
    configs:
      - source: prometheus_config
        target: /prometheus_config/prometheus.yml
      - source: alert_rules
        target: /prometheus_config/alert_rules.yml
    user: root
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
        - node.role == manager
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 120s
      labels:
        - "traefik.port=9090"
        - "traefik.backend=prometheus"
        - "traefik.enable=true"
        - "traefik.docker.network=prom_monitoring_network"
        - "traefik.frontend.rule=PathPrefixStrip:/prometheus"
        - "traefik.backend.loadbalancer.sticky=true"
    logging:
      driver: "json-file"
      options:
        max-size: "50m"
        max-file: "3"

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.43.0
    networks:
      - monitoring_network
    ports:
      - 9080:8080
    command: -logtostderr -docker_only
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /:/rootfs:ro
      - /var/run:/var/run
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    deploy:
      mode: global
      resources:
        limits:
          cpus: "0.5"
          memory: 2048M
        reservations:
          cpus: '0.25'
          memory: 64M
      labels:
        - "swarm.cluster.name=devops"

prometheus.yml

global:
  scrape_interval:     15s
  evaluation_interval: 15s

  external_labels:
    cluster: devops

rule_files:
  - alert_rules.yml

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
    metric_relabel_configs:
    - source_labels: [ __name__ ]
      regex: '^go_.*'
      action: drop

  - job_name: 'cadvisor'
    file_sd_configs:
    - files:
      - /prometheus_config/targets/cadvisor_nodes.yml
      refresh_interval: 1m
    metric_relabel_configs:
    - source_labels: [ __name__ ]
      regex: '^go_.*'
      action: drop

Note: I am using file-based service discovery mechanism (because it is automatically reloaded without restarting prometheus).

cadvisor_nodes.yml

- targets:
  - mcrk-docker-1:9080
  - mcrk-docker-2:9080
  - mcrk-docker-3:9080

Remark: all files are shortened, because they contain some company related stuff, which I cannot share here. What is left should be sufficient to reproduce the issue.

SamK commented 10 months ago

Hi I think I'm having the same issue. Did you guys manage to fix it somehow?

Edit: So: it's not a bug, it's a feature: You are using the Swarm ingress routing mesh which makes port redirection global. This is the reason you receive data from another random node. In order to force the port to redirect to its own node, you must bypass the routing mesh which means setting the "mode" to "host" like this:

     ports:
-      - 9080:8080
+      - mode: host
+        target: 8080
+        published: 9080

The deployment will fail with a bind: address already in use error: you should docker service rm your service before redeploying (scaling is not possible in global mode).

life5ign commented 5 months ago

Edit: So: it's not a bug, it's a feature: You are using the [Swarm ingress routing mesh]

@SamK the swarm routing mesh strikes again! Thank you for this brilliant observation; this just fixed the "data from a random node" issue that was plaguing me and my dashboards (and sanity).

google / cadvisor

Label container_label_com_docker_swarm_node_id is the same for different nodes in swarm cluster #3045