open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.9k stars 2.27k forks source link

[receiver/hostmetrics] Network info from within container despite mounting /hostfs #34400

Open dhilgarth opened 1 month ago

dhilgarth commented 1 month ago

Component(s)

receiver/hostmetrics

What happened?

Description

My otel collector is running in a container. I've followed the documentation to ensure that it is actually monitoring the host and not the container. And this works for everything except network.
My host has the interface enp7s0 and no interface eth0 or eth1.
Yet, I get data for eth0 and eth1 - as you can see below, the container is in two networks - and not für enp7s0

Collector version

0.106.1

Environment information

Environment

OS: "Ubuntu 22.04" Docker Swarm: "Docker version 24.0.7, build 24.0.7-0ubuntu2~22.04.1"

OpenTelemetry Collector configuration

receivers:
  hostmetrics:
    collection_interval: 15s
    root_path: /hostfs
    scrapers:
      cpu:
        metrics:
          system.cpu.logical.count:
            enabled: true
          system.cpu.physical.count:
            enabled: true
          system.cpu.frequency:
            enabled: true
          system.cpu.utilization:
            enabled: true
      load: {}
      memory:
        metrics:
          system.linux.memory.available:
            enabled: true
          system.memory.limit:
            enabled: true
          system.memory.utilization:
            enabled: true
      disk: {}
      filesystem:
        metrics:
          system.filesystem.utilization:
            enabled: true
      paging:
        metrics:
          system.paging.utilization:
            enabled: true
          system.paging.usage:
            enabled: true
      network: {}
      process:
        mute_process_io_error: true
        mute_process_exe_error: true
        mute_process_user_error: true
        metrics:
          process.cpu.utilization:
            enabled: true
          process.memory.utilization:
            enabled: true
          process.disk.io:
            enabled: true
          process.disk.operations:
            enabled: true
          process.threads:
            enabled: true
          process.paging.faults:
            enabled: true

Log output

No response

Additional context

This is the Docker Swarm Service definition:

docker service inspect --pretty otel-collector

ID:             n38j4io4frootqte5d7iwbft1
Name:           otel-collector
Labels:
 com.docker.stack.image=otel/opentelemetry-collector-contrib:0.106.1
 com.docker.stack.namespace=monitoring
Service Mode:   Global
Placement:
 Constraints:   [node.platform.os == linux]
ContainerSpec:
 Image:         otel/opentelemetry-collector-contrib:0.106.1
 Args:          --config=/etc/otel-collector-config.yaml --feature-gates=-pkg.translator.prometheus.NormalizeName
 Env:           OTEL_RESOURCE_ATTRIBUTES=host.name={{.Node.Hostname}},os.type={{.Node.Platform.OS}},dockerswarm.service.name={{.Service.Name}},dockerswarm.task.name={{.Task.Name}}
 User: 0
Mounts:
 Target:        /hostfs
  Source:       /
  ReadOnly:     true
  Type:         bind
 Target:        /var/run/docker.sock
  Source:       /var/run/docker.sock
  ReadOnly:     true
  Type:         bind
 Target:        /var/lib/docker/containers
  Source:       /var/lib/docker/containers
  ReadOnly:     true
  Type:         bind
Configs:
 Target:        /etc/otel-collector-config.yaml
  Source:       otel-collector-config_XRNDChoHzH_Dsg
Resources:
Networks: web_new monitoring_new
Endpoint Mode:  vip
github-actions[bot] commented 1 month ago

Pinging code owners:

ChrsMark commented 1 month ago

Only hard-guessing here but what about trying to set the hostNetwork of the Collector Pod to true?

dhilgarth commented 1 month ago

I'm on Docker Swarm, not on Kubernetes. My understanding is that a container can either use the host network or be in overlay networks.
I need the container to be in overlay networks to be able to talk to the final destinations for the collected data.

rogercoll commented 2 weeks ago

I think @ChrsMark comment was on the right direction. In docker the equivalent hostNetwork parameter is directly named network.

In Linux environments, the hostmetrics receiver reads from any provided proc filesystem, specifically, the network interface stats are read from the /proc/net/dev file. But the key to your issue is that the proc filesystem is actually a pseudo-filesystem because it doesn't exist on disk but is instead generated dynamically by the kernel.

Reading network stats without the host's network:

$ docker run -it --user root -v /proc/net/dev:/host-proc-net ubuntu

root@ab671aece539:/# cat host-proc-net
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
  eth0:    2058      13    0    0    0     0          0         0        0       0    0    0    0     0       0          0

Reading network stats with the host's network (same network as the host):

$ docker run -it --user root --network=host -v /proc/net/dev:/host-proc-net ubuntu

root@neck-21fvs0qh00:/# cat host-proc-net
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo: 1347670    8939    0    0    0     0          0         0  1347670    8939    0    0    0     0       0          0
enp72:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
 wlan0: 34873370   47887    0    0    0     0          0         0 12566182   34445    0    0    0     0       0          0
docker0:       0       0    0    0    0     0          0         0     3988      23    0    0    0     0       0          0
br-ef339b805f93:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0

I reckon this is not an issue with the receiver as the expected host's network files are not available without the --network=host flag.

@dhilgarth Would it work collecting the hostmetrics in a sidecar container which forwards them to the exit collector (OTLP receiver)?