open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.13k stars 2.4k forks source link

[receiver/hostmetrics] filesystem scraper doesn't respect root_path #35990

Closed povilasv closed 1 month ago

povilasv commented 1 month ago

Component(s)

receiver/hostmetrics

What happened?

Description

Looks like https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35504 gave us a regression.

Now filesystem scraper ignores root_path and tries to just open /mounts althought root_path is set. Uses with root_path will start getting these errors, and no filesystem metrics:

2024-10-25T08:45:13.299+0300    error   scraperhelper/scrapercontroller.go:205  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "data_type": "metrics", "error": "open /mounts: no such file or directory", "scraper": "filesystem"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
        go.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
        go.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:177

IMO this is an important regression, as we use this in opentelemetry-helm-charts hostMetrics preset -> https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_config.tpl#L71-L88

Steps to Reproduce

receivers:
  hostmetrics:
    root_path: /tmp
    collection_interval: 30s
    scrapers:
      cpu:
        metrics:
          system.cpu.time:
            enabled: true
      disk:
        metrics:
          system.disk.io:
            enabled: true
          system.disk.operations:
            enabled: true
      filesystem:
        metrics:
          system.filesystem.usage:
            enabled: true
      load:
        metrics:
          system.cpu.load_average.1m:
            enabled: true
          system.cpu.load_average.5m:
            enabled: true
          system.cpu.load_average.15m:
            enabled: true
      memory:
        metrics:
          system.memory.usage:
            enabled: true
      network:
        metrics:
          system.network.connections:
            enabled: true
          system.network.io:
            enabled: true
      paging:
        metrics:
          system.paging.operations:
            enabled: true
          system.paging.usage:
            enabled: true
      process:
        metrics:
          process.cpu.time:
            enabled: true
          process.memory.usage:
            enabled: true

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: [batch]
      exporters: [debug]

  telemetry:
    logs:
      level: "debug"

You will get:

2024-10-25T09:03:56.691+0300    error   scraperhelper/scrapercontroller.go:205  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "data_type": "metrics", "error": "open /mounts: no such file or directory", "scraper": "filesystem"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
        go.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
        go.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:177

although it should try to open /tmp/mounts

Expected Result

Actual Result

Collector version

v0.102.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

github-actions[bot] commented 1 month ago

Pinging code owners:

TylerHelmuth commented 1 month ago

@povilasv is there a work around?

povilasv commented 1 month ago

Workaround is to manually set the env var:

HOST_PROC_MOUNTINFO=/hostfs/proc/1

marcelaraujo commented 1 month ago

@povilasv, there is the same issue here after upgrading to the latest version.

{"level":"error","ts":1729866966.393247,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"open /mounts: no such file or directory","scraper":"hostmetrics","stacktrace":"go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:181"}
{"level":"error","ts":1729866996.3939178,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"open /mounts: no such file or directory","scraper":"hostmetrics","stacktrace":"go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:181"}
{"level":"error","ts":1729867026.3932698,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"open /mounts: no such file or directory","scraper":"hostmetrics","stacktrace":"go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:181"}
povilasv commented 1 month ago

@marcelaraujo try setting the environment variable to HOST_PROC_MOUNTINFO=/hostfs/proc/1

it should workaround this issue

marcelaraujo commented 1 month ago

Hi @povilasv

It didn't work.

What should be the values for this case?

env:
   - name: HOST_PROC_MOUNTINFO
     value: /proc/1/self
volumes:
   - name: hostfs
     hostPath:
        path: /
volumeMounts:
   - name: hostfs
     mountPath: /hostfs
     readOnly: true
     mountPropagation: HostToContainer
config:
   receivers:
      hostmetrics:
          root_path: /hosts

When I tried using your environment variable, I got a different issue

 {"level":"error","ts":1729870334.3869734,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"failed to read usage at /hostfs/conf: no such file or directory; failed to read usage at /hostfs/hostfs/run/containerd/io.containerd.runtime.v2.task/k8s.io/b9d436f09506a98c09ba162dc17d2692f70430adf0601dfc1cd2f676c0253b80/rootfs/conf: no such file or directory","scraper":"hostmetrics","stacktrace":"go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:177"}
atoulme commented 1 month ago

Looking at this issue and going to attempt to reproduce ; the environment variable is used in 2 places and setting it as a workaround might not be the fix.

atoulme commented 1 month ago

Using your config file, on Mac, trying to reproduce:

docker run --rm -it -v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml -v /:/tmp otel/opentelemetry-collector-contrib:latest

I don't see the error reported. I will try to reproduce on a Linux VM next.

atoulme commented 1 month ago

Reproducing on Linux now, taking it further.

atoulme commented 1 month ago

Adding -e HOST_PROC_MOUNTINFO=/tmp/proc/1 doesn't fix the issue.

2024-10-25T16:20:16.489Z    error   scraperhelper/scrapercontroller.go:205  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "data_type": "metrics", "error": "failed to read usage at /tmp/tmp/boot: no such file or directory; failed to read usage at /tmp/tmp/boot/efi: no such file or directory; failed to read usage at /tmp/tmp/snap/snapd/21759: no such file or directory; failed to read usage at /tmp/tmp/snap/amazon-ssm-agent/7993: no such file or directory; failed to read usage at /tmp/tmp/snap/core18/2829: no such file or directory; failed to read usage at /tmp/tmp/snap/core22/1621: no such file or directory; failed to read usage at /tmp/tmp/snap/amazon-ssm-agent/9565: no such file or directory; failed to read usage at /tmp/tmp/snap/core18/2846: no such file or directory; failed to read usage at /tmp/tmp/snap/snapd/22991: no such file or directory; failed to read usage at /tmp/tmp/snap/core22/1663: no such file or directory; failed to read usage at /tmp/etc/otelcol-contrib/config.yaml: no such file or directory", "scraper": "hostmetrics"}
atoulme commented 1 month ago

Using -e HOST_PROC_MOUNTINFO="" fixes the issue, setting the correct default value for this env var. The fix is to not set the default value of HOST_PROC_MOUNTINFO in the envMap at all as any non-empty value is going to skew things. I will open a PR in a second with a fix.

atoulme commented 1 month ago

https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/36000 is a fix.

I am going to try and revive https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32536 to make sure we have this fix tested.

marcelaraujo commented 1 month ago

@atoulme Confirming the suggestion worked.