open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.04k stars 2.34k forks source link

Docker resource detection fails after system reboot #34761

Open litetex opened 2 months ago

litetex commented 2 months ago

Component(s)

processor/resourcedetection, processor/resourcedetection/internal/docker

What happened?

Description

I'm running a container of this project with the resourcedetectionprocessor/docker.

The corresponding config looks like this:

processors:
  resourcedetection:
    detectors: [docker]

After a system restart the resource detector loses it's functionallity (the lables from the docker host are no longer picked up) and the following message shows up in the log:

warn    internal/resourcedetection.go:130       failed to detect resource       {"kind": "processor", "name": "resourcedetection", "pipeline": "logs", "error": "failed getting OS type: failed to fetch Docker OS type: Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.46/info\": context deadline exceeded"}

This is likely because docker is currently starting up.

Possible fixes:

It would be great to also make this configureable.

Collector version

v0.107.0

Environment information

Environment

Ubunut 24.04 LTS Docker CE 27.1.2

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

github-actions[bot] commented 2 months ago

Pinging code owners:

rogercoll commented 2 months ago

Crash the container when this happens so that it can get restarted automatically due to the restart policy

Maybe a new error_mode configuration option like the transform processor would help, when set to propagate it should return any detector error and crash the collector.

rogercoll commented 2 months ago

Another viable solution would be what is being proposed here: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34876