open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.11k stars 2.39k forks source link

[processor/resourcedetection], [receiver/dockerstats] Collector cannot query Docker socket in official contrib images #11791

Open mx-psi opened 2 years ago

mx-psi commented 2 years ago

Describe the bug

The docker detector from the resource detection processor and the dockerstats receiver do not work on official opentelemetry-collector-contrib images, or any other image that runs the Collector under a user other than root.

Steps to reproduce

Run the resource detection processor docker detector or the dockerstats receiver, while mounting the /var/run/docker.sock socket:

docker run -v /var/run/docker.sock:/var/run/docker.sock:ro -v <mount config here> otel/opentelemetry-collector-contrib

What did you expect to see?

The Docker detector should add the host.name of the host machine, and its operating system.

The Docker stats receiver should produce valid metrics.

What did you see instead?

Both components fail because of lack of permissions

What version did you use?

Can be reproduced on the latest version, happens since v0.40.0 (more specifically, since #6380).

What config did you use?

For both components the default configuration on the README can reproduce this; see e.g. the resource detection processor:

processors:
  resourcedetection/docker:
    detectors: [env, docker]
    timeout: 2s
    override: false

Environment

This happens on every Docker version and every Collector image since v0.40.0

Additional context

This happens since #6380, because of a permissions issue: the mounted socket is only readable by root. AFAICT, Docker does not currently allow mounting volumes with permissions for a specific user (see moby/moby#2259), and we can't chown the socket at build time, so we have to choose between running as rootless or supporting this.

This is not a problem on downstream or custom distributions that run as root.

For getting the hostname on the Docker detector, a workaround is to override the OS hostname on the Docker image using something like --hostname $(hostname). I don't know of a workaround for getting the hosts' operating system or getting the metrics on the dockerstats receiver.

mx-psi commented 2 years ago

@open-telemetry/collector-contrib-maintainer I assume we want to keep running under a non-root user. If we don't find a solution that works when not running as root, should the docker detector be deprecated and eventually removed? This would still be useful on downstream distros that run as root, but I don't know if that is a common case.

mx-psi commented 2 years ago

It is possible to override the user by doing docker run -u 0, but I don't feel very comfortable telling people to run as root if our official policy is to run as non-root.

TylerHelmuth commented 2 years ago

pinging @jrcamp @pmm-sumo @Aneurysm9 @dashpole as code owners

dashpole commented 2 years ago

Using the docker socket is a really high level of privilege generally, and I agree with it not being the recommended configuration. Mounting individual files (e.g. /etc/hostname) seems like a better way to get some of the information you are interested in than fetching it from docker. I haven't looked into it at all, but I wonder if the system detector with a few files mounted readonly would work the same as the docker detector.

github-actions[bot] commented 2 years ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

mx-psi commented 2 years ago

The dockerstatsreceiver also queries the Docker socket and thus suffers from the same problem https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/dockerstatsreceiver#configuration

ErvalhouS commented 1 year ago

The dockerstatsreceiver also queries the Docker socket and thus suffers from the same problem https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/dockerstatsreceiver#configuration

This is also why I'm here

github-actions[bot] commented 1 year ago

Pinging code owners for receiver/dockerstats: @rmfitzpatrick @jamesmoessis. See Adding Labels via Comments if you do not have permissions to add labels yourself.

mx-psi commented 1 year ago

I think realistically we only have two options here:

I feel like, at least for the dockerstats receiver, option 1 would cause a lot of pain, so I would rather work on option 2.

rmfitzpatrick commented 1 year ago

This is a general docker concern and the container user needs to be in the host's docker group:

$ docker run -v /var/run/docker.sock:/var/run/docker.sock:ro --group-add $(stat -c '%g' /var/run/docker.sock) otel/opentelemetry-collector-contrib <...>
# or if specifying the user:group directly
$ docker run -v /var/run/docker.sock:/var/run/docker.sock:ro --user "some.user:$(stat -c '%g' /var/run/docker.sock)" otel/opentelemetry-collector-contrib <...>
github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

gbbr commented 1 year ago

I don't want to make a strong promise, but I am interested in working on this and will try out the proposal above and report back. Hopefully I can reproduce.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

carlreid commented 1 year ago

@gbbr I see you're making a lot of nice progress relating to the dockerstats receiver, but I also just hit this permission issue when trying to mount the docker.sock. Did you manage to figure something out that works without needing to run as privileged? Especially since we'd like to take advantage of https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/22149.

gbbr commented 1 year ago

@carlreid on this front specifically no. I am not aware of a better method unfortunately.

R-Sommer commented 1 year ago

For docker-compose.yml group_add could be added with docker's group ID of the host e.g.:

group_add:
  - "998"

Double quotes are necessary otherwise this error would occur:

* 'group_add[0]' expected type 'string', got unconvertible type 'int', value: '998'

Using "docker" instead of its ID result in this error:

Error response from daemon: Unable to find group docker: no matching entries in group file
github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

schewara commented 5 months ago

As it was already mentioned in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/11791#issuecomment-1369840892, setting the user parameter with the required <UID>:<GID> is definitely the most straight forward solution.

In my opinion the permission issue is a common theme for all containers trying to access the docker socket or other files on the host system and is actually a good thing, which forces everyone to take a step back and re-think if the access is really needed.

I also think this is outside of the collectors scope, as you can never foresee what the runtime environment looks like.

When thinking of a scenario, where the docker engine is running in rootless mode and the individual permissions of the user on the host OS will most certainly break everything all over again.

Just for completeness, here the snipped from our compose file which for us works without any issues.

    ...
    volumes: 
      - type: bind
        source: /path/to/otelcol-config.yaml
        target: /etc/otelcol/config.yaml
      - type: bind 
        source: /var/run/docker.sock
        target: /var/run/docker.sock
        read_only: True
    user: 10001:998
    command: ['--config=file:/etc/otelcol/config.yaml']
    ...