open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.04k stars 2.35k forks source link

Docker logs receiver #31597

Closed aboguszewski-sumo closed 2 weeks ago

aboguszewski-sumo commented 7 months ago

The purpose and use-cases of the new component

There are three types of logs that can be scraped:

Currently Docker container logs can be fetched by using filelog receiver. However, there are cases where you can't access the files, but can access Docker API (@swiatekm-sumo please elaborate if needed). I'm not aware of receivers able to scrape daemon logs and events.

Example configuration for the component

This is something worth discussing in this issue. Below config should express what I think we need:

dockerlogs:
    events:
      enabled: true
      filters:
        ## to be discussed, see https://docs.docker.com/engine/api/v1.44/#tag/System/operation/SystemEvents

    container:
      enabled: true
      excluded_containers: 
        - *-nginx
        - custom-container
      ## possibly more, see https://docs.docker.com/engine/api/v1.44/#tag/Container/operation/ContainerLogs

    daemon:
    ## Let's skip these for now

    ## these are taken directly from dockerstats receiver
    endpoint: unix:///var/run/docker.sock
    collection_interval: 2s
    timeout: 20s
    api_version: 1.24

where I could, I tried to be consistent with dockerstats receiver.

Telemetry data types supported

Logs, but see also Additional context section.

Is this a vendor-specific component?

Code Owner(s)

@aboguszewski-sumo and possibly more people from Sumo Logic

Sponsor (optional)

@astencel-sumo

Additional context

There is already an issue with regard to Docker events: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/29096 Also, there is dockerstats receiver, but currently it scrapes only metrics.

Now, the question is: how should we resolve the potential connection between these three? We can either:

If you look at the issue linked above, we can see that codeowners of dockerstats receiver approved the idea of scraping events.

aboguszewski-sumo commented 7 months ago

I think it's reasonable to ping codeowners of dockerstats receiver here: @rmfitzpatrick @jamesmoessis

swiatekm commented 7 months ago

Intuitively, it makes most sense to me to have all signal types handled in the same receiver. This is simpler for both users and maintainers, especially given that the receivers will most likely share a lot of code and dependencies.

On the topic of getting logs from the Docker API vs scraping them from disk, this is a bit similar to the Kubernetes case, but the tradeoffs are significantly different. The performance penalty is much smaller for Docker, because we just read from a (usually local) socket, as opposed to forcing the API Server to do a lot of additional work. There's also a way to enrich Kubernetes telemetry with metadata via the k8sattributes processor, whereas there's no such way for Docker.

andrzej-stencel commented 7 months ago

I propose to:

  1. Add the option to collect Docker events as logs to the existing Docker Stats receiver.
  2. Possibly also add the option to collect container logs via the same Docker API
  3. Rename the Docker Stats receiver to Docker receiver (breaking change).

Here's a related previous issue to collect Docker events:

github-actions[bot] commented 7 months ago

Pinging code owners for receiver/dockerstats: @rmfitzpatrick @jamesmoessis. See Adding Labels via Comments if you do not have permissions to add labels yourself.

aboguszewski-sumo commented 7 months ago

As for the formal side of the changes, I propose this order:

  1. Add fetching events container logs in separate PRs and mark them as experimental
  2. After everything has been added, change the receiver's name (I'm not sure if there are any special procedures in terms of that)
  3. After the name has been changed (or at the same time) change logs and events status to alpha

Let me know if this is incorrect.

andrzej-stencel commented 7 months ago

I don't think 2. should be blocked by first doing 1.

Renaming a component in a graceful manner is a non-trivial endeavor. We did something similar with changing logging exporter to debug. How this would work in practice is perhaps:

  1. Create a new docker receiver that will share the metrics collection code with the existing docker_stats receiver. Deprecate the docker_stats receiver in favor of the docker receiver.
  2. Add the new code for logs collection (Docker events, container logs - possibly in separate PRs) in the new docker receiver, leaving the old docker_stats receiver's functionality unchanged.
  3. Remove the deprecated docker_stats receiver after a long time, perhaps 6 or 12 months.
github-actions[bot] commented 4 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

aboguszewski-sumo commented 4 months ago

@andrzej-stencel could you remove the stale label?

github-actions[bot] commented 2 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 2 weeks ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.