vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.56k stars 1.54k forks source link

Verify `docker_logs` support for podman #6807

Open otisg opened 3 years ago

otisg commented 3 years ago

Current Vector Version

0.12

Use-cases

@jszwedko mentioned this in Discord on March 5, 2021: _I'm not personally familiar with podman, but it seems like it exposes a docker compatible API: https://docs.podman.io/en/latest/_static/api.html . If that is true, it might Just Work with the existing docker integrations_

Trying to use Vector 0.12 with Podman, but:

thread 'tokio-runtime-worker' panicked at 'called Option::unwrap() on a None value', src/sour ces/docker_logs.rs:472:55 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace Mar 16 15:43:20.377 ERROR source{component_kind="source" component_name=docker component_type=doc ker_logs}: vector::topology: An error occurred that vector couldn't handle.

So there might be some differences in the protocol after all...

Attempted Solutions

Proposal

References

jszwedko commented 1 year ago

Reusing this issue to validate Vector's support for podman. Resolving it should include documentation and tests.

sushain97 commented 11 months ago

I tested this briefly and while it didn't crash, it certainly didn't work either. It appears to just be constantly starting and stopping the log stream:

2023-10-17T07:28:00.242771Z  INFO source{component_kind="source" component_id=docker_logs component_type=docker_logs component_name=docker_logs}: vector::internal_events::docker_logs: Started watching for container logs. container_id=8285392c0d45719cd78adc08b63f595bd71ad20a0a20ecf8a94bfb7377354cf4
2023-10-17T07:28:00.276397Z  INFO source{component_kind="source" component_id=docker_logs component_type=docker_logs component_name=docker_logs}: vector::internal_events::docker_logs: Stopped watching for container logs. container_id=9ad5803896ac91f9143f9c634632633001a9d37170f852173f3e24d69de4dabb
2023-10-17T07:28:00.276545Z  INFO source{component_kind="source" component_id=docker_logs component_type=docker_logs component_name=docker_logs}: vector::internal_events::docker_logs: Started watching for container logs. container_id=9ad5803896ac91f9143f9c634632633001a9d37170f852173f3e24d69de4dabb
2023-10-17T07:28:00.278314Z  INFO source{component_kind="source" component_id=docker_logs component_type=docker_logs component_name=docker_logs}: vector::internal_events::docker_logs: Stopped watching for container logs. container_id=0fe024f8327bee8e5ce97eb62bc0b4914fe29597f9da01152d8936a7015c4611
2023-10-17T07:28:00.278382Z  INFO source{component_kind="source" component_id=docker_logs component_type=docker_logs component_name=docker_logs}: vector::internal_events::docker_logs: Started watching for container logs. container_id=0fe024f8327bee8e5ce97eb62bc0b4914fe29597f9da01152d8936a7015c4611
2023-10-17T07:28:00.303191Z  INFO source{component_kind="source" component_id=docker_logs component_type=docker_logs component_name=docker_logs}: vector::internal_events::docker_logs: Stopped watching for container logs. container_id=9ad5803896ac91f9143f9c634632633001a9d37170f852173f3e24d69de4dabb
2023-10-17T07:28:00.303319Z  INFO source{component_kind="source" component_id=docker_logs component_type=docker_logs component_name=docker_logs}: vector::internal_events::docker_logs: Started watching for container logs. container_id=9ad5803896ac91f9143f9c634632633001a9d37170f852173f3e24d69de4dabb
2023-10-17T07:28:00.306309Z  INFO source{component_kind="source" component_id=docker_logs component_type=docker_logs component_name=docker_logs}: vector::internal_events::docker_logs: Stopped watching for container logs. container_id=8285392c0d45719cd78adc08b63f595bd71ad20a0a20ecf8a94bfb7377354cf4

Config:

[sources.docker_logs]
type = "docker_logs"
docker_host = "unix:///var/run/podman/podman.sock"

My guess is that there's a subtle incompatibility between the Docker and Podman log APIs.

tinchopu commented 5 months ago

Yes I also observed a significant increase in load generated on the podman process as soon as I started to read the logs with vector Config

sources:
  docker_host:
    type: docker_logs
    docker_host: "/run/user/1000/podman/podman.sock"
    exclude_containers:
      - opensearch
      - blackbox-exporter
      - influxdb
      - influxdb-temp

System load / podman process when vector is running:

Bildschirmfoto 2024-04-24 um 16 04 08

Here how it changes when I exit vector:

Bildschirmfoto 2024-04-24 um 16 19 35
tinchopu commented 5 months ago

Ok update from my side: the issue was podman version 4.3.1. Updating to 5.1.1 (build from source) fixed the issue

jjzxcd commented 2 weeks ago

I tested Vector 0.41.0 with Podman 5.2.1. It works if Podman generates system events and container logs within 2 minutes after Vector starts. Otherwise, Vector returns timeout errors and stops watching for events and container logs:

vector[19350]: 2024-08-21T17:05:19.072757Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::sources::docker_logs: Capturing logs from now on. now=2024-08-21T17:05:19.072742638+00:00

vector[19350]: 2024-08-21T17:05:19.072925Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::sources::docker_logs: Listening to docker log events.

vector[19350]: 2024-08-21T17:05:19.114233Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Started watching for container logs. container_id=6c7b2459bcb71617e192003e88955527b738bb20dc4b6b5cbbc798112b45b357

vector[19350]: 2024-08-21T17:07:19.098385Z ERROR source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Error in communication with Docker daemon. error=RequestTimeoutError error_type="connection_failed" stage="receiving" container_id=None internal_log_rate_limit=true

vector[19350]: 2024-08-21T17:07:19.115508Z ERROR source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Internal log [Error in communication with Docker daemon.] is being suppressed to avoid flooding.

vector[19350]: 2024-08-21T17:07:19.115597Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Stopped watching for container logs. container_id=6c7b2459bcb71617e192003e88955527b738bb20dc4b6b5cbbc798112b45b357

It appears that the docker_logs source in Vector uses a hardcoded DEFAULT_TIMEOUT for requests. If no response is received within this default timeout period while querying events and logs, Vector will return an error and stop watching for container logs without further retries. With the Docker API, everything works fine because it immediately sends an HTTP response status code when a stream is started, even if no data is available yet. However, Podman delays sending the response until some events or logs are actually generated, which can lead to issues if the timeout expires before any data is available. Since there is definitely a difference in the behavior of the Docker API and the Podman compat API for the logs and events endpoints, I opened https://github.com/containers/podman/issues/23712

Config:

sources:
  src_podman:
    type: docker_logs
    docker_host: unix:///run/podman/podman.sock