vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
16.99k stars 1.46k forks source link

Support ingesting data from named pipes #3874

Open jszwedko opened 3 years ago

jszwedko commented 3 years ago

Current Vector Version

vector 0.11.0 (gbffecc4 x86_64-unknown-linux-gnu 2020-09-14)

Use-cases

(Copied from discord: https://discordapp.com/channels/742820443487993987/746070591097798688/755150918969982986)

The user has a setup where logs are written to a dynamic set of named pipes which they would like to ingest with vector.

Attempted Solutions

We originally thought that the file source might be able to handle this, but it appears not, due to the checkpointing requiring seeks:

Sep 14 16:03:46.828  INFO vector: Log level "info" is enabled.
Sep 14 16:03:46.835  INFO vector: Loading configs. path=["/tmp/fifo.toml"]
Sep 14 16:03:46.868  INFO vector::topology: Running healthchecks.
Sep 14 16:03:46.868  INFO vector::topology: Starting source "my_source_id"
Sep 14 16:03:46.868  INFO vector::topology::builder: Healthcheck: Passed.
Sep 14 16:03:46.869  INFO vector::topology: Starting sink "console"
Sep 14 16:03:46.869  INFO vector: Vector has started. version="0.11.0" git_version="v0.9.0-677-gbffecc4" released="Mon, 14 Sep 2020 15:46:35 +0000" arch="x86_64"
Sep 14 16:03:46.869  INFO source{name=my_source_id type=file}: vector::sources::file: Starting file server. include=["/tmp/my_pipe"] exclude=[]
Sep 14 16:03:52.868 ERROR source{name=my_source_id type=file}:file_server: vector::internal_events::file: failed reading file for fingerprinting. path="/tmp/my_pipe" error=Os { code: 29, kind: Other, message: "Illegal seek" }

The error is output when the pipe is written to. Additionally, vector hangs when trying to shut down.

The config:

[sources.my_source_id]
# General
type = "file" # required
data_dir = "/tmp/vector" # optional, no default
include = ["/tmp/my_pipe"]

[sinks.console]
inputs = ["my_source_id"]
type = "console"
encoding.codec = "json"

Vector is able to use it as a stdin source, but this will only work for one pipe per vector instance.

Proposal

Not sure! We could extend the file source, the stdin source, or add a new one to handle this depending on feasibility.

hanshuebner commented 9 months ago

I'm in need of this as well. It seems that the behavior of the source used for named pipes would be similar to the file descriptor source, except that:

I think this is sufficiently different from both the file and the file descriptor source, so a new "Named Pipe" source might be best.

Would a PR with this have a chance to be merged? Other ideas or suggestions?

asymmetric commented 1 month ago

I bumped into this when using the file source to ingest Nomad logs. Nomad creates named pipes, which Vector tries to seek.

This blows up at runtime with:

thread 'vector-worker' panicked at lib/file-source/src/file_watcher/mod.rs:124:67:
called `Result::unwrap()` on an `Err` value: Os { code: 29, kind: NotSeekable, message: "Illegal seek" }

I worked around it using the exclude option.