Open jszwedko opened 2 years ago
There is a implementation of batch handling in the file
source here: https://github.com/vectordotdev/vector/pull/11667
I presently solve this using stdin and some excessive cat.
(cat logfile; kill -s TERM 0) | ./vector
allows me to run a metrics source concurrently while processing the logfile, and exiting vector once the logfile has been completely handled. If you do this inside a bash script, you need to set -m
first.
As suggested in some of the links referenced above, I have tried the remove_after_secs: 0
to my file
source thinking that may be the use of inotify would induce some behaviour but this did not make any change in the behaviour, I had to Ctrl-C vector to end the process after all files are processed and removed :-(
I have a use case with http_client source. No clear "end" but maybe a timeout can be considered?
@jszwedko
Hello, is this still being considered?
We have exactly this need, we are running vector inside a pod where another container will get some data from some API endpoint and write it into json files, vector is running in another container reading these files and shipping them to kafka or some http endpoint.
After the container running the collector script terminates, the container with vector keeps running, we are now looking into how we can kill this container, but would be nice if vector supported this natively.
I think we are still open to it, but no concrete timeline.
Community Note
We've had a number of different requests to support ETL-like use-cases in Vector and so I figured it'd be useful to create this issue to track them all in one place.
Currently Vector is architected for stream processing and doesn't support ETL execution very well. This is primarily due to the lack of source support for bulk execution where the source shuts down after all input has been processed.
Users have asked for this functionality for the
file
source and theaws_s3
source, but it is easy to see that it could be desirable for any archive-like source. It could even be useful for sources likekafka
where it would drain a topic and then shut down.Refs: