Open DpoBoceka opened 5 years ago
Hey @DpoBoceka, I'm not opposed to adding the ability to watch and track input files. However, it's a fairly large task, so I'm not likely to take this on myself any time soon.
I'll just leave it here in order someone would be interested.
https://github.com/radovskyb/watcher
With that library we could implement Input.Connect()
and Read()
the bytes of a file from that channel as we do now. I would like to try that out later
Some advise before I'll get to it?
So I think this behaviour should be added to the file
input rather than files
because files
specifically consumes each discrete file as a payload instead of line by line.
I would propose the following additions:
file
input. We need to preserve backwards compatibility here so path
needs to still allow a string
value, but we can either add another field paths
which is an array, or allow path
to be either a string or array of strings.cache
which allows users to specify a cache resource to store metadata about when and where we last read from each file being consumed.cache
is specified, for each file path being consumed we store the consumed position in the cache using the path as the key (maybe hashed). It might be worth storing this in a structured way so that we can add more context later (JSON format?) We should also flush these offsets in a separate goroutine in intervals.cache
is specified, for each file we query the cache to see if there's a pre-existing position to consume from. If there is not, or if the position is greater than the files current size (meaning it's been rotated) then we consume from the beginning.Allowing users to specify their own cache
resource not only means they can store this metadata however they like but it also gives them control over things like TTLs. It probably makes sense to eventually flesh out the file
cache type to support TTLs itself as it's the most likely candidate for this purpose.
I wish "file" input could support tail mode (with truncation/move detection, as in https://github.com/hpcloud/tail) and "super asterisk" as in https://github.com/influxdata/telegraf/tree/master/plugins/inputs/tail
Use case: reading syslog-generated log files (rotated and/or created based on current time)
@Jeffail Since the file
plugin has been deprecated, should this feature be in a new tail-file
plugin or be added as a feature to files
after all?
Hey @abh, it's actually the files
input that has been deprecated in favour of file
, the reason for that was because the file
input got a new field codec
along with supporting multiple paths with the new paths
field, and so it supports everything that the files
input did (and more).
However, I think it might be difficult to map over all the different codec
options to a watcher because they expect to consume an io.Reader
, whereas a file watcher will want to chop the file byte stream into discrete lines (or follow a custom delimiter), so I think it might be sensible to go with a separate implementation for now.
Maybe a good path would be to create a new input marked as experimental, iterate on it a few times, and if we can eventually find a way to introduce the codecs from the normal file
input then we can combine them, otherwise they'll remain separate.
Is this something you're considering working on? If so let me know if I can help or provide any guidance, it would be awesome to finally get it done.
Looks like https://github.com/influxdata/tail is a maintained version of https://github.com/hpcloud/tail
There's also https://github.com/nxadm/tail which looks a bit more active.
Just had a quick look in there and it doesn’t look like that much code, TBH. Might be worth maintaining that logic directly in Benthos.
LE: This is definitely not smth we want in Benthos: https://github.com/nxadm/tail/blob/master/winfile/winfile.go I wonder if there's a separate library for it...
Also need this.
I already started to use https://github.com/nxadm/tail and it’s been good .
For consistency consider following the SFTP "watcher" pattern. https://www.benthos.dev/docs/components/inputs/sftp
Thanks for an excellent project.
For consistency consider following the SFTP "watcher" pattern. https://www.benthos.dev/docs/components/inputs/sftp
Thanks for an excellent project.
Had a look. Its using polling. is that your point ? I think polling is also a good base to start from too. We can also add debounce too.
this could be used as a base: https://github.com/loov/watchrun/tree/master
Its using polling and also high resolution timers
It would be nice to have an opportunity to use benthos instead of filebeat or rsyslog for simple shipping logs so it could expand its influence and conquer more use-cases. But currently benthos'es "files" input reads path just once, hence in order to ship new logs we have to restart the instance. I also wonder if it has metadata in it to understand where benthos stopped its reading if we had it restarted.