redpanda-data / connect

Fancy stream processing made operationally mundane
https://docs.redpanda.com/redpanda-connect/about/
8.13k stars 831 forks source link

upgrade "files" input with watchdog #271

Open DpoBoceka opened 5 years ago

DpoBoceka commented 5 years ago

It would be nice to have an opportunity to use benthos instead of filebeat or rsyslog for simple shipping logs so it could expand its influence and conquer more use-cases. But currently benthos'es "files" input reads path just once, hence in order to ship new logs we have to restart the instance. I also wonder if it has metadata in it to understand where benthos stopped its reading if we had it restarted.

Jeffail commented 5 years ago

Hey @DpoBoceka, I'm not opposed to adding the ability to watch and track input files. However, it's a fairly large task, so I'm not likely to take this on myself any time soon.

DpoBoceka commented 5 years ago

I'll just leave it here in order someone would be interested. https://github.com/radovskyb/watcher With that library we could implement Input.Connect() and Read()the bytes of a file from that channel as we do now. I would like to try that out later

DpoBoceka commented 4 years ago

Some advise before I'll get to it?

Jeffail commented 4 years ago

So I think this behaviour should be added to the file input rather than files because files specifically consumes each discrete file as a payload instead of line by line.

I would propose the following additions:

Allowing users to specify their own cache resource not only means they can store this metadata however they like but it also gives them control over things like TTLs. It probably makes sense to eventually flesh out the file cache type to support TTLs itself as it's the most likely candidate for this purpose.

miko commented 4 years ago

I wish "file" input could support tail mode (with truncation/move detection, as in https://github.com/hpcloud/tail) and "super asterisk" as in https://github.com/influxdata/telegraf/tree/master/plugins/inputs/tail

Use case: reading syslog-generated log files (rotated and/or created based on current time)

abh commented 3 years ago

@Jeffail Since the file plugin has been deprecated, should this feature be in a new tail-file plugin or be added as a feature to files after all?

Jeffail commented 3 years ago

Hey @abh, it's actually the files input that has been deprecated in favour of file, the reason for that was because the file input got a new field codec along with supporting multiple paths with the new paths field, and so it supports everything that the files input did (and more).

However, I think it might be difficult to map over all the different codec options to a watcher because they expect to consume an io.Reader, whereas a file watcher will want to chop the file byte stream into discrete lines (or follow a custom delimiter), so I think it might be sensible to go with a separate implementation for now.

Maybe a good path would be to create a new input marked as experimental, iterate on it a few times, and if we can eventually find a way to introduce the codecs from the normal file input then we can combine them, otherwise they'll remain separate.

Is this something you're considering working on? If so let me know if I can help or provide any guidance, it would be awesome to finally get it done.

mihaitodor commented 2 years ago

Looks like https://github.com/influxdata/tail is a maintained version of https://github.com/hpcloud/tail

Jeffail commented 2 years ago

There's also https://github.com/nxadm/tail which looks a bit more active.

mihaitodor commented 2 years ago

Just had a quick look in there and it doesn’t look like that much code, TBH. Might be worth maintaining that logic directly in Benthos.

LE: This is definitely not smth we want in Benthos: https://github.com/nxadm/tail/blob/master/winfile/winfile.go I wonder if there's a separate library for it...

gedw99 commented 1 year ago

Also need this.

I already started to use https://github.com/nxadm/tail and it’s been good .

terryherron commented 1 year ago

For consistency consider following the SFTP "watcher" pattern. https://www.benthos.dev/docs/components/inputs/sftp

Thanks for an excellent project.

gedw99 commented 1 year ago

For consistency consider following the SFTP "watcher" pattern. https://www.benthos.dev/docs/components/inputs/sftp

Thanks for an excellent project.

Had a look. Its using polling. is that your point ? I think polling is also a good base to start from too. We can also add debounce too.

gedw99 commented 1 year ago

this could be used as a base: https://github.com/loov/watchrun/tree/master

Its using polling and also high resolution timers