Open vanga opened 9 years ago
@vanga it's a very old thread, but did you figure out any solution for this. I'm in the same situation as of now. Specifically feeding Load Balancer logs.
I'm thinking of implementing point no. 3, but that comes at the cost of no real time logs.
I went with 1st option.
We have S3 access logs being collected in a bucket. We are using S3 input plugin to index these files into ELK.
After a couple of months usage we noticed unusual no of requests made to S3 (~1 Billion/Month) which costs $440, this is only the charge for the no of requests which is negligible for most of the use cases, and no one even bothers about this cost.
When I looked at the billing reports, there were around 950 Million HEAD reqeusts made to the bucket which has these logs. S3 input plugin must be making all these requests (file watching?)
I am not sure if there is any need to do some kind of optimization on the plugin part. I think the logs that people store in S3 don't change over time(my assumption), so if a file is indexed already, then there is no need to watch that.
From user perspective, the options I can think of, to avoid these requests are 1) Move the files to different location after the indexing is done 2) Download the files to local drive using a cron job and use file input plugin to index to ES 3) Use daily prefixes, so that plugin watches only those files, log files are named with timestamps 4) Change the default interval to something higher if having some delay is fine, S3 access logs are hourly generated, so there is an hour delay anyway.
Any opinions and suggestions are welcome.
Thanks