logstash-plugins / logstash-input-s3

Apache License 2.0
57 stars 150 forks source link

CloudTrail-optimized polling #86

Open brandond opened 8 years ago

brandond commented 8 years ago

A very common use case for S3 polling is ingest of CloudTrail logs, which have a fixed key format within a bucket: /AWSLogs/<AccountId>/CloudTrail/<region>/<YYYY>/<MM>/<DD>/<AccountId>_CloudTrail_<region>_<ISODate>_<random>.json.gz

Given this fixed structure, ingest and incremental polling can be optimized given:

The process would look something like:

Using the above logic, the lastdb file only needs to persist a small amount of information:

I am happy to work on this with an optimized poller class that could be selected via configuration option. Not sure if I should fork the current master branch, or the WIP threading branch?

joshbrand commented 8 years ago

EBS logs in S3 also follow a similar convention, and could easily work with this, just a slightly different prefix /AWSLogs/<accountid>/elasticloadbalancing/<region>/ so it would be awesome if this could apply to those as well