Open brandond opened 8 years ago
EBS logs in S3 also follow a similar convention, and could easily work with this, just a slightly different prefix /AWSLogs/<accountid>/elasticloadbalancing/<region>/
so it would be awesome if this could apply to those as well
A very common use case for S3 polling is ingest of CloudTrail logs, which have a fixed key format within a bucket:
/AWSLogs/<AccountId>/CloudTrail/<region>/<YYYY>/<MM>/<DD>/<AccountId>_CloudTrail_<region>_<ISODate>_<random>.json.gz
Given this fixed structure, ingest and incremental polling can be optimized given:
The process would look something like:
/AWSLogs/<AccountId>/CloudTrail/<region>/
prefixes<YYYY>/<MM>/<DD>/
sub-prefix<DD>
token from current_prefix and calllist_objects_v2({prefix: parent_prefix, start_after: current_prefix})
<MM>
and<YYYY>
tokens/AWSLogs/<AccountId>/CloudTrail/<region>
prefixes are present and spawn new poller threads as necessary/AWSLogs/<AccountId>/CloudTrail/<region>
prefix disappears, it should terminate.Using the above logic, the lastdb file only needs to persist a small amount of information:
/AWSLogs/<AccountId>/CloudTrail/<region>/
prefixes with:<YYYY>/<MM>/<DD>/
)I am happy to work on this with an optimized poller class that could be selected via configuration option. Not sure if I should fork the current master branch, or the WIP threading branch?