Closed brandond closed 4 years ago
Just got back fro vacation will check asap.
@ph I've been running this for a few weeks now, watching a couple S3 buckets containing CloudTrail logs and (more recently) Firehose exports from CloudWatch Logs. It works a lot differently than it used to, but runs with much lower overhead than it did previously.
I haven't even looked at any of the tests; I can try to get those cleaned up at some point.
Here's my first shot at a patch to address #86. It changes the sincedb and poller functionality to store a 'marker' so that the poller remembers where it was in the object list last time it ran, and pick up there again. It does this by holding a rolling tail list of objects that have been queued for processing, and remembering the earliest one in the list that has been successfully processed. This is where it resumes when listing bucket objects.
This really only works for buckets where objects can be counted on to show up at the 'end' of the key space within a given prefix. This is guaranteed to be true for things like CloudTrail, but probably not other use cases. I'll probably continue to enhance this PR to allow for selectable polling strategies that could include: