logstash-plugins / logstash-input-s3

Apache License 2.0
57 stars 150 forks source link

Fix missing file ingestion in same second #220

Closed kaisecheng closed 3 years ago

kaisecheng commented 3 years ago

This PR makes two changes to fix the unprocessed file

  1. Take the last_modified from s3 summary instead of details In the original implementation, the time difference of the same iteration between summary (t1) and details (t2) could be long depending on the file size. Meanwhile, it has a chance that the file got updated and the updated timestamp saves in sincedb. The next iteration, only files with last_modified > sincedb will be processed. Therefore, files updated between t1 and t2 could be left
  2. Set the cutoff time. last_modified > cutoff (now - 3s) will be process in next iteration. Files with the same last_modified (t1) could be readable after summary (t1) call. Due to the checking last_modified > sincedb, the next iteration would not process files with last_modified (t1). By setting cutoff time, hopefully s3 return a repeatable file list.

Related Issue https://github.com/logstash-plugins/logstash-input-s3/issues/221

The red CI is from Logstash 6.8 which has never been green since branching