logstash-plugins / logstash-input-s3

Apache License 2.0
57 stars 150 forks source link

Files being unprocessed with the same last modified timestamp #221

Closed kaisecheng closed 3 years ago

kaisecheng commented 3 years ago

The precision of S3 timestamp is in second. Files with the same last_modified could be processed in two iterations. However, for every processed file, the timestamp will be written in sincedb. In the next iterations, if the timestamp is smaller or equal (<=) to sincedb, files will be left unprocessed. Therefore from time to time, users see some files with the same timestamp remain to unpick.

Related PR https://github.com/logstash-plugins/logstash-input-s3/issues/57 https://github.com/logstash-plugins/logstash-input-s3/pull/189 https://github.com/logstash-plugins/logstash-input-s3/pull/61 https://github.com/logstash-plugins/logstash-input-s3/pull/192

kaisecheng commented 3 years ago

Fixed in v3.6.0

yogevyuval commented 3 years ago

@kaisecheng This is great, but is it merged into logstash? meaning which version of logstash uses the new plugin?

kaisecheng commented 3 years ago

@yogevyuval Logstash 7.13

sunfriendli commented 1 year ago

The problem still exists in logstash-8.2.3 with logstash-input-s3 on 3.8.3. Files with same last modified timestamp may be ignore, and thereby lead to data lost.

kaisecheng commented 1 year ago

@sunfriendli Could you create a new issue in this repo with reproducing steps and log for further investigation?

sunfriendli commented 1 year ago

@sunfriendli Could you create a new issue in this repo with reproducing steps and log for further investigation?

Hello @kaisecheng , I created a new issue at #244