logstash-plugins / logstash-input-s3

Apache License 2.0
57 stars 150 forks source link

Miss files due to the same last modified time #191

Open brucezhao11 opened 4 years ago

brucezhao11 commented 4 years ago

We're using logstash s3 input plugin to consume files from Spark output. As Spark writes files concurrently, many files have the same last modified time. While S3 does not guarantee strong consistency, logstash may only list some of them. Then some files may be missed to process.

S3 Description: A process writes a new object to Amazon S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.

File List: 2019-10-27 02:56:03 214283086 part-00000-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:56:03 214282388 part-00001-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:55:59 213951314 part-00002-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:56:03 214436993 part-00003-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:56:03 214117584 part-00004-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:55:59 214373123 part-00005-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:56:03 214342724 part-00006-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:56:03 214619587 part-00007-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:55:59 214146139 part-00008-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:56:03 214505891 part-00009-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:56:03 214004818 part-00010-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json 2019-10-27 02:55:59 214139449 part-00011-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json

jasonpepper commented 4 years ago

I think this is a duplicate of issue #57

hard-working-boy commented 4 years ago

https://github.com/logstash-plugins/logstash-input-s3/issues/57 set sincedb_path to /dev/null to ignore mtime. @jasonpepper add the sincedb_disabled property not accepted . Is the pull request not accepted?