logstash-plugins / logstash-input-s3

Apache License 2.0
58 stars 152 forks source link

Files with same last modified timestamp miss processing #244

Open sunfriendli opened 2 years ago

sunfriendli commented 2 years ago

Logstash information

  1. Logstash version,8.2.3
  2. Logstash installation source, logstash-8.2.3-linux-x86_64.tar.gz
  3. How is Logstash being run, supervisor
  4. How was the Logstash Plugin installed, default plugin with Logstash
  5. input conf
    input {
        s3 {
                type => "something"
                sincedb_path => "/data/elk/logstash-8.2.3/since_db_s3_something"
                temporary_directory => "/tmp/logstash/shippersomething"
                bucket => "s3-bucket"
                prefix => "logserver/something"
                interval => 120
                region => "us-west-2"
                codec => "json"
                access_key_id => "*********************"
                secret_access_key => "****************************"
        }
    }

    JVM version, 1.8.0_232 OS version,CentOS Linux release 7.6.1810

We upload our log file to S3 every minute on the public network, then Logstash pull them from S3 and output to ES on the subnet.

When the public network environment is not good,some files upload to S3 may fail and lead to re-upload. When the network environment back to normal, several files may have the same last modified timestamp. When Logstash process those files with same last modified timestamp, some files were missed.

This problem occurs frequently. In my case, files upload to S3 every minute, 3~10 files missed every day.

TheVastyDeep commented 1 year ago

See also 191.