logstash-plugins / logstash-input-s3

Apache License 2.0
57 stars 150 forks source link

sincedb file not created, files from bucket not deleted #236

Open niekosau opened 2 years ago

niekosau commented 2 years ago

Logstash information:

  1. Logstash version (e.g. bin/logstash --version) logstash 8.0.0
  2. Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker) rpm repository from https://artifacts.elastic.co/packages/8.x/yum
  3. How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes) systemd unit provided by package
  4. How was the Logstash Plugin installed shipped with logstash

OS version (uname -a if on a Unix-like system): Rocky linux 8.5 (4.18.0-348.7.1.el8_5.x86_64) Description of the problem including expected versus actual behavior: files from bucket (radosgw on-premise) not removed, sincedb file not created/updated

Steps to reproduce: Configuration:

input {
  s3 {
    access_key_id => "XXXXXXXXXX"
    secret_access_key => "xxxxxxxxxxxxxxxxx"
    bucket => "test-bucket"
    endpoint => "https://s3.domain.tld"
    delete => true
    sincedb_path  => "/var/lib/logstash/s3-sincedb.db"
    additional_settings => {
      force_path_style => true
      follow_redirects => false
    }
  }
}
output {
  stdout {}
}

Tested older plugin versions, last working version: 3.5.0 Plugin downgraded by executing: bin/logstash-plugin install --version 3.5.0 logstash-input-s3

Please include a minimal but complete recreation of the problem, including (e.g.) pipeline definition(s), settings, locale, etc. The easier you make for us to reproduce it, the more likely that somebody will take the time to look at it.

  1. Create a bucket and put some files
  2. Start logstash with minimal configuration
  3. Files not removed after processing, sincedb file not created, so next interval same files processed again.
dabelousov commented 2 years ago

@niekosau can you show your logstash log of input.s3? May be i have a same problem

zeroad commented 2 years ago

I'm using logstash 7.17.4 with plugin version 3.8.3

I think the comparison logic is the culprit here. In my case it is comparing timestamps which end with different local formats.

Before this I added

::File.open('/tmp/debug.log', 'a') { |file| file.write("object.last_modified: ", object.last_modified, ",  log.last_modified: ", log.last_modified, "\n") }

Which gives me:

object.last_modified: 2021-06-25 16:38:23 +0000,  log.last_modified: 2021-06-25 16:38:23 UTC

I'm not familiar with ruby but when comparing the dates it probably casts them to the above string representations which are not the same.

In the first step I fixed it on my side by comparing the unix time stamps

if object.last_modified.to_i == log.last_modified.to_i

Still this works not 100% as expected. The logic applied here is to save the latest timestamp seen in s3 to the sincedb. If you do not delete your files in the s3 you will always import all files which have the same timestamp as stored in the sincedb path. Which means you will import the latest file again and again.

To fix this issue I always add 1s to the timestamp written to the sincedb, after this line

since = since + 1

I'm not sure if there will be any side effects when simply adding +1 to the timestamp (at least it works for me now as expected).
One should probably fix the comparising logic.