logstash-plugins / logstash-integration-aws

Apache License 2.0
7 stars 17 forks source link

S3 input does not delete files #45

Open keeshoekzema opened 3 months ago

keeshoekzema commented 3 months ago

I am aware that I am using an S3-compatible solution and this is not supported. However, this is an easy fix.

Logstash information:

Please include the following information:

  1. Logstash version: 8.13.4
  2. Logstash installation source: docker.elastic.co/logstash/logstash:8.13.4

Description of the problem including expected versus actual behavior:

When running the S3 input-plugin, files are not deleted or backed up when using Ceph-RGW as S3 storage

Steps to reproduce:

input {
  s3 {
    access_key_id => "${ACCESS_KEY_ID}"
    secret_access_key => "${SECRET_ACCESS_KEY}"
    bucket => "test"
    additional_settings => {
      force_path_style => true
      follow_redirects => false
    }
    endpoint => "http://ceph-rgw"
    delete => true
  }
}

Provide logs (if relevant):

This gives the following output:

[2024-05-23T14:52:01,565][INFO ][logstash.inputs.s3       ][main][..] object-in-bucket-xyz is updated at 2024-05-23 13:58:33 UTC and will process in the next cycle

Which means the file is not deleted. In the next cycle, the timestamps still don't match so it again is not deleted.

When changing the log output to include both timestamps here the issue becomes clear:

object.last_modified: 2024-05-23 13:33:33.000000000 Z
log.last_modified: 2024-05-23 13:33:33.898000000 Z

An easy fix would be to change line 380 to:

if object.last_modified.floor == log.last_modified.floor

I have not tested this against S3, but i assume they give the timestamp in miliseconds while ceph-rgw does not. I totally understand it if you don't want to fix it, in that case i'll just leave this information here as a workaround for anyone having the same problem.