logstash-plugins / logstash-input-s3

Apache License 2.0
57 stars 150 forks source link

fix: issue with timestamp comparison #248

Open Mantas2 opened 1 year ago

Mantas2 commented 1 year ago

Problem: The logstash-input-s3 plugin has a known issue regarding object timestamp logic, causing problems when using S3-compatible storage solutions other than AWS S3. This issue has been discussed in the following GitHub pull request and issue: Pull Request: Fix object timestamp logic Issue: sincedb file not created, files from bucket not deleted

Proposed Solution: To address both problems, a suggested fix has been proposed in the pull request. This fix aims to make the logstash-input-s3 plugin compatible with more S3-compatible backends by improving the timestamp handling logic.

Context: It’s important to note that the logstash-input-s3 plugin was originally designed to work only with AWS S3 and does not officially support other S3-compatible storage solutions. However, implementing the proposed fix would make the plugin suitable for a significant number of alternative S3-compatible solutions, eliminating the need for unsupported forks.

Microseconds Comparison: The core issue lies in the comparison of timestamps with microseconds precision, which causes two main problems: the sincedb file not being created and duplicated reads of files from the S3 bucket. This issue is well-explained in the blog post titled “Time comparison in Ruby” by Railsware, which discusses the challenges and confusion associated with time comparison in Ruby. (Link)

Root Cause Uncertainty: It’s worth noting that the root cause of the microseconds difference between file list timestamps in buckets and the last sincedb writes is still uncertain. This issue does not occur when using the logstash-input-s3 plugin with AWS S3, only on other S3 compatible backends.

Issue in question is present in Cloudfare R2 and DigitalOcean Spaces, and the fix has been tested with them as well: Cloudflare R2: A S3-compatible storage solution provided by Cloudflare. (Link) DigitalOcean Spaces: A S3-compatible object storage service offered by DigitalOcean. (Link)

cla-checker-service[bot] commented 1 year ago

❌ Author of the following commits did not sign a Contributor Agreement: 4ad48b6a8e5c124fa83e8499fbe6977cb1238254

Please, read and sign the above mentioned agreement if you want to contribute to this project

Mantas2 commented 1 year ago

I have an update from the R2 Engineering team - they have confirmed that the issue with the Logstash plugin in question is related to the difference in timestamp granularity between R2 and S3. While S3 uses timestamps with second granularity, R2 offers millisecond granularity. It also seems to be the case for other S3 compatible instances like Digitalocean Spaces.

Romuss commented 1 year ago

@Mantas2 please sign a contributor agreement. We have the same problem with plugin. Thanks

Mantas2 commented 1 year ago

@Mantas2 please sign a contributor agreement. We have the same problem with plugin. Thanks

I have been trying to, multiple times, still waiting for someone to verify it, I guess

Derekt2 commented 5 months ago

same issue here with minio S3 buckets

fabionitto commented 4 months ago

Will this code be merged? Having the same problem here with Minio Buckets