logstash-plugins / logstash-integration-aws

Apache License 2.0
7 stars 17 forks source link

Backporting (s3-output PR #249) the fix of uploading corrupted GZIP to AWS S3 object. #20

Closed mashhurs closed 1 year ago

mashhurs commented 1 year ago

Release notes

Logstash now restores and uploads the corrupted files to S3.

What does this PR do?

Backports logstash-output-s3 corrupted S3 file issue case (PR #249).

Why is it important/What is the impact to the user?

When using GZIP encoding option with output to AWS S3 plugin, there are cases where Logstash may be crashed. When Logstash crashed GZIP stream is left opened and no tail in the file exist. Logstash uploads corrupted file to S3 at restart but customers who download S3 file and use, they figured out the file is corrupted. This PR aims to recover the corrupted file at restart time and upload healthy GZIP file to S3.

Checklist

Author's Checklist

How to test this PR locally

  1. Use following config
    input {
    stdin {}
    }
    output {
    s3 {
        region => "ca-central-1"
        bucket => "logstash-mashhur-test-1"
        codec => "json_lines"
        canned_acl => "private"
        prefix => "test-%{+YYYY.MM.dd}"
        additional_settings => {
            "force_path_style" => true
        }
        encoding => "gzip"
        upload_queue_size => 10
        upload_workers_count => 2
        time_file => 1
        rotation_strategy => "time"
        temporary_directory => "/Users/mashhur/Dev/elastic/temp/s3-temp"
        validate_credentials_on_root_bucket => false
    }
    }
  2. Send some input data and kill the Logstash process
  3. Re-run the Logstash
  4. Download S3 object uploaded with AWS-CLI. Note that downloading on browser automatically unzips the file. Better use plain AWS-CLI.
    aws s3api get-object --bucket logstash-mashhur-test-1 --key test-2023.01.05/ls.s3.b2854743-82b4-4a73-9350-36746e6ff3aa.2023-01-05T13.21.part2.txt.gz my_downloaded_s3_object.txt.gz
  5. Unzip the file and check the content
    gunsip my_downloaded_s3_object.txt.gz
    cat my_downloaded_s3_object.txt

Related issues