logstash-plugins / logstash-input-s3

Apache License 2.0
57 stars 150 forks source link

S3 plugin not functioning correctly for GZ files from Firehose #180

Open apatnaik14 opened 5 years ago

apatnaik14 commented 5 years ago

I was testing the s3 plugin for a production POC where a Firehose delivery system is delivering Cloudwatch logs into an S3 bucket from where I am reading it with the S3 plugin into logstash

My logstash config is as below:

input { s3 { bucket => "test" region => "us-east-1" role_arn => "test" interval => 10 additional_settings => { "force_path_style" => true "follow_redirects" => false } } }

output { elasticsearch { hosts => ["http://localhost:9200"] sniffing => false index => "s3-logs-%{+YYYY-MM-dd}" } stdout { codec => rubydebug } }

As I start up logstash locally, I can see the data reaching to logstash but its not in proper format, like below.

{ "type" => "s3", "message" => "\u001F�\b\u0000\u0000\u0000\u0000\u0000\u0000\u0000͒�n\u00131\u0010�_��\u0015�����x���MC)\u0005D\u0016!**", "@version" => "1", "@timestamp" => 2019-07-12T15:32:37.328Z }

I also tried adding a codec => "gzip_lines" into the configuration, but then logstash was not able to process those files at all. The documentation suggests S3 plugin is supposed to support GZ files out of the box. I was hoping if anyone could point out what I am doing wrong?

Regards, Arpan

Please find below version and OS information.

Luk3rson commented 4 years ago

Hi @yaauie, I am having the same issue. Is there an update on this? I tried to use several different decoders. Without any results.

Thanks a lot.

apatnaik14 commented 4 years ago

Hi @yaauie !

I was hoping to check on the plan to merge the above changes into the plugin?

Regards, Arpan

mrudrara commented 3 years ago

@apatnaik14 I am in similar boat as you! Wondering if you had any luck with other workarounds you may have tried?

Luk3rson commented 3 years ago

Hey @mrudrara , I created simple Lambda function which adds the extension to each file uploaded to the S3 bucket. This lambda is invoked by the PUT Event rule of the S3 bucket. I can share the function if you like to.

mrudrara commented 3 years ago

Thanks @Luk3rson! Really appreciate it. Wondering if you had issues with too many lambda invocations ever?

mrudrara commented 3 years ago

@Luk3rson can you share the function may be gist

thanks in advance

Luk3rson commented 3 years ago

Hi @mrudrara Apologize for the late reply, Here is my function Luk3rson's GZIP Lambda convertor Regards

mrudrara commented 3 years ago

Hi @Luk3rson Really appreciate it. Meanwhile while working AWS Support engineer they also recommended "Data Transformation with Lambda"

glen-uc commented 3 years ago

Hi @Luk3rson,@mrudrara

If the folder only contains gz logs then you can add this filter in the s3 plugin (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html#plugins-inputs-s3-gzip_pattern)

gzip_pattern >= ".*?$"

So that input plugin will treat the files as gz without appending a gz extension using the lambda