logstash-plugins / logstash-input-kinesis

Logstash Plugin for AWS Kinesis Input
Apache License 2.0
45 stars 57 forks source link

support CloudWatch Logs streams #2

Closed codekitchen closed 8 years ago

codekitchen commented 8 years ago

This issue was imported from the old repo @ https://github.com/codekitchen/logstash-input-kinesis/issues/6

CloudWatch logs can be supported today, if you set up a logstash-filter-split and gzip codec, see this comment.

However, this is a common enough use case that it makes sense to add native support in this plugin, so that people don't have to do all that extra setup.

vaibhavinbayarea commented 8 years ago

Following up on the discussion in the original thread, I'm presently using logstash-input-kinesis with data being transferred / received in gzip format.

My gzipped data presently looks like: [ {event_dict1}, {event_dict2}, {event_dict3}.. ]

where event_dict1 is a dictionary holding values for a log message (program name, msg, timestamp, severity.. etc).

So there is a use case in addition to working with cloudwatch format, not sure on the popularity though :)

Maybe an additional parameter field like "transport => gzip | cloudwatch ..." or "gzip_compressed_data => True"??

threadwaste commented 8 years ago

@codekitchen I'm pretty motivated to work on this. I have something baking for personal use, but wanted to shed assumptions in a contribution. Basically:

  1. Do you see a need for handling both cases? For example, adding compressed => [nil, "gzip"] and cloudwatch_logs => [true, false]. This brings up the issue of handling dependent configuration options, but makes the plugin a bit more flexible.
  2. I don't see the value in retaining the original logEvents field, nor nesting individual events under it ala the split plugin. So, I take the other top-level fields, merge each logEvent into a new event, decorate them, and send them off to the output queue. Result is n events with id, logGroup, logStream, message, messageType, owner, subscriptionFilters, and timestamp. Sane?
threadwaste commented 8 years ago

@codekitchen @vaibhavinbayarea I took a crack at the specific case in this issue, and support for general gzip decompression came out as a side effect.

vaibhavinbayarea commented 8 years ago

@threadwaste thats great! I can test the patch out sometime by the end of the week. Btw, I can only test for gzip'd scenario as I dont have cloudwatch in my setup.

codekitchen commented 8 years ago

This is great, thank you. Sorry about the silence, I try to be very responsive on OSS projects but I've been camping without internet access for the last 2 weeks. I'll be in a place on Monday where I can check this out in detail.

threadwaste commented 8 years ago

@codekitchen No reason for apologies. Sounds like time well spent! The PR definitely needs to be finished: README.md updates, clean up the branching logic in Worker#process_record, and so on. The foundation is likely reviewable enough, though.

vaibhavinbayarea commented 8 years ago

Hi @threadwaste,

I did a git checkout on your branch "cloudwatch-log-source", with the last commit being ed9a17699ca424c7c3ff49ea9879026f8d7fc840.

I have tested the following config:

input {
  kinesis {
    region => "us-west-1"
    application_name => "log_imports_openstack"
    kinesis_stream_name => "test_stream"
    compression => "gzip"
    codec => json
  }
}

The plugin worked as expected. Thanks!

threadwaste commented 8 years ago

@codekitchen I have cut a gem of the codec discussed in PR #3. Only updating this issue in case you wanted to close it out.

codekitchen commented 8 years ago

Great thanks @threadwaste , I'll add something to the README about it