logstash-plugins / logstash-codec-avro

A logstash codec plugin for decoding and encoding Avro records
Apache License 2.0
15 stars 63 forks source link

adding support for stripping magic byte and schema id #20

Closed bruce-szalwinski closed 7 years ago

bruce-szalwinski commented 7 years ago

Support for stripping magic byte and schema id

When using Avro schema to post to Kafka topic, the messages all have five byte header (magic byte + 4 bytes for schema id). When using avro codec, these bytes need to be stripped off before trying to decode the message. The strip_header config triggers this behavior.

joekiller commented 7 years ago

Builds pass and code looks alright to me.

bruce-szalwinski commented 7 years ago

Thanks @joekiller. What's next? Just need a committer to say LGTM?

joekiller commented 7 years ago

Yeah unfortunately I'm not a contributor on here. I'll see who I can ping.

-Joe Sent via mobile.

On Nov 2, 2016 3:56 PM, "Bruce Szalwinski" notifications@github.com wrote:

Thanks @joekiller https://github.com/joekiller. What's next? Just need a committer to say LGTM?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-codec-avro/pull/20#issuecomment-257980346, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-bx7gD2AV9jW9mSNmpe6GnO6t0Lbjaks5q6OrfgaJpZM4KcSsS .

talevy commented 7 years ago

thank you for the contribution, but this is a very specific protocol built on top of Avro.

there is already a codec for such a protocol being pushed by Confluent:

https://github.com/revpoint/logstash-codec-avro_schema_registry

bruce-szalwinski commented 7 years ago

Ok. Thanks. I just looked over at logstash-code-avro_schema_registry and it does exactly what the strip_header => true config would have done. So now the community gets two codecs when they could have had one and a config. Oh well. Thanks for playing.

mrauter commented 7 years ago

+1 for this PR. Avro schema is a very common use case

reachfh commented 6 years ago

There is an Avro standard for "single-object encoding", in the spec here: https://avro.apache.org/docs/1.8.2/spec.html#binary_encoding

It consists of:

  1. A two-byte marker, C3 01, to show that the message is Avro and uses this single-record format (version 1).
  2. The 8-byte little-endian CRC-64-AVRO fingerprint of the object's schema
  3. The Avro object encoded using Avro's binary encoding

This is independent of the schema registry, and useful for people who do not want to run a schema registry. So it would be useful for this plugin to support it, rather than making another.

spockz commented 1 year ago

This seems like very basic functionality to include in the standard plugin without complicating it. Can we reconsider merging this feature?

reachfh commented 1 year ago

I ended up writing a plugin to handle the header: https://github.com/cogini/logstash-codec-avro_header

ouwe-knutselaar commented 1 year ago

I would still like to have this functionality inside this plugin instead of using the https://github.com/revpoint/logstash-codec-avro_schema_registry.

If you update logstash, the Gemfile of Ruby is updated and the link to the logstash-codec-avro_schema_registry is lost. When logstash is restarted it cannot find the plugin and goes in to a high CPU loop. You don't want that on your business systems.

So every time you have to update logstash you have to be aware that you also need to reinstall the non standard plugins.