trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.43k stars 3k forks source link

Support Confluent Schema Registry #2105

Open jeqo opened 4 years ago

jeqo commented 4 years ago

Confluent Schema Registry is commonly used to store Avro schemas, and reduce Kafka records size by storing an ID instead of the schema on the record message.

Currently, only Avro files are supported to decode Kafka records.

I'd like to propose adding support for Schema Registry on the Avro decoder.

Work items:

martint commented 4 years ago

@elonazoulay, you've been looking into this, no?

elonazoulay commented 4 years ago

Yep, we will contribute what we have shortly, we also have a use case for this: we use the schema registry to supply metadata and publish keys and values using the String and Avro Kafka deserializers from confluent.

@jeqo sounds very similar to our use case, once we put what we have up it would be great to collaborate on this, there might be things that are very specific to our use case, we can make them more general.

elonazoulay commented 4 years ago

@jeqo, here is the pull request: #2361 - still cleaning up the part where schema registry is used to infer the schema (i.e. without the need for json files).

elonazoulay commented 4 years ago

It looks like there is a lot of overlap between #2106 and #2361 :)

OneCricketeer commented 4 years ago

Link https://github.com/prestodb/presto/issues/11354

(edit) Why are there multiple repos?

zhenik commented 4 years ago

Link prestodb/presto#11354

(edit) Why are there multiple repos?

facebook distribution and community opensource distribution. Blog link explanation . I had the same question when I saw your issue @cricket007

findepi commented 4 years ago

@zhenik the blog you linked above is just a subjective commentary and IMO doesn't explain much. @cricket007 You can see previous discussion under https://github.com/prestosql/presto/issues/380. If you have any doubts, I encourage you to reach out on our community slack.

Let's keep the discussion here focused on the Confluent Schema Registry.

OneCricketeer commented 4 years ago

Thanks for the links!

I only opened that issue after I saw Kafka reader was added