hortonworks / registry

Schema Registry
Apache License 2.0
14 stars 8 forks source link

Add protobuf schema support #721

Open jhsenjaliya opened 4 years ago

jhsenjaliya commented 4 years ago

We should add protobuf schema support in the schema registry.

Here is what we are thinking this would include: 1) take protobuf IDL schema string 2) parse the schema using wire lib 3) create proto schema class that holds all the components of the schema 4) extract fields from schema and store as SchemaFieldInfo 5) provide backward, forward, both, none compatibility 5) store raw schema 6) provide convertor functionality ( XML or JSON to proto and other way around) with new API to request for XML or JSON representation of the stored proto schema ( similar to mentioned at Registry Roadmap (Integration->Convertor) )

required changes as heads up: 1) need to capture index of the field in schema_field_info table. This is relevant info for proto. 2) syntaxType in schema_metadata_info table so that each schema type can have its own syntax validator than current JSON for all

Please provide comments if you have any.

jhsenjaliya commented 4 years ago

@shanmukhsista

michaelandrepearce commented 4 years ago

Why not store and present as the schema as json form e.g. like .proto files, it would then be more align with how we are doing avro using a json form to store the schema and more consumable.

michaelandrepearce commented 4 years ago

There is some complexity to add to client side, to support dynamically registering by capturing the schema from the generic message. Not both apicur.io and confluent schema registry in adding support there have achieved this so can be used as point of reference.

Also should ensure that as like done with avro, the confluent compatibility layer should be extended to support protobuf

jhsenjaliya commented 4 years ago

I agree that taking JSON representation of proto3 ( btw, this is only for proto3 i think) would simplify things. but problem is user would require utility convert proto to JSON and its not very straight forward without going through proto parser and bunch of other things. Let me take a dig at this first, if it does not work out, JSON would be easy route anyway. Thanks !

michaelandrepearce commented 4 years ago

Go look at confluents code its there, also theres a jsonformatter in proto ;)

jhsenjaliya commented 4 years ago

Sure, json formatter is from proto lib so we definitely plan to use that for JSON -> proto_descriptor -> JSON. but i also wanted to provide proto_IDL -> proto_descriptor -> (proto_IDL or JSON or XML) let me put up the PR once we have it working version, pls help us on the review. Thanks

michaelandrepearce commented 4 years ago

Also remember will need client side support. And obviously the key bit would be to auto register a Message (as like avro) where it gets the schema from the object. And for DynamicMessage support, so an app can take the schema from registry and process the message without needing precompiled.

michaelandrepearce commented 4 years ago

Lastly something to think about. Is that its actually possible to convert a protobuf to avro (and likewise schema) it be good to support that in registry.

E.g use case would be a microservice world using predominant protobuf e.g. grpc or classical messaging, where then you want to ingest all that into kafka and larger data system thats all working in avro

jhsenjaliya commented 4 years ago

+1, the idea for protobuf -> avro sounds cool. will try that once we have initial protobuf support that we are planning for ( as you described in ur first comment ). Thanks for the help in making the feature better.