Open Irate-Walrus opened 3 years ago
Hi @Irate-Walrus
I am sorry for the delay. Your feature request makes sense.
Why did I follow the design
one serializer per schema
? BecauseFaust
was design with1 topic --> 1 schema
It appears _loads and _dumps must be synchronous due to Faust compatibility: Yes. I opened a PR a long time ago to make everything
async
. UnfortunatelyFaust
was not maintained any more. I am part offaust-streaming
but I do not have enough time to contribute.
I think that this feature will be helpful, but what I would suggest is:
Avro/Json
schemas should be register before hand in the schema server. This is a good practice.schema_id
and the serialization_type
must be included in the kafka headers
. This is also a good practice.kafka headers
and you will know which schema you have to use in order to deserialize. If we follow the ⬆️ steps, you won't need any explicit relationship between Record
and Schema
Hi @marcosschroh,
Thanks for spending the time to get back 🎉
It is a shame that Faust
is no longer maintained, I was using faust-streaming
but eventually moved on when I wanted rpc-like features and async
.
1.
Avro/Json
schemas should be registered before hand in the schema server. This is a good practice.I totally agree, I was thinking of something more like a code-first approach here. Run a cli tool, generate the schemas and push them up to the registry.
2. The
schema_id
and theserialization_type
must be included in thekafka headers
. This is also a good practice.First I've heard of this, although it makes a lot of sense. I assume
Confluent
uses a magic byte askafka headers
are relatively new. I haven't used their products recently, so I am unsure if they now use headers.you won't need any explicit relationship between
Record
andSchema
I was taking the perspective of something like
FastAPI
where thekafka message
is automatically deserialised and then parsed into the correctAvroModel
class. Not sure this is possible without registering theAvroModel
somehow.
Happy for this issue to be closed if you feel like this is not within the goals/scope of this project 😊. Much appreciated.
In a pythonic world
, using only the AvroModel
will be enough because all the teams in your organization will use the same models but this is not always the case as teams can use different programming languages to talk to kafka
. In this sense, you need a way to share metadata for events and you do it using kafka headers
.
As you correctly mentioned, Confluent
has its own protocol with the magic byte
, the reason behind is that kafka headers
are relative "new" and because they needed a way to tell their consumers which schema
was used to serialize
the event, they included it in the payload
.
I think that using pre-registered schemas
and using the kafka headers
is the way to go. Even if you use AvroModel
, I will recommend sending the schema-id
in the headers
. I have in the backlog a ticket to add the Meta.schema_id
in dataclasses-avroschemas
that will help on this cases (used mainly as documentation I guess). Also, I think we need a Generic
serializer that will be smart enough to serialize/deserialize Confluent and Non Confluent events
What do you think?
In a
pythonic world
, using only theAvroModel
will be enough because all the teams in your organization will use the same models but this is not always the case as teams can use different programming languages to talk tokafka
. In this sense, you need a way to share metadata for events and you do it usingkafka headers
.
I concur, obviously you will still need to let the serializer know what class it will be serializing to/from but this can be independent of the actual pre-registered schema. It does raise the question of checks as to whether the schema actually matches the class representation of it. Although you could probably offload that to something like pydantic
.
I think that using pre-registered
schemas
and using thekafka headers
is the way to go. Even if you useAvroModel
, I will recommend sending theschema-id
in theheaders
. I have in the backlog a ticket to add theMeta.schema_id
indataclasses-avroschemas
that will help on this cases (used mainly as documentation I guess). Also, I think we need aGeneric
serializer that will be smart enough to serialize/deserialize Confluent and Non Confluent events
For dataclasses-avroschemas
I approached it the same way as you, and added additional information to their Meta
class as a custom solution, but as you said documentation of that would help. A Generic
serializer is a good idea, whether it detects support for kafka headers
or is a simple as a user configuration
.
Feature Request✨ While I may be treating the package incorrectly, it appears that, by intention, there exists one serializer per schema. I was interested in a generic serializer that took any registered
Record
/AvroModel
subclass and attempted to serialize/deserialize it. Additionally the ability to register a specific schema to a confluent kafka topic value/key was desirable.I quickly whipped up a example:
Record
namespace.register_to_topic
will attempt to register the schema as a confluent topic value or key schema.Considerations 🔍️:
Record
namespace is for model identificationQuestions 🤔:
_loads
and_dumps
must be synchronous due to Faust compatibility, would there be anyway to use anAsyncSchemaRegistryClient
and support async avro writers within these functions?Notes 📝: