kafkajs / confluent-schema-registry

is a library that makes it easier to interact with the Confluent schema registry
https://www.npmjs.com/package/@kafkajs/confluent-schema-registry
MIT License
157 stars 102 forks source link

Example for schema evolution? #46

Open krukru opened 4 years ago

krukru commented 4 years ago

Hi there,

Looking at the docs, you reference "Compatibility and schema evolution", but I am unable to find any example on how to achieve schema evolution, which is one of the main benefits of using Avro.

For example: I would like new consumers to be able to read old producer data. Does this library support this?

For example:

V1 schema

{
    "type": "record",
    "name": "Foo",
    "fields": [
        {
            "name": "f1",
            "type": "string"
        }
    ]
}

V2 schema

{
    "type": "record",
    "name": "Foo",
    "fields": [
        {
            "name": "f1",
            "type": "string"
        },
        {
            "name": "f2",
            "type": "string",
            "default": "",
        }
    ]
}

I would like V2 consumers to have the property f2 with value "" when reading data produced with schema v1, but this does not happen, since the message is decoded using the old schema

Nevon commented 4 years ago

None of this is really anything to do with this library, but rather with Avro. What this library does is read the schema id from messages, fetch the corresponding schema from the schema registry if needed, and then decoding the message using that schema. Anything to do with schema evolution is purely up to how you design your Avro schemas.

In your example, in order for the schema change to be backwards compatible, the field f2 would have to be optional. Most times you would probably make it a union of null and string, but maybe using a default value of empty string works as well - I'm not sure off the top of my head, but it's easy to try by just publishing your v1 and then seeing what happens when you try to publish v2. There is a table explaining what the different compatibility modes mean here: https://docs.confluent.io/current/schema-registry/avro.html#summary

Backwards compatiblity in an Avro sense of the word means to actually use the v2 schema to read messages produced with v1. That said, what would happen in reality with this library is that you would use the v1 schema to read the old message and the v2 schema to read the new message, so in reality f2 would be undefined whenever you read a v1 message (the key wouldn't even exist).

krukru commented 4 years ago

Hi @Nevon, thanks for the quick response! I went ahead and looked at the avsc (the lib which is used by this project to process Avro), and they mention resolvers as a way to achieve schema evolution. Looking at the Schema interface in this project, it does not expose the createResolver method, but it should exist in the Schema class. Would you consider it useful to expose createResolver in Schema?

ivank commented 3 years ago

At our shop we had developed a simpler implementation of schema registry back in the day, since this project was not yet available. Its called Castle (as in "Kafka's greatest work :)).

Anyway we've recently encountered this issue ourselves, so we did implement reader schemas.

You guys are right, Avsc does support it with createResolver, and it's pretty easy to implement. I'd be happy to give it a go and try to write a PR for it here too, if I had some guidance where you'd think its best to add it.

nick-zh commented 3 years ago

@ivank i would be really interested in this as well. @Nevon could you let us know if such a PR would be accepted and how we would best approach it? I think the flexibility of reader schemas could be a great asset.

nick-zh commented 3 years ago

I just saw, there is already an open PR for this #137