kafka-ops / julie

A solution to help you build automation and gitops in your Apache Kafka deployments. The Kafka gitops!
MIT License
421 stars 114 forks source link

Solution proposal for reference schema support (#282) #293

Open marnold-twilio opened 3 years ago

marnold-twilio commented 3 years ago

Is your feature request related to a problem? Please describe. Reference schemas are not currently supported as described in #282 .

Describe the solution you'd like I'm happy to write up the solution but wanted to open an issue for discussion before opening a PR. My proposed solution is as follows:

  1. Add a new key to the descriptor files schema section where the value is a list of file paths to reference schemas:

    context: "my"
    source: "dev"
    projects:
    - name: "test"
    topics:
      - name: "my.topic.1.0"
        config:
          replication.factor: "1"
          num.partitions: "1"
        schemas:
          - value.schema.file: "schemas/avro/MyOrder-with-reference.avsc"
            references:
              - "schemas/avro/reference.avsc"
              - "schemas/avro/reference-with-reference.avsc"

    This will be parsed and the list of references will be accessible via the Subject object.

  2. The ultimate goal is to be able to register the reference schemas in order (backwards topological sort) and pass in the references when registering the schemas in SchemaRegistryManager.save

Let's say we have schema A which references B. And schema B has references to schema B. To properly register the scheams, we can create a dependency list of the schemas: [[C, D], [B], [A]]. C and D will be registered and register will return their version. At that point we can register B and have the SchemaReference objects for C & D to pass to schemaRegistryClient.parseSchema when registering schema B. The same process applied for registering schema A.

Describe alternatives you've considered We currently use the maven-schema-registry maven plugin and while it works fine, we would like to consolidate our schema and topic management system into the same descriptor files.

Additional context I'm happy to add this feature and have some of the code already written. I just wanted to bring up my proposed solution for review from @purbon

varminas commented 1 month ago

Is there any progress on this issue? At the moment I am forced to use the workaround solution by copying the same type in all the schemas. KTB can compile such a configuration, but as you know, "the last" type wins. This is a bad praxis as the same type must be copy-pasted and in case of changes there is a big possiblity to introduce some mistakes as the type which should be defined once and referenced then from other schemas need to be updated in all schemas.

It was kind of "ok" for long time, but after the update of the maven plugin "org.apache.avro:avro-compiler" to version 1.12.0 it became a real problem, because that plugin became more strict and it does not allow to redefine the same time multiple times. So, I need to use older version of the maven plugin in order to make schemas compatible for KTB.