Improved support for Kafka consumer only and schema-registry

henrikcaesar commented 1 year ago

Description

I have been reading the guide https://quarkus.io/guides/kafka-schema-registry-avro and it assumes that the producer and consumer is within the same Quarkus application. Many times, (perhaps most?), the consumer lives in a separate Quarkus application. And that case is not covered by the guide.

Also, a Kafka consumer only application does not work using quarkus-confluent-registry-avro. quarkus-confluent-registry-avro uses quarkus-avro for code-generation from Avro schemas without using any maven plugin. quarkus-avro assumes that the Avro schemas are placed in src/main/avro. But as a consumer I need to download the schema from the registry using kafka-schema-registry-maven-plugin, typically to target/avro, and hence no code is generated. Even if I download to src/main/avro no code is generated. So, to get this to work I need to use the avro codegen maven plugin. Since the guide doesn't mention this, it leads to confusion and trial and error before getting it to work.

Implementation ideas

Update quarkus-avro with configurable schema source directory? Perhaps won't work if codegen is run before download by kafka-schema-registry-maven-plugin.

Update https://quarkus.io/guides/kafka-schema-registry-avro with consumer only example to ease users trying to use Quarkus together with a schema register.

quarkus-bot[bot] commented 1 year ago

/cc @alesj, @cescoffier, @ozangunalp

cescoffier commented 1 year ago

As for gRPC (and protobuf), I would recommend sharing the Avro schema in a separate artifact. Alternatively, as proposed, you can use kafka-schema-registry-maven-plugin (but it only works for confluent) and set the output directory to src/main/avro.

cescoffier commented 1 year ago

I guess this should just be documented.

henrikcaesar commented 1 year ago

Alternatively, as proposed, you can use kafka-schema-registry-maven-plugin (but it only works for confluent) and set the output directory to src/main/avro.

Downloading to src/main/avro works and code is generated (couldn't get this to work last week but perhaps i downloaded the schemas to src/main/resources/avro by mistake).

We download the schema from the schema register on each build and generate code from them. Isn't it better to download to ${project.build.directory}/avro? But that doesn't work with quarkus-avro. If we have to download to src/main/avro these files, or the complete directory, must be added to .gitignore so they aren't added by mistake to the repository. Or is it perhaps recommended to download the files manually, and actually add the schemas to the repository?

ozangunalp commented 1 year ago

@henrikcaesar I've been looking at our avro codegen support lately. Quarkus codegen support (avro or protobuf for grpc) looks at all source directories configured by the build tool. So if you are using the Maven you can use build-helper-maven-plugin to add an additional source directory and quarkus-avro will look at source-dir/avro for schema files to compile. For your use case you can configure the kafka-schema-registry-maven-plugin to download schema into that directory.

pseudo pom.xml build portion should look like this:

<build>
 <plugins>
  <plugin>quarkus-maven-plugin</plugin>
  <plugin>
    <groupId>io.confluent</groupId>
    <artifactId>kafka-schema-registry-maven-plugin</artifactId>
    <version>7.3.0</version>
    <configuration>
        <schemaRegistryUrls>
            <param>http://127.0.0.1:8081</param>
        </schemaRegistryUrls>
        <outputDirectory>${project.build.directory}/schemas/avro/</outputDirectory>
        <goals>
            <goal>test-compatibility</goal>
        </goals>
        <phase>generate-sources</phase>
    </configuration>
  </plugin>
  <plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>build-helper-maven-plugin</artifactId>
    <version>3.3.0</version>
    <executions>
      <execution>
        <id>add-source</id>
        <phase>generate-sources</phase>
        <goals>
          <goal>add-source</goal>
        </goals>
        <configuration>
          <sources>
            <source>${project.build.directory}/schemas/</source>
          </sources>
        </configuration>
      </execution>
    </executions>
  </plugin>
 </plugins>
</build>

Just don't forget to invoke the generate-sources phase when you are on dev-mode.

cescoffier commented 1 year ago

@ozangunalp should we add a section in the doc?

ozangunalp commented 1 year ago

I mentioned this (very) briefly in #29489.

ozangunalp commented 1 year ago

@cescoffier But this doc is hidden inside kafka schema registry. I think we need to create separate codegen docs for avro and protobuf.

henrikcaesar commented 1 year ago

@ozangunalp thanks! I also think the docs could mention this. An consumer only example with schema registry could cover it? But an separate docs section for the code generation could also be useful.

cescoffier commented 1 month ago

gRPC already has a dedicated guide for the codegen (https://quarkus.io/guides/grpc-generation-reference). But, yes, +1 for having one on Kafka/Avro

quarkusio / quarkus

Improved support for Kafka consumer only and schema-registry #28600

Description

Implementation ideas