quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.58k stars 2.63k forks source link

Bump of Avro from 1.11.3 to 1.12.0 can results in AvroTypeException #42504

Open michalvavrik opened 1 month ago

michalvavrik commented 1 month ago

Describe the bug

With https://github.com/quarkusio/quarkus/pull/42324 I started to get exception during cache update in my app with SmallRye Reactive Messaging. I tried to check https://github.com/apache/avro/releases/tag/release-1.12.0 but I am not able to identify which commit caused this. I suppose there can be something wrong with my scheme. Additionally, there is still nothing about 1.12.0 in https://avro.apache.org/ so I don't think this is user friendly to expect our users to go one by one through Avro merge requests and try to identify what has changed.

Expected behavior

There is nothing about breaking changes in Quarkus migration guides, I think it should be documented that Avro version has changed. Suggestion how to deal with this is for extra points.

I am not able to tell whether this is actual bug or just my wrong scheme.

Actual behavior

I am getting exception and don't have expected messages:

12:12:16,841 INFO  [app] 12:12:16,536 Failed to update cache value for key: 2: org.apache.avro.AvroTypeException: Undefined schema: io.quarkus.ts.messaging.kafka.status
12:12:16,842 INFO  [app]    at org.apache.avro.util.SchemaResolver$ResolvingVisitor.visitNonTerminal(SchemaResolver.java:187)
12:12:16,843 INFO  [app]    at org.apache.avro.util.Schemas.visitNonTerminal(Schemas.java:109)
12:12:16,844 INFO  [app]    at org.apache.avro.util.Schemas.visit(Schemas.java:82)
12:12:16,845 INFO  [app]    at org.apache.avro.ParseContext.lambda$ensureSchemasAreResolved$0(ParseContext.java:301)
12:12:16,846 INFO  [app]    at java.base/java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:833)
12:12:16,848 INFO  [app]    at org.apache.avro.ParseContext.ensureSchemasAreResolved(ParseContext.java:301)
12:12:16,849 INFO  [app]    at org.apache.avro.ParseContext.resolve(ParseContext.java:324)
12:12:16,850 INFO  [app]    at org.apache.avro.Schema$Parser.parse(Schema.java:1541)
12:12:16,851 INFO  [app]    at org.apache.avro.Schema$Parser.parse(Schema.java:1515)
12:12:16,852 INFO  [app]    at io.apicurio.registry.serde.avro.AvroSchemaUtils.parse(AvroSchemaUtils.java:69)
12:12:16,853 INFO  [app]    at io.apicurio.registry.serde.avro.AvroSchemaParser.parseSchema(AvroSchemaParser.java:58)
12:12:16,854 INFO  [app]    at io.apicurio.registry.serde.avro.AvroSchemaParser.parseSchema(AvroSchemaParser.java:37)
12:12:16,855 INFO  [app]    at io.apicurio.registry.resolver.AbstractSchemaResolver.resolveSchemaWithReferences(AbstractSchemaResolver.java:220)
12:12:16,856 INFO  [app]    at io.apicurio.registry.resolver.AbstractSchemaResolver.lambda$resolveSchemaByGlobalId$1(AbstractSchemaResolver.java:188)
12:12:16,857 INFO  [app]    at io.apicurio.registry.resolver.ERCache.lambda$getValue$0(ERCache.java:201)
12:12:16,858 INFO  [app]    at io.apicurio.registry.resolver.ERCache.retry(ERCache.java:254)
12:12:16,859 INFO  [app]    at io.apicurio.registry.resolver.ERCache.getValue(ERCache.java:200)
12:12:16,860 INFO  [app]    at io.apicurio.registry.resolver.ERCache.getByGlobalId(ERCache.java:170)
12:12:16,861 INFO  [app]    at io.apicurio.registry.resolver.AbstractSchemaResolver.resolveSchemaByGlobalId(AbstractSchemaResolver.java:184)
12:12:16,862 INFO  [app]    at io.apicurio.registry.resolver.DefaultSchemaResolver.resolveSchemaByArtifactReference(DefaultSchemaResolver.java:176)
12:12:16,863 INFO  [app]    at io.apicurio.registry.serde.AbstractKafkaDeserializer.resolve(AbstractKafkaDeserializer.java:147)
12:12:16,864 INFO  [app]    at io.apicurio.registry.serde.AbstractKafkaDeserializer.readData(AbstractKafkaDeserializer.java:136)
12:12:16,865 INFO  [app]    at io.apicurio.registry.serde.AbstractKafkaDeserializer.deserialize(AbstractKafkaDeserializer.java:122)
12:12:16,865 INFO  [app]    at io.smallrye.reactive.messaging.kafka.fault.DeserializerWrapper.lambda$deserialize$1(DeserializerWrapper.java:77)
12:12:16,866 INFO  [app]    at io.smallrye.reactive.messaging.kafka.fault.DeserializerWrapper.wrapDeserialize(DeserializerWrapper.java:109)
12:12:16,867 INFO  [app]    at io.smallrye.reactive.messaging.kafka.fault.DeserializerWrapper.deserialize(DeserializerWrapper.java:77)
12:12:16,868 INFO  [app]    at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:73)
12:12:16,868 INFO  [app]    at org.apache.kafka.clients.consumer.internals.CompletedFetch.parseRecord(CompletedFetch.java:321)
12:12:16,869 INFO  [app]    at org.apache.kafka.clients.consumer.internals.CompletedFetch.fetchRecords(CompletedFetch.java:283)
12:12:16,869 INFO  [app]    at org.apache.kafka.clients.consumer.internals.FetchCollector.fetchRecords(FetchCollector.java:168)
12:12:16,870 INFO  [app]    at org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:134)
12:12:16,871 INFO  [app]    at org.apache.kafka.clients.consumer.internals.Fetcher.collectFetch(Fetcher.java:145)
12:12:16,871 INFO  [app]    at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.pollForFetches(LegacyKafkaConsumer.java:693)
12:12:16,871 INFO  [app]    at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.poll(LegacyKafkaConsumer.java:617)
12:12:16,872 INFO  [app]    at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.poll(LegacyKafkaConsumer.java:590)
12:12:16,872 INFO  [app]    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874)
12:12:16,873 INFO  [app]    at io.smallrye.reactive.messaging.kafka.impl.ReactiveKafkaConsumer.lambda$poll$4(ReactiveKafkaConsumer.java:199)
12:12:16,873 INFO  [app]    at io.smallrye.context.impl.wrappers.SlowContextualFunction.apply(SlowContextualFunction.java:21)
12:12:16,874 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniOnItemTransform$UniOnItemTransformProcessor.onItem(UniOnItemTransform.java:36)
12:12:16,874 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniOperatorProcessor.onItem(UniOperatorProcessor.java:47)
12:12:16,874 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniMemoizeOp.forwardTo(UniMemoizeOp.java:123)
12:12:16,875 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniMemoizeOp.subscribe(UniMemoizeOp.java:67)
12:12:16,875 INFO  [app]    at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
12:12:16,876 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27)
12:12:16,876 INFO  [app]    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
12:12:16,876 INFO  [app]    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
12:12:16,877 INFO  [app]    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
12:12:16,877 INFO  [app]    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
12:12:16,877 INFO  [app]    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
12:12:16,878 INFO  [app]    at java.base/java.lang.Thread.run(Thread.java:1583)

How to Reproduce?

Steps to reproduce:

git clone git@github.com:michalvavrik/quarkus-test-suite.git
cd quarkus-test-suite/messaging/kafka-strimzi-avro-reactive-messaging

# run one of
mvn clean verify -Dit.test=DevModeApicurioIT
mvn clean verify -Dit.test=StrimziKafkaAvroIT

Output of uname -a or ver

Fedora 40

Output of java -version

OpenJDK 21

Quarkus version or git rev

999-SNAPSHOT

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.9.6

Additional information

No response

quarkus-bot[bot] commented 1 month ago

/cc @alesj (kafka), @cescoffier (kafka), @ozangunalp (kafka)

michalvavrik commented 1 month ago

/cc @yrodiere

yrodiere commented 1 month ago

I tried to check https://github.com/apache/avro/releases/tag/release-1.12.0 but I am not able to identify which commit caused this

Could it be that this test was relying on unsafe/unspecified behavior?

Or it could just be a bug in Avro 1.12.

Additionally, there is still nothing about 1.12.0 in https://avro.apache.org/ so I don't think this is user friendly to expect our users to go one by one through Avro merge requests and try to identify what has changed.

That seems obvious, yes.

There is nothing about breaking changes in Quarkus migration guides, I think it should be documented that Avro version has changed

We don't really have anything actionable to document, though. I'd only add something if we can recommend users to take action.

Of course we could list every single dependency update there, just for information, but IMO that kind of thing only makes sense if it's automated.

/cc @yrodiere

I'll let @alesj / @cescoffier / @ozangunalp have a look, as the problem seems to be in reactive messaging. FWIW the build passed with the upgrade, so either yours is a very specific case, or there are missing tests in Quarkus.

michalvavrik commented 1 month ago

Could it be that this test was relying on unsafe/unspecified behavior?

I don't say it's not possible, if I knew I'd not create this issue. This status enum is generated and I use it in that application, so I'd expect that if it is generated, it can also be used.

We don't really have anything actionable to document, though. I'd only add something if we can recommend users to take action. Of course we could list every single dependency update there, just for information, but IMO that kind of thing only makes sense if it's automated.

This kinda depends on whether fault is at my scheme or not. Because if there is a breaking change caused by this bump, it must be documented.

yrodiere commented 1 month ago

We don't really have anything actionable to document, though. I'd only add something if we can recommend users to take action. Of course we could list every single dependency update there, just for information, but IMO that kind of thing only makes sense if it's automated.

This kinda depends on whether fault is at my scheme or not. Because if there is a breaking change caused by this bump, it must be documented.

Seems we agree then: we can't just mention the upgrade in the migration guide, we need a specific breaking change (and corresponding upgrade action). Let's see what everyone's investigation leads to.

ozangunalp commented 1 month ago

I don't know yet how exactly but the option apicurio.registry.dereference-schema=true resolves this issue. By the way, it is the default value since io.apicurio:apicurio-registry-serdes-avro-serde:2.5.11.Final (with https://github.com/Apicurio/apicurio-registry/issues/4463)

I don't get why dereferencing would be required with Avro 1.12.0, but it is maybe because of https://issues.apache.org/jira/browse/AVRO-3666

I'll let @carlesarnal comment.

rsvoboda commented 1 month ago

Quarkus uses 2.5.10.Final</apicurio-registry.version> - https://github.com/quarkusio/quarkus/blob/main/bom/application/pom.xml#L199

rsvoboda commented 1 month ago

We have applied workaround to our tests so the daily CI is not red because of this issue. @carlesarnal please let us know about Ozan's tip with AVRO-3666

fedinskiy commented 5 hours ago

It looks like this bug is not affecting Quarkus of the recent versions (eg 3.14), which supported by the fact, that https://issues.apache.org/jira/browse/AVRO-3666 is closed

rsvoboda commented 5 hours ago

@fedinskiy AVRO-3666 is the possible trigger / root cause

Quarkus main moved from apicurio-registry.version 2.5.10.Final to 2.6.3.Final during the time

And as hinted in https://github.com/quarkusio/quarkus/issues/42504#issuecomment-2286638391 apicurio.registry.dereference-schema=true is used since 2.5.11.Final

So the upgraded apicurio-registry is kinda hiding the issue.

@ozangunalp what to do with the issue? There was no response from @carlesarnal.

carlesarnal commented 5 hours ago

https://issues.apache.org/jira/browse/AVRO-3666

Sorry, I've been on PTO and this got lost in my inbox. I don't know why AVRO-3666 would cause the issue, I'll need to dig deeper into it. Our libraries have been tested with Avro 1.11.3, so there might be some issue behind it. I'll post here my findings.