quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.87k stars 2.71k forks source link

Bump of Avro from 1.11.3 to 1.12.0 can results in AvroTypeException #42504

Open michalvavrik opened 3 months ago

michalvavrik commented 3 months ago

Describe the bug

With https://github.com/quarkusio/quarkus/pull/42324 I started to get exception during cache update in my app with SmallRye Reactive Messaging. I tried to check https://github.com/apache/avro/releases/tag/release-1.12.0 but I am not able to identify which commit caused this. I suppose there can be something wrong with my scheme. Additionally, there is still nothing about 1.12.0 in https://avro.apache.org/ so I don't think this is user friendly to expect our users to go one by one through Avro merge requests and try to identify what has changed.

Expected behavior

There is nothing about breaking changes in Quarkus migration guides, I think it should be documented that Avro version has changed. Suggestion how to deal with this is for extra points.

I am not able to tell whether this is actual bug or just my wrong scheme.

Actual behavior

I am getting exception and don't have expected messages:

12:12:16,841 INFO  [app] 12:12:16,536 Failed to update cache value for key: 2: org.apache.avro.AvroTypeException: Undefined schema: io.quarkus.ts.messaging.kafka.status
12:12:16,842 INFO  [app]    at org.apache.avro.util.SchemaResolver$ResolvingVisitor.visitNonTerminal(SchemaResolver.java:187)
12:12:16,843 INFO  [app]    at org.apache.avro.util.Schemas.visitNonTerminal(Schemas.java:109)
12:12:16,844 INFO  [app]    at org.apache.avro.util.Schemas.visit(Schemas.java:82)
12:12:16,845 INFO  [app]    at org.apache.avro.ParseContext.lambda$ensureSchemasAreResolved$0(ParseContext.java:301)
12:12:16,846 INFO  [app]    at java.base/java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:833)
12:12:16,848 INFO  [app]    at org.apache.avro.ParseContext.ensureSchemasAreResolved(ParseContext.java:301)
12:12:16,849 INFO  [app]    at org.apache.avro.ParseContext.resolve(ParseContext.java:324)
12:12:16,850 INFO  [app]    at org.apache.avro.Schema$Parser.parse(Schema.java:1541)
12:12:16,851 INFO  [app]    at org.apache.avro.Schema$Parser.parse(Schema.java:1515)
12:12:16,852 INFO  [app]    at io.apicurio.registry.serde.avro.AvroSchemaUtils.parse(AvroSchemaUtils.java:69)
12:12:16,853 INFO  [app]    at io.apicurio.registry.serde.avro.AvroSchemaParser.parseSchema(AvroSchemaParser.java:58)
12:12:16,854 INFO  [app]    at io.apicurio.registry.serde.avro.AvroSchemaParser.parseSchema(AvroSchemaParser.java:37)
12:12:16,855 INFO  [app]    at io.apicurio.registry.resolver.AbstractSchemaResolver.resolveSchemaWithReferences(AbstractSchemaResolver.java:220)
12:12:16,856 INFO  [app]    at io.apicurio.registry.resolver.AbstractSchemaResolver.lambda$resolveSchemaByGlobalId$1(AbstractSchemaResolver.java:188)
12:12:16,857 INFO  [app]    at io.apicurio.registry.resolver.ERCache.lambda$getValue$0(ERCache.java:201)
12:12:16,858 INFO  [app]    at io.apicurio.registry.resolver.ERCache.retry(ERCache.java:254)
12:12:16,859 INFO  [app]    at io.apicurio.registry.resolver.ERCache.getValue(ERCache.java:200)
12:12:16,860 INFO  [app]    at io.apicurio.registry.resolver.ERCache.getByGlobalId(ERCache.java:170)
12:12:16,861 INFO  [app]    at io.apicurio.registry.resolver.AbstractSchemaResolver.resolveSchemaByGlobalId(AbstractSchemaResolver.java:184)
12:12:16,862 INFO  [app]    at io.apicurio.registry.resolver.DefaultSchemaResolver.resolveSchemaByArtifactReference(DefaultSchemaResolver.java:176)
12:12:16,863 INFO  [app]    at io.apicurio.registry.serde.AbstractKafkaDeserializer.resolve(AbstractKafkaDeserializer.java:147)
12:12:16,864 INFO  [app]    at io.apicurio.registry.serde.AbstractKafkaDeserializer.readData(AbstractKafkaDeserializer.java:136)
12:12:16,865 INFO  [app]    at io.apicurio.registry.serde.AbstractKafkaDeserializer.deserialize(AbstractKafkaDeserializer.java:122)
12:12:16,865 INFO  [app]    at io.smallrye.reactive.messaging.kafka.fault.DeserializerWrapper.lambda$deserialize$1(DeserializerWrapper.java:77)
12:12:16,866 INFO  [app]    at io.smallrye.reactive.messaging.kafka.fault.DeserializerWrapper.wrapDeserialize(DeserializerWrapper.java:109)
12:12:16,867 INFO  [app]    at io.smallrye.reactive.messaging.kafka.fault.DeserializerWrapper.deserialize(DeserializerWrapper.java:77)
12:12:16,868 INFO  [app]    at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:73)
12:12:16,868 INFO  [app]    at org.apache.kafka.clients.consumer.internals.CompletedFetch.parseRecord(CompletedFetch.java:321)
12:12:16,869 INFO  [app]    at org.apache.kafka.clients.consumer.internals.CompletedFetch.fetchRecords(CompletedFetch.java:283)
12:12:16,869 INFO  [app]    at org.apache.kafka.clients.consumer.internals.FetchCollector.fetchRecords(FetchCollector.java:168)
12:12:16,870 INFO  [app]    at org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:134)
12:12:16,871 INFO  [app]    at org.apache.kafka.clients.consumer.internals.Fetcher.collectFetch(Fetcher.java:145)
12:12:16,871 INFO  [app]    at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.pollForFetches(LegacyKafkaConsumer.java:693)
12:12:16,871 INFO  [app]    at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.poll(LegacyKafkaConsumer.java:617)
12:12:16,872 INFO  [app]    at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.poll(LegacyKafkaConsumer.java:590)
12:12:16,872 INFO  [app]    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874)
12:12:16,873 INFO  [app]    at io.smallrye.reactive.messaging.kafka.impl.ReactiveKafkaConsumer.lambda$poll$4(ReactiveKafkaConsumer.java:199)
12:12:16,873 INFO  [app]    at io.smallrye.context.impl.wrappers.SlowContextualFunction.apply(SlowContextualFunction.java:21)
12:12:16,874 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniOnItemTransform$UniOnItemTransformProcessor.onItem(UniOnItemTransform.java:36)
12:12:16,874 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniOperatorProcessor.onItem(UniOperatorProcessor.java:47)
12:12:16,874 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniMemoizeOp.forwardTo(UniMemoizeOp.java:123)
12:12:16,875 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniMemoizeOp.subscribe(UniMemoizeOp.java:67)
12:12:16,875 INFO  [app]    at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
12:12:16,876 INFO  [app]    at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27)
12:12:16,876 INFO  [app]    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
12:12:16,876 INFO  [app]    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
12:12:16,877 INFO  [app]    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
12:12:16,877 INFO  [app]    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
12:12:16,877 INFO  [app]    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
12:12:16,878 INFO  [app]    at java.base/java.lang.Thread.run(Thread.java:1583)

How to Reproduce?

Steps to reproduce:

git clone git@github.com:michalvavrik/quarkus-test-suite.git
cd quarkus-test-suite/messaging/kafka-strimzi-avro-reactive-messaging

# run one of
mvn clean verify -Dit.test=DevModeApicurioIT
mvn clean verify -Dit.test=StrimziKafkaAvroIT

Output of uname -a or ver

Fedora 40

Output of java -version

OpenJDK 21

Quarkus version or git rev

999-SNAPSHOT

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.9.6

Additional information

No response

quarkus-bot[bot] commented 3 months ago

/cc @alesj (kafka), @cescoffier (kafka), @ozangunalp (kafka)

michalvavrik commented 3 months ago

/cc @yrodiere

yrodiere commented 3 months ago

I tried to check https://github.com/apache/avro/releases/tag/release-1.12.0 but I am not able to identify which commit caused this

Could it be that this test was relying on unsafe/unspecified behavior?

Or it could just be a bug in Avro 1.12.

Additionally, there is still nothing about 1.12.0 in https://avro.apache.org/ so I don't think this is user friendly to expect our users to go one by one through Avro merge requests and try to identify what has changed.

That seems obvious, yes.

There is nothing about breaking changes in Quarkus migration guides, I think it should be documented that Avro version has changed

We don't really have anything actionable to document, though. I'd only add something if we can recommend users to take action.

Of course we could list every single dependency update there, just for information, but IMO that kind of thing only makes sense if it's automated.

/cc @yrodiere

I'll let @alesj / @cescoffier / @ozangunalp have a look, as the problem seems to be in reactive messaging. FWIW the build passed with the upgrade, so either yours is a very specific case, or there are missing tests in Quarkus.

michalvavrik commented 3 months ago

Could it be that this test was relying on unsafe/unspecified behavior?

I don't say it's not possible, if I knew I'd not create this issue. This status enum is generated and I use it in that application, so I'd expect that if it is generated, it can also be used.

We don't really have anything actionable to document, though. I'd only add something if we can recommend users to take action. Of course we could list every single dependency update there, just for information, but IMO that kind of thing only makes sense if it's automated.

This kinda depends on whether fault is at my scheme or not. Because if there is a breaking change caused by this bump, it must be documented.

yrodiere commented 3 months ago

We don't really have anything actionable to document, though. I'd only add something if we can recommend users to take action. Of course we could list every single dependency update there, just for information, but IMO that kind of thing only makes sense if it's automated.

This kinda depends on whether fault is at my scheme or not. Because if there is a breaking change caused by this bump, it must be documented.

Seems we agree then: we can't just mention the upgrade in the migration guide, we need a specific breaking change (and corresponding upgrade action). Let's see what everyone's investigation leads to.

ozangunalp commented 3 months ago

I don't know yet how exactly but the option apicurio.registry.dereference-schema=true resolves this issue. By the way, it is the default value since io.apicurio:apicurio-registry-serdes-avro-serde:2.5.11.Final (with https://github.com/Apicurio/apicurio-registry/issues/4463)

I don't get why dereferencing would be required with Avro 1.12.0, but it is maybe because of https://issues.apache.org/jira/browse/AVRO-3666

I'll let @carlesarnal comment.

rsvoboda commented 3 months ago

Quarkus uses 2.5.10.Final</apicurio-registry.version> - https://github.com/quarkusio/quarkus/blob/main/bom/application/pom.xml#L199

rsvoboda commented 3 months ago

We have applied workaround to our tests so the daily CI is not red because of this issue. @carlesarnal please let us know about Ozan's tip with AVRO-3666

fedinskiy commented 2 months ago

It looks like this bug is not affecting Quarkus of the recent versions (eg 3.14), which supported by the fact, that https://issues.apache.org/jira/browse/AVRO-3666 is closed

rsvoboda commented 2 months ago

@fedinskiy AVRO-3666 is the possible trigger / root cause

Quarkus main moved from apicurio-registry.version 2.5.10.Final to 2.6.3.Final during the time

And as hinted in https://github.com/quarkusio/quarkus/issues/42504#issuecomment-2286638391 apicurio.registry.dereference-schema=true is used since 2.5.11.Final

So the upgraded apicurio-registry is kinda hiding the issue.

@ozangunalp what to do with the issue? There was no response from @carlesarnal.

carlesarnal commented 2 months ago

https://issues.apache.org/jira/browse/AVRO-3666

Sorry, I've been on PTO and this got lost in my inbox. I don't know why AVRO-3666 would cause the issue, I'll need to dig deeper into it. Our libraries have been tested with Avro 1.11.3, so there might be some issue behind it. I'll post here my findings.

carlesarnal commented 2 months ago

It turns out Avro changed the exception they're throwing between those two versions when a schema cannot be resolved, causing this problem. Dereferencing the schema solves the issue because it's being resolved and the whole content is being inlined by the server at deserializing time. In Avro 1.11.3, a SchemaParseException is thrown in this situation where as in Avro 1.12 a AvroTypeException is thrown. I'll add a test case on our side to cover this, but the underlying problem is that we are actually using an older version of Avro, so even if we add test cases, they would be discovered in Quarkus first if we don't align the versions.

ozangunalp commented 2 months ago

Thanks for looking into this @carlesarnal ! So this stems from having an incompatible Avro version between Quarkus and apicurio-registry-serdes-avro-serde ?

carlesarnal commented 2 months ago

Thanks for looking into this @carlesarnal ! So this stems from having an incompatible Avro version between Quarkus and apicurio-registry-serdes-avro-serde ?

That's exactly it, yes. That, and that the tests for the avro support in the schema-registry extension are probably not complex enough. We would have caught this in Apicurio Registry (I immediately got the error once I upgraded on our side), but the integration tests in Quarkus have to be extended IMO.

michalvavrik commented 1 week ago

That's exactly it, yes. That, and that the tests for the avro support in the schema-registry extension are probably not complex enough. We woul have caught this in Apicurio Registry (I immediately got the error once I upgraded on our side), but the integration tests in Quarkus have to be extended IMO.

+1

gkozyryatskyy commented 1 day ago

Hello guys! Im seeing exactly same problem after quarkus version update to 3.15.2. I try to fix and update apicurio to 2.6.5 even set APICURIO_REGISTRY_DEREFERENCE_SCHEMA: true manually, but this does not help. After reading the thread, seems like the problem exactly in new Avro 1.12.0 dependency. Any suggestion how i can fix it for quarkus 3.15.2 P.S. for example Downgrade Avro to older version?

gkozyryatskyy commented 1 day ago

I solved the problem from my side. For whoever need details regarding this, problem was:

carlesarnal commented 1 day ago

I solved the problem from my side. For whoever need details regarding this, problem was:

  • I had apicurio deployed with schemas with references
  • apicurio.registry.dereference-schema=true forces to push dereferenced schemas to apicurio, but if schemas already with references there, it is not helped
  • so I set apicurio.registry.deserializer.dereference-schema=true instead to force deserializer dereference the schemas on request
  • property should be set to every channel mp.messaging.incoming.*.apicurio.registry.deserializer.dereference-schema=true

Thanks for your comment @gkozyryatskyy. That is the only workaround for now. I'm already working on upgrading avro on our side. We'll do a new release and it will be fixed in newer versions of Quarkus I'm assuming.