quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.4k stars 2.57k forks source link

Quarkus 3 dev mode classloader issue - not working with aws crt http client #35569

Open madocx opened 10 months ago

madocx commented 10 months ago

Describe the bug

Quarkus dev mode classloader seems to have issues with the aws-crt http client that can be used with dynamodb. Everything works perfectly when compiled natively, but in dev mode get a strange QuarkusConfigFactory error. Looking at some old threads with similar errors, it seems like classloading is typically the culprit.

The issues appear to be limited to certain mutiny methods, as using "invoke" vs "call" does not produce the error.

Expected behavior

Operation returns successfully, just as it does when using netty-nio http client in dev mode, or the same as it does when using aws-crt client in native executable.

Actual behavior

Below error occurs when attempting to work with the Uni created from the aws sdk's response. Same thing occurs if using their SDKPublisher methods.

2023-08-25 11:51:44,739 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (vert.x-eventloop-thread-0) HTTP Request to /fruits failed, error id: 29981583-2563-4019-bac3-f820f0fdc29d-1: java.util.ServiceConfigurationError: io.smallrye.config.SmallRyeConfigFactory: io.quarkus.runtime.configuration.QuarkusConfigFactory not a subtype
    at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:589)
    at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1237)
    at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265)
    at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300)
    at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385)
    at io.smallrye.config.SmallRyeConfigProviderResolver.getFactoryFor(SmallRyeConfigProviderResolver.java:100)
    at io.smallrye.config.SmallRyeConfigProviderResolver.getConfig(SmallRyeConfigProviderResolver.java:76)
    at io.smallrye.config.SmallRyeConfigProviderResolver.getConfig(SmallRyeConfigProviderResolver.java:64)
    at org.eclipse.microprofile.config.ConfigProvider.getConfig(ConfigProvider.java:85)
    at io.smallrye.context.impl.DefaultValues.<init>(DefaultValues.java:41)
    at io.smallrye.context.SmallRyeContextManager.<init>(SmallRyeContextManager.java:63)
    at io.smallrye.context.SmallRyeContextManager$Builder.build(SmallRyeContextManager.java:337)
    at io.smallrye.context.SmallRyeContextManagerProvider.getContextManager(SmallRyeContextManagerProvider.java:48)
    at io.smallrye.context.SmallRyeContextManagerProvider.getContextManager(SmallRyeContextManagerProvider.java:37)
    at io.smallrye.context.SmallRyeContextManagerProvider.getManager(SmallRyeContextManagerProvider.java:97)
    at io.smallrye.context.SmallRyeThreadContext.getCurrentThreadContextOrDefaultContexts(SmallRyeThreadContext.java:160)
    at io.smallrye.mutiny.context.DefaultContextPropagationInterceptor.getThreadContext(DefaultContextPropagationInterceptor.java:12)
    at io.smallrye.mutiny.context.BaseContextPropagationInterceptor.decorate(BaseContextPropagationInterceptor.java:92)
    at io.smallrye.mutiny.infrastructure.Infrastructure.decorate(Infrastructure.java:254)
    at io.smallrye.mutiny.groups.UniOnItem.transform(UniOnItem.java:114)
    at io.smallrye.mutiny.groups.UniOnItem.lambda$call$2(UniOnItem.java:80)
    at io.smallrye.context.impl.wrappers.SlowContextualFunction.apply(SlowContextualFunction.java:21)
    at io.smallrye.mutiny.operators.uni.UniOnItemTransformToUni$UniOnItemTransformToUniProcessor.performInnerSubscription(UniOnItemTransformToUni.java:68)
    at io.smallrye.mutiny.operators.uni.UniOnItemTransformToUni$UniOnItemTransformToUniProcessor.onItem(UniOnItemTransformToUni.java:57)
    at io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage$CompletionStageUniSubscription.forwardResult(UniCreateFromCompletionStage.java:63)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallMetricCollectionStage.lambda$execute$0(AsyncApiCallMetricCollectionStage.java:56)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallTimeoutTrackingStage.lambda$execute$2(AsyncApiCallTimeoutTrackingStage.java:69)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:177)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:105)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.completeResponseFuture(MakeAsyncHttpRequestStage.java:238)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:163)
    at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
    at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
    at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:581)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)

How to Reproduce?

Reproducer: https://github.com/madocx/quarkus-reproducer

  1. Run with mvn quarkus:dev
  2. Make request to fruits resource

To eliminate error, either change http client to netty-nio, or replace call inocation with invoke.

Output of uname -a or ver

Linux a-20b6iv5ait3pg 5.15.122-77.145.amzn2.x86_64 #1 SMP Tue Aug 1 20:49:01 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

openjdk version "11.0.14.1" 2022-02-08 OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1) OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

3.2.4.Final

Build tool (ie. output of mvnw --version or gradlew --version)

apache-maven-3.9.4 (also occurs with older versions)

Additional information

No response

gsmet commented 10 months ago

@scrocquesel are you aware of this issue?

scrocquesel commented 10 months ago

@scrocquesel are you aware of this issue?

No, could it be move to the quarkiverse aws repo ?

@madocx is it a regression from a previous version ?

scrocquesel commented 10 months ago

What also fix the issue is to force running the completionStage back to the initial thread context.

    public Uni<Fruit> get(String name) {
        Key partitionKey = Key.builder().partitionValue(name).build();
        // capture context
        Context context = Vertx.currentContext();
        return Uni.createFrom().completionStage(() -> fruitTable.getItem(partitionKey))
                .emitOn(command -> context.runOnContext(x -> command.run())) // Restore the context
                .call(result -> Uni.createFrom().item(logResults(result)));
//              .invoke(result -> Uni.createFrom().item(logResults(result)));
    }

I don't have enough knowledge of how quarkus dev mode manage Thread and TCL to see what going on. But it certainly comes from the fact that aws crt is creating thread from the JNI context.

madocx commented 10 months ago

@scrocquesel I'm happy to move it there if you believe that's where it belongs. The error was coming from QuarkusConfigFactory so figured I'd start here, but if its a problem with the extension that definitely makes sense.

This code was running on 2.16.9.Final before, but was not using the awc-crt http client at that time.

scrocquesel commented 10 months ago

Could stay here, it's more related to quarkus intrinsics

bhdrk commented 10 months ago

I had the same problem working with the quarkus-mongo-client. My code was:

    getCollection().insertOne(doc)
      .map(checkDocResult(doc));

Adding the -Dmutiny.disableCallBackDecorators=true system property solves the problem. But it may cause another problem if you are working with context on mutiny.

scrocquesel commented 5 months ago

I've made some progress on this one and believe it is related to the one fixed for S3CRT by wrapping the FUTURE_COMPLETION_EXECUTOR to restore a previously captured TCCL (PR linked above). I'll explore what can be done for aws-crt in the quarkus-amazon-service extension, as it should be the same solution.

If you don't use the extension, I guess you will have to do something similar as I'm not aware of a way to fix that globally at the quarkus core level. The issue is that the completion callback is called from a JNI thread, and the classloader is not set automatically for those threads. I don't have enough knowledge on how threading works in Java regarding JNI. I know @stuartwdouglas did some magic with the default ForkJoinPool, but I guess AWS SDK is creating its own thread pool for processing the request, and we don't have control over it.