quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.53k stars 2.61k forks source link

Newer versions of rest-client-jackson start getting NoStackTraceTimeoutException after several hours of running while ignore retry parameters #42029

Open fedinskiy opened 1 month ago

fedinskiy commented 1 month ago

Describe the bug

I have an application, which retrieves data every several minutes from https://api.github.com/graphql using json client. When the app uses the latest 3.x versions, all these requests start failing with NoStackTraceTimeoutException after 30-50 hours after start up. When the app uses 3.8.5, it doesn't encounter this problem.

The client is annotated with @Retry(delay = 2, delayUnit = SECONDS), but this seems to be ignored, since the error is logged in two bursts with 10 second intervals, like this: 2024-07-20 19:09:00,910 2024-07-20 19:09:00,960 2024-07-20 19:19:00,838 2024-07-20 19:19:01,077 2024-07-20 19:29:01,043 2024-07-20 19:29:01,050 2024-07-20 19:39:01,051 2024-07-20 19:39:01,333

The application is run in Openshift. When the app is restarted by scaling to 0 and back, it runs without problems for another day.

Expected behavior

The requests do not start to fail with timeouts and use retry options for retries

Actual behavior

The app was started at 2024-07-19 08:36:51,772. There was several errors in the logs (eg 502 bad gateway), then there was this:

2024-07-20 18:39:01,614 ERROR [io.qua.sch.com.run.StatusEmitterInvoker] (executor-thread-404) Error occurred while executing task for trigger IntervalTrigger [id=1_io.quarkus.activity.GitHubOpenPrQueueService#updateOpenPrQueueInOrganization, interval=600000]: java.util.concurrent.CompletionException: jakarta.ws.rs.ProcessingException: The timeout of 30000 ms has been exceeded when getting a connection to api.github.com:443
    at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
    at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:874)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:887)
    at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2325)
    at java.base/java.util.concurrent.CompletableFuture$MinimalStage.whenComplete(CompletableFuture.java:2902)
    at io.quarkus.scheduler.common.runtime.DefaultInvoker.invoke(DefaultInvoker.java:24)
    at io.quarkus.scheduler.common.runtime.StatusEmitterInvoker.invoke(StatusEmitterInvoker.java:35)
    at io.quarkus.scheduler.runtime.SimpleScheduler$ScheduledTask.doInvoke(SimpleScheduler.java:463)
    at io.quarkus.scheduler.runtime.SimpleScheduler$ScheduledTask$2.call(SimpleScheduler.java:445)
    at io.quarkus.scheduler.runtime.SimpleScheduler$ScheduledTask$2.call(SimpleScheduler.java:442)
    at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$0(ContextImpl.java:178)
    at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:279)
    at io.vertx.core.impl.ContextImpl.lambda$internalExecuteBlocking$2(ContextImpl.java:210)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:604)
    at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1495)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: jakarta.ws.rs.ProcessingException: The timeout of 30000 ms has been exceeded when getting a connection to api.github.com:443
    at org.jboss.resteasy.reactive.client.impl.InvocationBuilderImpl.unwrap(InvocationBuilderImpl.java:212)
    at org.jboss.resteasy.reactive.client.impl.InvocationBuilderImpl.method(InvocationBuilderImpl.java:328)
    at io.quarkus.activity.graphql.GraphQLClient$$QuarkusRestClientInterface.graphql(Unknown Source)
    at io.quarkus.activity.graphql.GraphQLClient$$CDIWrapper.graphql(Unknown Source)
    at io.quarkus.activity.graphql.GraphQLClient$$CDIWrapper_Subclass.graphql$$superforward(Unknown Source)
    at io.quarkus.activity.graphql.GraphQLClient$$CDIWrapper_Subclass$$function$$1.apply(Unknown Source)
    at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:73)
    at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:62)
    at io.smallrye.faulttolerance.FaultToleranceInterceptor.lambda$syncFlow$3(FaultToleranceInterceptor.java:258)
    at io.smallrye.faulttolerance.core.InvocationContext.call(InvocationContext.java:19)
    at io.smallrye.faulttolerance.core.Invocation.apply(Invocation.java:29)
    at io.smallrye.faulttolerance.core.retry.Retry.doApply(Retry.java:92)
    at io.smallrye.faulttolerance.core.retry.Retry.apply(Retry.java:46)
    at io.smallrye.faulttolerance.FaultToleranceInterceptor.syncFlow(FaultToleranceInterceptor.java:260)
    at io.smallrye.faulttolerance.FaultToleranceInterceptor.intercept(FaultToleranceInterceptor.java:187)
    at io.smallrye.faulttolerance.FaultToleranceInterceptor_Bean.intercept(Unknown Source)
    at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
    at io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
    at io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
    at io.quarkus.activity.graphql.GraphQLClient$$CDIWrapper_Subclass.graphql(Unknown Source)
    at io.quarkus.activity.graphql.GraphQLClient$$CDIWrapper_ClientProxy.graphql(Unknown Source)
    at io.quarkus.activity.github.GitHubService.getOpenPrQueueInOrganization(GitHubService.java:175)
    at io.quarkus.activity.github.GitHubService_ClientProxy.getOpenPrQueueInOrganization(Unknown Source)
    at io.quarkus.activity.GitHubOpenPrQueueService.buildOpenPrQueueInOrganization(GitHubOpenPrQueueService.java:47)
    at io.quarkus.activity.GitHubOpenPrQueueService.updateOpenPrQueueInOrganization(GitHubOpenPrQueueService.java:29)
    at io.quarkus.activity.GitHubOpenPrQueueService_ClientProxy.updateOpenPrQueueInOrganization(Unknown Source)
    at io.quarkus.activity.GitHubOpenPrQueueService_ScheduledInvoker_updateOpenPrQueueInOrganization_e5d168e0f7b339244f388e8ce1217ce6aabfb5e7.invokeBean(Unknown Source)
    ... 16 more
Caused by: io.vertx.core.impl.NoStackTraceTimeoutException: The timeout of 30000 ms has been exceeded when getting a connection to api.github.com:443

And then, starting from 2024-07-20 18:49:00,900 and 2024-07-20 18:49:01,256 there are two of these error in the logs every 10 seconds for the next two (at the time of writing) days.

How to Reproduce?

  1. git clone -b reproducer/latest git@github.com:fedinskiy/gh-activity.git
  2. Create .env file with the following content: ACTIVITY_TOKEN=${an API token for github}
  3. Start it locally with mvn quarkus:dev For openshift:
  4. Create an Openshift secret with the following content:
    apiVersion: v1
    kind: Secret
    metadata:
    name: gh-activity-token
    stringData:
    ACTIVITY_TOKEN: your_token
  5. Deploy the app: mvn clean package -Dquarkus.openshift.deploy=true -Dquarkus.openshift.env.vars."activity-logins"=fedinskiy

Output of uname -a or ver

4.18.0-553.5.1.el8_10.x86_64

Output of java -version

17.0.7, vendor: Red Hat, Inc.

Quarkus version or git rev

3.12.3

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)

Additional information

$ oc version
Client Version: 4.15.0-202404030309.p0.g62c4d45.assembly.stream.el8-62c4d45
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Kubernetes Version: v1.27.6+d54805
quarkus-bot[bot] commented 1 month ago

/cc @brunobat (opentelemetry,tracing), @cescoffier (rest-client), @geoand (jackson,kubernetes,openshift,rest-client), @gsmet (jackson), @iocanel (kubernetes,openshift), @radcortez (opentelemetry,tracing)