quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.34k stars 2.55k forks source link

Quarkus Scheduler, when calling a token-protected endpoint (keycloak) fails to refresh the token #41461

Open spirostz opened 5 days ago

spirostz commented 5 days ago

Describe the bug

Using Quarkus Scheduler, (@Scheduled). Used Latest Quarkus LTS 3.8.5

Inside scheduler method there is a RestClient calling an Token Protected API (eg. keycloak) When calling a token-protected endpoint (keycloak), fails to refresh the token after the token expiration see image.

I have FULLY reproduced the problem here, just git clone, START the application and wait 10 minutes: https://github.com/spirostz/quarkus_scheduler_token_problem

There is no problem when calling the same endpoint outside Quarkus Scheduler. Also reproduced in a test in the above repo.

image

Expected behavior

Inside scheduler method, all calls to a token-protected endpoint should never fail with 401 error

Actual behavior

Inside scheduler method, all calls to a token-protected endpoint should respond without error when

How to Reproduce?

1 Clone the https://github.com/spirostz/quarkus_scheduler_token_problem 2 Start the Application 3 Wait 10 minutes and you will see the error in the logs

Output of uname -a or ver

MINGW64_NT-10.0-22631 spirospc 3.3.3-341.x86_64 2022-01-17 11:45 UTC x86_64 Msys

Output of java -version

java version "17.0.8" 2023-07-18 LTS Java(TM) SE Runtime Environment (build 17.0.8+9-LTS-211) Java HotSpot(TM) 64-Bit Server VM (build 17.0.8+9-LTS-211, mixed mode, sharing)

Quarkus version or git rev

3.8.5

Build tool (ie. output of mvnw --version or gradlew --version)

Maven home: C:\tools\apache-maven-3.8.5-bin\apache-maven-3.8.5 Java version: 17.0.8, vendor: Oracle Corporation, runtime: C:\Program Files\Java\jdk-17 Default locale: en_US, platform encoding: Cp1252 OS name: "windows 11", version: "10.0", arch: "amd64", family: "windows"

Additional information

### logs containing the failure:

2024-06-26 13:07:48,052 INFO [org.tes.doc.DockerClientProviderStrategy] (build-42) Loaded org.testcontainers.dockerclient.NpipeSocketClientProviderStrategy from ~/.testcontainers.properties, will try it first 2024-06-26 13:07:48,262 INFO [org.tes.doc.DockerClientProviderStrategy] (build-42) Found Docker environment with local Npipe socket (npipe:////./pipe/docker_engine) 2024-06-26 13:07:48,264 INFO [org.tes.DockerClientFactory] (build-42) Docker host IP address is localhost 2024-06-26 13:07:48,279 INFO [org.tes.DockerClientFactory] (build-42) Connected to docker: Server Version: 26.1.4 API Version: 1.45 Operating System: Docker Desktop Total Memory: 11964 MB 2024-06-26 13:07:48,293 INFO [org.tes.ima.PullPolicy] (build-42) Image pull policy will be performed by: DefaultPullPolicy() 2024-06-26 13:07:48,295 INFO [org.tes.uti.ImageNameSubstitutor] (build-42) Image name substitution will be performed by: DefaultImageNameSubstitutor (composite of 'ConfigurationFileImageNameSubstitutor' and 'PrefixingImageNameSubstitutor') 2024-06-26 13:07:48,299 INFO [org.tes.DockerClientFactory] (build-42) Checking the system... 2024-06-26 13:07:48,300 INFO [org.tes.DockerClientFactory] (build-42) ?? Docker server version should be at least 1.6.0 2024-06-26 13:07:48,323 INFO [io.qua.oid.dep.dev.key.KeycloakDevServicesProcessor] (build-42) Using Quarkus powered Keycloak distribution 2024-06-26 13:07:48,343 INFO [tc.qua.io/.0.4] (build-42) Creating container for image: quay.io/keycloak/keycloak:24.0.4 2024-06-26 13:07:48,540 INFO [org.tes.uti.RegistryAuthLocator] (build-42) Credential helper/store (docker-credential-desktop) does not have credentials for quay.io 2024-06-26 13:07:48,548 INFO [tc.tes.6.0] (build-42) Creating container for image: testcontainers/ryuk:0.6.0 2024-06-26 13:07:48,620 INFO [org.tes.uti.RegistryAuthLocator] (build-42) Credential helper/store (docker-credential-desktop) does not have credentials for https://index.docker.io/v1/ 2024-06-26 13:07:48,741 INFO [tc.tes.6.0] (build-42) Container testcontainers/ryuk:0.6.0 is starting: 05f3bb0109b1548459d17c5443e581751ea5b3d17e9e79abc461924137459be0 2024-06-26 13:07:49,326 INFO [tc.tes.6.0] (build-42) Container testcontainers/ryuk:0.6.0 started in PT0.777672S 2024-06-26 13:07:49,404 INFO [tc.qua.io/.0.4] (build-42) Container quay.io/keycloak/keycloak:24.0.4 is starting: 621159e8996b1bfc69a27f226783bd47dad769e01496c13be5ed26e790461524 2024-06-26 13:08:11,118 INFO [tc.qua.io/.0.4] (build-42) Container quay.io/keycloak/keycloak:24.0.4 started in PT22.7750323S 2024-06-26 13:08:13,555 INFO [io.qua.oid.dep.dev.key.KeycloakDevServicesProcessor] (build-42) Dev Services for Keycloak started.


--/ \/ / / / | / \/ //_/ / / / / -/ // / // / |/ , / ,< / // /\ \
--_
__// |//|//||____//
2024-06-26 13:08:14,874 INFO [io.quarkus] (Quarkus Main Thread) code-with-quarkus 1.0.0-SNAPSHOT on JVM (powered by Quarkus 3.8.5) started in 28.167s. Listening on: http://localhost:8080 2024-06-26 13:08:14,876 INFO [io.quarkus] (Quarkus Main Thread) Profile dev activated. Live Coding activated. 2024-06-26 13:08:14,877 INFO [io.quarkus] (Quarkus Main Thread) Installed features: [cdi, keycloak-authorization, oidc, oidc-client, oidc-client-filter, resteasy, resteasy-client, resteasy-client-jackson, scheduler, security, smallrye-context-propagation, vertx] 2024-06-26 13:08:30,200 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:09:00,004 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:09:30,008 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:10:00,000 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:10:30,006 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:11:00,014 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:11:30,008 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:12:00,007 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:12:30,001 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:13:00,011 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:13:30,012 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:14:00,005 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:14:30,009 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:15:00,010 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:15:30,014 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:16:00,003 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:16:30,002 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:17:00,002 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:17:30,000 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:18:00,000 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:18:30,004 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:18:30,022 ERROR [io.qua.sch.com.run.StatusEmitterInvoker] (vert.x-worker-thread-1) Error occurred while executing task for trigger CronTrigger [id=1_org.acme.Schedule#schedule, cron=0/30 ?, gracePeriod=PT1S, timeZone=null]: java.util.concurrent.CompletionException: jakarta.ws.rs.WebApplicationException: Unknown error, status code 401 at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:874) at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:887) at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2325) at java.base/java.util.concurrent.CompletableFuture$MinimalStage.whenComplete(CompletableFuture.java:2902) at io.quarkus.scheduler.common.runtime.DefaultInvoker.invoke(DefaultInvoker.java:24) at io.quarkus.scheduler.common.runtime.StatusEmitterInvoker.invoke(StatusEmitterInvoker.java:35) at io.quarkus.scheduler.runtime.SimpleScheduler$ScheduledTask.doInvoke(SimpleScheduler.java:443) at io.quarkus.scheduler.runtime.SimpleScheduler$ScheduledTask$2.call(SimpleScheduler.java:425) at io.quarkus.scheduler.runtime.SimpleScheduler$ScheduledTask$2.call(SimpleScheduler.java:422) at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$0(ContextImpl.java:178) at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:279) at io.vertx.core.impl.ContextImpl.lambda$internalExecuteBlocking$2(ContextImpl.java:210) at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18) at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513) at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538) at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29) at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: jakarta.ws.rs.WebApplicationException: Unknown error, status code 401 at org.jboss.resteasy.microprofile.client.DefaultResponseExceptionMapper.toThrowable(DefaultResponseExceptionMapper.java:42) at org.jboss.resteasy.microprofile.client.ExceptionMapping$HandlerException.mapException(ExceptionMapping.java:60) at io.quarkus.restclient.runtime.QuarkusProxyInvocationHandler.invoke(QuarkusProxyInvocationHandler.java:172) at jdk.proxy5/jdk.proxy5.$Proxy90.call(Unknown Source) at org.acme.Schedule.schedule(Schedule.java:17) at org.acme.Schedule_ClientProxy.schedule(Unknown Source) at org.acme.Schedule_ScheduledInvoker_schedule_dce0524c3f1de653f83719900424446459b7c471.invokeBean(Unknown Source) ... 15 more

2024-06-26 13:19:00,005 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:19:30,013 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client 2024-06-26 13:20:00,009 INFO [org.acm.Schedule] (vert.x-worker-thread-1) Calling client

quarkus-bot[bot] commented 5 days ago

/cc @manovotn (scheduler), @mkouba (scheduler), @pedroigor (keycloak), @sberyozkin (keycloak)

sberyozkin commented 5 days ago

@spirostz I see you use Keycloak Dev Service, with default realm properties, and I can only guess that after 10 mins, a limit on a number of times a refresh token can be used is reached, or the refresh token's lifetime may be set equal to the access token's lifetime (FYI, I don't think it is possible to directly control a RT lifetime using Keycloak admin API).

I think the best option where it can be kept under control for a long time, while using the Dev Service, is to create a custom realm with the required properties and use quarkus.keycloak.devservice.realm-path property to get it loaded.

Can you please try to update https://github.com/quarkusio/quarkus/blob/main/extensions/oidc/deployment/src/main/java/io/quarkus/oidc/deployment/devservices/keycloak/KeycloakDevServicesProcessor.java#L777, to say 100 and see if it can help, we can set it to a high enough value by default if it makes a difference. I can work with your reproducer, but I have quite a few other issues in front of me right now

spirostz commented 5 days ago

@sberyozkin Thank you for you response, I did the change through the GUI and the problem is still there.

Network data exchanged: image

Keep in mind that I used Keycloak Dev Service just to replicate the problem and keep the reproduction setup minimal (plug n play approach). The problem identified on a production environment with production ready keycloak server.

There is no problem when calling the same endpoint outside Quarkus Scheduler, so I believe is something related to scheduler.

If you find sometime to reproduce it with my repo:

sberyozkin commented 5 days ago

@spirostz OK, I see, Can you please add quarkus.keycloak.devservice.show-logs=true and check what error is reported at the Keycloak end ?

spirostz commented 5 days ago

@sberyozkin I enabled the logs quarkus.keycloak.devservices.show-logs=true but nothing logged before or after the error.

I confirmed that logs worked since I saw them during the keycloak startup eg [org.keycloak.services] (main) KC-SERVICES0050: Initializing master realm

I pushed the new property to my repo.

image

sberyozkin commented 4 days ago

@spirostz Thanks, I'll have a closer look next week. Let me ask you to check something else in the meantime...

Can you confirm, that when Scheduler runs the client, the access token is reaching the protected endpoint, before it expires ?

Can you add a refresh token time skew property, quarkus.oidc-client.refresh-token-time-skew, and set it to a value larger than the scheduler interval, for ex, if it runs every 30 secs, set the skew to 70 secs... One possibility, is that a nearly expired token is sent to the target and by the time it arrives it has expired. So this property will ensure a nearly expired token gets refreshed

spirostz commented 7 hours ago

@sberyozkin By setting the quarkus.oidc-client.refresh-token-time-skew=70s there was no error.

sberyozkin commented 3 hours ago

@spirostz Ok, thanks, in that case, do you agree we can close this issue ? OIDC client does not refresh the token which is still valid, while the refresh token time skew property is taking care of the edge cases where the token might expire immediately after it was propagated. OidcClient itself can not rewind the request chain if the error is returned, which I guess can be indirectly resolved by using Fault tolerance annotations only...