quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.35k stars 2.56k forks source link

App fails to start when connection to OIDC server times out #41551

Closed sbaeumlisberger closed 14 hours ago

sbaeumlisberger commented 5 days ago

Describe the bug

When the connection to the OIDC server times out on startup the app crashes with the following error:

2024-06-28 12:07:35,007 WARN  [io.qua.oid.run.OidcRecorder] (vert.x-eventloop-thread-1) OIDC server is not available at the 'http://idp--test:8082/auth/realms/master' URL. Please make sure it is correct. Note it has to end with a realm value if you work with Keycloak, for example: 'https://localhost:8180/auth/realms/quarkus'
2024-06-28 12:07:35,026 ERROR [io.qua.run.Application] (main) Failed to start application (with profile [prod]): java.lang.RuntimeException: Failed to start quarkus
    at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
    at io.quarkus.runtime.Application.start(Application.java:101)
    at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:111)
    at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
    at io.quarkus.runtime.Quarkus.run(Quarkus.java:44)
    at io.quarkus.runtime.Quarkus.run(Quarkus.java:124)
    at io.quarkus.runner.GeneratedMain.main(Unknown Source)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:62)
    at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:33)
Caused by: io.smallrye.mutiny.TimeoutException
    at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:64)
    at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
    at io.quarkus.oidc.runtime.OidcRecorder.createStaticTenantContext(OidcRecorder.java:173)
    at io.quarkus.oidc.runtime.OidcRecorder.setup(OidcRecorder.java:91)
    at io.quarkus.deployment.steps.OidcBuildStep$setup1958257566.deploy_0(Unknown Source)
    at io.quarkus.deployment.steps.OidcBuildStep$setup1958257566.deploy(Unknown Source)
    ... 11 more

Expected behavior

The appliaction starts and attempts to connect during the first request. This is already the behaviour for any other connection error.

Actual behavior

No response

How to Reproduce?

Output of uname -a or ver

No response

Output of java -version

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

quarkus-bot[bot] commented 5 days ago

/cc @pedroigor (oidc), @sberyozkin (oidc)

sberyozkin commented 5 days ago

@sbaeumlisberger Hi, I'm not sure crashes is a correct term, I'd prefer fails to start, what else should it do if the connection is blocking for 10 seconds ? It can't just wait indefinitely.

I'm assuming you have already set quarkus.oidc.use-blocking-dns-lookup=true.

But there is another property, quarkus.oidc.connection-timeout, which, incidentally, is set to 10 secs by default.

Can you please set to it 30 secs etc and that should resolve it

sbaeumlisberger commented 5 days ago

You are right it fails to start. I think the correct way would be to continue starting and attempt the connection when needed (first request). This is already the case for any other connection error.

It's not a option to increase the timeout for us. We do not want the application start to be delayed more than necessary. I will try the dns option, but I do not think that that will solve the problem.

For context: The application is deployed on kubernetes and the OIDC server is not required for the full functionality of the app.

sberyozkin commented 5 days ago

@sbaeumlisberger quarkus.oidc.connection-timeout is what can help. If it takes 20 secs to start, what difference does it make if, instead of waiting for the OIDC connection to complete, the first request arrives and the OIDC connection is still being established and it will take 15 secs to finish ?

Perhaps, a better option for your case, given that you don't need OIDC immediately, is to use TenantConfigResolver instead of configuring it in application.properties, it will initiate a connection during the first request when the OIDC server is already available or nearly available.

Or another, similar, option is to disable the discovery and request that JWK keys are resolved at the first request (I can provide mode details, if it can be of interest).

sberyozkin commented 5 days ago

@sbaeumlisberger That said, I can probably do it exactly the same way when Connection IO error is reported, when the connection is retried at the first request... In meantime, the 2 options above for delaying the connection until the 1st request should do it

sbaeumlisberger commented 5 days ago

Thank you for the fast help. I'll try the two options for delaying the connection.

sbaeumlisberger commented 2 days ago

Many thanks for the tips. The solution was a custom TenantResolver.

I still think the error handling could be improved, but I will close this issue now that we have found a good solution.

sberyozkin commented 2 days ago

@sbaeumlisberger, thanks for the confirmation, let me re-open it though, as this is something I'd like to check, if we can postpone the connection attempt in case of the io.smallrye.mutiny.TimeoutException, so I need an open tracker to keep a reminder :-)

sberyozkin commented 2 days ago

For example, if it works, then by decreasing a connection timeout to for 2/3 secs, the startup will continue after the timeout