quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.82k stars 2.69k forks source link

java.net.Inet6AddressImpl.lookupAllHostAddr blocks the main thread when OIDC auth server is configured #31851

Closed andychoi-ibm closed 1 year ago

andychoi-ibm commented 1 year ago

Describe the bug

It seems to be hanging resolving the hostname of the configured OIDC auth server while starting the app in OpenShift that doesn't have IPv6 configured. I tried configuring the following config option but got the same result.

quarkus.native.additional-build-args=-J-Djava.net.preferIPv4Stack=true,-J-Djava.net.preferIPv4Addresses=true

The app works otherwise but this prevents the app from starting in milliseconds when running a native executable.

2023-03-15 01:48:23,811 WARN [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 3998 ms, time limit is 2000 ms: io.vertx.core.VertxException: Thread blocked at java.base@17.0.6/java.net.Inet6AddressImpl.lookupAllHostAddr(Inet6AddressImpl.java) at java.base@17.0.6/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:933) at java.base@17.0.6/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1534) at java.base@17.0.6/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:852) at java.base@17.0.6/java.net.InetAddress.getAllByName0(InetAddress.java:1524) at java.base@17.0.6/java.net.InetAddress.getAllByName(InetAddress.java:1381) at java.base@17.0.6/java.net.InetAddress.getAllByName(InetAddress.java:1305) at java.base@17.0.6/java.net.InetAddress.getByName(InetAddress.java:1255) at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156) at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153) at java.base@17.0.6/java.security.AccessController.executePrivileged(AccessController.java:144) at java.base@17.0.6/java.security.AccessController.doPrivileged(AccessController.java:569) at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153) at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41) at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61) at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53) at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55) at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31) at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106) at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206) at io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:162) at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:139) at io.vertx.core.net.impl.ChannelProvider.handleConnect(ChannelProvider.java:155) at io.vertx.core.net.impl.ChannelProvider.connect(ChannelProvider.java:106) at io.vertx.core.net.impl.ChannelProvider.connect(ChannelProvider.java:92) at io.vertx.core.net.impl.NetClientImpl.connectInternal2(NetClientImpl.java:273) at io.vertx.core.net.impl.NetClientImpl.lambda$connectInternal$1(NetClientImpl.java:240) at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base@17.0.6/java.lang.Thread.run(Thread.java:833) at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:775) at org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:203) 2023-03-15 01:48:24,927 DEBUG [io.net.han.ssl.SslHandler] (vert.x-eventloop-thread-1) [id: 0x87f1d756, L:/172.21.21.49:55900 - R:xxx.ibm.com/###.###.###.###:443] HANDSHAKEN: protocol:TLSv1.2 cipher suite:TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

Expected behavior

The native executable starts in milliseconds.

Actual behavior

The native executable starts in 4-5 seconds.

How to Reproduce?

Steps to reproduce the behavior.

  1. Configure an app to use OIDC.
  2. Build a container that runs the app in native mode.
  3. Run the container in OpenShift that doesn't have IPv6 configured.

Output of uname -a or ver

Linux graphql-api-7f776b8c48-2hn4f 4.18.0-305.57.1.el8_4.x86_64 #1 SMP Tue Jul 5 17:55:55 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

java version "17.0.5" 2022-10-18 IBM Semeru Runtime Certified Edition 17.0.5.0 (build 17.0.5+8) Eclipse OpenJ9 VM 17.0.5.0 (build openj9-0.35.0, JRE 17 Linux amd64-64-Bit Compressed References 20221018_272 (JIT enabled, AOT enabled) OpenJ9 - e04a7f6c1 OMR - 85a21674f JCL - e1263b46ebe based on jdk-17.0.5+8)

GraalVM version (if different from Java)

mandrel-java17-22.3.1.0-Final

Quarkus version or git rev

2.16.0.Final

Build tool (ie. output of mvnw --version or gradlew --version)

Maven home: /home/xxx/apache-maven-3.9.0 Java version: 17.0.5, vendor: IBM Corporation, runtime: /opt/ibm/ibm-semeru-certified-17-jdk Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "4.18.0-425.10.1.el8_7.x86_64", arch: "amd64", family: "unix"

Additional information

No response

quarkus-bot[bot] commented 1 year ago

/cc @Karm (mandrel), @galderz (mandrel), @geoand (openshift), @iocanel (openshift), @pedroigor (oidc), @sberyozkin (oidc), @zakkak (mandrel)

sberyozkin commented 1 year ago

Hey @stuartwdouglas @cescoffier @vietj @pmlopes Can you recommend something like configuring Vert.x Client to prefer IPV4 resolution ?

sberyozkin commented 1 year ago

This is probably more like an external setup problem, as per the problem description:

It seems to be hanging resolving the hostname of the configured OIDC auth server while starting the app in OpenShift that doesn't have IPv6 configured.

zakkak commented 1 year ago

passing quarkus.native.additional-build-args=-J-Djava.net.preferIPv4Stack=true,-J-Djava.net.preferIPv4Addresses=true should work around the issue though (i.e. it should not result in java.net.Inet6AddressImpl.lookupAllHostAddr(Inet6AddressImpl.java) being invoked) .

@andychoi-ibm could you please provide a maven project that reproduces the issue?

cescoffier commented 1 year ago

@sberyozkin no much we can do. Vert.x resolves the DNS name using the default JDK API (which is blocking). If the DNS server is slow you get this. However, it should only happen once, then the resolution is cached.

There is an async DNS client in Vert.x. However, it is experimental and is disabled by default.

cescoffier commented 1 year ago

-Djava.net.preferIPv4Stack=true would force the resolution to IPv4 - however if the DNS server is slow, it won't help.

sberyozkin commented 1 year ago

Thanks @cescoffier, I'd like to keep this issue open until quarkus.native.additional-build-args=-J-Djava.net.preferIPv4Stack=true,-J-Djava.net.preferIPv4Addresses=true is cleared, as asked by Foivos

andychoi-ibm commented 1 year ago

@zakkak You can use https://github.com/quarkusio/quarkus-quickstarts/tree/main/microprofile-graphql-quickstart to reproduce the issue.

  1. Edit pom.xml to add the OIDC dependency.
    <dependency>
      <groupId>io.quarkus</groupId>
      <artifactId>quarkus-oidc</artifactId>
    </dependency>
  1. Add the following config options to application.properties.
quarkus.log.level=DEBUG
quarkus.oidc.enabled=false
quarkus.native.additional-build-args=-J-Djava.net.preferIPv4Stack=true,-J-Djava.net.preferIPv4Addresses=true

%prod.quarkus.oidc.enabled=true
%prod.quarkus.oidc.application-type=service
%prod.quarkus.oidc.discovery-enabled=false
%prod.quarkus.oidc.auth-server-url=https://quarkus.io
%prod.quarkus.oidc.jwks-path=/jwks

Note: quarkus.oidc.auth-server-url can be set to any URL.

  1. Run 'mvn clean package -Pnative' and build an image.

podman build -f src/main/docker/Dockerfile.native -t default-route-openshift-image-registry.apps..openshiftapps.com//microprofile-graphql-quickstart .

  1. Push the image to OpenShift.

podman push default-route-openshift-image-registry.apps..openshiftapps.com//microprofile-graphql-quickstart

  1. Create a deployment to start the app.

@cescoffier We're running the app in AWS ROSA and we haven't experienced DNS server slowness with other apps.

zakkak commented 1 year ago

Thank you for the reproducer @andychoi-ibm.

Regarding java.net.preferIPv4* not being respected.

java.net.InetAddress is registered for class reinitialization by GraalVM https://github.com/oracle/graal/blob/251442c6a74ef7ea93ed5239e669ed2b7fb96214/substratevm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/jdk/JNIRegistrationJavaNet.java#L59 which means that no matter what value you pass to java.net.preferIPv4* at build-time the application will check the value at runtime and use the corresponding stack.

This way we can run the same native executable on different platforms without failing if IPv6 is or isn't available on that platform etc. This also allows us to choose whether to use IPv6 or not even when supported.

So to work-around the issue using java.net.preferIPv4* you need to move it's definition from build-time to run-time, i.e.:

you may remove quarkus.native.additional-build-args=-J-Djava.net.preferIPv4Stack=true,-J-Djava.net.preferIPv4Addresses=true and pass -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Addresses=true when running the container.

andychoi-ibm commented 1 year ago

@zakkak Thank you for providing the detailed analysis and a solution! Would you be able to recommend a good option to pass the system properties to the running container?

cescoffier commented 1 year ago

You can edit the docker.native file and pass the property.

zakkak commented 1 year ago

Would you be able to recommend a good option to pass the system properties to the running container?

@andychoi-ibm No, unfortunately I am not familiar with that part, I will leave it to someone more familiar (and I will read the answer to learn something new)

You can edit the docker.native file and pass the property.

@cescoffier (a bit off topic) is this the standard way of doing this for JVM-mode as well? I would expect some consistency between the different modes, e.g., if application.properties is loaded/read when running in JVM-mode, I would expect it to get loaded in native-mode and in containerized builds.

andychoi-ibm commented 1 year ago

@zakkak @cescoffier Thank you, both! I was able to disable IPv6 by passing the properties via CMD but, unfortunately, got the same result with IPv4 so it seems Clement was right that this is caused by slow DNS resolution.

2023-03-16 21:31:53,736 WARN [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 3999 ms, time limit is 2000 ms: io.vertx.core.VertxException: Thread blocked at java.base@17.0.6/java.net.Inet4AddressImpl.lookupAllHostAddr(Inet4AddressImpl.java)

I'm not sure if this is an issue with OpenShift/RHEL but I noticed the issue doesn't occur when I add a simple init container with an echo command, but the total startup time is roughly the same as it takes time to run the init container...

I think we can close this issue as it doesn't seem to be an issue with Quarkus.

andychoi-ibm commented 1 year ago

The issue no longer occurred once we updated the OCP version to 4.10.53.

acasanova99 commented 1 year ago

This is the manual configuration for Keycloak is:

quarkus.oidc.application-type=service
quarkus.oidc.discovery-enabled=false

quarkus.oidc.token.path=/protocol/openid-connect/token
quarkus.oidc.jwks.path=/protocol/openid-connect/certs
quarkus.oidc.user.info-path=/protocol/openid-connect/userinfo
quarkus.oidc.introspection-path=/protocol/openid-connect/tokens/introspect

PD: Remember that if you are using Keycloak, the parameter: quarkus.oidc.auth-server-url should end up with your realm, i.e. <your-path>/auth/realms/<your-realm>