quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.84k stars 2.7k forks source link

Websocket client fails to connect when using native transport #41082

Open mkorman9 opened 5 months ago

mkorman9 commented 5 months ago

Describe the bug

quarkus-websockets client is unable to connect to the server endpoint when running tests with quarkus.vertx.prefer-native-transport set to true. java.io.InterruptedIOException is thrown by the connectToServer method and the suppressed exception is

java.util.concurrent.ExecutionException: java.lang.IllegalStateException: incompatible event loop type: io.netty.channel.epoll.EpollEventLoop
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
        at io.undertow.websockets.ServerWebSocketContainer.connectToServerInternal(ServerWebSocketContainer.java:445)
        at io.undertow.websockets.ServerWebSocketContainer.connectToServerInternal(ServerWebSocketContainer.java:411)
        at io.undertow.websockets.ServerWebSocketContainer.connectToServer(ServerWebSocketContainer.java:255)

Issue is only visible when native transport is enabled, so netty-transport-native-epoll dependency with matching platform should be present on classpath.

Expected behavior

Websockets client should be able to successfully connect to the server endpoint when running in tests

Actual behavior

connectToServer fails with java.io.InterruptedIOException

How to Reproduce?

  1. Prepare the environment running x86_64 Linux with any supported Java version
  2. Clone the minimal example that reproduces the issue - https://github.com/mkorman9/quarkus-websockets-bug
  3. Run ./gradlew build
  4. Tests should fail with:
    WebsocketTest > test() FAILED
    java.lang.RuntimeException at WebsocketTest.java:24
        Caused by: java.io.InterruptedIOException at WebsocketTest.java:21
  5. Changing quarkus.vertx.prefer-native-transport to false in application.properties fixes the issue

Github Action reproducing the issue jas been already prepared - https://github.com/mkorman9/quarkus-websockets-bug/actions/runs/9436762644/job/25991803842

Output of uname -a or ver

Linux fv-az659-258 6.5.0-1021-azure #22~22.04.1-Ubuntu SMP Tue Apr 30 16:08:18 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

openjdk version "21.0.3" 2024-04-16 LTS OpenJDK Runtime Environment Temurin-21.0.3+9 (build 21.0.3+9-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (build 21.0.3+9-LTS, mixed mode, sharing)

Quarkus version or git rev

3.11.0

Build tool (ie. output of mvnw --version or gradlew --version)

------------------------------------------------------------ Gradle 8.6 ------------------------------------------------------------ Build time: 2024-02-02 16:47:16 UTC Revision: d55c486870a0dc6f6278f53d21381396d0741c6e Kotlin: 1.9.20 Groovy: 3.0.17 Ant: Apache Ant(TM) version 1.10.13 compiled on January 4 2023 JVM: 21.0.3 (Eclipse Adoptium 21.0.3+9-LTS) OS: Linux 6.5.0-1021-azure amd64

Additional information

No response

quarkus-bot[bot] commented 5 months ago

/cc @geoand (kotlin), @zakkak (native-image)

geoand commented 5 months ago

Weird issue.

@cescoffier any idea what could be causing it?

Furthermore, any chance you can test this @mkorman9 with the new websockets-next?

cescoffier commented 5 months ago

Native transports are not supported in native. See https://quarkus.io/guides/vertx-reference#native-transport.

mkorman9 commented 5 months ago

@geoand websockets-next seems to work fine -> https://github.com/mkorman9/quarkus-websockets-bug/tree/next build -> https://github.com/mkorman9/quarkus-websockets-bug/actions/runs/9438594374/job/25995742353

so if it's a bug it's most likely somewhere in the "old" extension.

@cescoffier I haven't tried a native build, my example is for JVM only. I guess the bot added the native-image tag by mistake.

geoand commented 5 months ago

Thanks for checking

cescoffier commented 5 months ago

So, when enabling native transport, it replaces the event loop implementation. Somewhere in the undertow extension, this possibility has not been anticipated. I need to check in details if we can remove the downcast.

Websocket-next being on top of Vertx (without the undertow layer) does not have the problem, as Vertx is perfectly aware of the event loop implementation replacement.

cescoffier commented 5 months ago

I can't reproduce it on Mac (switched to KQueue instead of Epoll). @geoand can you try the reproducer on Linux and find the exact place where the downcast is done?

geoand commented 5 months ago

Sure thing

geoand commented 5 months ago

The problem is that io.undertow.websockets.WebsocketConnectionBuilder is passed the EventLoopGroup configured from WebsocketCoreRecorder but it's hardcoded to use .channel(NioSocketChannel.class) (in line 186).

So the easy solution would be to change WebsocketConnectionBuilder to always use a NioEventLoop

geoand commented 5 months ago

I can take a look at a proper fix later on today

geoand commented 5 months ago

https://github.com/quarkusio/quarkus-http/pull/155 is a workaround. A proper solution would mean passing the Channel class all around so it would have to break some public class and I am not sure if that is acceptable.

mkouba commented 1 month ago

The workaround proposed in https://github.com/quarkusio/quarkus-http/pull/155 was merged. What are the next steps?